TLBs and virtualisation
TLB
Nehalem also features changes in the TLB hierarchy wich come hand in hand with the changes in the cache hierarchy.
Nehalem now has a true two level TLB hierarchy which can be allocated dynamically between threads. The first level TLB serves all memory acceses and contains 64 entries for 4 KB pages as well as 32 entries for 2M / 4M pages whereas it keeps four way associativity. Further Nehalem contains a second level unified TLB with 512 entries for small pages which again kepps the four way associativity.
To allocate the whole cache every core has 576 entries for small pages and 2304 for the whole chip. The number of TLB entries makes the translation of 9216 KB possible which is more than enough for the 8 MB L3-Cache Nahlem comes with.
VirtualisationNehalems TLB can also access VPIDs (Virtual Processor ID). Every TLB entry caches the translation of a physical to a virtual address. The translation is specific to a given process and virtual machine. Earlier Intel CPUs needed to flush the TLB whenever it was switched between virtualized guest and host instance. Intel estimates that the latency for a VM round trip is 40 percent compared to Merom (65 nm Core 2).
A further improvement referring to virtualisation can be found when we look at the extended page tables. They are now able to eliminate many VM transitions and not only reduce the latency like VPID does. Earlier Intel designs needed a hypervisor which was handling page faults. Now the page tables can be simply compared which saves many unnecessary VM exits.
Discuss this article in the forum.