Almost fourteen months ago, I started working on my bachelor thesis. Although I finished it half a year ago, it’s still part of my work as a student research assistant.
During my initial work, most of the code was written for an internal research kernel. I’m now happy that we were able to port it to an open source kernel called eduOS. This minimal operating system is used for practical demo’s and assignments during the OS course at my university. There’s much more I could write about. So this will probably be another separate blog post.
The motive for this article is an abstract I wrote for the student research competition of the ASPLOS conference which is held this year in Istanbul, Turkey. Unfortunately my submission got rejected. But as a nice side-effect, I’ve now the chance to present my work to an english audience as well:
Self-referencing Page Tables for the x86-Architecture
A simple Paging Implementation for a minimalistic Operating System
Academic advisor: Dr. rer nat. Stefan Lankes
Institute for Automation of Complex Power Systems
E.ON Energy Research Center, RWTH Aachen University
Mathieustr. 10, 52074 Aachen, Germany
This was a submission for ASPLOS Student Research Competition ’15 Istanbul, Turkey1
The adoption of 64 bit architectures went along with an extension of the virtual address space (VAS). To cope with this growth, the memory management unit (MMU) had to be extended as well. For paging-based systems like Intel’s x86-architecture this was realized by adding more levels of indirection to the page table walk.
This walk translates virtual pages to physical page frames (PF) by performing look-ups in a radix / prefix tree in which every node represents a page table (Figure 1a). Since the tables are part of the translation process, they must be referenced by physical page frame numbers (PFN, blue line). As the operating system is only eligible to access the VAS, it cannot follow the path of a walk. In order to allow the manipulation of page tables, it must provide:
- Access to the table entries, by mapping the tables themselves to the VAS.
- A mapping between physical references to corresponding locations in the VAS.
Additionally, every level of the page table walk increases the complexity of managing these mappings. They also increase the memory consumption by occupying physical page frames. It is possible to avoid both drawbacks by the technique described in the following.
In my bachelor thesis, I presented an approach, which is compatible with both the 32 bit and 64 bit version of Intel’s x86-architecture. This allows for a replacement of two code bases, one for each architecture, by one supporting both. Thus, results in a shorter, easier comprehensible, and maintainable code. As foundation for this implementation our teaching OS called “eduOS” was used2. “eduOS” supports only the 32 bit protected mode whereas the 64 bit longmode is only implemented for an internal research kernel.
Thanks to the sophisticated design of Intel’s x86 MMU, it is possible to avoid most of the complexity and space requirements by using a little trick. Adding a self-reference in the root table (PML4 resp. PGD) automatically enables access to all page tables from the VAS without the need for manual mappings as described above (Figure 1b). The operating system does not need to manually follow the path of a page table walk, as this task is executed by the MMU for accessing individual tables instead of page frames.
An access to the VAS region covered by a self-reference causes the MMU to look up the root table twice (red line). Effectively, this shifts the whole page table walk by one level. Therefore, it stops with the PFN of page tables instead of page frames that are usually translated by the MMU. Here, both the PML4 and PDPT indexes are used to choose an entry out of the PML4 table. Therefore, it must be guaranteed that PML4 entries can be interpreted as PDPT entries, too. This demands for the following requirements:
- Homogenous coding of paging flags across all paging levels.
- Equal table sizes across all paging levels.
Fortunately, the x86-architecture complies with this prerequisites as shown in Figure 2. Green colored flags are coded consistently across all paging levels. Only PAT, size and global flags have a slightly different meaning for entries in the PGT. My bachelor thesis shows that these deviations still allow maintaining full control caching and memory protection properties of self-mapped tables. This includes for common system calls like fork() and kill().
By repeatedly addressing the self-reference, it is also possible to access tables of the upper levels (PGD to PML4). Table 1 shows the resulting virtual addresses of all page tables when using the last (512th) entry of the PML4 table for the self-reference3. This grants access to all possible page tables, including those which might not yet exist and may be allocated in the future. Hence, the self-reference reserves a fixed fraction of the VAS for the page tables. The size of this region is equal to 256 TiB / 512 = 512 GiB for 64 bit (resp. 4 GiB / 1024 = 4 MiB for 32 bit), which is negligible in comparison to the huge VAS of 248 byte.
For the manipulation of page table entries two approaches
- Top-down Use known tree traversals, starting at the root node,
which corresponds to the PML4 respectively PGD.
- Bottom-up Use the page fault handler to create new tables on-the-fly,
when they are not yet present.
But there are also other architectures which satisfy the prerequisites described above. One of these is the Alpha4 architecture, which suggests a similar approach in the reference manual. Intel and AMD do not mention the technique in their x86 manuals. In the field of operating systems, support is far more limited. There is only a single reference5 dated to 2010 indicating that Microsoft might use a similar approach for its NT kernel. Linux cannot profit because its paging implementation must support a broad selection of virtual memory architectures of which not all fulfill the requirements mentioned above.