Download Hybrid (Software-Hardware) Dynamic Memory Allocator and more Slides Computer Architecture and Organization in PDF only on Docsity!
Hybrid (Software-Hardware) Dynamic
Memory Allocator
Outline
• Introduction
• Related Research
• Proposed Hybrid Allocator
• Complexity and Performance Comparison
• Conclusion
• References
Introduction
Current Systems
- Execution time spent on Memory Management is 42%.
- Still important researches on
- Good execution performance
- Memory locality
- How to get free chunks of memory?
- Software Allocator
- Hardware Allocator
DMM
Pure Software
Low Cost Allocator
D ynamic M emory M anagement
Introduction
- Software Allocator
- Different Search Techniques
to organize available chunks
of free memory
- Disadvantage
- Search could be in the critical
path of allocators causing a
major performance
bottleneck.
Allocation
- Improve Performance
- Hide execution latency of
freeing objects
- Coalescing of free chunks of
memory
Complexity
Related Research
PHK (Poul-Henning Kamp) Allocator
Two most popular general purpose open source allocator
**1. Doug Lea used in LINUX System
- PHK used in Free-BSD System**
Difference between them is less than 3% for memory allocation intensive benchmarks in SPEC 2000 CPU.
PHK Allocator chosen bacause of its suitability for hardware/software co-design.
Free-BSD (Berkeley Software Distribution ) is an advanced operating system for x86 compatible (including Pentium® and Athlon™), architectures. It is derived from BSD, the version of UNIX® developed at the University of California, Berkeley. It is developed and maintained by a large team of individuals.
Related Research
PHK (Poul-Henning Kamp) Allocator
- Page based allocator
- Each page can only contain objects of one size
- For a large object sufficient number of pages allocated
- For small objects less than a half page, object size is padded to the
nearest power of 2
- Allocator keeps a page directory for all allocated pages and at the
beginning of each small object page, bitmap of allocation information
is created
- (^) While allocating small objects, PHK Allocator performs a linear search
on the bitmap to find the first available chunk in that page
Related Research
- Chang’s Hardware Allocator
- Each leaf node of the OR-tree represents base size of the smallest unit of memory that can be allocated
- The leaves of OR-tree together represent the entire memory
- AND-tree has the same number of leaves as the OR-tree
- Input of the AND-tree is generated by a complex interconnection network of the OR-tree
Or Gates
Related Research
- Chang’s Hardware Allocator
- Or-Tree
- Determine if there is a large enough space for allocation request
- AND-Tree
- Find the beginning address of that memory chunk
- Flip the bits corresponding to the memory chunk in the bit- vector
Bit-vector
Related Research
- The interconnection between the OR-tree and the AND-tree is the
most complex part of the Chang’s allocator
- The interconnection has the same critical path delay as the
OR/AND-tree
- Final allocation result is produced by the output of the AND-tree
through a set of multiplexers
- The Hardware complexity, in terms of number of gates is
O(n logn)
the memory
chunks Critical path delay
Proposed Hybrid Allocator
Pure hardware allocators based on buddy system
1. Complexity of the hardware increases with the size of the memory managed 2. Poor object locality
Software Allocators Poor execution performance
Problems of hardware-software only allocators
Proposed Hybrid Allocator
Software portion
Responsible for
- Creating page indexes
- For large sized objects (>half a page) does the allocation without any assistance from hardware
- Allocation for a small sized object, it will locate the bitmap of a page with free memory and issues a search request to the hardware
Hardware portion
- Search the page index (or bitmap) in parallel to find a free chunk
- Mark the bitmap to indicate an allocation
Proposed Hybrid Allocator
- OR-tree responsible for determining if there is a free chunk in a page (similar to Chang’s system)
- AND-tree will locate the position of the first free chunk in the page (similar to Chang’s system)
- Because an OR-tree and an AND-tree are dedicated to one object size, complex interconnections between OR and AND tree are not needed( unlike Chang’s)
Proposed Hybrid Allocator
- Bit-flippers use the decoded address and the opcode to determine how to flip a desired bit
Block Diagram of Proposed Hardware Component (For Page Size 4096 bytes and Object Size 16 bytes)
Proposed Hybrid Allocator
- Overall design of the system with 4096- byte pages
- For different object sizes, the hardware needed to support the bit-map will be different
- In our design, preselected object sizes are from 16-bytes to 2048-bytes and include hardware to support pages for these objects
- MUX is used to select the hardware unit that will be responsible for supporting objects of a given size
- The larger the object size, the smaller the amount of hardware needed to support the bit-maps indicating the availability of chunks in that page