Hybrid (Software-Hardware) Dynamic Memory Allocator, Slides of Computer Architecture and Organization

A hybrid dynamic memory allocator that balances hardware complexity with performance by combining the phk (poul-henning kamp) allocation algorithm used in free-bsd system and chang’s hardware allocator based on the buddy system. The proposed allocator aims to improve memory usage and execution performance in modern programming languages with dynamic memory allocation.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

dinarr
dinarr 🇮🇳

4.8

(12)

73 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Hybrid (Software-Hardware) Dynamic
Memory Allocator
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Hybrid (Software-Hardware) Dynamic Memory Allocator and more Slides Computer Architecture and Organization in PDF only on Docsity!

Hybrid (Software-Hardware) Dynamic

Memory Allocator

Outline

• Introduction

• Related Research

• Proposed Hybrid Allocator

• Complexity and Performance Comparison

• Conclusion

• References

Introduction

Current Systems

  • Execution time spent on Memory Management is 42%.
  • Still important researches on
    • Good execution performance
    • Memory locality
  • How to get free chunks of memory?
    • Software Allocator
    • Hardware Allocator
DMM

Pure Software

Low Cost Allocator

D ynamic M emory M anagement

Introduction

  • Software Allocator
    • Different Search Techniques

to organize available chunks

of free memory

  • Disadvantage
    • Search could be in the critical

path of allocators causing a

major performance

bottleneck.

  • Hardware Allocator
    • Parallel Search
      • Speed up Memory

Allocation

  • Improve Performance
  • Hide execution latency of

freeing objects

  • Coalescing of free chunks of

memory

  • Disadvantage
    • Potential Hardware

Complexity

Related Research

PHK (Poul-Henning Kamp) Allocator

Two most popular general purpose open source allocator

**1. Doug Lea used in LINUX System

  1. PHK used in Free-BSD System**

Difference between them is less than 3% for memory allocation intensive benchmarks in SPEC 2000 CPU.

PHK Allocator chosen bacause of its suitability for hardware/software co-design.

Free-BSD (Berkeley Software Distribution ) is an advanced operating system for x86 compatible (including Pentium® and Athlon™), architectures. It is derived from BSD, the version of UNIX® developed at the University of California, Berkeley. It is developed and maintained by a large team of individuals.

Related Research

PHK (Poul-Henning Kamp) Allocator

  • Page based allocator
  • Each page can only contain objects of one size
  • For a large object sufficient number of pages allocated
  • For small objects less than a half page, object size is padded to the

nearest power of 2

  • Allocator keeps a page directory for all allocated pages and at the

beginning of each small object page, bitmap of allocation information

is created

  • (^) While allocating small objects, PHK Allocator performs a linear search

on the bitmap to find the first available chunk in that page

Related Research

  • Chang’s Hardware Allocator
    • Each leaf node of the OR-tree represents base size of the smallest unit of memory that can be allocated
    • The leaves of OR-tree together represent the entire memory
    • AND-tree has the same number of leaves as the OR-tree
    • Input of the AND-tree is generated by a complex interconnection network of the OR-tree

Or Gates

Related Research

  • Chang’s Hardware Allocator
    • Or-Tree
      • Determine if there is a large enough space for allocation request
    • AND-Tree
      • Find the beginning address of that memory chunk
    • Flip the bits corresponding to the memory chunk in the bit- vector

Bit-vector

Related Research

  • The interconnection between the OR-tree and the AND-tree is the

most complex part of the Chang’s allocator

  • The interconnection has the same critical path delay as the

OR/AND-tree

  • Final allocation result is produced by the output of the AND-tree

through a set of multiplexers

  • The Hardware complexity, in terms of number of gates is

O(n logn)

the memory

chunks Critical path delay

Proposed Hybrid Allocator

Pure hardware allocators based on buddy system

1. Complexity of the hardware increases with the size of the memory managed 2. Poor object locality

Software Allocators Poor execution performance

Problems of hardware-software only allocators

Proposed Hybrid Allocator

Software portion

Responsible for

  1. Creating page indexes
  2. For large sized objects (>half a page) does the allocation without any assistance from hardware
  3. Allocation for a small sized object, it will locate the bitmap of a page with free memory and issues a search request to the hardware

Hardware portion

  1. Search the page index (or bitmap) in parallel to find a free chunk
  2. Mark the bitmap to indicate an allocation

Proposed Hybrid Allocator

  • OR-tree responsible for determining if there is a free chunk in a page (similar to Chang’s system)
  • AND-tree will locate the position of the first free chunk in the page (similar to Chang’s system)
  • Because an OR-tree and an AND-tree are dedicated to one object size, complex interconnections between OR and AND tree are not needed( unlike Chang’s)

Proposed Hybrid Allocator

  • Bit-flippers use the decoded address and the opcode to determine how to flip a desired bit

Block Diagram of Proposed Hardware Component (For Page Size 4096 bytes and Object Size 16 bytes)

Proposed Hybrid Allocator

  • Overall design of the system with 4096- byte pages
  • For different object sizes, the hardware needed to support the bit-map will be different
  • In our design, preselected object sizes are from 16-bytes to 2048-bytes and include hardware to support pages for these objects
  • MUX is used to select the hardware unit that will be responsible for supporting objects of a given size
  • The larger the object size, the smaller the amount of hardware needed to support the bit-maps indicating the availability of chunks in that page