Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Memory Organization and Virtual Memory Management: Concurrency and Caching - Prof. Sudhaka, Study notes of Computer Architecture and Organization

Georgia Institute of Technology - Main Campus Computer Architecture and Organization

Prof. Sudhakar Yalamanchili

Various memory organization techniques for cache and main memory, focusing on concurrency and virtual memory management. Topics include sequential and concurrent bank operations, data skewing, virtual memory, and caching from disk. The document also covers management policies, address translation concepts, and implementation using page tables and translation lookaside buffers (tlbs).

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-01v 🇺🇸

10 documents

1 / 30

This page cannot be seen from the preview

Don't miss anything!

Module:

Main Memory Organization and

Operation

ECE 4100/6100 (2)

Reading for this Module

•Main memory organizations

–Section 5.8

•Memory Technology

–Section 5.9

•Virtual memory

–Section 5.10

•Examples

–Paged virtual memory: Section 5.11

–Memory hierarchies: Section 5.13

Discover Study notes of Computer Architecture and Organization Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Memory Organization and Virtual Memory Management: Concurrency and Caching - Prof. Sudhaka and more Study notes Computer Architecture and Organization in PDF only on Docsity!

© Sudhakar Yalamanchili, Georgia Institute of Technology

Module: Module:

Main Memory Organization andMain Memory Organization and

OperationOperation

Reading for this Module

Main memory organizations
- Section 5.
Memory Technology
- Section 5.
Virtual memory
- Section 5.
Examples
- Paged virtual memory: Section 5.
- Memory hierarchies: Section 5.

ECE 4100/6100 (3)

Main and Secondary Memory

Organization

single bank

single bit

Interleaved banks

single chip

mem mem mem mem

Interleaved disks

Each level is characterized by distinct technology and architecture

Main Memory Storage Technologies

DRAM: “Dynamic” Random Access Memory
- Highest densities
- Optimized for cost/bit Æ main memory
SRAM: “Static” Random Access Memory
- Densities ¼ to 1/8 of DRAM
- Speeds 8-16x faster than DRAM
- Cost 8-16x more per bit
- Optimized for speed Æ caches

ECE 4100/6100 (7)

Generic Memory Organization

sense amplifiers

Memory cell stores a bit of information: DRAM or

SRAM cell

Row address causes the read of a row of bits into the

buffer

Column address selects bit from the buffer
CS- Chip Select, RW – Read/Write

RAS array

CAS

Use half the address pins

Row address strobe (RAS) Column Address Strobe (CAS)

DRAM Optimizations

Faster to read data from the same row
- Called “page mode” (fast page mode, EDO are variations)
- Multiple CAS accesses
Bandwidth determined by cycle time
- Example Row: 100ns +
- Example Column: 30ns +, usually more like 50nS due to external components
Add a clock to the interface Æ synchronous DRAMs
- Enable split transactions
Use both edges of the clock Æ Double data rate
In practice
- There are multiple banks on chip
- Arrays are 1-4Mbits

ECE 4100/6100 (9)

Conventional SDRAM Memory

System

Define memory bank as an independently addressable and controllable entity - Typically 2-4 banks per chip - Different topologies for each set of signals between memory controller and memory chips
(S)DRAM based designs typically constructed with chips contributing part of a word - Maximum number of outstanding requests limited by concurrency in the chip - Internal “banks used for performance optimizations - Speed via concurrency - Power

Word slice

8 bits per chip

32 bit words

ADDR/CLK

DATA

RAS/CAS/CS

Memory Controller

Rambus: Direct RDRAM

Designed to improve bandwidth to memory: Reinvent the DRAM interface - Split transaction bus - Tailored signal traces: the Rambus Signal Layer - Short bus length for high speed - 2 bytes @ 800Mhz (1.6GB/s per bus)
Each chip operates a set of 16 memory banks rather than a slice - Actually 2 banks where pairs share the sense amplifiers - Performs its own refresh - Block transfer, streaming, arbitrary sequencing

Word slice 16 bit packets

ADDR/CLK

DATA Only one device responds

ECE 4100/6100 (13)

Main Memory Organizations

The processor-memory bus may have width of one or more memory words
Multiple memory banks can operate in parallel
- Transfer from memory to the cache is subject to the width of the processor-memory bus
Wide memory comes with constraints on expansion
- Use of error correcting codes require the complete “width” to be read to recompute the codes on writes
- Minimum expansion unit size is increased

registers ALU

cache

Memory

registers ALU

cache

Mem Mem Mem Mem

registers ALU

cache

Mem Mem Mem Mem

bus wide bus^ bus

Word Level Interleaved Memory

Memory is organized into multiple, concurrent, banks
World level interleaving across banks
Single address generates multiple, concurrent accesses
Well matched to cache line access patterns
Assuming a word-wide bus, cache miss penalty is T address + T mem_access + #words * T transfer cycles

memory access 1

memory access 2

Read the output ofa memory access

output of memoryTime to read the

Memory Module

Time

Note the effect of a split transaction vs. locked bus

word interleaving (^0 1 2 ) (^4 5 6 )

Bank 0 Bank 1 Bank 2 Bank 3

ECE 4100/6100 (15)

Sequential Bank Operation

n-m higher order bits m^ lower order bits

word 1

access 1 module 0 module 1

Implement using DRAM arrays with page mode

access

bank

Concurrent Bank Operation

n-m

0 1 m-

word 1

module 0 module 1 module 2

ADDR

DATA

Supports arbitrary accesses
Needs sources of multiple, independent accesses
- Lock-up free caches, data speculation, write buffers, pre- fetching

ECE 4100/6100 (19)

Conflict-Free Access

Conflict free access to elements of the vector if Æ
- M >= N
- M >= N. gcd(M, d )
Multi-dimensional arrays treated as arrays of 1-d

vectors

Conflict free access for various patterns in a matrix

requires

M >= N. gcd(M,δ1) for columns
M >= N. gcd(M, δ2) for rows
M >= N. gcd(M, δ1+ δ2 ) for forward diagonals
M >= N. gcd(M, δ1- δ2) for backward diagonals

Conflict-Free Access

Implications for M = N = even number?
For non-power-of-two values of M, indexing and

address computation must be efficient

Vectors that are accessed are scrambled
- Unscrambling of vectors is a non-trivial performance issue
Data dependencies can still reduce bandwidth far

below O(M)

ECE 4100/6100 (21)

Avoiding Bank Conflicts: Compiler

Techniques

Many banks

int x[256][512];

for (j = 0; j < 512; j = j+1)

for (i = 0; i < 256; i = i+1)

x[i][j] = 2 * x[i][j];

Even with 128 banks, since 512 is multiple of 128, conflict

on word accesses

Solutions :
- Software: loop interchange
- Software: adjust array size to a prime # (“array padding”)
- Hardware: prime number of banks (e.g. 17)
- Data skewing

Pentium IV Execution Core

Exec Port 0 Exec Port 1 Load Port Store Port

ALU (2X) FP Move^

ALU (2X) Integer^ FP^ Load Store

FP/SSE Move FP/SSE Store

Add/Sub Logic Store Data Branches

Add/Sub Shift/Ro tate

FP/SSE

Dispatch Ports

scheduler scheduler scheduler^ scheduler

Out-of-order schedulers feed dispatch ports

Compute μop queue (^) memory μop queue

ECE 4100/6100 (25)

Summary

Two dominant memory technologies for cache and

main memory

SRAM and DRAM
Chip level organizations for concurrency
Main memory organizations for concurrency
Match the cache design
Optimize for bandwidth or latency
Latency and bandwidth computation
Miss penalty computation
Compiler techniques for organizing data for

concurrent access

Virtual Memory Virtual Memory

ECE 4100/6100 (27)

Topical Outline

Review of virtual memory management
- Motivation
- Operation
Speeding up address translation via concurrency
Speeding up address translation via use of virtually

addressed caches

The Memory Hierarchy

registers ALU Cache

Memory

Managed by the compiler Memory

Managed by the hardware Managed by the operating system

Managed by the operating system

Caching is the mode of operation at each level of the hierarchy
- data movement from tape Æ disk Æ memory Æ cache Æ registers
Control of movement is performed by HW or SW guided by
- Static vs. run-time management of resources
- Time scales of operation
  - Span nanoseconds (register access) to seconds (tapes)

ECE 4100/6100 (31)

Virtual Memory Management

Follows the same basic principles as cache

management

Unit of management is a page
- A page is typically 4Kbytes – 32 Kbytes
The program sees a virtual address space

partitioned into virtual pages

Virtual address space typically exceeds physical memory size
The program resides on disk
Physical memory is partitioned into physical pages

Management Policies

Demand driven operation
- Pages are brought into memory when referenced
Placement policy
- Fully associative placement
Replacement policy
- Approximations to least recently used (LRU) are the most common
Update policy
- At current disk latencies, write-through is infeasible
- Write-back update policy is employed

ECE 4100/6100 (33)

Address Translation: Concepts

VPN offset

Address Translation Data Structure

PPN offset offset

base address

Offsets within the virtual page and corresponding physical page are the same
We only need to translate the virtual page number (VPN) to the corresponding physical page number (PPN)

Physical memory pages

virtual memory pages (located on disk)

pages

PPN VPN

translate addressMove page and

Address Translation Implementation: The

Page Table

Translate the virtual page address to the physical

page address

Keep state information on a page
- Modified/Not Modified
- Access rights
- In memory or on disk
- Caching policies

V state PPN/Disk Address

VPN offset

Page table baseregister index

Physical address

To cache

ECE 4100/6100 (37)

TLB Operation

TLB size typically a function of the target domain
- High end machines will have fully associative large TLBs
PTE entries are replaced on a demand driven basis
The TLB is in the critical path

registers ALU Cache

Memory

TLB

address^ virtual physicaladdress

Translate & Update TLB

miss

The Memory Access Path

Virtual-to-physical address translation occurs on

every access!

Adds to the latency of every memory access
How can we optimize the critical path?

registers ALU Cache

Memory

TLB

address^ virtual physicaladdress

ECE 4100/6100 (39)

Optimizing the Critical Path:

Principles

Concurrency between address translation and cache

access

Overlap cache access and TLB translation
Making translation the exception
Address the cache with the virtual address

Overlapping Cache and TLB Access

Example: direct mapped, 16 Kbyte cache with 64

byte lines and 16 Kbyte pages

18 14

18 8 6

VPN PPN V state VPN PPN V state VPN PPN V state VPN PPN V state

: : : : :

31 : 0

Mux

State Bits Tag^ Data

: :

TLB

Direct Mapped Cache

Access TLB with VPN

Access Cache with Line Address

Memory Organization and Virtual Memory Management: Concurrency and Caching - Prof. Sudhaka, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Memory Organization and Virtual Memory Management: Concurrency and Caching - Prof. Sudhaka and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Module: Module:

Main Memory Organization andMain Memory Organization and

OperationOperation

Reading for this Module

Main and Secondary Memory

Organization

Main Memory Storage Technologies

SRAM cell

buffer

Conventional SDRAM Memory

System

access

vectors

requires

address computation must be efficient

below O(M)

Avoiding Bank Conflicts: Compiler

Techniques

int x[256][512];

for (j = 0; j < 512; j = j+1)

for (i = 0; i < 256; i = i+1)

x[i][j] = 2 * x[i][j];

on word accesses

Pentium IV Execution Core

Summary

main memory

concurrent access

Virtual Memory Virtual Memory

Topical Outline

addressed caches

The Memory Hierarchy

Virtual Memory Management

management

partitioned into virtual pages

Management Policies

Address Translation: Concepts

Address Translation Implementation: The

Page Table

page address

TLB Operation

The Memory Access Path

every access!

Optimizing the Critical Path:

Principles

access

Overlapping Cache and TLB Access

byte lines and 16 Kbyte pages