Xilinx FPGA Memory Types: Distributed RAM vs. Block RAM - Prof. David Hwang, Study notes of Digital Systems Design

An overview of xilinx fpga memory types, focusing on distributed ram and block ram. It explains the differences between these two types, their configurations, and their use cases. The document also includes vhdl code examples for distributed ram with asynchronous read and distributed dual-port ram with asynchronous read.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-yik-2
koofers-user-yik-2 šŸ‡ŗšŸ‡ø

10 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
ECE 545—Digital System Design with VHDL
Lecture 10
Memories (RAM/ROM)
11/11/08
2
Outline
•Memory
•Distributed RAM
•Block RAM
•Instantiation versus Inference
•VHDL Inference Code
•Distributed RAM
•Block RAM
•ROM
•VHDL Instantiation Code
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Xilinx FPGA Memory Types: Distributed RAM vs. Block RAM - Prof. David Hwang and more Study notes Digital Systems Design in PDF only on Docsity!

1

ECE 545—Digital System Design with VHDL

Lecture 10

Memories (RAM/ROM)

Outline

  • Memory
    • Distributed RAM
    • Block RAM
  • Instantiation versus Inference
  • VHDL Inference Code
    • Distributed RAM
    • Block RAM
    • ROM
  • VHDL Instantiation Code

3

Memory Types

Memory Types

Memory

RAM ROM

Single port Dual port

With asynchronous

read

With synchronous

read

Memory

Memory

7

The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright Ā© 2004 Mentor Graphics Corp. (www.mentor.com)

Xilinx Multipurpose LUT

RAM16X1S

O

D WE WCLK A A A A RAM32X1S

O

D WE WCLK A A A A A RAM16X2S

O

D WE WCLK A A A A

D O

=

=

LUT

LUT or

LUT

RAM16X1D

SPO

D WE WCLK A A A A DPRA0 DPO DPRA DPRA DPRA

or

Distributed RAM

  • CLB LUT configurable as

Distributed RAM

  • An LUT equals 16x1 RAM
  • Cascade LUTs to increase RAM size
  • Synchronous write
  • Asynchronous read
  • Can create a synchronous read by
using extra flip-flops
  • Naturally, distributed RAM read is
asynchronous
  • Two LUTs can make
    • 32 x 1 single-port RAM
    • 16 x 2 single-port RAM
    • 16 x 1 dual-port RAM

9

FPGA Block RAM

Block RAM

Spartan- Dual-Port Block RAM

Port A^ Port B

Block RAM

• Most efficient memory implementation

• Dedicated blocks of memory

• Ideal for most memory requirements

• 4 to 104 memory blocks

  • 18 kbits = 18,432 bits per block (16 k without parity bits)

• Use multiple blocks for larger memories

• Builds both single and true dual-port RAMs

• Synchronous write and read (different from distributed RAM)

13

Block RAM can have various configurations (port

aspect ratios)

0

16,

1

4,

4 0

8,

2 0

2047

8+ 0

1023

16+ 0

16k x 1

8k x 2 4k x 4

2k x (8+1)

1024 x (16+2)

Block RAM Port Aspect Ratios

15

Single-Port Block RAM

Dual-Port Block RAM

19

Inference vs. Instantiation

21

Generic Inferred RAM

25

Distributed RAM with asynchronous read

LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all;

entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0)); end raminfr;

Distributed RAM with asynchronous read

architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits- downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end if; end process; do <= RAM(conv_integer(unsigned(a))); end behavioral;

27

Report from Synthesis

Resource Usage Report for raminfr

Mapping to part: xc3s50pq208-

Cell usage:

GND 1 use

RAM16X4S 8 uses

I/O ports: 69

I/O primitives: 68 IBUF 36 uses

OBUF 32 uses

BUFGP 1 use

I/O Register bits: 0

Register bits not including I/Os: 0 (0%)

RAM/ROM usage summary

Single Port Rams (RAM16X4S): 8

Global Clock Buffers: 1 of 8 (12%)

Mapping Summary:

Total LUTs: 32 (2%)

Report from Implementation

Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of 16 100% Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 32 out of 1,536 2% Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of 124 55% Number of GCLKs: 1 out of 8 12%

31

Distributed RAM with "false" synchronous read

architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits- downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; do <= RAM(conv_integer(unsigned(a))); end if; end process;

end behavioral;

Report from Synthesis

Resource Usage Report for raminfr Mapping to part: xc3s50pq208- Cell usage: FD 32 uses GND 1 use RAM16X4S 8 uses I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use

I/O Register bits: 0 Register bits not including I/Os: 32 (2%)

RAM/ROM usage summary Single Port Rams (RAM16X4S): 8

Global Clock Buffers: 1 of 8 (12%)

Mapping Summary: Total LUTs: 32 (2%)

33

Report from Implementation

Design Summary:

Number of errors: 0

Number of warnings: 0

Logic Utilization:

Number of Slice Flip Flops: 32 out of 1,536 2%

Logic Distribution:

Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of 16 100% Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic

Total Number of 4 input LUTs: 32 out of 1,536 2%

Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of 124 55% Number of GCLKs: 1 out of 8 12%

Total equivalent gate count for design: 4,

Block RAM with synchronous read (read through)

37

Report from Synthesis

Resource Usage Report for raminfr

Mapping to part: xc3s50pq208-

Cell usage:

GND 1 use

RAMB16_S36 1 use

VCC 1 use

I/O ports: 69 I/O primitives: 68

IBUF 36 uses

OBUF 32 uses

BUFGP 1 use

I/O Register bits: 0

Register bits not including I/Os: 0 (0%)

RAM/ROM usage summary

Block Rams : 1 of 4 (25%)

Global Clock Buffers: 1 of 8 (12%)

Mapping Summary:

Total LUTs: 0 (0%)

Report from Implementation

Design Summary:

Number of errors: 0

Number of warnings: 0

Logic Utilization:

Logic Distribution:

Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 69 out of 124 55% Number of Block RAMs: 1 out of 4 25% Number of GCLKs: 1 out of 8 12%

39

Distributed dual-port RAM with asynchronous read

Distributed dual-port RAM with asynchronous read

library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all;

entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM

port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); dpra : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); spo : out std_logic_vector(bits-1 downto 0); dpo : out std_logic_vector(bits-1 downto 0)); end raminfr;