Basics - High Performance Embedded Systems Design - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of High Performance Embedded Systems Design which includes Performance Analysis, Digital Camera Case Study, Simple Digital Camera, Requirements Specification, Design, Four Implementations, Custom, Standard, Memory etc. Key important points are: Basics, Hardware Design, Combinational Logic, Sequential Logic, Custom Single Purpose, Single Purpose Processor, Inputs, Controller and Datapath, Templates, Algorithm

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhritiman
dhritiman šŸ‡®šŸ‡³

4.7

(6)

106 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2-Hardware Design Basics of
Embedded Processors
(cont.)
1
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Basics - High Performance Embedded Systems Design - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

2-Hardware Design Basics of

Embedded Processors

(cont.)

Outline

• Introduction

• Combinational logic

• Sequential logic

• Custom single-purpose processor design

• RT-level custom single-purpose processor

design

Example: greatest common divisor

• First create algorithm

• Convert algorithm to ā€œcomplexā€

state machine

  • Known as FSMD: finite-state

machine with datapath

  • Can use templates to perform such

conversion

GCD

(a) black-box

view

x_i y_i

d_o

go_i

0: int x, y;

1: while (1) {

2: while (!go_i);

3: x = x_i;

4: y = y_i;

5: while (x != y) {

6: if (x < y)

7: y = y - x;

else

8: x = x - y;

9: d_o = x;

(b) desired functionality

(c) state

diagram

State diagram templates

Assignment statement

a = b

next statement

a = b

next

statement

Loop statement

while (cond) {

loop-body- statements

}

next statement

loop-body-

statements

cond

next

statement

!cond

J:

C:

Branch statement

if (c1)

c1 stmts

else if c c2 stmts

else

other stmts

next statement

c

c2 stmts

!c1&c

!c1&!c

next

statement

c1 stmts others

J:

C:

Creating the datapath

• Create a register for any declared

variable

• Create a functional unit for each

arithmetic operation

• Connect the ports, registers and

functional units

  • Based on reads and writes
  • Use multiplexors for multiple

sources

• Create unique identifier

  • for each datapath component

control input and output

7: y = y -x 8: x = x - y

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

5-J:

3: x = x_i

4: y = y_i

2-J:

!go_i

!(!go_i)

d_o = x

1-J:

subtractor subtractor 5: x!=y 6: x<y 8: x-y 7: y-x

x_i y_i

d_o

0: x 0: y

9: d

n-bit 2x1 n-bit 2x

x_sel

y_sel x_ld

y_ld

x_neq_y

x_lt_y d_ld

5: x!=y

Datapath

Creating the controller’s FSM

• Same structure as FSMD

• Replace complex actions/conditions

with datapath configurations

7: y = y -x 8: x = x - y

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

5-J:

3: x = x_i

4: y = y_i

2-J:

!go_i

!(!go_i)

d_o = x

1-J:

y_sel = 1 y_ld = 1

7: x_sel = 1 x_ld = 1

6-J:

x_neq_y

!x_neq_y

x_lt_y !x_lt_y

5-J:

d_ld = 1

1-J:

x_sel = 0 3: x_ld = 1

y_sel = 0 4: y_ld = 1

2-J:

!go_i

!(!go_i)

go_i

Controller

subtractor subtractor 5: x!=y 6: x<y 8: x-y 7: y-x

x_i y_i

d_o

0: x 0: y

9: d

n-bit 2x1 n-bit 2x

x_sel

y_sel x_ld

y_ld

x_neq_y

x_lt_y d_ld

5: x!=y

Datapath

Controller state table for the GCD

example

Inputs Outputs

  • 0 0 0 0 * * * 0 0 0 1 X X Q3 Q2 Q1 Q0 x_neq_y x_lt_y go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
  • 0 0 0 1 * * 0 0 0 1 0 X X
  • 0 0 0 1 * * 1 0 0 1 1 X X
  • 0 0 1 0 * * * 0 0 0 1 X X
  • 0 0 1 1 * * * 0 1 0 0 0 X
  • 0 1 0 0 * * * 0 1 0 1 X
  • 0 1 0 1 0 * * 1 0 1 1 X X
  • 0 1 0 1 1 * * 0 1 1 0 X X
  • 0 1 1 0 * 0 * 1 0 0 0 X X
  • 0 1 1 0 * 1 * 0 1 1 1 X X
  • 0 1 1 1 * * * 1 0 0 1 X
  • 1 0 0 0 * * * 1 0 0 1 1 X
  • 1 0 0 1 * * * 1 0 1 0 X X
  • 1 0 1 0 * * * 0 1 0 1 X X
  • 1 0 1 1 * * * 1 1 0 0 X X
  • 1 1 0 0 * * * 0 0 0 0 X X
  • 1 1 0 1 * * * 0 0 0 0 X X
  • 1 1 1 0 * * * 0 0 0 0 X X
  • 1 1 1 1 * * * 0 0 0 0 X X

Completing the GCD custom single-

purpose processor design

• We finished the

datapath

• We have a state table

for the next state and

control logic

– All that’s left is

combinational logic

design

• This is not an

optimized design,

a view inside the controller and datapath

controller datapath

state

register

next-state

and

control

logic

registers

functional

units

RT-level custom single-purpose processor

design (cont’)

WaitFirst4 RecFirst4Start

data_lo_ld=

WaitSecond

rdy_in=

rdy_in=

RecFirst4End

rdy_in=

RecSecond4Start

data_hi_ld=

RecSecond4End

rdy_in=0 rdy_in=

rdy_in=

rdy_in=

Send 8 Start

data_out_ld=

rdy_out=

Send8End

rdy_out=

(a) Controller

rdy_in rdy_out

data_hi data_lo

data_in(4)

(b) Datapath

data_out_ld^ data_hi_ld data_out data_lo_ld

clk

to all registers

data_out

Bridge

Optimizing single-purpose

processors

• Optimization is the task of making design

metric values the best possible

• Optimization opportunities

– original program

– FSMD

– datapath

– FSM

Optimizing the original program

(cont’)

0: int x, y;

1: while (1) {

2: while (!go_i);

3: x = x_i;

4: y = y_i;

5: while (x != y) {

6: if (x < y)

7: y = y - x;

else

8: x = x - y;

9: d_o = x;

0: int x, y, r;

1: while (1) {

2: while (!go_i);

// x must be the larger number

3: if (x_i >= y_i) {

4: x=x_i;

5: y=y_i;

6: else {

7: x=y_i;

8: y=x_i;

9: while (y != 0) {

10: r = x % y;

11: x = y;

12: y = r;

13: d_o = x;

original program optimized program

replace the subtraction

operation(s) with modulo

operation in order to speed

up program

GCD(42, 8) - 9 iterations to complete the loop

x and y values evaluated as follows : (42, 8), (34, 8),

GCD(42,8) - 3 iterations to complete the loop

x and y values evaluated as follows: (42, 8), (8,2),

Optimizing the FSMD

• Areas of possible improvements

– merge states

• states with constants on transitions can be eliminated ,

transition taken is already known

• states with independent operations can be merged

– separate states

• states which require complex operations (abc*d) can

be broken into smaller states to reduce hardware size

– scheduling

Optimizing the datapath

• Sharing of functional units

– one-to-one mapping, as done previously, is not

necessary

– if same operation occurs in different states, they

can share a single functional unit

• Multi-functional units

– ALUs support a variety of operations, it can be

shared among operations occurring in different

states

Optimizing the FSM

• State encoding

– task of assigning a unique bit pattern to each state

in an FSM

– size of state register and combinational logic vary

– can be treated as an ordering problem

• State minimization

– task of merging equivalent states into a single

state

• state equivalent if for all possible input combinations

the two states generate the same outputs and

transitions to the next same state