Download Simple Digital Camera - Embedded System Design - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
Outline
• Introduction to a simple digital camera
• Designer’s perspective
• Requirements specification
• Design
– Four implementations
Introduction
• Putting it all together
– General-purpose processor
– Single-purpose processor
• Custom
• Standard
– Memory
– Interfacing
• Knowledge applied to designing a simple digital
camera
– General-purpose vs. single-purpose processors
– Partitioning of functionality among different processor
types
Designer’s perspective
• Two key tasks
– Processing images and storing in memory
• When shutter pressed:
– Image captured
– Converted to digital form by charge-coupled device (CCD)
– Compressed and archived in internal memory
– Uploading images to PC
• Digital camera attached to PC
• Special software commands camera to transmit archived images
serially
CCD module
- Simulates real CCD
- CcdInitialize is passed name of image file
- CcdCapture reads “image” from file
- CcdPopPixel outputs pixels one at a time
char CcdPopPixel(void) {
char pixel;
pixel = buffer[rowIndex][colIndex];
if( ++colIndex == SZ_COL ) {
colIndex = 0;
if( ++rowIndex == SZ_ROW ) {
colIndex = -1;
rowIndex = -1;
return pixel;
#include <stdio.h>
#define SZ_ROW 64
#define SZ_COL (64 + 2)
static FILE *imageFileHandle;
static char buffer[SZ_ROW][SZ_COL];
static unsigned rowIndex, colIndex;
void CcdInitialize(const char *imageFileName) {
imageFileHandle = fopen(imageFileName, "r");
rowIndex = -1;
colIndex = -1;
void CcdCapture(void) {
int pixel;
rewind(imageFileHandle);
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
if( fscanf(imageFileHandle, "%i", &pixel) == 1 ) {
buffer[rowIndex][colIndex] = (char)pixel;
rowIndex = 0;
colIndex = 0;
UART module
• Actually a half UART
– Only transmits, does not receive
• UartInitialize is passed name of file to output
to
• UartSend transmits (writes to output file)
bytes at a time
#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, "w");
void UartSend(char d) {
fprintf(outputFileHandle, "%i\n", (int)d);
CODEC (cont.)
- Implementing FDCT formula C(h) = if (h == 0) then 1/sqrt(2) else 1. F(u,v) = ¼ x C(u) x C(v) Σx=0..7 Σy=0..7 Dxy x cos(π(2u + 1)u/16) x cos(π(2y + 1)v/16)
- Only 64 possible inputs to COS , so table can be used to save
performance time
- Floating-point values multiplied by 32,678 and rounded to nearest integer
- 32,678 chosen in order to store each value in 2 bytes of memory
- Fixed-point representation explained more later
- FDCT unrolls inner loop of summation, implements outer
summation as two consecutive for loops
static short ONE_OVER_SQRT_TWO = 23170; static double COS(int xy, int uv) { return COS_TABLE[xy][uv] / 32768.0; } static double C(int h) { return h? 1.0 : ONE_OVER_SQRT_TWO / 32768.0; }
static const short COS_TABLE[8][8] = { { 32768, 32138, 30273, 27245, 23170, 18204, 12539, 6392 }, { 32768, 27245, 12539, -6392, -23170, -32138, -30273, -18204 },
{ 32768, 18204, -12539, -32138, -23170, 6392, 30273, 27245 }, { 32768, 6392, -30273, -18204, 23170, 27245, -12539, -32138 }, { 32768, -6392, -30273, 18204, 23170, -27245, -12539, 32138 }, { 32768, -18204, -12539, 32138, -23170, -6392, 30273, -27245 }, { 32768, -27245, 12539, 6392, -23170, 32138, -30273, 18204 },
{ 32768, -32138, 30273, -27245, 23170, -18204, 12539, -6392 } };
static int FDCT(int u, int v, short img[8][8]) { double s[8], r = 0; int x; for(x=0; x<8; x++) { s[x] = img[x][0] * COS(0, v) + img[x][1] * COS(1, v) +
img[x][2] * COS(2, v) + img[x][3] * COS(3, v) + img[x][4] * COS(4, v) + img[x][5] * COS(5, v) + img[x][6] * COS(6, v) + img[x][7] * COS(7, v); } for(x=0; x<8; x++) r += s[x] * COS(x, u);
return (short)(r * .25 * C(u) * C(v)); }
CNTRL (controller) module
- Heart of the system
- CntrlInitialize for consistency with other modules only
- CntrlCaptureImage uses CCDPP module to input image and place in buffer
- CntrlCompressImage breaks the 64 x 64 buffer into 8 x 8 blocks and performs FDCT on each block using the CODEC module - Also performs quantization on each block
- CntrlSendImage transmits encoded image serially using UART module
void CntrlSendImage(void) { for(i=0; i<SZ_ROW; i++) for(j=0; j<SZ_COL; j++) { temp = buffer[i][j]; UartSend(((char)&temp)[0]); / send upper byte / UartSend(((char)&temp)[1]); /* send lower byte */ } } }
#define SZ_ROW 64 #define SZ_COL 64 #define NUM_ROW_BLOCKS (SZ_ROW / 8) #define NUM_COL_BLOCKS (SZ_COL / 8) static short buffer[SZ_ROW][SZ_COL], i, j, k, l, temp;
void CntrlInitialize(void) {}
void CntrlCaptureImage(void) { CcdppCapture(); for(i=0; i<SZ_ROW; i++)
for(j=0; j<SZ_COL; j++) buffer[i][j] = CcdppPopPixel(); }
void CntrlCompressImage(void) { for(i=0; i<NUM_ROW_BLOCKS; i++) for(j=0; j<NUM_COL_BLOCKS; j++) {
for(k=0; k<8; k++) for(l=0; l<8; l++) CodecPushPixel( (char)buffer[i * 8 + k][j * 8 + l]); CodecDoFdct();/* part 1 - FDCT */
for(k=0; k<8; k++) for(l=0; l<8; l++) { buffer[i * 8 + k][j * 8 + l] = CodecPopPixel(); /* part 2 - quantization / buffer[i8+k][j*8+l] >>= 6; } } }
Putting it all together
• Main initializes all modules, then uses
CNTRL module to capture, compress,
and transmit one image
• This system-level model can be used for
extensive experimentation
– Bugs much easier to correct here rather
than in later models
int main(int argc, char *argv[]) { char *uartOutputFileName = argc > 1? argv[1] : "uart_out.txt"; char imageFileName = argc > 2? argv[2] : "image.txt"; / initialize the modules / UartInitialize(uartOutputFileName); CcdInitialize(imageFileName); CcdppInitialize(); CodecInitialize(); CntrlInitialize(); / simulate functionality */ CntrlCaptureImage(); CntrlCompressImage(); CntrlSendImage(); }
Implementation 1: Microcontroller
alone
• Low-end processor could be Intel 8051 microcontroller
• Total IC cost including NRE about $
• Well below 200 mW power
• Time-to-market about 3 months
• However, one image per second not possible
- 12 MHz, 12 cycles per instruction
- Executes one million instructions per second
- CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
- ~100 assembly instructions each iteration
- 409,000 (4096 x 100) instructions per image
- Half of budget for reading image alone
- Would be over budget after adding compute-intensive DCT and Huffman encoding
Implementation 2:
Microcontroller and CCDPP
- CCDPP function implemented on custom single-purpose processor
- Improves performance – less microcontroller cycles
- Increases NRE cost and time-to-market
- Easy to implement
- Simple datapath
- Few states in controller
- Simple UART easy to implement as single-purpose processor also
- EEPROM for program memory and RAM for data memory added as well
UART CCDPP
EEPROM RAM
SOC
UART
- UART in idle mode until invoked
- UART invoked when 8051 executes store instruction with UART’s
enable register as target address
- Memory-mapped communication between 8051 and all
single-purpose processors
- Lower 8-bits of memory address for RAM
- Upper 8-bits of memory address for memory-mapped I/O
devices
- Start state transmits 0 indicating start of byte transmission then
transitions to Data state
- Data state sends 8 bits serially then transitions to Stop state
- Stop state transmits 1 indicating transmission done then
transitions back to idle mode
invoked
I = 8
I < 8
Idle
I = 0
Start :
Transmi t LOW
Data :
Transmit data(I), then I++
Stop :
Transmi t HIGH
FSMD description of UART
CCDPP
- Hardware implementation of zero-bias operations
- Interacts with external CCD chip
- CCD chip resides external to our SOC mainly because combining CCD with ordinary logic not feasible
- Internal buffer, B , memory-mapped to 8051
- Variables R , C are buffer’s row, column indices
- GetRow state reads in one row from CCD to B
- 66 bytes: 64 pixels + 2 blacked-out pixels
- ComputeBias state computes bias for that row and stores in
variable Bias
- FixBias state iterates over same row subtracting Bias from each
element
- NextRow transitions to GetRow for repeat of process on next row
or to Idle state when all 64 rows completed
C = 64
C < 64
R = 64 C = 66
invoked
R < 64
C < 66
Idle :
R=
C=
GetRow :
B[R][C]=Pxl C=C+
ComputeBias :
Bias=(B[R][11] + B[R][10]) / 2 C=
NextRow :
R++
C=
FixBias :
B[R][C]=B[R][C]-Bias
FSMD description of CCDPP
Software
- System-level model provides majority of code
- Module hierarchy, procedure names, and main program unchanged
- Code for UART and CCDPP modules must be redesigned
- Simply replace with memory assignments
- xdata used to load/store variables over external memory bus
- at specifies memory address to store these variables
- Byte sent to U_TX_REG by processor will invoke UART
- U_STAT_REG used by UART to indicate its ready for next byte
- UART may be much slower than processor
- Similar modification for CCDPP code
- All other modules untouched
static unsigned char xdata U_TX_REG at 65535; static unsigned char xdata U_STAT_REG at 65534; void UARTInitialize(void) {} void UARTSend(unsigned char d) { while( U_STAT_REG == 1 ) { /* busy wait */ } U_TX_REG = d; }
Rewritten UART module
#include <stdio.h> static FILE *outputFileHandle; void UartInitialize(const char *outputFileName) { outputFileHandle = fopen(outputFileName, "w"); } void UartSend(char d) { fprintf(outputFileHandle, "%i\n", (int)d); }
Original code from system-level model
Analysis
- Entire SOC tested on VHDL simulator
- Interprets VHDL descriptions and functionally simulates
execution of system
- Recall program code translated to VHDL
description of ROM
- Tests for correct functionality
- Measures clock cycles to process one image
(performance)
- Gate-level description obtained through synthesis
- Synthesis tool like compiler for SPPs
- Simulate gate-level models to obtain data for power
analysis
- Number of times gates switch from 1 to 0 or 0 to
- Count number of gates for chip area
Power
VHDL
simulator
VHDL VHDL VHDL
Execution time
Synthesis tool
gates gates gates
Sum gates
Gate level simulator
Power equation
Chip area
Obtaining design metrics of interest