





























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Project; Class: Senior Design Project Lab; Subject: Electrical and Computer Engr; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;
Typology: Study Guides, Projects, Research
1 / 69
This page cannot be seen from the preview
Don't miss anything!






























































By
Ishaan Gupta
Akhilesh Khanna
TA: Richard Marty Cantzler
rd December 2004
Project No. 41
Reconfigurable computing has become increasingly important over the last few years. A lot of
research has been carried out on the advantages of using reconfigurable hardware as an addition to the
conventional processor. Signal Processing is one field where this method has been found to return
significant improvements and this phenomenon is explored in this project.
This projects implements the JPEG compression scheme using reconfigurable hardware. The
FPGA used in the implementation is the Xilinx XSA-100. Data transfer to and from the XSA-board is
done using the P ersonal C omputer M emory C ard I nternational A ssociation (PCMCIA) interface. The
JPEG compression scheme uses the Forward Discrete Cosine Transform (FDCT) DSP algorithm along
with quantization and entropy encoding. This project is intended as a proof of concept that the JPEG
compression can be speeded up through utilization of reconfigurable hardware, thus showing the
acceleration of DSP algorithms in hardware.
Ishaan Gupta
Ishaan Gupta
Ishaan Gupta
Hardware Image Compression
example: the back-projection algorithm used in MRI and CT scans)
DMA type interrupt based access can be used to siphon off work to the reconfigurable logic
Image compression using the JPEG algorithms in hardware
Portable to any logic device family (Xilinx, Altera, and others)
Reduce the load off the embedded processor, by doing a part of its work
The following are the specifications that need to be met in order to implement JPEG compression
on the XSA-100 board.
image has to be transferred to the SRAM, read from the SRAM in 8*8 blocks of pixels, compressed
in hardware, written back to the SRAM and then transferred back from the SRAM. There should not
be any loss or corruption of the image data while the Image is being transferred using the PCMCIA
bus. Thus the PCMCIA communication has to be robust and reliable.
VHDL code written to implement the JPEG compression in hardware has to be synthesizable in
order for it to work on the board.
level of compression.
This project can be broken down into two major components: the compression algorithm and the
PCMCIA interfacing and the synchronization between them for access to the SRAM. Both these
Ishaan Gupta
Hardware Image Compression
components can be implemented and tested separately and then integrated to get a fully working design
on the board.
The PCMCIA is used for the transfer of image data to and from the board. The data is transferred
through the PCMCIA bus onto the SRAM on the XSA-100 board and back from the SRAM. A
PCMCIA card inserted into a computer is used for this purpose. Thus, this component implements the
design of a reliable and robust system of communication between an FPGA board and a computer
system. A printed circuit board (PCB) is used for connection of the header pins on the XSA-100 board
with the pins from the PCMCIA card.
The second component of this project involves the steps that need to be executed in order to perform
compression in hardware. This component can be broken down itself into three separate components
that can be designed and tested independently. These components are:
a. Entropy Encoding
b. Run-Length Encoding
Ishaan Gupta
Hardware Image Compression
The Xilinx ISE tool can also be used to do a manual place and route for the design. This feature
was used in some cases to make sure that everything got routed on the FPGA chip.
ModelSim XE is the tool that is the industry standard for simulating VHDL and verilog hardware
descriptions. It was used to make sure that there were no conceptual bugs in the design of the JPEG
compression pipeline and the PCMCIA and SDRAM controllers.
EasyTrax was used to design the board that connects the PCMCIA breakout board with the FPGA
board. This was very useful as it can be used in conjunction with the machine in the ECE parts shops
and design time can be reduced.
Orcad was used to draw out schematics for the PCMCIA connector and the FPGA board extension
header connections. The Orcad Schematic for the PCMCIA printed circuit board is in the appendix –
Schematics and Images Section A.
Ishaan Gupta
Hardware Image Compression
JPEG compression has a number of steps and each step had different design alternatives. The algorithm
takes an 8x8 image block from the original image and does compression on it. The compressed data
from the 8x8 image block is then written back into the compressed file and the subsequent 8x8 image
blocks are compressed. The algorithm compresses each block independently and
The JPEG compression pipeline is show in Figure 3.1. It basically reads values from the SRAM and
processes them through the various sections of the pipeline. Each of these sections are described in
detail in the following:
Figure 3.1 JPEG compression Pipeline
The FDCT equation is the summation of the input function and cosine functions over the 8x8 block that
is being compressed.
Ishaan Gupta
Hardware Image Compression
higher frequencies, thus the matrix has higher numbers on the bottom right corner and lower number
towards the top left corner.
As entropy encoding is basically a zig-zag traversal of the 8x8 quantized blocks of values, in hardware it
can be implemented as an address translation unit. The address translation unit is used when the values
coming out of the Quantization stage of the pipeline are input into the registers of the Encoding stage of
the pipeline.
Run Length Encoding is the point where the data compression really occurs. In the Run Length
Encoding stage the image data arriving as a result of the previous stages is actually stored using a
smaller number of bits. The input to this stage is a stream of numbers and the output from this stage is
also a stream of numbers with the Nth value in the output stream being a number occurring in the input
stream and the (N+1)th value in the output stream being the number of continuous occurrences of that
number. In the hardware implementation used in this project this is implemented using an input and an
output matrix. The input matrix is the zigzag sequence of the values after they have been quantized and
the output matrix is the run length encoding of the values in the input matrix, with zeroes padded at the
end of the output matrix.
The PCMCIA SDRAM base system consists of an SDRAM controller, PCMCIA state machine, and an
SDRAM arbitrator.
Ishaan Gupta
Hardware Image Compression
The PCMCIA state machine (shown in figure 3.2) reads the PCMCIA signals coming in from the host
(the iPaq), and takes actions accordingly.
Figure 3.2 PCMCIA state machine
The SDRAM controller used is made by Xess Corporation. Xess Corporation is the manufacturer of the
XSA-100 board, which is the board we used to implement our design.
read
read
read
read
read
write
Mem_ack = ‘1’
REG = ‘0’ and OE =
‘0’ and not(CE = “11”)
REG = ‘0’ and WE =
‘0’ and not(CE = “11”)
REG = ‘1’ and WE =
‘0’ and not(CE = “11”)
REG = ‘1’ and OE =
‘0’ and not(CE = “11”)
Mem_ack = ‘1’
Ishaan Gupta
Hardware Image Compression
The design is intended as a proof of concept of the usability of reconfigurable hardware in the
implementation of the DSP algorithms and the speed up obtained in doing the implementation in
hardware. Also, the transfer of image data using the PCMCIA interface is also to be tested. Thus, testing
and verification of the design at various levels is to be done.
The compression algorithm has to be tested and verified to ensure the functionality desired. In
order to do this first a C++ implementation of the algorithm was written up. The C++ code written
implements the JPEG compression using matrix multiplications, as intended to be done in hardware.
Initially individual functions of the algorithm were tested through the usage of test matrices. These
matrices were chosen such that the output generated by the functions could be manually calculated and
checked against the function generated output. Thus, in this manner individual functions were debugged
and corrected. Finally, a test image file was used and the code was again debugged and tested several
times until the desired output was obtained. The algorithm used was hence verified for its functionality
and the matrix implementation derived to implement the compression also achieves the behavior
required.
The design should be able to achieve variable levels of compression through a simple change at a
single point in the code. This is done by changing the values of a simple coefficient matrix used in the
algorithm. This provides for an easy extension to allow the user to decide the amount of compression by
Ishaan Gupta
Hardware Image Compression
simply deciding between a number of coefficient matrices available for use. The matrix to achieve 50%
compression is given below:
Another test done was the compression achieved for different file sizes and different image
dimensions. The output of these tests is outlined in the following table:
Sample Image Image dimensions Original Size(bytes)
Compressed Size using
our algorithm
Ferrari.bmp 1024*768 2,595,295 900,564(65.3%)
aishwarya.bmp 144*150 72,089 26,168(63.7%)
soccer.bmp 1600*1200 6,336,059 2,071,891(67.3%)
football.bmp 481*415 662,323 231,150(65.1%)
Comparison was done with other standard JPEG and PNG compression algorithms were made
and the results obtained are outlined in the following table:
Sample Image
Image
dimensions
Original
Size(bytes)
JPEG
compression
Using paint
(bytes)
PNG compression
Using paint
(bytes)
Ferrari.bmp 1024*768 2,595,295 65,536(97.3%) 593,920(77.1%)
aishwarya.bmp 144*150 72,089 8,192(88.6%) 61,440(14.7%)
soccer.bmp 1600*1200 6,336,059 466,944(92.3%) 4,669,440(26.3%)
football.bmp 481*415 662,323 49,152(92.6%) 458,350(30.8%)
Hence, the algorithm implemented in this project is comparable to other standard compression schemes.
The image Ferrari.bmp in the appendix – Schematics and Images in Sections B and C. These sections
show the original image and the image after being compressed and then decompressed. These images
show that the data loss is almost negligible.
Ishaan Gupta
Hardware Image Compression
properly. These holes were then manually drilled to increase their size which resulted in the holes
becoming too large for the connector pins to be soldered properly on the board. A second PCB board
was then manufactured which did not work initially due to certain issues. One of the issues faced was
that the impedance between some pins mapped on the board was too low. This was determined through
the use of an oscilloscope. The soldering of components on the board was tested using the continuity test
provided by the multimeter. Some extra wires had to be soldered on the board to connect all required
pins.
The pcmcia component is one of the two major components of the design and was designed and
tested independently. In order to test the PCMCIA interface an SDRAM controller module was needed
and the one that is provided by Xess Corporation was used. The PCMCIA –SDRAM synchronous
controller was tested and found to be working perfectly. These tests were done using the following
benchmarks:
6 writes and reads.
All of the above tests gave the required results showing proper working. Once these small steps were
implemented and tested, further critical testing was done by loading and unloading entire image files to
and from the SRAM. Debugging was done using logic analyzer traces and the final working design
returned the loaded image unchanged.
This component of the design was unable to be tested properly as the problem of lack of space on the
board was encountered. Even individual components of the compression algorithm i.e. the FDCT
algorithm, quantization, zigzag encoding or even the smallest component, the Run-Length Encoding
Ishaan Gupta
Hardware Image Compression
were too large to fit on the board with the PCMCIA implementation. Thus this part of the design could
not be tested.
From the above tests the design was constantly improved and debugged, making it more robust
and effective. The tests were implemented in the order described above as this project involved the
accomplishment of so many different steps. 6 of the 7 level of tests were able to be completed and the
last test case turned out to be the stumbling block due to lack of space on the XSA-100 board.