Download Understanding the Processor: Architecture, Instruction Execution, and Memory Management and more Slides Architecture in PDF only on Docsity!
Computer Organization and
Assembly Language
Lecture 2 – x86 Processor
Architecture
What is a processor?
• CPU ( C entral P rocessing U nit) or Processor
- is the brain of the computer.
• In the PC, the Processor is in the Intel
80x86 or Pentium family.
What does the processor contain?
• Busses - Carries data, control signals and
addresses between the processor
components and other devices within the
computer.
• Registers - High-speed memory units within
the CPU.
• Clock - synchronizes all the steps in
fetching, decoding and executing
instructions.
Basic Microprocessor Design
Central Processor Unit (CPU)
registers
ALU CU clock
Memory
Storage
Unit
I/O
Device
I/O
Device
data bus
control bus address bus
Instruction Execution Cycle
I-1 I-2 I-3 I-
PC (^) program
op op
ALU
I-
write write
read
fetch
flags
decode
execute
instruction queue
memory (^) registers registers
Intel 32-bit Architecture
The 80386 was Intel’s first 32-bit processor and the first to
include parallel stages of execution. The parts that carry
these out are:
BIU ( B us I nterface U nit) - accesses memory and provides input-output.
- Code Prefetch Unit - receive machine instructions from BIU & inserts them into the prefetch queue.
- Instruction Decode Unit - decodes machine instructions and converts them into microcode instructions.
- Execution Unit - executions microcode instructions.
- Segment Unit - translates logical addresses into linear addresses (and performs protection checks).
- Paging Unit - translates linear addresses into physical addresses, performs page protections checks and keeps track of recently accessed pages.
Six-Stage Non-Pipeline Execution
S-1 S-2 S-3 S-4 S-5 S-
1 I-
2 I-
3 I-
4 I-
5 I-
6 I-
7 I-
8 I-
9 I-
10 I-
11 I-
12 I-
Without pipelining,
CPU resources are
wasted
Cycles
Stages
For k stages, n
instructions require
n ×××× k clock cycles
Six-Stage Pipeline Instruction
S-1 S-2 S-3 S-4 S-5 S-
1 I-
2 I-2 I-
3 I-2 I-
4 I-2 I-
5 I-2 I-
6 I-2 I-
7 I-
Stages
Cycles
With pipelining,
a second instruction
can begin execution
almost immediately
and finish sooner
n instructions require
k + (n-1) clock cycles
Reading From Memory
• Memory access is an important factor in
understanding program execution speed
because memory access via the system bus
is much slower than the CPU clock.
• The clock cycles that are wasted while
waiting for operands to be fetched are called
wait states.
Cache Memory
• Cache memory saves data received fetched
from or written to memory. Since it is
much faster than conventional memory,
there are fewer wait states.
• Level-1 cache is built into the processor.
• Level-2 cache is located on separate chips
near the processor.
Load and Execute Process
When you “tell” the computer to run a program,
certain things happen:
- The user issues a command to run the program
- The operating system (OS) finds the program’s
filename in the system directory, if necessary searching through the path for the name.
- The OS retrieves the basic file information,
including size and disk location.
- The OS determines a memory location for the file
and reads it in and creates a process table entry for it.
Load and Execute Process (continued)
- The OS executes a branching instruction,
beginning program execution, creating a new process (the user’s program).
- The process runs by itself, with the OS keeping
track of its use of system resources.
- When the program is finished, its table entry and
memory are made available for reuse.
IA-32 Processor Address Space
- In protected mode IA-32 processors can access up to 4 Gigabytes of storage, with memory addresses from 0 to 2^32 -1.
- In real mode, a maximum of 1 megabyte of memory can be accessed with memory addresses from 0 to 2^10 -1.
- The IA-32 processors provide a Virtual 8086 where multiple MS-DOS programs can run safely within an Windows environment.
32-bit Register General Purpose
EAX
31 0
EBX
ECX
EDX
Status and Control
AX
BX
CX
DX
Flags
IP
EFLAGS
EIP
Index
EBP
ESP
ESI
EDI
Segment CS SS DS ES FS GS
16-bit Processor Architecture
General Purpose Registers
AX
BX
CX
DX
AH
BH
CH
DH
AL
BL
CL
DL
AX (Accumulator) - favored for
arithmetic opertions
BX (Base) - Holds base address
for procedures and variables
CX (Counter) - Used as a counter
for looping operations
DX (Data) - Used in mulitplication
and division operations.
15 0
7 0 7 0
Segment Registers
Segment registers are used to hold base addresses
for program code, data and the stack.
15 0 CS 15 0 SS 15 0 DS 15 0 ES
CS (Code Segment) - holds the base
address for all executable instructions
in the program
SS (Stack Segment) - holds the base
address for the stack
DS (Data Segment) - holds the base
address for variables
ES (Extra Segment) - an additional base
address value for variable.
Flags
There are two types of flags: control flags (which determine
how instructions are carried out) and status flags (which
report on the results of operations.
Control flags include:
- Direction Flag (DF) - affects the direction of block data
transfers (like long character string). 1 = up; 0 - down.
- Interrupt Flag (IF) - determines whether interrupts can
occur (whether hardware devices like the keyboard,
disk drives, and system clock can get the CPU’s
attention to get their needs attended to.
- Trap Flag (TF) - determines whether the CPU is halted
after every instruction. Used for debugging purposes.
Status Flags
- Status Flags include:
- Carry Flag (CF) - set when the result of unsigned arithmetic is too large to fit in the destination. 1 = carry; 0 = no carry.
- Overflow Flag (OF) - set when the result of signed arithmetic is too large to fit in the destination. 1 = overflow; 0 = no overflow.
- Sign Flag (SF) - set when an arithmetic or logical operation generates a negative result. 1 = negative; 0 = positive.
- Zero Flag (ZF) - set when an arithmetic or logical operation generates a result of zero. Used primarily in jump and loop operations. 1 =zero; 0 = not zero.
- Auxiliary Carry Flag - set when an operation causes a carry from bit 3 to 4 or borrow (frombit 4 to 3). 1 = carry, 0 = no carry.
- Parity - used to verify memory integrity. Even # of 1s = Even parity; Odd # of 1s = Odd Parity
Floating-Point Unit
ST(0)
ST(1)
ST(2)
ST(3)
ST(5)
ST(6)
ST(7)
ST(4)
Opcode Register
80-bit Data Registers 48-bit Pointer Registers FPU Instruction Pointer
FPU Data Pointer
Tag Register
Control Register
Status Register
16-bit Control Registers
The Intel Microprocessor Family
- The Intel family owes its origins to the 8080 , an 8-bit
processor which could only access 64 kilobytes of
memory.
- The 8086 (1978) had 16-bit registers, a 16-bit data bus, 20-
bit memory using segmented memory. The IBM PC used
the 8088 , which was identical except it used an 8-bit data
bus.
- 8087 - a math co-processor that worked together with the
8086/8088. Without it, floating point arithmetic require
complex software routines.
- 80286 - ran in real mode (like the 8086/8088) or in
protected mode could access up tp 16MB using 24-bit
addressing with a clock spped between 12 and 25 MHz. Its
math co-processor was the 80287.
Intel Core Processor Family
• Intel introduce the Core family of
processors in 2006, which are more
powerful than the Pentium processors that
preceded them.
• So far they include:
– Core 2 Duo – 2 processors codes, 1.8-3.3 GHz,
64 bit, 6 MByte L2 cache.
– Core 2 Quad - 4 processors codes, up to 12
MByte L2 cache, 1333 MHz front side bus.
CISC Architecture
• The Intel processors have been based on the
CISC ( C omplex I nstruction S et C omputer)
approach to processor design.
• CISC processors have large , powerful
instruction sets that can include many high-
level operations. But the size of the
instruction set makes the control unit
relatively slow.
RISC Architecture
• RISC computers use smaller, streamlined
instruction sets that allow their control units
to be quicker.
• Intel processors are backwards-compatible
and are basically CISC but use RISC
features such as pipelining and superscalar.
Segmented Memory Map, Real-Address
Mode
F E C
D B A 90000 80000 70000 60000 50000 40000 30000 20000 10000 00000
8000:FFFF
8000:
segment (^) offset
0250
8000:
Protected Mode Memory Management
- When the processor runs in protected mode, a program can
access up to 4 gigabytes of memory.
- Although the programmer’s view of memory is a flat
image of 4 GB, the operating system works in the
background to create and maintain this image.
- The segment registers point to segment descriptor tables,
which define locations of the program segments:
- CS refers to the code segment’s descriptor table
- DS refers to the data segment’s descriptor table
- SS refers to the stack segment’s descriptor table
Flat Segmentation Memory Model
base address
limit access
physical RAM
not used
00040000
00000000
Multi-Segment Memory Model
base address limit^ access
00008000 000A
Local Descriptor Table
8000 3000
26000
Paging
- IA-32 architecture also allows memory segments
to be divided into 4K units called pages.
- Many of these pages of memory are saved on disk
in a swap file and are loaded into memory (and rewritten in the swap file) when the CPU needs a page that is not present in physical memory. This situation is called a page fault.
- The use of paging and swap files allows the
memory used to be several times larger than physical memory; it is known as virtual memory.