Download Jump Instructions-Microprocessor and Assembly Language Programming-Lecture Notes and more Study notes Microprocessor and Assembly Language Programming in PDF only on Docsity!
Till now we have accumulated the very basic tools of assembly language
programming. A very important weapon in our arsenal is the conditional
jump instruction. During the course of last two chapters we used these tools
to write two very useful algorithms of sorting and multiplication. The
multiplication algorithm is useful even though there is a MUL instruction in
the 8088 instruction set, which can multiply 8bit and 16bit operands. This is
because of the extensibility of our algorithm, as it is not limited to 16bits and
can do 32bit or 64bit multiplication with minor changes.
Both of these algorithms will be used a number of times in any program of
a reasonable size and complexity. An application does not only need to
multiply at a single point in code; it multiplies at a number of places. If
multiplication or sorting is needed at 100 places in code, copying it 100
times is a totally infeasible solution. Maintaining such a code is an
impossible task.
The straightforward solution to this problem using the concepts we have
acquainted till now is to write the code at one place with a label, and
whenever we need to sort we jump to this label. But there is problem with
this logic, and the problem is that after sorting is complete how the processor
will know where to go back. The immediate answer is to jump back to a label
following the jump to bubble sort. But we have jumped to bubble sort from
100 places in code. Which of the 100 positions in code should we jump
back? Jump back at the first invocation, but jump has a single fixed target.
How will the second invocation work? The second jump to bubble sort will
never have control back at the next line.
Instruction are tied to one another forming an execution thread, just like a
knitted thread where pieces of cotton of different sizes are twisted together to
form a thread. This thread of execution is our program. The jump instruction
breaks this thread permanently, making a permanent diversion, like a turn
on a highway. The conditional jump selects one of the two possible
directions, like right or left turn on a road. So there is no concept of
returning.
However there are roundabouts on roads as well that take us back from
where we started after having traveled on the boundary of the round. This is
the concept of a temporary diversion. Two or more permanent diversions can
take us back from where we started, just like two or more road turns can
take us back to the starting point, but they are still permanent diversions in
their nature.
We need some way to implement the concept of temporary diversion in
assembly language. We want to create a roundabout of bubble sort, another
roundabout of our multiplication algorithm, so that we can enter into the
roundabout whenever we need it and return back to wherever we left from
after completing the round.
Key point in the above discussion is returning to where we left from, like a
loop in a knitted thread. Diversion should be temporary and not permanent.
The code of bubble sort written at one place, multiply at another, and we
temporarily divert to that place, thus avoiding a repetition of code at a 100
places.
CALL and RET
In every processor, instructions are available to divert temporarily and to
divert permanently. The instructions for permanent diversion in 8088 are the
jump instructions, while the instruction for temporary diversion is the CALL
instruction. The word call must be familiar to the readers from subroutine
call in higher level languages. The CALL instruction allows temporary
diversion and therefore reusability of code. Now we can place the code for
bubble sort at one place and reuse it again and again. This was not possible
with permanent diversion. Actually the 8088 permanent diversion
mechanism can be tricked to achieve temporary diversion. However it is not
possible without getting into a lot of trouble. The key idea in doing it this way
is to use the jump instruction form that takes a register as argument.
Therefore this is not impossible but this is not the way it is done.
The natural way to do this is to use the CALL instruction followed by a
label, just like JMP is followed by a label. Execution will divert to the code
following the label. Till now the operation has been similar to the JMP
instruction. When the subroutine completes we need to return. The RET
instruction is used for this purpose. The word return holds in its meaning
that we are to return from where we came and need no explicit destination.
Therefore RET takes no arguments and transfers control back to the
instruction following the CALL that took us in this subroutine. The actual
technical process that informs RET where to return will be discussed later
after we have discussed the system stack.
CALL takes a label as argument and execution starts from that label, until
the RET instruction is encountered and it takes execution back to the
instruction following the CALL. Both the instructions are commonly used as
a pair, however technically they are independent in their operation. The RET
works regardless of the CALL and the CALL works regardless of the RET. If
you CALL a subroutine it will not complain if there is no RET present and
similarly if you RET without being called it won’t complain. It is a logical pair
and is used as a pair in every decent code. However sometimes we play tricks
with the processor and we use CALL or RET alone. This will become clear
when we need to play such tricks in later chapters.
Bubble Sort
Swap
Program
takes two bytes. Left shifting has been used to multiply by two.
Base+index+offset addressing has been used. BX holds the start of
array, SI the offset into it and an offset of 2 when the next element is
to be read. BX can be directly changed but then a separate counter
would be needed, as SI is directly compared with CX in our case.
The code starting from the start label is our main program
analogous to the main in the C language. BX and CX hold our
parameters for the bubblesort subroutine and the CALL is made to
invoke the subroutine.
Inside the debugger we observe the same unsigned data that we are so
used to now. The number 0103 is passed via BX to the subroutine which is
the start of our data and the number 000A via CX which is the number of
elements in our data. If we step over the CALL instruction we see our data
sorted in a single step and we are at the termination instructions. The
processor has jumped to the bubblesort routine, executed it to completion,
and returned back from it but the process was hidden due to the step over
command. If however we trace into the CALL instruction, we land at the first
instruction of our routine. At the end of the routine, when the RET
instruction is executed, we immediately land back to our termination
instructions, to be precise the instruction following the CALL.
Also observe that with the CALL instruction SP is decremented by two from
FFFE to FFFC, and the stack windows shows 0150 at its top. As the RET is
executed SP is recovered and the 0150 is also removed from the stack. Match
it with the address of the instruction following the CALL which is 0150 as
well. The 0150 removed from the stack by the RET instruction has been
loaded into the IP register thereby resuming execution from address 0150.
CALL placed where to return on the stack for the RET instruction. The stack
is automatically used with the CALL and RET instructions. Stack will be
explained in detail later, however the idea is that the one who is departing
stores the address to return at a known place. This is the place using which
CALL and RET coordinate. How this placed is actually used by the CALL and
RET instructions will be described after the stack is discussed.
After emphasizing reusability so much, it is time for another example
which uses the same bubblesort routine on two different arrays of different
sizes.
Example
; bubble sort subroutine called twice [org 0x0100] jmp start
data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0 data2: dw 328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98 dw 888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5 swap: db 0
bubblesort: dec cx ; last element not compared shl cx, 1 ; turn into byte count
mainloop: mov si, 0 ; initialize array index to zero mov byte [swap], 0 ; reset swap flag to no swaps
innerloop: mov ax, [bx+si] ; load number in ax cmp ax, [bx+si+2] ; compare with next number jbe noswap ; no swap if already in order
mov dx, [bx+si+2] ; load second element in dx mov [bx+si], dx ; store first number in second mov [bx+si+2], ax ; store second number in first mov byte [swap], 1 ; flag that a swap has been done
noswap: add si, 2 ; advance si to next index
cmp si, cx ; are we at last index jne innerloop ; if not compare next two
cmp byte [swap], 1 ; check if a swap has been done je mainloop ; if yes make another pass
ret ; go back to where we came from
start: mov bx, data ; send start of array in bx mov cx, 10 ; send count of elements in cx call bubblesort ; call our subroutine
mov bx, data2 ; send start of array in bx mov cx, 20 ; send count of elements in cx call bubblesort ; call our subroutine again
mov ax, 0x4c00 ; terminate program int 0x
There are two different data arrays declared. One of 10 elements and
the other of 20 elements. The second array is declared on two lines,
where the second line is continuation of the first. No additional label
is needed since they are situated consecutively in memory.
The other change is in the main where the bubblesort subroutine is
called twice, once on the first array and once on the second.
Inside the debugger observe that stepping over the first call, the first array
is sorted and stepping over the second call the second array is sorted. If
however we step in SP is decremented and the stack holds 0178 which is the
address of the instruction following the call. The RET consumes that 0178
and restores SP. The next CALL places 0181 on the stack and SP is again
decremented. The RET consumes this number and execution resumes from
the instruction at 0181. This is the coordinated function of CALL and RET
using the stack.
In both of the above examples, there is a shortcoming. The subroutine to
sort the elements is destroying the registers AX, CX, DX, and SI. That means
that the caller of this routine has to make sure that it does not hold any
important data in these registers before calling this function, because after
the call has returned the registers will be containing meaningless data for the
caller. With a program containing thousands of subroutines expecting the
caller to remember the set of modified registers for each subroutine is
unrealistic and unreasonable. Also registers are limited in number, and
restricting the caller on the use of register will make the caller’s job very
tough. This shortcoming will be removed using the very important system
stack.
STACK
Stack is a data structure that behaves in a first in last out manner. It can
contain many elements and there is only one way in and out of the container.
When an element is inserted it sits on top of all other elements and when an
element is removed the one sitting at top of all others is removed first. To
visualize the structure consider a test tube and put some balls in it. The
second ball will come above the first and the third will come above the
second. When a ball is taken out only the one at the top can be removed. The
operation of placing an element on top of the stack is called pushing the
element and the operation of removing an element from the top of the stack
is called popping the element. The last thing pushed is popped out first; the
last in first out behavior.
We can peek at any ball inside the test tube but we cannot remove it
without removing every ball on top of it. Similarly we can read any element
from the stack but cannot remove it without removing everything above it.
The stack operations of pushing and popping only work at the top of the
The corresponding instruction RETF will pop the offset in the instruction
pointer followed by popping the segment in the code segment register.
Apart from CALL and RET, the operations that use the stack are PUSH and
POP. Two other operations that will be discussed later are INT and IRET.
Regarding the stack, the operation of PUSH is similar to CALL however with
a register other than the instruction pointer. For example “push ax” will push
the current value of the AX register on the stack. The operation of PUSH is
shown below.
SP SP – 2
[SP] AX
The operation of POP is the reverse of this. A copy of the element at the top
of the stack is made in the operand, and the top of the stack is incremented
afterwards. The operation of “pop ax” is shown below.
AX [SP]
SP SP + 2
Making corresponding PUSH and POP operations is the responsibility of
the programmer. If “push ax” is followed by “pop dx” effectively copying the
value of the AX register in the DX register, the processor won’t complain.
Whether this sequence is logically correct or not should be ensured by the
programmer. For example when PUSH and POP are used to save and restore
registers from the stack, order must be correct so that the saved value of AX
is reloaded in the AX register and not any other register. For this the order of
POP operations need to be the reverse of the order of PUSH operations.
Now we consider another example that is similar to the previous examples,
however the code to swap the two elements has been extracted into another
subroutine, so that the formation of stack can be observed during nested
subroutine calls.
Example
; bubble sort subroutine using swap subroutine [org 0x0100] jmp start
data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0 data2: dw 328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98 dw 888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5 swapflag: db 0
swap: mov ax, [bx+si] ; load first number in ax xchg ax, [bx+si+2] ; exchange with second number mov [bx+si], ax ; store second number in first ret ; go back to where we came from
bubblesort: dec cx ; last element not compared shl cx, 1 ; turn into byte count
mainloop: mov si, 0 ; initialize array index to zero mov byte [swapflag], 0 ; reset swap flag to no swaps
innerloop: mov ax, [bx+si] ; load number in ax cmp ax, [bx+si+2] ; compare with next number jbe noswap ; no swap if already in order
call swap ; swaps two elements mov byte [swapflag], 1 ; flag that a swap has been done
noswap: add si, 2 ; advance si to next index cmp si, cx ; are we at last index jne innerloop ; if not compare next two
cmp byte [swapflag], 1 ; check if a swap has been done je mainloop ; if yes make another pass ret ; go back to where we came from
start: mov bx, data ; send start of array in bx mov cx, 10 ; send count of elements in cx
call bubblesort ; call our subroutine
mov bx, data2 ; send start of array in bx mov cx, 20 ; send count of elements in cx call bubblesort ; call our subroutine again
mov ax, 0x4c00 ; terminate program int 0x
A new instruction XCHG has been introduced. The instruction
swaps its source and its destination operands however at most one
of the operands could be in memory, so the other has to be loaded in
a register. The instruction has reduced the code size by one
instruction.
The RET at the end of swap makes it a subroutine.
Inside the debugger observe the use of stack by CALL and RET
instructions, especially the nested CALL.
SAVING AND RESTORING REGISTERS
The subroutines we wrote till now have been destroying certain registers
and our calling code has been carefully written to not use those registers.
However this cannot be remembered for a good number of subroutines.
Therefore our subroutines need to implement some mechanism of retaining
the callers’ value of any registers used.
The trick is to use the PUSH and POP operations and save the callers’
value on the stack and recover it from there on return. Our swap subroutine
destroyed the AX register while the bubblesort subroutine destroyed AX, CX,
and SI. BX was not modified in the subroutine. It had the same value at
entry and at exit; it was only used by the subroutine. Our next example
improves on the previous version by saving and restoring any registers that it
will modify using the PUSH and POP operations.
Example
; bubble sort and swap subroutines saving and restoring registers [org 0x0100] jmp start
data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0 data2: dw 328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98 dw 888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5 swapflag: db 0
swap: push ax ; save old value of ax
mov ax, [bx+si] ; load first number in ax xchg ax, [bx+si+2] ; exchange with second number mov [bx+si], ax ; store second number in first
pop ax ; restore old value of ax ret ; go back to where we came from
bubblesort: push ax ; save old value of ax push cx ; save old value of cx push si ; save old value of si
dec cx ; last element not compared shl cx, 1 ; turn into byte count
mainloop: mov si, 0 ; initialize array index to zero mov byte [swapflag], 0 ; reset swap flag to no swaps
innerloop: mov ax, [bx+si] ; load number in ax cmp ax, [bx+si+2] ; compare with next number jbe noswap ; no swap if already in order
The out-of-line procedure is the temporary division, the concept of
roundabout that we discussed. Near calls are also called intra segment calls,
while far calls are called inter-segment calls. There are also versions that are
called indirect calls; however they will be discuss later when they are used.
RET
RET (Return) transfers control from a procedure back to the instruction
following the CALL that activated the procedure. RET pops the word at the
top of the stack (pointed to by register SP) into the instruction pointer and
increments SP by two. If RETF (inter segment RET) is used the word at the
top of the stack is popped into the IP register and SP is incremented by two.
The word at the new top of stack is popped into the CS register, and SP is
again incremented by two. If an optional pop value has been specified, RET
adds that value to SP. This feature may be used to discard parameters
pushed onto the stack before the execution of the CALL instruction.
PARAMETER PASSING THROUGH STACK
Due to the limited number of registers, parameter passing by registers is
constrained in two ways. The maximum parameters a subroutine can receive
are seven when all the general registers are used. Also, with the subroutines
are themselves limited in their use of registers, and this limited increases
when the subroutine has to make a nested call thereby using certain
registers as its parameters. Due to this, parameter passing by registers is not
expandable and generalizable. However this is the fastest mechanism
available for passing parameters and is used where speed is important.
Considering stack as an alternate, we observe that whatever data is placed
there, it stays there, and across function calls as well. For example the
bubble sort subroutine needs an array address and the count of elements. If
we place both of these on the stack, and call the subroutine afterwards, it
will stay there. The subroutine is invoked with its return address on top of
the stack and its parameters beneath it.
To access the arguments from the stack, the immediate idea that strikes is
to pop them off the stack. And this is the only possibility using the given set
of information. However the first thing popped off the stack would be the
return address and not the arguments. This is because the arguments were
first pushed on the stack and the subroutine was called afterwards. The
arguments cannot be popped without first popping the return address. If a
heaving thing falls on someone’s leg, the heavy thing is removed first and the
leg is not pulled out to reduce the damage. Same is the case with our
parameters on which the return address has fallen.
To handle this using PUSH and POP, we must first pop the return address
in a register, then pop the operands, and push the return address back on
the stack so that RET will function normally. However so much effort doesn’t
seem to pay back the price. Processor designers should have provided a
logical and neat way to perform this operation. They did provided a way and
infact we will do this without introducing any new instruction.
Recall that the default segment association of the BP register is the stack
segment and the reason for this association had been deferred for now. The
reason is to peek inside the stack using the BP register and read the
parameters without removing them and without touching the stack pointer.
The stack pointer could not be used for this purpose, as it cannot be used in
an effective address. It is automatically used as a pointer and cannot be
explicitly used. Also the stack pointer is a dynamic pointer and sometimes
changes without telling us in the background. It is just that whenever we
touch it, it is where we expect it to be. The base pointer is provided as a
replacement of the stack pointer so that we can peek inside the stack
without modifying the structure of the stack.
When the bubble sort subroutine is called, the stack pointer is pointing to
the return address. Two bytes below it is the second parameter and four
bytes below is the first parameter. The stack pointer is a reference point to
these parameters. If the value of SP is captured in BP, then the return
address is located at [bp+0], the second parameter is at [bp+2], and the first
parameter is at [bp+4]. This is because SP and BP both had the same value
and they both defaulted to the same segment, the stack segment.
This copying of SP into BP is like taking a snapshot or like freezing the
stack at that moment. Even if more pushes are made on the stack
decrementing the stack pointer, our reference point will not change. The
parameters will still be accessible at the same offsets from the base pointer.
If however the stack pointer increments beyond the base pointer, the
references will become invalid. The base pointer will act as the datum point
to access our parameters. However we have destroyed the original value of
BP in the process, and this will cause problems in nested calls where both
the outer and the inner subroutines need to access their own parameters.
The outer subroutine will have its base pointer destroyed after the call and
will be unable to access its parameters.
To solve both of these problems, we reach at the standard way of accessing
parameters on the stack. The first two instructions of any subroutines
accessing its parameters from the stack are given below.
push bp mov bp, sp
As a result our datum point has shifted by a word. Now the old value of BP
will be contained in [bp] and the return address will be at [bp+2]. The second
parameters will be [bp+4] while the first one will be at [bp+6]. We give an
example of bubble sort subroutine using this standard way of argument
passing through stack.
Example
; bubble sort subroutine taking parameters from stack [org 0x0100] jmp start
data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0 data2: dw 328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98 dw 888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5 swapflag: db 0
bubblesort: push bp ; save old value of bp mov bp, sp ; make bp our reference point push ax ; save old value of ax push bx ; save old value of bx push cx ; save old value of cx push si ; save old value of si
mov bx, [bp+6] ; load start of array in bx mov cx, [bp+4] ; load count of elements in cx dec cx ; last element not compared shl cx, 1 ; turn into byte count
mainloop: mov si, 0 ; initialize array index to zero mov byte [swapflag], 0 ; reset swap flag to no swaps
innerloop: mov ax, [bx+si] ; load number in ax cmp ax, [bx+si+2] ; compare with next number jbe noswap ; no swap if already in order
xchg ax, [bx+si+2] ; exchange ax with second number mov [bx+si], ax ; store second number in first mov byte [swapflag], 1 ; flag that a swap has been done
noswap: add si, 2 ; advance si to next index cmp si, cx ; are we at last index jne innerloop ; if not compare next two
useless. Also observe that RET n has discarded the arguments rather than
popping them as they were no longer of any use either of the caller or the
callee.
The strong argument in favour of callee cleared stacks is that the
arguments were placed on the stack for the subroutine, the caller did not
needed them for itself, so the subroutine is responsible for removing them.
Removing the arguments is important as if the stack is not cleared or is
partially cleared the stack will eventually become full, SP will reach 0, and
thereafter wraparound producing unexpected results. This is called stack
overflow. Therefore clearing anything placed on the stack is very important.
LOCAL VARIABLES
Another important role of the stack is in the creation of local variables that
are only needed while the subroutine is in execution and not afterwards.
They should not take permanent space like global variables. Local variables
should be created when the subroutine is called and discarded afterwards.
So that the spaced used by them can be reused for the local variables of
another subroutine. They only have meaning inside the subroutine and no
meaning outside it.
The most convenient place to store these variables is the stack. We need
some special manipulation of the stack for this task. We need to produce a
gap in the stack for our variables. This is explained with the help of the
swapflag in the bubble sort example.
The swapflag we have declared as a word occupying space permanently is
only needed by the bubble sort subroutine and should be a local variable.
Actually the variable was introduced with the intent of making it a local
variable at this time. The stack pointer will be decremented by an extra two
bytes thereby producing a gap in which a word can reside. This gap will be
used for our temporary, local, or automatic variable; however we name it. We
can decrement it as much as we want producing the desired space, however
the decrement must be by an even number, as the unit of stack operation is
a word. In our case we needed just one word. Also the most convenient
position for this gap is immediately after saving the value of SP in BP. So that
the same base pointer can be used to access the local variables as well; this
time using negative offsets. The standard way to start a subroutine which
needs to access parameters and has local variables is as under.
push bp mov bp, sp sub sp, 2
The gap could have been created with a dummy push, but the subtraction
makes it clear that the value pushed is not important and the gap will be
used for our local variable. Also gap of any size can be created in a single
instruction with subtraction. The parameters can still be accessed at bp+
and bp+6 and the swapflag can be accessed at bp-2. The subtraction in SP
was after taking the snapshot; therefore BP is above the parameters but
below the local variables. The parameters are therefore accessed using
positive offsets from BP and the local variables are accessed using negative
offsets.
We modify the bubble sort subroutine to use a local variable to store the
swap flag. The swap flag remembered whether a swap has been done in a
particular iteration of bubble sort.
Example
; bubble sort subroutine using a local variable [org 0x0100] jmp start
data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0 data2: dw 328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
dw 888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
bubblesort: push bp ; save old value of bp mov bp, sp ; make bp our reference point sub sp, 2 ; make two byte space on stack push ax ; save old value of ax push bx ; save old value of bx push cx ; save old value of cx push si ; save old value of si
mov bx, [bp+6] ; load start of array in bx mov cx, [bp+4] ; load count of elements in cx dec cx ; last element not compared shl cx, 1 ; turn into byte count
mainloop: mov si, 0 ; initialize array index to zero mov word [bp-2], 0 ; reset swap flag to no swaps
innerloop: mov ax, [bx+si] ; load number in ax cmp ax, [bx+si+2] ; compare with next number jbe noswap ; no swap if already in order
xchg ax, [bx+si+2] ; exchange ax with second number mov [bx+si], ax ; store second number in first mov word [bp-2], 1 ; flag that a swap has been done
noswap: add si, 2 ; advance si to next index cmp si, cx ; are we at last index jne innerloop ; if not compare next two
cmp word [bp-2], 1 ; check if a swap has been done je mainloop ; if yes make another pass
pop si ; restore old value of si pop cx ; restore old value of cx pop bx ; restore old value of bx pop ax ; restore old value of ax mov sp, bp ; remove space created on stack pop bp ; restore old value of bp ret 4 ; go back and remove two params
start: mov ax, data push ax ; place start of array on stack mov ax, 10 push ax ; place element count on stack call bubblesort ; call our subroutine
mov ax, data push ax ; place start of array on stack mov ax, 20 push ax ; place element count on stack call bubblesort ; call our subroutine again
mov ax, 0x4c00 ; terminate program int 0x
A word gap has been created for swap flag. This is equivalent to a
dummy push. The registers are pushed above this gap.
The swapflag is accessed with [bp-2]. The parameters are accessed
in the same manner as the last examples.
We are removing the hole that we created. The hole is removed by
restoring the value of SP that it had at the time of snapshot or at the
value it had before the local variable was created. This can be
replaced with “add sp, 2” however the one used in the code is
preferred since it does not require to remember how much space for
local variables was allocated in the start. After this operation SP
points to the old value of BP from where we can proceed as usual.
We needed memory to store the swap flag. The fact that it is in the stack
segment or the data segment doesn’t bother us. This will just change the
addressing scheme.