Download STOS, SCAS and Other Instructions-Microprocessor and Assembly Language Programming-Lecture Notes and more Study notes Microprocessor and Assembly Language Programming in PDF only on Docsity!
Till now very simple instructions of the 8088 microprocessor have been
introduced. In this chapter we will discuss a bit more powerful instructions
that can process blocks of data in one go. They are called block processing or
string instructions. This is the appropriate place to discuss these
instructions as we have just introduced a block of memory, which is the
video memory. The vision of this memory for the processor is just a block of
memory starting at a special address. For example the clear screen operation
initializes this whole block to 0720.
There are just 5 block processing instructions in 8088. In the primitive
form, the instructions themselves operate on a single cell of memory at one
time. However a special prefix repeats the instruction in hardware called the
REP prefix. The REP prefix allows these instructions to operate on a number
of data elements in one instruction. This is not like a loop; rather this
repetition is hard coded in the processor. The five instructions are STOS,
LODS, CMPS, SCAS, and MOVS called store string, load string, compare
string, scan string, and move string respectively. MOVS is the instruction
that allows memory to memory moves, as was discussed in the exceptions to
the memory to memory movement rules. String instructions are complex
instruction in that they perform a number of tasks against one instruction.
And with the REP prefix they perform the task of a complex loop in one
instruction. This causes drastic speed improvements in operations on large
blocks of memory. The reduction in code size and the improvement in speed
are the two reasons why these instructions were introduced in the 8088
processor.
There are a number of common things in these instructions. Firstly they
all work on a block of data. DI and SI are used to access memory. SI and DI
are called source index and destination index because of string instructions.
Whenever an instruction needs a memory source, DS:SI holds the pointer to
it. An override is possible that can change the association from DS but the
default is DS. Whenever a string instruction needs a memory destination,
ES:DI holds the pointer to it. No override is possible in this case. Whenever a
byte register is needed, AL holds the value. Whenever a word register is used
AX holds the value. For example STOS stores a register in memory so AL or
AX is the register used and ES:DI points to the destination. The LODS
instruction loads from memory to register so the source is pointed to by
DS:SI and the register used is AL or AX.
String instructions work on a block of data. A block has a start and an
end. The instructions can work from the start towards the end and from the
end towards the start. In fact they can work in both directions, and they
must be allowed to work in both directions otherwise certain operations with
overlapping blocks become impossible. This problem is discussed in detail
later. The direction of movement is controlled with the Direction Flag (DF) in
the flags register. If this flag is cleared the direction is from lower addresses
towards higher addresses and if this flag is set the direction is from higher
addresses to lower addresses. If DF is cleared, this is called the auto-
increment mode of string instruction, and if DF is set, this is called the auto-
decrement mode. There are two instructions to set and clear the direction
flag.
cld ; clear direction flag std ; set direction flag
Every string instruction has two variants; a byte variant and a word
variant. For example the two variants of STOS are STOSB and STOSW.
Similarly the variants for the other string instructions are attained by
appending a B or a W to the instruction name. The operation of each of the
string instructions and each of the repetition prefixes is discussed below.
STOS
STOS transfers a byte or word from register AL or AX to the string element
addressed by ES:DI and updates DI to point to the next location. STOS is
often used to clear a block of memory or fill it with a constant.
The implied source will always be in AL or AX. If DF is clear, DI will be
incremented by one or two depending of whether STOSB or STOSW is used.
If DF is set DI will be decremented by one or two depending of whether
STOSB or STOSW is used. If REP is used before this instruction, the process
will be repeated CX times. CX is called the counter register because of the
special treatment given to it in the LOOP and JCXZ instructions and the REP
set of prefixes. So if REP is used with STOS the whole block of memory will
be filled with a constant value. REP will always decrement CX like the LOOP
instruction and this cannot be changed with the direction flag. It is also
independent of whether the byte or the word variant is used. It always
decrements by one; therefore CX has count of repetitions and not the count
of bytes.
LODS
LODS transfers a byte or word from the source location DS:SI to AL or AX
and updates SI to point to the next location. LODS is generally used in a loop
and not with the REP prefix since the value previously loaded in the register
is overwritten if the instruction is repeated and only the last value of the
block remains in the register.
SCAS
SCAS compares a source byte or word in register AL or AX with the
destination string element addressed by ES:DI and updates the flags. DI is
updated to point to the next location. SCAS is often used to locate equality or
in-equality in a string through the use of an appropriate prefix.
SCAS is a bit different from the other instructions. This is more like the
CMP instruction in that it does subtraction of its operands. The prefixes
REPE (repeat while equal) and REPNE (repeat while not equal) are used with
this instruction. The instruction is used to locate a byte in AL in the block of
memory. When the first equality or inequality is encountered; both have
uses. For example this instruction can be used to search for a 0 in a null
terminated string to calculate the length of the string. In this form REPNE
will be used to repeat while the null is not there.
MOVS
MOVS transfers a byte or word from the source location DS:SI to the
destination ES:DI and updates SI and DI to point to the next locations.
MOVS is used to move a block of memory. The DF is important in the case of
overlapping blocks. For example when the source and destination blocks
overlap and the source is below the destination copy must be done upwards
while if the destination is below the source copy must be done downwards.
We cannot perform both these copy operations properly if the direction flag
was not provided. If the source is below the destination and an upwards copy
is used the source to be copied is destroyed. If however the copy is done
downwards the portion of source destroyed is the one that has already been
pop cx pop ax pop es ret
start: call clrscr ; call clrscr subroutine
mov ax, 0x4c00 ; terminate program int 0x
013 A space efficient way to zero a 16bit register is to XOR it with itself.
Remember that exclusive or results in a zero whenever the bits at
the source and at the destination are same. This instruction takes
just two bytes compared to “mov di, 0” which would take three. This
is a standard way to zero a 16bit register.
Inside the debugger the operation of the string instruction can be
monitored. The trace into command can be used to monitor every repetition
of the string instruction. However screen will not be cleared inside the
debugger as the debugger overwrites its display on the screen so CX
decrements with every iteration, DI increments by 2. The first access is made
at B800:0000 and the second at B800:0002 and so on. A complex and
inefficient loop is replaced with a fast and simple instruction that does the
same operation many times faster.
LODS EXAMPLE – STRING PRINTING
The use of LODS with the REP prefix is not meaningful as only the last
value loaded will remain in the register. It is normally used in a loop paired
with a STOS instruction to do some block processing. We use LODS to pick
the data, do the processing, and then use STOS to put it back or at some
other place. For example in string printing, we will use LODS to read a
character of the string, attach the attribute byte to it, and use STOS to write
it on the video memory.
The following example will print the string using string instructions.
Example
; hello world printing using string instructions [org 0x0100] jmp start
message: db 'hello world' ; string to be printed length: dw 11 ; length of string
;;;;; COPY LINES 005-024 FROM EXAMPLE 7.1 (clrscr) ;;;;;
; subroutine to print a string ; takes the x position, y position, attribute, address of string and ; its length as parameters printstr: push bp mov bp, sp push es push ax push cx push si push di
mov ax, 0xb mov es, ax ; point es to video base mov al, 80 ; load al with columns per row mul byte [bp+10] ; multiply with y position add ax, [bp+12] ; add x position shl ax, 1 ; turn into byte offset mov di,ax ; point di to required location mov si, [bp+6] ; point si to string mov cx, [bp+4] ; load length of string in cx mov ah, [bp+8] ; load attribute in ah
cld ; auto increment mode nextchar: lodsb ; load next char in al stosw ; print char/attribute pair loop nextchar ; repeat for the whole string
pop di pop si pop cx pop ax pop es pop bp ret 10
start: call clrscr ; call the clrscr subroutine
mov ax, 30 push ax ; push x position mov ax, 20 push ax ; push y position mov ax, 1 ; blue on black attribute push ax ; push attribute mov ax, message push ax ; push address of message push word [length] ; push message length call printstr ; call the printstr subroutine
mov ax, 0x4c00 ; terminate program int 0x
051 Both operations are in auto increment mode.
052-053 DS is automatically initialized to our segment. ES points to video
memory. SI points to the address of our string. DI points to the
screen location. AH holds the attribute. Whenever we read a
character from the string in AL, the attribute byte is implicitly
attached and the pair is present in AX. The same effect could not be
achieved with a REP prefix as the REP will repeat LODS and then
start repeating STOS, but we need to alternate them.
054 CX holds the length of the string. Therefore LOOP repeats for each
character of the string.
Inside the debugger we observe how LODS and STOS alternate and CX is
only used by the LOOP instruction. In the original code there were four
instructions inside the loop; now there are only two. This is how string
instructions help in reducing code size.
SCAS EXAMPLE – STRING LENGTH
Many higher level languages do not explicitly store string length; rather
they use a null character, a character with an ASCII code of zero, to signal
the end of a string. In assembly language programs, it is also easier to store
a zero at the end of the string, instead of calculating the length of string,
which is very difficult process for longer strings. So we delegate length
calculation to the processor and modify our string printing subroutine to
take a null terminated string and no length. We use SCASB with REPNE and
a zero in AL to find a zero byte in the string. In CX we load the maximum
possible size, which is 64K bytes. However actual strings will be much
smaller. An important thing regarding SCAS and CMPS is that if they stop
due to equality or inequality, the index registers have already incremented.
Therefore when SCAS will stop DI would be pointing past the null character.
Example
; hello world printing with a null terminated string [org 0x0100]
LES and LDS Instructions
Since the string instructions need their source and destination in the form
of a segment offset pair, there are two special instructions that load a
segment register and a general purpose register from two consecutive
memory locations. LES loads ES while LDS loads DS. Both these instructions
have two parameters, one is the general purpose register to be loaded and
the other is the memory location from which to load these registers. The
major application of these instructions is when a subroutine receives a
segment offset pair as an argument and the pair is to be loaded in a segment
and an offset register. According to Intel rules of significance the word at
higher address is loaded in the segment register while the word at lower
address is loaded in the offset register. As parameters segment should be
pushed first so that it ends up at a higher address and the offset should be
pushed afterwards. When loading the lower address will be given. For
example “lds si, [bp+4]” will load SI from BP+4 and DS from BP+6.
LES AND LDS EXAMPLE
We modify the string length calculation subroutine to take the segment
and offset of the string and use the LES instruction to load that segment
offset pair in ES and DI.
Example
; hello world printing with length calculation subroutine [org 0x0100] jmp start
message: db 'hello world', 0 ; null terminated string
;;;;; COPY LINES 005-024 FROM EXAMPLE 7.1 (clrscr) ;;;;;
; subroutine to calculate the length of a string ; takes the segment and offset of a string as parameters strlen: push bp mov bp,sp push es push cx push di
les di, [bp+4] ; point es:di to string mov cx, 0xffff ; load maximum number in cx xor al, al ; load a zero in al repne scasb ; find zero in the string mov ax, 0xffff ; load maximum number in ax sub ax, cx ; find change in cx dec ax ; exclude null from length
pop di pop cx pop es pop bp ret 4
; subroutine to print a string ; takes the x position, y position, attribute, and address of a null ; terminated string as parameters printstr: push bp mov bp, sp push es push ax push cx push si push di
push ds ; push segment of string mov ax, [bp+4] push ax ; push offset of string call strlen ; calculate string length
cmp ax, 0 ; is the string empty jz exit ; no printing if string is empty mov cx, ax ; save length in cx
mov ax, 0xb mov es, ax ; point es to video base mov al, 80 ; load al with columns per row mul byte [bp+8] ; multiply with y position add ax, [bp+10] ; add x position shl ax, 1 ; turn into byte offset mov di,ax ; point di to required location mov si, [bp+4] ; point si to string mov ah, [bp+6] ; load attribute in ah
cld ; auto increment mode nextchar: lodsb ; load next char in al stosw ; print char/attribute pair loop nextchar ; repeat for the whole string
exit: pop di pop si pop cx pop ax pop es pop bp ret 8
start: call clrscr ; call the clrscr subroutine
mov ax, 30 push ax ; push x position mov ax, 20 push ax ; push y position mov ax, 0x71 ; blue on white attribute push ax ; push attribute mov ax, message push ax ; push address of message call printstr ; call the printstr subroutine
mov ax, 0x4c00 ; terminate program int 0x
036 The LES instruction is used to load the DI register from BP+4 and
the ES register from BP+6.
065 The convention to return a value from a subroutine is to use the AX
register. That is why AX is not saved and restored in the subroutine.
Inside the debugger observe that the segment register is pushed followed
by the offset. The higher address FFE6 contains the segment and the lower
address FFE4 contains the offset. This is because we have a decrementing
stack. Then observe the loading of ES and DI from the stack.
. MOVS EXAMPLE – SCREEN SCROLLING
MOVS has the two forms MOVSB and MOVSW. REP allows the instruction
to be repeated CX times allowing blocks of memory to be copied. We will
perform this copy of the video screen.
Scrolling is the process when all the lines on the screen move one or more
lines towards the top of towards the bottom and the new line that appears on
the top or the bottom is cleared. Scrolling is a process on which string
movement is naturally applicable. REP with MOVS will utilize the full
processor power to do the scrolling in minimum time.
In this example we want to scroll a variable number of lines given as
argument. Therefore we have to calculate the source address, which is 160
times the number of lines to clear. The destination address is 0, which is the
top left of the screen. The lines that scroll up are discarded so the source
pointer is placed after them. An equal number of lines at the bottom are
cleared. These lines have actually been copied above.
Example
; scroll down the screen [org 0x0100] jmp start
; subroutine to scrolls down the screen ; take the number of lines to scroll as parameter scrolldown: push bp mov bp,sp push ax push cx push si push di push es push ds
mov ax, 80 ; load chars per row in ax mul byte [bp+4] ; calculate source position push ax ; save position for later use shl ax, 1 ; convert to byte offset mov si, 3998 ; last location on the screen sub si, ax ; load source position in si mov cx, 2000 ; number of screen locations sub cx, ax ; count of words to move mov ax, 0xb mov es, ax ; point es to video base mov ds, ax ; point ds to video base mov di, 3998 ; point di to lower right column std ; set auto decrement mode rep movsw ; scroll up mov ax, 0x0720 ; space in normal attribute pop cx ; count of positions to clear rep stosw ; clear the scrolled space
pop ds pop es pop di pop si pop cx pop ax pop bp ret 2
start: mov ax, push ax ; push number of lines to scroll call scrolldown ; call scroll down subroutine
mov ax, 0x4c00 ; terminate program int 0x
CMPS EXAMPLE – STRING COMPARISON
For the last string instruction, we take string comparison as an example.
The subroutine will take two segment offset pairs containing the address of
the two null terminated strings. The subroutine will return 0 if the strings
are different and 1 if they are same. The AX register will be used to hold the
return value.
Example
; comparing null terminated strings [org 0x0100] jmp start
msg1: db 'hello world', 0 msg2: db 'hello WORLD', 0 msg3: db 'hello world', 0
;;;;; COPY LINES 028-050 FROM EXAMPLE 7.4 (strlen) ;;;;;
; subroutine to compare two strings ; takes segment and offset pairs of two strings to compare