Download Course 2559A: Introduction to Visual Basic .NET

Document related concepts
no text concepts found
Transcript
Computer Organization
X86 Assembly
Language
Mohammad Sharaf
Handouts
+
IBM PC Assembly Language &
Programming,
Peter Abel, Prentice Hall, 5th edition.
Chap.: 1,
4, 6, 7,8
Evolution of Microprocessor
Evolution of Microprocessor cont.
Basic
Concepts
What is Registers?

You can consider it as variables inside the CPU chip
General Purpose Registers

AX, BX, CX, and DX: They can be assigned to any
value you want




AX (Accumulator Register): Most of
arithmetical operations are done with AX
BX (Base Register): Used to do array
operations. BX is usually worked with other
registers like SP to point to stacks
CX (Counter Register): Used for counter
purposes
DX (Data Register). Used for storing data value
Index Registers
SI and DI: Usually used to
process arrays or strings:
 SI
(Source Index): is always
pointed to the source array
 DI
(Destination Index): is always
pointed to the destination array
Segment Registers

CS, DS, ES, and SS:




CS (Code Segment Register): Points to the segment of
the running program. We may NOT modify CS directly
DS (Data Segment Register): Points to the segment of
the data used by the running program. You can point
this to anywhere you want as long as it contains the
desired data
ES (Extra Segment Register): Usually used with DI and
doing pointers things. The couple DS:SI and ES:DI are
commonly used to do string operations
SS (Stack Segment Register): Points to stack segment
Pointer Registers
 BP,



SP, and IP:
BP (Base Pointer): used for preserving space
to use local variables
SP (Stack Pointer): used to point the current
stack
IP (Instruction Pointer): denotes the current
pointer of the running program. It is always
coupled with CS and it is NOT Modifiable. So,
the couple of CS:IP is a pointer pointing to the
current instruction of running program. You
can NOT access CS nor IP directly
16-bit Register

The general registers AX, BX, CX, and DX are 16-bit

However, they are composed from two smaller registers
For example: AX
The high 8-bit is called AH, and the low 8-bit is called AL
Both AH and AL can be accessed directly


However, since they altogether embodied AX

Modifying AH is modifying the high 8-bit of AX

Modifying AL is modifying the low 8-bit of AX
AL occupy bit 0 to bit 7 of AX, AH occupy bit 8 to bit 15 of
AX
Extended Register
 X386 processors introduce extended registers

Most of the registers, except segment registers
are enhanced into 32-bit

So, we have extended registers EAX, EBX, ECX,
and so on

AX is only the low 16-bit (bit 0 to 15) of EAX

There are NO special direct access to the upper
16-bit (bit 16 to 31) in extended register
Flag Register
 Flag is 16-bit register that contains CPU status

It holds the value of which the programmers may
need to access. This involves detecting whether
the last arithmetic holds zero result or may be
overflow

Intel doesn't provide a direct access to it; rather it
is accessed via stack. (via POPF and PUSHF)

You can access each flag attribute by using
bitwise AND operation since each status is
mostly represented by just 1 bit
Flag Register cont.

C carry flag: is turned to 1 whenever the last arithmetical
operation, such as adding and subtracting, has carry or
borrow otherwise 0

P parity flag: It will set to 1 if the last operation (any
operation) results even number of bit 1

A auxiliary flag: It is set in Binary Coded Decimal (BCD)
operations

Z zero flag: used to detect whether the last operation (any
operation) holds zero result

S sign flag: used to detect whether the last operation
holds negative result. It is set to 1 if the highest bit (bit 7
in bytes or bit 15 in words) of the last operation is 1
Flag Register cont.

T trap flag: used in debuggers to turn on the step-by-step
feature

I interrupt flag: used to toggle the interrupt enable or not.
If the bit is set (= 1), then the interrupts are enabled,
otherwise disabled. The default is on

D direction flag: used for directions of string operations.
If the bit is set, then all string operations are done
backward. Otherwise, forward. The default is forward (0)

O the overflow flag: used to detect whether the last
arithmetic operation result has overflowed or not. If the
bit is set, then it has been an overflow
Memory

X86 CPU only has 16-bit registers, so the maximum
amount of memory that can be addressed is:
216 = 65536 (64K)

However, after XT arrives, the memory is extended
to 1 MB. That is 16 times bigger than the original
 Segmentation: means the memory is divided
virtually into several areas called Segment

The segment registers are 16 bit

The idea of the segmentation is NOT dividing 1 MB
into 16 exact parts
Memory cont.
 Interleaved: means that if we say the segment
number 0, then we can access the memory 0 to
65536. Segment number 1 allows us to access
memory number 16 to 65552. Segment 2 from 32
to 65568, and so on with the increment of 16
65568
65552
Seg 2
65536
Seg 1 32
Seg 0
16
0
Memory Interleaved
Why
did they do that?
It is for the sake of the operating
system OS memory management
stuff
Therefore, OS align the executed code
to the nearest 16 bytes alignment
Memory cont.
 The
memory access must be done in a pair of
registers
 The
first is the segment register and next is
any register, usually BX, DX, SI or DI
 The
register pair usually written like
this: ES:DI with a colon between them
 The
pair is called the Segment:Offset pair
So, ES:DI means that the segment part is addressed
by ES, and the offset part is addressed by DI
Memory cont.
Logical
address
Example:
Absolute
or Physical
address
 If
the ES contains 1, and DI is 5, means
that we access the memory 5.
 If
ES:DI = 0001:0005 then it actually
access the actual address 21
(1 * 16 + 5 = 21)
 So,
0000:0021 and 0001:0005 is actually
the same address
Stacks
stack (LIFO) is a temporary area to
store temporary things
 The
 It
is mainly used to pass the parameter
value to procedures or functions
 Sometimes,
it also acts as temporary space
to allocate for local variables. Therefore, the
role of the stack is very
important
Interrupts

Upon a request of an interrupt, the CPU usually stores
context of running program, then it goes to the interrupt
routine

After processing the interrupt, the processor restores all
states stored and resume the program. There are 3 kinds of
interrupts:

Hardware interrupts occurs if one of the hardware inside
your computer needs immediate processing

Software interrupts occurs if the running program requests
the program to be interrupted and do something else

CPU-generated interrupts occurs if the processor knows
that is something wrong with the running code. (Divide a
number with 0)
Why Assembly?
It's
difficult
Error
prone
Hard
to debug
Takes
a lot of time to develop
Why Assembly?
However:




fast
Assembly is
. A LOT faster than any compiler of
any language could ever produce
Assembly is a lot closer to machine level than any
language because the commands of assembly language
are mapped 1-1 to machine instructions
Assembly code is a lot smaller than any compiler of
any language could ever produce
In Assembly, we can do a lot of things that we can't do in
any higher level language
Notes
 The
assembly language is NOT
case-sensitive
A
comment in assembly begins
with a semicolon (;). Everything
after a semicolon until the end of
the line is ignored
COM
Structure
ideal
p286n
model tiny
codeseg
org 100h
jmp start
; your data and subroutine here
start:
mov ax, 4c00h
int 21h
end
Com Program Explanation

ideal says that we're using ideal syntax of TASM

p286n or .286 says that we're using 80286 processor instructions

model tiny or .model tiny says that we're using COM format

codeseg or .code says that this is the beginning of our code

org 100h

COM programs are almost always begin with a jump, i.e. jump to the
beginning of the code. Between the jump and the beginning of your
code, you place your variables here. The jump is denoted by the word
jmp and followed with a label (here we call it start)

After the label start, the next two lines is just the code to terminate
your program

end or .end entry specify the end point of your program
Making Labels

Put any name and stick it with a colon (:)

Label usually serves as a tag of where you'd
like to jump and so on


You have to pick unique names for each label,
otherwise the assembler will fail
There is a way to make it local: to prefix it with
a @@ in front of the label name and still end it
with a colon
Variables in
Assembly
Variables Declaration

Our ideal syntax (TASM based) looks like this:
Ideal
p286n
model tiny
codeseg
org 100h
jmp start
; your data and subroutine here (this is a comment)
start:
mov ax, 4c00h
int 21h
end

Put variable declarations after the jmp start statement.
Variables Declaration

There are 3 main types of
variable declarations in
assembly:




db is to declare the 1-bytelength
dw is for the word (2 bytes)
dd is for the double-word (4
bytes)
The declaration syntax is as
follows:
var_name db value
:
Ideal
P286n
bits db 101001b
model tiny
var2 dw 4567h
Codeseg
var3 dw 0BABEh
:
org 100h
jmp start
score db 100
year
dw 2001
money dd 1000000
start:
mov ax, 4c00h
int 21h
end
Variables Declaration cont.


Variable Limits and Negative Values
Declaration
Acronym
Length
Limit
db
define byte
1 byte
0-255
dw
define word
2 bytes
0-65535
dd
define double
4 bytes
0-4294967295
You can assign the variables as negative values,
too. However, assembler will convert them to the
corresponding 2’s complement value. For
example: If you assign -1 to a db variable,
assembler will convert it to 255 integer
2’s Complement
Moving Around Values


If you need to do some calculations or commands
involving the variables you'll have to load the
variable values to the registers
The syntax of the mov command is: mov a , b
which means assign b to a
Reg 1
Var1
Var2
mov ax, [var2]
MM
Reg 2
mov [var1],ax
Moving Around Values: example
:
jmp start
our_var dw 10
start:
mov bx, [our_var]
mov cx, bx
mov [our_var], cx
mov ax, 4c00h
int 21h
end
Moving Around Values cont.

When we deal with byte variables (i.e. db), we need
to use byte registers (e.g. AL, AH, BL, BH, and so
on) to do our bidding

AX, BX, CX, DX, and so on are word registers

You can use double-word registers which is
available in 80386 processors or better (use p386n
instead of p286n to enable double-word registers)

The double-word registers includes EAX, EBX,
ECX, EDX, and so on
Moving Around Values cont.

We can assign variables with constants with mov
instruction. However, this will work only with
80286 or better processors:
mov [word ptr our_var], 1
Notice the word ptr modifier must be used when
you assign constants to variables. Since our_var
is a word variable, we need to use word ptr
modifier
Likewise, byte variable uses byte ptr modifier and
double-word variable uses dword ptr
Moving Around Values example
Notice the way that Intel assembler
store a word value
It stores the least significant byte first,
then the most significant byte later
Big-endian & Little-endian

Describe the order in which a sequence of bytes is
stored in a computer’s memory

In a big-endian system, the most significant value in the
sequence is stored at the lowest storage address (i.e.,
first)

In a little-endian system, the least significant value in the
sequence is stored first
Moving Around Values cont.
 Recall
that variables in assembly are
treated as addresses
AX  0502h
Moving Around Values cont.

Double-word variables are also stored similarly
my_var dd 1234BABEh
Impacts on Registers
 Recall
that the word register AX consists of
AH and AL
 Modifying
either AH or AL will modify the
contents of AX
 Likewise,
modifying AX will be likely modify
AH and AL
Question Marks on Variables

If you are not certain about the default value of a variable
you can give a question mark ("?") instead. For example:
another_var dw ?
String Variables

You can define strings variables in assembly. It is as
follows:
message db "Hello World!$ "
String variables are required to be stored as db variables.
The string is then surrounded by quotes, either single or
double, up to you
String Variables
•Why do we have to end
our string with a dollar
sign ("$")?
•Each characters of the
string is converted to its
corresponding ASCII
code
message db "Hello World!$"
Multi-Valued Variables
 The
variables defined
as db means each
value is defined as
bytes
 However,
there is no
restriction on how
many values we can
define for each
variable names
multivar db 12h, 34h, 56h, 78h, 00h, 11h, 22h, 00h
Multi-Valued Variables

So multi valued variables are stored contiguously
multivar2 dw 1234h, 5678h, 0011h, 2200h
Using dup

Another way to declare a multi-valued variables
are using dup command:
my_array db 5 dup (00h)
That example above is similar to:
my_array db 00h, 00h, 00h, 00h, 00h
dup is kind of shortcut to define variables with the
same values

Of course you can define something like this:
bar_array db 10 dup (?)
Arithmetic
Instructions
Addition & Subtraction
Addition & Subtraction

You may actually add or subtract variables with
constants. But don't forget to add the word ptr or
dword ptr as appropriate

If the result of an addition overflows, the carry flag
is set to 1, otherwise it is 0

Similarly, if the result of subtraction requires a
borrow, then the carry flag is also set to 1,
otherwise it is 0
Addition & Subtraction
 Suppose
you'd like to add a 32-bit integers
with 16-bit registers

Intel processor has a special instruction
called adc
 For
the subtraction, we have similar
instruction called sbb
Multiplication & Division

Multiplication and division always assume AX as
the place holder

If there is an overflow in multiplication, the
overflow flag will be set

Note: mul and div will treat every numbers as
positive. If you have negative values, you'll need to
replace them imul and idiv respectively
Increment & Decrement

Often times, we'd like to incrementing something
by 1 or decrement thing by 1

You can use add x, 1 or sub x, 1 if you'd like to, but
Intel x86 assembly has a special instruction for
them

Instead of add x, 1 we use inc x. These are
equivalent

Likewise in subtraction, you can use dec x

Beware that neither inc nor dec instruction sets
the carry flag as add and sub do
Tips
 The
arithmetic operations can have special
properties
 For
example: add x, x is actually equal to
multiplying x by 2
 Similarly,
 In
sub x, x is actually setting x to 0
8086 processor, these arithmetic is faster
than doing mul or doing mov x, 0. Even
more, its code size is smaller
Bitwise
Operations
And, Or, Xor
 and, or, and xor
takes two operands
 You
can have both operands as
registers, one of them as variables, etc.
The syntax is as follows:
And, Or, Xor: example
AH = 76
and
AL = 45
AH = 01001100
and
AL = 00101101
Not
 The
not operation takes a single operand
Bit Masking & Flipping

Sometimes, one byte can contain several information
decoded in bits (like flag register)

Example: Suppose AL = 00101100. However you only need
the lower four bits (i.e. 1100)


This can be done creating a mask based on the and
behavior
Since we need only the lower four bits, the mask would
be: 00001111
Bit Masking example

Suppose you have AL = 00101100. Now, you'd like to
store the lower 4 bits of your data in CL = 00000011 into
the lower 4 bits of AL
Bit Masking & Flipping



There are times we only want to flip the bits around
We can use xor with it. You can observe that
anything xorred with 1 will be flipped
Suppose, we'd like to flip the middle four bits of
AL:
Bit Shifting

Shifting left one position means take one bit at the left,
then shift the remaining bits, then add one 0 at the end

Shifting right is analogous

The x and y usage is just like add or sub, you can have
registers, variables or constants. Of course the x part
cannot be a constant

What happened to the missing bits that get shifted out?
The carry flag will hold the last shifted-out bit
Shift and Rotate
Bit Rolling

Bit rolling is similar to bit-shifting. Instead of shifted out,
the bits gets rolled back

Rolling to the right is similar

There is another variant on rolling bits, using carry flag.
Rolling bits using carry flag is done by rcl and rcr
Shift and Rotate cont.
Branching &
Loop
Instructions
Unconditional & Conditional Jumps

Conditional jumps always consider some condition

If the condition is satisfied, then the jump is taken,
otherwise it is not

The conditions are usually reflected in the
processor flags

On the other hand, unconditional jumps do not
regard any conditions

So, it is more like goto in a sense
Making Labels

Labels are essential to jump instructions

It marks the destination. Of course you need to set
where to jump, Making labels in assembly are easy

Labels can be made like this:
example:

So, we can pick out any names and stick a colon
after it (:)

You must make sure that all label names throughout
your program are unique, no duplicates
Unconditional Jumps


For unconditional jump, the instruction is jmp
unconditional jumps takes no regard on
conditions. So, whenever the processor arrives at
the instruction jmp somewhere, it will directly
skip all the instructions below it up to until the
instruction marked by the label somewhere
Conditional Jumps
 Before
the jump instruction, we (usually)
have to put a comparison or testing
instruction
 The
comparison instruction is cmp
Conditional Jumps cont.
Conditional Jumps cont.
that jg, jge, jl, and jle will work for
signed variables only
 Note
unsigned variables, use ja "jump if
above", jae, jb "jump if below", and jbe as
the substitution respectively
 For
rest (i.e. je, jne, and jc) work with
both signed and unsigned variables
 The
Testing Instruction

The syntax of test instruction:
test x, y

It behaves like an and but it does not store the
result back to x

So it is more like x and y

Usually after this instruction, we usually check
whether the result of the and-ing is zero or not
using jz or jnz (i.e. "jump if zero")
Testing Instruction example 1
Add 1+2+3+...+10
Testing Instruction example 2
8! Factorial
Loop Construct




This structure is just like do..while construct in C/Java
When the processor takes loop instruction, it will first
decrease the register CX by one
After that, CX is tested whether it is zero or not. If it is not
zero, then jump to mylabel
It's kind of countdown counter
Loop Construct example
Let's take 1+2+...+10 example
Interrupt
Essentials
Introduction to Interrupt


Interrupt is just like a procedure provided by the system
and You can invoke it
These two lines actually request the operating system to
terminate the program

The interrupt is called using int instruction with a number
after it

This number is referred as Interrupt Number
Introduction to Interrupt cont.
 Interrupt
number alone is not enough
 Interrupt
behaves differently depending on
which Service Number is called
 Service
numbers are usually placed in AH
 Sub-Service number is usually placed in AL
 This
interrupt mechanism is pretty much like
a phone number
Output to Screen
Output to Screen

After the start label we are invoking interrupt number
21h, service 09h

Interrupt 21h is reserved for Operating System calls

When you look up what service 09h does on interrupt
21h in interrupt list

To insert a new line simply change the message
declaration into:
Input from Keyboard
 Interrupt
21h service 0Ah offers a
mean to input from keyboard. The
interrupt lists say:
Input from keyboard example
Buffer
Output: A Better Version


There is one way to cope with “$” issue by output
characters one by one using a loop
The loop terminates if the character being read is 0

Zero in ASCII number is defined as a blank and
usually used to terminate stuffs

Interrupt 21h, service 06h used to print one
character on screen
Input one Character
Number to String
 The
output routines we discussed so
far are intended only for outputting
strings
 How
 We
can we output numbers?
have to convert the numbers to
string first
Stacks
Why Stack?
There are several reasons why we need stacks:
 To
save register values if we ran out of
registers
 To
pass parameters to subroutines
 To
make space for local variables in
subroutines
 To
preserve original register values if we
change them in a subroutine
 To
fetch processor flag status
Stack Operations

last in first out (LIFO)

Stack operations mainly done by two instructions either
push or pop

The instruction push will push values into the stack, while
pop will pop it out

The syntax is like this:

The operand X is a 16-bit

You can push 8-bit too, but the processor will push a 16-bit
value anyway
Memory Layout

You should know that register CS by default points to the
segment where the code resides. DS will point to the data
segment. ES usually pointed to data segment too. SS will
point to stack segment. Since CS, DS, ES, and SS point to
the same segment, it means code, data, and stack resides
in the same region
MM
CS
Code Seg.
Code Seg.
&
--------------------Data Seg.
Data Seg.
&
ES
DS
--------------------Extra Seg.
Extended &
Seg.
--------------------Stack Seg.
Stack Seg.
SS
How can we manage this?

The stack is not only pointed by SS register. But
also SP register

So, the pair SS:SP points the top of the stack.
Initially, SP is set to the very bottom of the
segment in "tiny" mode, at address FFFEh

Each time we push something into the stack, this
SP register will be decremented up by 2. If we pop
something, SP will be incremented down by 2

Whereas, our code and our data starts at offset
100h
So, the layout looks something like this:
Application
Other Uses

Can we push a constant? In 8086 NO. In 80286 or above
YES. So, doing push 1, this will be treated as if a 16-bit
value. No need to specify word ptr and stuff

The more useful usage of push and pop is to push flag
and then pop it into register. That way, we can examine
the flag content directly. Look at the following code:
pushf ;
top stack  flag register
pop AX ;
AX  stack top

There we can examine the flag values in register AX, The
net effect is the same like assigning AX with flags

Likewise, you can set the flag values using push AX then
popf
Subroutines
&
Macros
Subroutine Syntax
More on Parameters & Local Variables
• Note that we can not initialize local variables
• Of course you can do a mov to assign it with a
value later on
• The parameters are passed down through
stack using push and pop
A Word of Caution
 Since
procedures are built with the help of
stacks, you have to remember not to modify
SP and BP anytime in the subroutines
 It's
because SP is used to store stack
position and BP is used to store the stack
position before entering the subroutine
 Moreover,
when you modify certain
registers in a subroutine, it is likely you
interfering the main program
How to cope this situation then?
pusha
"push all " :
which basically stores (almost) all registers
popa
"pop all" :
to pop into the appropriate registers
How About Functions?

Subroutines that can return some values too

Usually, we designate registers to hold the output
or result for our subroutine


Many programmers tend to choose AX for this
purpose. If you have more than one output from
the subroutine, you can select multiple registers to
hold the results
Due to this nature, the output registers need not to
be saved nor restored because the caller itself
expects those designated registers to change
Functions example
 Let's
make a subroutine to calculate 1+2+...+n
Document a Subroutine
 It
is a good habit to document a subroutine.
At least give a comment above it
Routine Placement
Macros
Notice :
•We use macro and endm
keyword instead
•We may not specify the
parameter type
•There is no ret
instruction at the end
•There is no call keyword
Recap
The main differences (behavior-wise) are:


Macros use String replacement for its invocation
whereas subroutines use Calls
Due to replacement nature, macro can exist Multiple
copies in the programs whereas subroutine can exist only
in One copy

Because of multiple copies possibility, you cannot obtain a
macro's Address, whereas you can obtain a subroutine's
address

Macros can be faster since it doesn't have calling and return
time penalty

Macros can be harder to debug
Arrays
Array Revisited



To refresh our mind, declaring a ten-byte array is like this:
To load the 1st element of the array into register al is like
MM
this:
Accessing the 2nd, the 3rd, and the 4th element
is like this:
100
05
101
02
102
08
103
09
104
01
105
07
106
03
107
00
108
04
109
06
Access Array through a loop
MM
100
05
101
02
102
08
103
09
104
01
105
07
106
03
107
00
108
04
109
06
Reverse array example
MM
100
05
101
02
102
08
Note:
103
09
BX is nicked as ‘Base register'
104
01
SI as ‘Source Index'
105
07
DI as ‘Destination Index'
106
03
107
00
108
04
109
06
String
Instructions
5


There are five basic string instructions:
1.
LES, LDS
2.
MOVS
3.
CMPS
4.
SCAS
5.
STOS , LODS
These instructions can be "emulated“ with mov, cmp,
loop and jmp. However, these five brothers are a lot faster
since they are "built-in" instructions
LES DI and LDS SI

String instructions typically uses DS:SI pair to
denote the source string and ES:DI pair to denote
the destination string

The only thing we care is to set the register SI and
DI to point to the source and destination offset
respectively
LES DI, [SomeStringVar]
LDS SI, [OtherStringVar]

These instructions are used to set both ES and DI
or both DS and SI respectively
Direction Flag

After setting source and/or destination register pairs, you may
want to specify on how the string instruction is performed:
Should it be performed Backwards or Forwards?

Assembly can do these instructions in both directions

Determining which way to go involves setting the direction
flag. Intel x86 assembly has two instructions for this:
CLD ; Clear Direction Flag
STD

; Set Direction Flag
Clearing direction flag will cause the string instructions done
forward. Setting it will make a reverse direction
MOVS


The instruction movs is used to copy source string into
the destination. This instruction comes in two variants:
movsb and movsw
Since we'd like to move several bytes at a time, these
movs instructions are done in batches using rep prefix.
The number of movements is specified by CX register
CMPS


The instruction cmps is used to compare two strings. It
also has two variants: cmpsb and cmpsw
After the rep cmpsb, the zero flag is set if the result is
equal
SCAS

The instruction scas is used to scan a string pointed by ES:DI

Typically used for searching a particular character in a string

scas has two variants: scasb and scasw. In scasb, the
string ES:DI is searched for the occurrence of the element
specified by the register AL, whereas in scasw, the element to
be searched is in AX
STOS

The stos instruction fill the string pointed by ES:DI
pair with the value in AX. So, it is great when you'd
like to initialize arrays (usually with zeroes)

It has two variants: stosb and stosw. In stosb, all
bytes in the string ES:DI is replaced with whatever
AL contains. In stosw, the initializator is AX
contains
LODS
 The
lods instruction will load a chunk
(either a byte or a word) from the string
pointed by DS:SI into AX
 It
has two variants: lodsb and lodsw