Download Assembly Language - Kirkwood Community College

Document related concepts
no text concepts found
Transcript
Assembly Language
part 1
Some terminology
• Assembly language: a low-level language
that is a little more human-friendly than
machine language
• assembler: a program that translates
assembly language source code into
executable form
• object code: machine language program
(assembler output)
Types of assemblers
• Resident assembler: an assembler written
for its own platform, using the native
instruction set
• Cross assembler: assemble run on one
platform to produce object code for
another
• Disassembler: program that attempts to
recover source code from object code (not
100% successful)
Assembly language instructions
• Mnemonics: abbreviated words used instead of
machine language hex code;
– have one-to-one correspondence with underlying
instruction
– always possible to determine underlying machine
language statement from assembly language
mnemonic, but not vice-versa
• Pseudo-ops: assembly language statements
used mostly for data declaration; do not
correspond to specific machine language
instructions
Pep/8 assembly language
• General syntax notes:
– one instruction per line of code
– comments start with semicolon, continue until
end of line
– not case-sensitive
– Spacing:
• at least one space required after each instruction
(mnemonic or pseudo-op)
• otherwise doesn’t matter
– last line of program must be .END pseudo-op
Pep/8 Assembly Language
• Mnemonic instruction format:
– 2-6 letter instruction specifier (most or 3-4 letters)
– operand specifier, usually followed by a comma and
– 1-3 letter address mode specifier (most are 1)
• Examples:
LDA 0x0014,i ; load hex value 14 to A
LDX 0x1110,d ; load data at address 1110 into x
• Entire Pep/8 assembly language instruction set
is printed on the inside front cover (and on page
191) of your textbook
Pep/8 Assembly Language
• Addressing mode specifiers:
– i: immediate
– d: direct
– n: indirect
– s: stack-relative
– sf: stack-relative deferred
– x: indexed
– sx: stack-indexed
– sxf: stack-indexed deferred
Pep/8 Assembly Language
• Unimplemented opcodes
– instructions available at assembly language level,
even though they are not (directly) available at the
machine language level
– represent operations handled by the operating system
• They include:
–
–
–
–
–
NOPn: unary no operation trap
NOP: non-unary NOP
DECI: decimal input trap
DECO: decimal output trap
STRO: string output trap
Pseudo-ops
• .ADDRSS: used to crate labeled jump destinations
• .ASCII: specifies char string
• .BLOCK: allocates specified # of bytes, initializes whole
set to zero
• .BURN: used for OS configuration
• .BYTE: allocates one byte; can specify hex or decimal
content
• .END: stop code
• .EQUATE: equate symbol with literal value; like # define
in C/C++
• .WORD: allocates one word of memory
Example program 1
; Program example 1
CHARO 0x0010 ,d
CHARO 0x0011 ,d
CHARO 0x0012 ,d
CHARO 0x0013 ,d
CHARO 0X0014 ,d
STOP
.ASCII "Arrr!"
.END
Comment
Instructions: each outputs one
character; starting address of program
is 0000, and each instruction (except
STOP) is 3 bytes long; STOP is one
byte
Data
Object code & assembler output
from program example 1
Program example 2
go past data – minimizes offset
BR 0x0008
calculations (contrast with data after code)
.ASCII "#?"
“constant” declarations
.ASCII "\n"
“variable” declaration
.BLOCK 2
CHARO 0x0003, d
output prompt
CHARO 0x0004, d
read number
DECI 0x0006, d
CHARO 0x0005, d output newline character
DECO 0x0006, d
output number
STOP
.END
Data declaration & storage
• The previous example included two
different types of data declaration
instructions:
– the .ASCII pseudo-op is used to allocate a
contiguous set of bytes large enough to hold
the specified data; as with Java & C++, the
backslash character (\) is used as the escape
character for control codes, like newline
– the .BLOCK pseudo-op allocates the specified
number of bytes and initializes their values to 0
Data declaration & storage
• Two other pseudo-ops provide data
storage:
– the .WORD instruction allocates two bytes,
suitable for storage of integers
– the .BYTE instruction allocates one byte,
suitable for storage of characters
– like .ASCII (and unlike .BLOCK), both of these
instructions allow the programmer to specify
initial values, as shown on next slide
Initialization examples
• .WORD 7
; allocates 2 bytes, with
; decimal value 7
• .BYTE 0x2B ; allocate 1 byte, with hex
; value 2B (‘+’)
I/O instructions
• The DECI and DECO instructions considerably
ease the process of reading and writing
numbers
• Each one deals with word-size data, and
represent instructions not available in the
underlying machine language – thus they are
part of the set of unimplemented op codes
• The actual I/O is performed by the operating
system; the instructions generate program
interrupts that allow the OS to temporarily take
over to provide a service to the program
I/O instructions
• The CHARI and CHARO instructions are
simply assembly language versions of the
machine language input and output
instructions:
– read or write a byte of data
– data source (for output) and destination (for
input) are memory (not registers)
I/O instructions
• STRO is yet another example of an
unimplemented op code
• Outputs a string of data
– String can be predefined with the .ASCII
pseudo-op
– Predefined string must be terminated with a
null character: “\x00”
Arranging instructions and data
• In the first program example (see Monday’s
notes), as with all of the machine language
examples, instructions were placed first, ended
with a STOP code, and data followed
• Problems with this approach:
– requires address calculations based on the number of
instructions (which may not be known as you’re
writing a particular instruction)
– addresses may have to be adjusted if even minor
changes are made to the program
Putting the data first
• An easy solution to the problems
described on the previous slide was
illustrated by the program example; the
solution is twofold:
– declare the data first
– place an unconditional branch instruction at
the beginning of the program, pointing to the
first instruction after the data
– the following example provides another
illustration
Program example 3
br 0x0020 ; bypass data
.block 4 ; space for 2 ints
.ascii "Enter a number: \x00"
.ascii " + \x00"
.ascii " = \x00"
stro 0x0007,d ; prompt
deci 0x0003,d ; get 1st number
stro 0x0007,d ; prompt
deci 0x0005,d ; get 2nd number
deco 0x0003,d ; output 1st number
stro 0x0018,d ; output ascii string " + "
deco 0x0005,d ; output 2nd number
stro 0x001c,d ; output string " = "
lda 0x0003,d ; put the first # in A
adda 0x0005,d ; add 2nd # to first
sta 0x0003,d ; store sum
deco 0x0003,d ; output sum
stop
.end
Program example 4: using labels
br code
pirate: .ASCII "Arrr!\x00"
code: stro pirate ,d
STOP
.END
Symbols
• Symbols are assembler names for memory
addresses
• Can be used to label data or instructions
• Syntax rules:
–
–
–
–
start with letter
contain letter & digits
8 characters max
CASE sensitive
• Define by placing symbol label at start of line,
followed by colon
Symbol Table
• Assembler stores labels & corresponding
addresses in lookup table called symbol
table
• Value of symbol corresponds to 1st byte of
memory address (of data or instruction)
• Symbol table only stores label & address,
not nature of what is stored
• Instruction can still be interpreted as data,
& vice versa
Example
Example program:
this: deco this, d
stop
.end
Output:
14592
What happened?
High Level Languages & Compilers
• Compilers translate high level language
code into low level language; may be:
– machine language
– assembly language
– for the latter an additional translation step is
required to make the program executable
C++/Java example
// C++ code:
#include <iostream.h>
#include <string>
string greeting =
“Hello world”;
int main ()
{
cout <<
greeting
<< endl;
return 0;
}
// Java code:
public class Hello {
static String greeting =
“Hello world”;
public static void main
(String [] args) {
System.out.print
(greeting);
System.out.print(‘\n’);
}
}
Assembly language (approximate)
equivalent
br main
greeting: .ASCII "Hello world \x00"
main: stro greeting, d
charo '\n', i
stop
.end
Data types
• In a high level language, such as Java or C++,
variables have the following characteristics:
– Name
– Value
– Data Type
• At a lower level (assembly or machine
language), a variable is just a memory location
• The compiler generates a symbol table to keep
track of high level language variables
Symbol table entries
The illustration drove shows a snippet of output
from the Pep/8 assembler. Each symbol table
entry includes:
• the symbol
• the value (of the symbol’s start address)
• the type (.ASCII in this case)
Pep/8 Branching instructions
• We have already seen the use of BR, the
unconditional branch instruction
• Pep/8 also includes 8 conditional branch
instructions; these are used to create
assembly language control structures
• These instructions are described on the
next couple of slides
Conditional branching instructions
• BRLE:
– branch on less than or equal
– how it works: if N or Z is 1, PC = operand
• BRLT:
– branch on less than
– how it works: if N is 1, PC = operand
• BREQ:
– branch on equal
– how it works: if Z is 1, PC = operand
• BRNE:
– branch on not equal
– how it works: if Z is 0, PC = operand
Conditional branching instructions
• BRGE:
– branch on greater than or equal
– if N is 0, PC = operand
• BRGT:
– branch on greater than
– if N and Z are 0, PC = operand
• BRV:
– branch if overflow
– if V is 1, PC = operand
• BRC:
– branch if carry
– if C is 1, PC = operand
Example
HLL code:
int num;
Scanner kb = new Scanner();
System.out.print
(“Enter a number: ”);
num = kb.nextInt();
if (num < 0)
num = -num;
System.out.print(num);
Pep/8 code:
br main
num: .block 2
prompt: .ascii "Enter a number: \x00"
main: stro prompt, d
deci num, d
lda num, d
brge endif
lda num, d
nega
; negate value in a
sta num, d
endif: deco num, d
stop
.end
Analysis of example
Pep/8 code:
br main
num: .block 2
prompt: .ascii "Enter a number: \x00"
main: stro prompt, d
deci num, d
lda num, d
brge endif
lda num, d
nega
; negate value in a
sta num, d
endif: deco num, d
stop
.end
The if statement, if
translated back to Java,
would now be more like:
if (num >= 0);
else
num = -num;
This part requires a little
more explanation; see
next slide
Analysis continued
• A compiler must be programmed to translate
assignment statements; a reasonable translation
of x = 3 might be:
– load a value into the accumulator
– evaluate the expression
– store result to variable
• In the case above (and in the assembly
language code on the previous page),
evaluation of the expression isn’t necessary,
since the initial value loaded into A is the only
value involved (the second load is really the
evaluation of the expression)
Compiler types and efficiency
• An optimizing compiler would perform the
necessary source code analysis to recognize
that the second load is extraneous
– advantage: end product (executable code) is shorter
& faster
– disadvantage: takes longer to compile
• So, an optimizing compiler is good for producing
the end product, or a product that will be
executed many times (for testing); a nonoptimizing compiler, because it does the
translation quickly, is better for mid-development
Another example
HLL code:
Pep/8 code:
final int limit = 100;
int num;
br main
limit:
.equate 100
num:
.block 2
high:
.ascii "high\x00"
low:
.ascii "low\x00"
prompt: .ascii "Enter a #: \x00"
main:
stro prompt, d
deci num, d
if:
lda num, d
cpa limit, i
brlt else
stro high, d
br endif
else:
stro low, d
endif:
stop
.end
System.out.print(“Enter a #: ”);
if (num >= limit)
System.out.print(“high”);
else
System.out.print(“low”);
Compare instruction: cpr where
r is a register (a or x):
action same as subr except
difference (result) isn’t stored in
the register – just sets status bits –
if N or Z is 0, <= is true
Writing loops in assembly language
• As we have seen, an if or if/else structure in
assembly language involves a comparison and
then a (possible) branch forward to another
section of code
• A loop structure is actually more like its high
level language equivalent; for a while loop, the
algorithm is:
– perform comparison; branch forward if condition isn’t
met (loop ends)
– otherwise, perform statements in loop body
– perform unconditional branch back to comparison
Example
• The following example shows a C++
program (because this is easier to
demonstrate in C++ than in Java) that
performs the following algorithm:
– prompt for input (of a string)
– read one character
– while (character != end character (‘*’))
• write out character
• read next character
C++ code
#include <iostream.h>
int main ()
{
char ch;
cout << “Enter a line of text, ending with *” << endl;
cin.get (ch);
while (ch != ‘*’)
{
cout << ch;
cin.get(ch);
}
return 0;
}
;am3ex5
br main
ch:
prompt:
main:
while:
endW:
.end
Pep/8 code
.block 1
.ascii "Enter a line of text ending with *\n\x00"
stro prompt, d
chari ch, d
; initial read
lda 0x0000, i
; clear accumulator
ldbytea ch, d
; load ch into A
cpa '*', i
breq endW
charo ch, d
chari ch, d
; read next letter
br while
stop
Do/while loop
• Post-test loop: condition test occurs after
iteration
• Premise of sample program:
– cop is sitting at a speed trap
– speeder drives by
– within 2 seconds, cop starts following, going 5
meters/second faster
– how far does the cop travel before catching
up with the speeder?
C++ version of speedtrap
#include <iostream.h>
#include <stdlib.h>
int main()
{
int copDistance = 0;
// cop is sitting still
int speeder;
// speeder's speed: entered by user
int speederDistance;
// distance speeder travels from cop’s position
cout << "How fast is the driver going? (Enter whole #): ";
cin >> speeder;
speederDistance = speeder;
do
{
copDistance += speeder + 5;
speederDistance += speeder;
} while (copDistance < speederDistance);
cout << "Cop catches up to speeder in " << copDistance << " meters." << endl;
return 0;
}
Pep/8 version
;speedTrap
br main
cop: .block 2
drvspd: .block 2
drvpos: .block 2
prompt: .ascii "How fast is the driver going? (Enter whole #): \x00"
outpt1: .ascii "Cop catches up to speeder in \x00"
outpt2: .ascii " meters\n\x00"
main: lda 0, i
sta cop, d
stro prompt, d
deci drvspd, d
; cin >> speeder;
ldx drvspd, d
; speederDistance = speeder;
stx drvpos, d
Pep/8 version continued
do:
lda 5, i
adda drvspd, d
adda cop, d
sta cop, d
addx drvspd, d
stx drvpos, d
while: lda cop, d
cpa drvpos, d
brlt do
stro outpt1, d
deco cop, d
stro outpt2, d
stop
.end
; copDistance += speeder + 5;
; speederDistance += speeder;
; while (copDistance <
;
speederDistance);
For loops
• For loop is just a count-controlled while
loop
• Next example illustrates nested for loops
C++ version
#include <iostream.h>
#include <stdlib.h>
int main()
{
int x,y;
for (x=0; x < 4; x++)
{
for (y = x; y > 0; y--)
cout << "* ";
cout << endl;
}
return 0;
}
Pep/8 version
;nestfor
br main
x: .word 0x0000
y: .word 0x0000
main:
sta x, d
stx y, d
outer:
adda 1, i
cpa 5, i
breq endo
sta x, d
ldx x, d
inner:
charo '*', i
charo ' ', i
subx 1, i
cpx 0, i
brne inner
charo '\n', i
br outer
endo: stop
.end
Notes on control structures
• It’s possible to create “control structures”
in assembly language that don’t exist at a
higher level
• Your text describes such a structure,
illustrated and explained on the next slide
A control structure not found in
nature
• Condition C1 is tested; if true,
branch to middle of loop (S3)
• After S3 (however you happen to
get there – via branch from C1 or
sequentially, from S2) test C2
• If C2 is true, branch to top of loop
• No way to do this in C++ or Java
(at least, not without the dreaded
goto statement)
High level language programs vs.
assembly language programs
• If you’re talking about pure speed, a program in
assembly language will almost always beat one
that originated in a high level language
• Assembly and machine language programs
produced by a compiler are almost always
longer and slower
• So why use high level languages (besides the
fact that assembly language is a pain in the
patoot)
Why high level languages?
• Type checking:
– data types sort of exist at low level, but the
assembler doesn’t check your syntax to
ensure you’re using them correctly
– can attempt to DECO a string, for example
• Encourages structured programming
Structured programming
• Flow of control in program is limited to
nestings of if/else, switch, while, do/while
and for statements
• Overuse of branching instructions leads to
spaghetti code
Unstructured branching
• Advantage: can lead to faster, smaller programs
• Disadvantage: Difficult to understand
– and debug
– and maintain
– and modify
• Structured flow of control is newer idea than
branching; a form of branching with gotos by
another name lives on in the Java/C++
switch/case structure
Evolution of structured
programming
• First widespread high level language was FORTRAN; it
introduced a new conditional branch statement:
if (expression) GOTO new location
• Considered improvement over assembly language –
combined CPr and BR statements
• Still used opposite logic:
if (expression not true) branch else
// if-related statements here
branch past else
else:
// else-related statements here
destination for if branch
Block-structured languages
• ALGOL-60 (introduced in 1960 – hey, me
too) featured first use of program blocks
for selection/iteration structures
• Descendants of ALGOL include C, C++,
and Java
Structured Programming Theorem
• Any algorithm containing GOTOs can be written
using only nested ifs and while loops (proven
back in 1966)
• In 1968, Edsgar Dijkstra wrote a famous letter to
the editor of Communications of the ACM
entitled “gotos considered harmful” – considered
the structured programming manifesto
• It turns out that structured code is less
expensive to develop, debug and maintain than
unstructured code – even factoring in the cost of
additional memory requirements and execution
time