Download lesson2 - USF Computer Science Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
IA32 programming for Linux
Concepts and requirements for
writing Linux assembly language
programs for Pentium CPUs
A source program’s format
•
•
•
•
•
•
•
Source-file: a pure ASCII-character textfile
Is created using a text-editor (such as ‘vi’)
You cannot use a ‘word processor’ (why?)
Program consists of series of ‘statements’
Each program-statement fits on one line
Program-statements all have same layout
Design in 1950s was for IBM punch-cards
Statement Layout (1950s)
• Each ‘statement’ was comprised of four ‘fields’
• Fields appear in a prescribed left-to-right order
• These four fields were named (in order):
-- the ‘label’ field
-- the ‘opcode’ field
-- the ‘operand’ field
-- the ‘comment’ field
• In many cases some fields could be left blank
• Extreme case (very useful): whole line is blank!
The ‘as’ program
•
•
•
•
•
•
•
The ‘assembler’ is a computer program
It accepts a specified text-file as its input
It must be able to ‘parse’ each statement
It can produce onscreen ‘error messages’
It can generate an ELF-format output file
(That file is known as an ‘object module’)
It can also generate a ‘listing file’ (optional)
The ‘label’ field
• A label is a ‘symbol’ followed by a colon (‘:’)
• The programmer invents his own ‘symbols’
• Symbols can use letters and digits, plus a very
small number of ‘special’ characters ( ‘.’, ‘_’, ‘$’ )
• A ‘symbol’ is allowed to be of arbitrarily length
• The Linux assembler (‘as’) was designed for
translating source-text produced by a high-level
language compiler (such as ‘cc’)
• But humans can also write such files directly
The ‘opcode’ field
• Opcodes are predefined symbols that are
recognized by the GNU assembler
• There are two categories of ‘opcodes’
(called ‘instructions’ and ‘directives’)
• ‘Instructions’ represent operations that the
CPU is able to perform (e.g., ‘add’, ‘inc’)
• ‘Directives’ are commands that guide the
work of the assembler (e.g., ‘.globl’, ‘.int’)
Instructions vs Directives
• Each ‘instruction’ gets translated by ‘as’
into a machine-language statement that
will be fetched and executed by the CPU
when the program runs (i.e., at ‘runtime’)
• Each ‘directive’ modifies the behavior of
the assembler (i.e., at ‘assembly time’)
• With GNU assembly language, they are
easy to distinguish: directives begin with ‘.’
A list of the Pentium opcodes
• An ‘official’ list of the instruction codes can
be found in Intel’s programmer manuals:
http://developer.intel.com
• But it’s three volumes, nearly 1000 pages
(it describes ‘everything’ about Pentiums)
• An ‘unofficial’ list of (most) Intel instruction
codes can fit on one sheet, front and back:
http://www.jegerlehner/intel/
The AT&T syntax
• The GNU assembler uses AT&T syntax
(instead of official Intel/Microsoft syntax)
so the opcode names differ slightly from
names that you will see on those lists:
Intel-syntax
--------------ADD
INC
CMP



AT&T-syntax
---------------------addb/addw/addl
incb/incw/incl
cmpb/cmpw/cmpl
The UNIX culture
• Linux is intended to be a version of UNIX (so
that UNIX-trained users already know Linux)
• UNIX was developed at AT&T (in early 1970s)
and AT&T’s computers were built by DEC, thus
UNIX users learned DEC’s assembley language
• Intel was early ally of DEC’s competitor, IBM,
which deliberately used ‘incompatible’ designs
• Also: an ‘East Coast’ versus ‘West Coast’ thing
(California, versus New York and New Jersey)
Bytes, Words, Longwords
• CPU Instructions usually operate on data-items
• Only certain sizes of data are supported:
BYTE: one byte consists of 8 bits
WORD: consists of two bytes (16 bits)
LONGWORD: uses four bytes (32 bits)
• With AT&T’s syntax, an instruction’s name also
incorporates its effective data-size (as a suffix)
• With Intel syntax, data-size usually isn’t explicit,
but is inferred by context (i.e., from operands)
The ‘operand’ field
• Operands can be of several types:
-- a CPU register may hold the datum
-- a memory location may hold the datum
-- an instruction can have ‘built-in’ data
-- frequently there are multiple data-items
-- and sometimes there are no data-items
• An instruction’s operands usually are ‘explicit’,
but in a few cases they also could be ‘implicit’
Examples of operands
• Some instruction that have two operands:
movl
%ebx, %ecx
addl
$4, %esp
• Some instructions that have one operand:
incl
%eax
pushl
$fmt
• An instruction that lacks explicit operands:
ret
The ‘comment’ field
• An assembly language program often can
be hard for a human being to understand
• Even a program’s author may not be able
to recall his programming idea after awhile
• So programmer ‘comments’ can be vital
• A comments begin with the ‘#’ character
• The assembler disregards all comments
(but they will appear in program listings)
‘Directives’
•
•
•
•
•
•
Sometimes called ‘pseudo-instructions’
They tell the assembler what to do
The assembler will recognize them
Their names begin with a dot (‘.’)
Examples: ‘.section’, ‘.global’, ‘.int,’ …
The names of valid directives appears in
the table-of-contents of the GNU manual
New program example
• Let’s look at a demo program (‘squares.s’)
• It prints out a mathematical table showing some
numbers and their squares
• But it doesn’t use any multiplications!
• It uses an algorithm based on algebra:
(n+1)2 - n2 = n + n + 1
If you already know the square of a given
number n , you can get the square of the
next number n+1 by just doing additions
Visualizing the algorithm idea
n
n
(n + 1)2 = n2 + 2n + 1
A program with a ‘loop’
• Here’s our program idea (expressed in C)
int num = 1, val = 1;
do {
printf( “ %d %d \n”, num, val );
val += num + num + 1;
num += 1;
}
while ( num <= 20 );
Some new ‘directives’
• ‘.equ’ – equates a symbol to a value:
.equ
MAX, 20
• ‘.globl’ – just an alternative for ‘.global’:
.globl
main
Some new ‘instructions’
• ‘inc’ – adds one to the specified operand:
incl
arg
• ‘cmp’ – compares two specified operands:
cmpl
$max, arg
• ‘jle’ – jump (to a specified instruction) if
condition ‘less than or equal to’ is true:
jle
again
Comparisons can be ‘tricky’
• It’s easy to get confused by AT&T syntax:
mov $5, %eax
while:
inc %eax
cmp $5, %eax
jle
while
(e.g., will this loop ever finish executing?)
• REMEMBER: ‘compare’ means ‘subtract’
The FLAGS register
O
F
D
F
Legend:
I
F
T
F
S
F
Z
F
0
A
F
ZF = Zero Flag
SF = Sign Flag
CF = Carry Flag
PF = Parity Flag
OF = Overflow Flag
AF = Auxiliary Flag
0
P
F
1
C
F
In-class exercise #1
• How would you modify the source-code for
the ‘squares’ program so that it prints out a
larger table (i.e., more than 20 lines)?
• How many squares can you display on the
screen before your program starts to show
‘wrong’ entries?
In-class exercise #2
• Can you write a program that prints out a
table showing powers of 2 (it’s useful for
computer science students to keep handy)
• Can you see how to do it without using
any ‘multiply’ operations – just additions?
• Hint: study the ‘squares.s’ source-code
• Then write your own ‘powers.s’ solution
• Turn in printouts (source and its output)