Download Programming - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PROGRAMMING
I. Programming
A. Options
1. C/C++
2. Linear Assembly
3. Assembly
4. Mixed: C/Assembly, Assembly/Assembly, etc
B. Choice depends on real-time requirements and optimization achieved with each language
– will look at this in next chapter.
II. Assembly Language Programming
A. Basics
1. Format
label
parallel bars
[condition]
instruction
unit
operands
; comments
A label identifies a line of code or a variable and represents a memory address that
contains either an instruction or data.
Ex.
x_addr
.short
1,2,3,4
; numbers in x array
(x_addr can be used as a pointer)
Labels must meet the following conditions:
The first character must be a letter or an underscore ( _ )
The first character must be in the first column
(All other elements of the line cannot be in the first column.)
Labels can include up to 32 alphanumeric characters.
An instruction that executes in parallel with the previous instruction is signified by
parallel bars (||).
[ ] Represents a condition
A1, A2, B0, B1, B2 are available for use as conditional registers.
[A2] means that the following instruction should execute if A2 is not zero.
[!A2] means the instruction should execute if A2 is zero.
Instructions are either directives or mnemonics:
- Directives are commands for the assembler that control the assembly process or
define the data structures (constants and variables) in the program. All assembler
directives begin with a period.
- Mnemonics are the actual microprocessor instructions that execute at runtime
and perform the operations in the program.
The functional unit in the assembly code is optional. The functional unit can be used
to document which resource each instruction uses and can be used to optimize
performance.
Operands – three types:
· Register operands indicate a register that contains the data.
· Constant operands specify the data within the assembly code.
· Pointer operands contain addresses of data values.
Only the load and store instructions require and use pointer operands to move data
values between memory and a register.
Instructions have the following requirements for operands in the assembly code:
· The destination operand must be in the same register file as one source operand.
· In an execute packet, for a functional unit on each side, one source operand can
come from the opposite register file.
2. Assembler directives – a command for the assembler. Indicates assembly code
sections and declares data types.
a. Directives that define sections
The smallest unit of an object file is called a section.
A section is a block of code or data that occupies space in the memory map with
other sections.
There are two basic types of sections:
Initialized sections contain data or code. Ex: .text, .data, .sect
Uninitialized sections reserve space in the memory map for uninitialized
data. It is a place for creating and storing variables during execution.
Ex: .bss, .usect
Object files always contain three default sections:
.text section - contains executable code
.data section - contains initialized data
.bss section - reserves space for uninitialized variables
The assembler allows you to create sections through directives.
.text – to declare the following section as program (not needed but for memory
allocation)
.data – to declare the following section as data
.sect “name” – defines a section of code or data named ‘name’ and associates
subsequent code or data with that section. ex: mydata
.bss symbol, size in bytes – reserves size bytes in the .bss (uninitialized data)
symbol .usect “section name”, size in bytes - reserves space in an uninitialized
named section. The .usect directive is similar to the .bss directive, but it
allows you to reserve space separately from the .bss section.
b. Directives that reference or define files
.def symbol - identifies a symbol that is defined in the current module
and that can be used in another module. Defines a routine.
.ref symbol - identifies a symbol that is used in the current module but
is defined in another module. Used for a subroutine called in the main
program.
.global symbol - declares a symbol external so that it is available to other
modules at link time. The .global directive does double duty, acting as a .def
for defined symbols and as a .ref for undefined symbols.
A symbol can be declared global for either of two reasons:
- If the symbol is not defined in the current module, the .global or .ref directive
tells the assembler that the symbol is defined in an external module.
- If the symbol is defined in the current module, the .global or .def directive
declares that the symbol and its definition can be used externally by other
modules.
c. Directives that initialize constants (data and memory)
.byte value – Initializes a byte in memory. Reserves 8 bits in memory and fills it
with the specified value
.short value - Initializes a 16-bit integer. 2s complement/binary (-2^15 to 2^15-1)
.float value - Initializes a 32-bit floating-point constant
.int value or .word value – Initializes a 32-bit integer -2^31 to 2^31-1
.double value - Initializes a 64-bit constant in memory
d. Directives that define symbols at assembly time
symbol .equ value - Equates a constant value to a symbol
symbol .set value
*The symbol is a label that must appear in the label field.
3. Mnemonics – Instructions
a. Location: in CCS: Help > Contents > Instruction Set Summary
Each instruction provides the following;
Syntax: ADD (.unit) src1, src2, dst
.unit = .L1, .L2, .S1, .S2
src and dst indicate source and destination
src – what is being operated on
dst – the result
Operand table:
src1
src2
dst
sint
xsint
sint
.L1, .L2
ADD
Tells what type of operands can be used.
S means signed; u unsigned.
Any operand that begins with x can be read from a register file that is
different from the destination register file – this is called a cross path.
Instruction type and delay slots. Useful for pipelining/constraints.
[Functional latency for a given instruction type can be found in CCS:
Help > CPU Reference Guide > ‘C67x Pipeline > Pipeline Execution of Instruction
Types]
b. Add/Subtract/Multiply
ADD .L1 A3, A7, A7 ; add A3+A7 -> A7
SUB .S1 A1, 1, A1 ; A1-1 -> A1 using the S unit
MPY .M2 A7,B7,B6 ;multiply 16LSBs of A7,B7 -> B6
|| MPYH .M1 A7,B7,A6 ;multiply 16MSBs of A7,B7 -> A6
c. Load/Store (.D unit)
The address register to be used must be on the same side as the .D unit.
LDH .D2 *B2++,B7 ; load (B2) -> B7, increment B2
|| LDH .D1 *A2++,A7 ; load (A2) -> A7, increment A2
Loads the half-word at the address pointed by B2 into B7. Then B2 is
incremented to point at the next higher memory address.
NOT valid: LDH .D2 *A2++,B7
Can use LDW to load a 32-bit word into each side or LDDW to load two 32-bit
words into each side. This is a way to get more data in at one time.
STW .D2 A1,*B4 ; store A1 -> (B4)
Stores the 32-bit word in A1 into memory whose address is pointed by B4.
Data Address Paths
- The .D functional units access general memory (not register files) through data
address paths. There is one for each side, T1 and T2. The .D unit from one side
can use the data address path from the other side by adding the T1 or T2 to the
functional unit.
- Ex: LDW .D1T2 *A0, B3
or
LDW .D1 *A0, B3
This command loads a 32-bit word based on the address provided by .D1, i.e.
from register file A, *A0. But the data access path T2 is used in order to put the
data into the B register file, B3.
d. Branch/Move
x
.short
1, 2, 3
Loop MVK .S1 x,A4 ;move 16LSBs of x address -> A4
MVKH .S1 x,A4 ;move 16MSBs of x address -> A4
.
.
.
SUB .S1 A1,1,A1 ; decrement A1
[A1] B
.S2 Loop
; branch to Loop if A1 is not equal to 0
NOP 5
STW .D1 A3,*A7 ; store A3 into (A7)
e. Division – is done by taking the reciprocal of the denominator and then multiplying
by the numerator. One single-precision floating point instruction: RCPSP
4. Cross-Paths
a. There are two data cross-paths, 1X and 2X, which allow the functional units on
one side to access data from the register file on the other side.
b. There can only be two cross-path reads per cycle. So only one functional unit
per data path per execute cycle can get an operand from the opposite register
file.
Ex:
MPY .M2 A7,B7,B6 ;multiply 16LSBs of A7,B7 -> B6
|| MPYH .M1 A7,B7,A6 ;multiply 16MSBs of A7,B7 -> A6
B. Programming Constraints
1. Functional Unit Constraints
a. The same two functional units cannot be used in parallel.
b. Location: in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline >
Functinal Unit Constraints
A functional unit may not be available for another instruction during an execute
cycle because of performing certain operations, like reading or writing. Important
for pipelining.
c. Ex. Let's look at two instructions: 16x16 (integer – short) multiply MPY (fixedpoint instruction) and 32x32 (integer) multiply MPYI (floating-point instruction).
MPY: in CCS: Help > Contents > Instruction Set Summary > 'C62x/'C64x/'C67x
(Shared) Fixed-Point Instructions
Pipeline Stage
Read
Write
Unit in use
E1
E2
src1,2
dst
.M
So we see that we have an instruction with a functional unit latency of 1
and a delay slot of 1.
MPYI: in CCS: Help > Contents > Instruction Set Summary > 'C67x (Specific)
Floating-Point Instructions
E1
E2
E3
E4
… … … E9
src1,2 src1,2 scr1,2 scr1,2
dst
.M
.M
.M
.M
Pipeline Stage
Read
Write
Unit in use
So we have an instruction with a functional unit latency of 4 and a delay
slot of 8.
So we find in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline >
Functinal Unit Constraints for MPYI:
1
R
MPYI
2
R
3
R
4
R
… …
8
9
W
Subsequent Same Unit Instruction
16x16 multiply
Xr
Xr = a read conflict
Xr
Xr
Xw
Xw=a write conflict
In other words, a MPY instruction cannot follow a MPYI instruction ON
THE SAME unit during the MPYI's E1, E2, E3, E4, and E8 phases.
Valid:
MPYI
ADD
SUB
ADD
MPY
.M1
MPYI
NOP
MPY
.M1
3
.M1
.M1
Or:
2. Cross-Path Constraints
a. Recall: only one functional unit per data path per execute cycle can get an
operand from the opposite register file.
b. Location: in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline >
Functinal Unit Constraints provides the cross-path constraints
So we find for the MPYI instruction
1
MPYI
R
2
R
3
R
4
R
… …
8
9
W
Same Side, DifferentUnit, Both Using Cross-Path
Single Cycle
Xr
Xr
Xr
Xr = a read conflict
Valid:
MPYI
ADD
.M1
.S1
A1,B1,A2
3,A3,A3
.M1
.S1
A1,B1,A2
3,B3,A3
Not Valid:
MPYI
ADD
3. Load/Store Constraints
Loading and storing cannot be done from/to the same register file.
Valid:
LDW .D1 *A0,B1 (use data address path T2 to load into B)
|| STW .D2 A1,*B2 (use data address path T1 to get data to store in memory)
(Note: both addresses come from the same register file as the functional unit)
Not valid:
LDW .D1 *A0,A1 (use data address path T1 to load data into A)
|| STW .D2 A2,*B2 (use data address path T1 to get data from A to store in
memory)
C. File Structure
Main program: generally call it init and have a vectors program which initializes to init
Calling Assembly Language Subroutines (true if called from C or assembly language)
Arguments are passed to the subroutine through register A4,B4,A6, … in that order.
Result is passed through A4.
The return address is in B3.
In C, assembly function must have an underscore at the beginning: _func
The name of the *.c file cannot be the same as the *.asm file.
In C, external declaration of an assembly function is optional, i.e. extern int func();
D. Examples
Example 1: Assembly calling assembly program to do dot product
Dotp_init.asm: ASM program to init variables. Calls dotpfunc.asm
.def
init
;starting address
.ref
dotpfunc
;subroutine
.text
;section for code follows
x_addr
.short 1,2,3,4
;numbers in x array
y_addr
.short 0,2,4,6
;numbers in y array
result_addr
.short 0
;initialize sum of products,
;address for result
init
MVK .S1
MVKH .S1
x_addr,A4
x_addr,A4
;16 LSBs address of x in A4
;16 MSBS address of x in A4
ret_addr
wait
MVK
MVKH
MVK
B
MVK
MVKH
NOP
.S2
.S2
.S1
.S1
.S2
.S2
y_addr,B4
y_addr,B4
4,A6
dotpfunc
ret_addr,B3
ret_addr,B3
3
;B4 since we pass arguments in this way
MVK
MVKH
STW
B
NOP
.S1
.S1
.D1
.S1
result_addr,A0
result_addr,A0
A4,*A0
;store result
wait
5
;A6 is another argument size of array
;branch to the subroutine
;B3 is the return address for dotpfunc
Dotpfunc.asm Dot product subroutine
dotpfunc
loop
[A1]
.def
dotpfunc
.text
MV
A6,A1
ZERO
A7
;define dot product function
LDH
LDH
NOP
MPY
NOP
ADD
SUB
B
NOP
*A4++,A2
*B4++,B2
4
A2,B2,A3
;load half-word x(1) to A2
;B2=y(1) these two could be in parallel
A3,A7,A7
A1,1,A1
loop
5
;sum of products in A7
;decrement loop counter
;branch back to loop until A1=0
A7,A4
B3
5
;put the result into the return register
;branch to addr in B3 return_addr
MV
B
NOP
.D1
.D2
.M1
.L1
.L1
.S1
.S2
;move loop count to conditional register
;init A7 for sum of products
;A3=x*y
Example 2: C calling assembly language subroutine
Dotp.c
#include <stdio.h>
#define count 4
short x[4] = {1,2,3,4};
short y[4] = {0,2,4,6};
int result;
main()
{
result = dotpfunc(x,y,count);
printf("result = %d \n", result);
}
Change the dotpfunc.asm so that
.def _dotpfunc
_dotpfunc
MV
A6,A1
III. Linear Assembly Programming
A. Basics
1. Assembler Optimizer
An assembler optimizer (instead of C compiler) is used with a linear assembly
program (*.sa) to create an assembly source program (*.asm). Usually more efficient
than code generated from C compiler.
Assembler optimizer assigns the functional unit and registers to use, finds instructions
that can execute in parallel, and performs pipelining.
2. General Programming
Parallel instructions are not valid in a linear assembly program.
Specifying the functional unit, register, or NOPs is optional.
Use syntax of assembly code instructions: ADD, SUB
Use operands as used in C. Variables are used to designate the registers.
A C program calling a linear assembly subroutine requires that the subroutine be
_func.
3. Directives
.cproc and .endproc specifies a C-callable procedure or section of code to be
optimized by the assembler optimizer. The variables being passed must follow the
.cproc directive:
.cproc x,y,count
.proc and .endproc starts and ends a general procedure, i.e. no arguments passed.
.return is used to return result to calling function.
.reg is to declare variables and use descriptive names for values that will be stored in
registers. When you use .reg, the assembly optimizer chooses a register whose use
agrees with the functional units chosen for the instructions that operate on the value.
.reg
mv
a,b
5,a
;represents registers which will be determined by optimizer.
;moves 5 to the register the optimizer assigns to a
.def defines a function
.trip must be included for the optimizer to pipeline code for a loop. Specifies the
number of times a loop iterates.
loop
.trip
4,20,4
loop will iterate a minimum of 4 times, max of 20, in
multiples of 4, i.e. 4,8,12,16,20
loop
.trip
4
loop iterates at least 4 times
loop
.trip
4,10
minimum of 4 times, max of 10 times
B. Example
Dot product: C program calling a linear assembly program
Dotp.c
#include <stdio.h>
#define count 4
short x[4] = {1,2,3,4};
short y[4] = {0,2,4,6};
int result;
main()
{
result = dotpfunc(x,y,count);
printf("result = %d \n", result);
}
Dotpfunc.sa Linear assembly program to do a dot product
_dotpfunc
loop
[count]
.def
_dotpfunc
.cproc x,y,count
.reg
a,b,prod,sum
;defines the function
;start linear asm section
;define variables for registers
ZERO
.trip
LDH
LDH
MPY
ADD
SUB
B
;initialize sum of products
;exactly 4 iterations through loop
;pointer to x array > a
;put an element of y into b
;prod=x*y
;sum of products > sum
;decrement counter
;go to loop if count is not equal to 0
sum
4,4
*x++,a
*y++,b
a,b,prod
prod,sum,sum
count,1,count
loop
.return sum
.endproc
;return sum as the result
;end linear assembly function