Download 01. ARM Architecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Embedded System Design Center
ARM7TDMI
ARM Architecture
Sai Kumar Devulapalli
Objectives
• To know the historical background in which ARM Processor
is designed.
• To understand the organization and architecture of
ARM7TDMI Processor Core
• To learn how data is handled in ARM7TDMI
• Introduction to exception handling modes and other modes
available in ARM7TDMI Processor.
• Register organization of ARM7TDMI and the usage of
registers.
2 of 42
Acorn Computer?
• Acorn Computers Limited, based in Cambridge, England.
• In 1979, Acorn Atom released. Made up with 6502 1Mhz 8
bits CPU.
• Not Very famous.
• When Acorn meet BBC ( British Broadcasting Corporation),
new computer was born?
3 of 42
BBC micro?
• In 1981, BBC “The Computer
Programme” project need to have a
computer to demonstrate various
tasks including
“teletext/telesoftware, comms,
controlling hardware, programming,
artificial intelligence, graphics,
sound and music, etc. “
• The Acorn team worked very hard
to make a prototype to BBC and
finally BBC accepted their design.
4 of 42
Great Success of BBC micro.
* As BBC micro used in the programme of the BBC. It becomes
very famous in U.K.
* Many U.K. Schools/research lab brought the BBC micro.
* Later the acorn engineers try to start with new processor.
5 of 42
The Birth of ARM.
• As acorn can’t find any processor ready on the market is
acceptable for their needs, they want to design new processor.
• Make new processor need great investment and experience?
• Luckily the papers from the Berkeley RISC I were designed.
• After some custom modifications by acorn, new RISC
processor was born !
• The ARM ( Advanced RISC Machine ).
6 of 42
History of ARM
Acorn - a Computer Manufacturer
1983:
• Acorn Limited:
• Dominant position in UK personal computer market with Rockwell
6502 (8- Bit) CPU.
1983:
• 16- Bit CISC CPU´s slower than standard memory ports with long
interrupt latencies
1983- 85:
• Acorn designed the first commercial RISC CPU:
• Acorn Risc Machine (ARM)
1990:
• Advanced Risc Machine was formed to broaden the market beyond
Acorn´s product range
7 of 42
History of ARM..
1990:
• Startup with 12 engineers and 1 CEO
• No patents, no customers, very little money
Mid- 1990s:
• T. I. licensed ARM7
• Incorporated into a chip for mobile phones
IPO Spring 1998
• 13 millionaires
8 of 42
Architectural Inheritance from Berkeley RISC I
Used:
• Load- store architecture
• Fixed- length 32- bit instructions
• 3 address format
Rejected:
• Register windows=> Costly
– Use Shadow Registers in ARM
• Delayed branches
• Single cycle execution of all instructions
• Memory Access
– Multiple Cycles when no separate data and instruction memory
support
– Auto-indexing Addressing Modes
Result: RISC with a few CISC features
9 of 42
What is RISC/CISC?
Reduced Instruction Set Computer
• Fewer Addressing modes.
• Fewer Instructions available.
• For example, ARM, NEC VR series.
Complex Instruction Set Computer
• More Instructions available
• Many addressing modes.
• For example, Intel x86.
10 of 42
Advantages of RISC?
• Smaller die size
• Simple instructions - simple processor require less
transistors.
• Shorter development time
• Simple processor take less effort to design.
• Higher performance?
• Disadvantages:
• Complex compiler
• poor code density
11 of 42
The ARM programmers´ model
• ARM is a Reduced Instruction Set Computer (RISC).
• It has:
• a large, regular register file
– any register can be used for any purpose
• a load- store architecture
– instructions which reference memory
– just move data, they do no processing
– processing uses values in registers only
• Fixed length instructions
– 32 bit Arm Instruction Set
– 16 bit Thumb Instruction Set
12 of 42
Main Features
•
•
•
•
•
A large set of general purpose registers
A load – store architecture
3- address instructions
Conditional execution for every instruction
Inclusion of very powerful load-store multiple register
instructions
• Ability to perform general shift & general ALU operation in 1
instruction that executes in 1 clk cycle
• Open instruction set extension through coprocessor instruction
set
• A unique architecture –THUMB suited to high-volume
applications with memory restrictions.
13 of 42
ARM7TDMI
ARM7TDMI
• is the current, low-end ARM Core.
• It is widely used across a range of application, notably in digital mobile
telephones.
The origin of the name ARM7TDMI:
• ARM7- a 3 volt compatible rework of ARM6 32-bit integer core
• The THUMB 16-bit compressed instruction set.
• On-chip Debug support, enabling the processor to halt in response to a
debug request.
• An enhanced Multiplier, with higher performance than its predecessors
and yielding a full 64-bit result.
– 4 extra instructions are provided which performs 32 * 32 -> 64
multiplications and 32 * 32 + 64 -> 64 multiply and accumulate
• Embedded ICE hardware to give on-chip breakpoint and watch point
support.
14 of 42
DATA TYPES
Byte (8-bit):
placed on any byte boundary.
Half-word (16-bit):
aligned to two-byte boundaries.
Word (32-bit):
aligned to four- byte boundaries.
15 of 42
Processor Modes
* The ARM has six operating modes:
• User (unprivileged mode under which most tasks run)
• Fast interrupt request Mode-FIQ (entered when a high priority (fast)
interrupt is raised)
• Interrupt Mode-IRQ (entered when a low priority (normal) interrupt is
raised)
• Supervisor Mode-SVC (entered on reset and when a Software Interrupt
instruction is executed)
• Abort Mode- ABT (used to handle memory access violations)
• Undefined Mode-UND (used to handle undefined instructions)
* ARM Architecture Version 4 adds a seventh mode:
• System Mode-SYS (privileged mode using the same registers as user
mode)
16 of 42
ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
0
31
CPSR
NZCV
17 of 42
Endianness
Relationship between bit and byte/word ordering defines
endianness:
bit 31
bit 0
byte 3 byte 2 byte 1 byte 0
little-endian
bit 31
bit 0
byte 0 byte 1 byte 2 byte 3
big-endian
18 of 42
The Instruction Pipeline
The ARM uses a pipeline in order to increase the speed of the flow of
instructions to the processor.
• Allows several operations to be undertaken simultaneously, rather
than serially.
ARM
PC
FETCH
Instruction fetched from memory
PC - 4
DECODE
Decoding of registers used in instruction
PC - 8
EXECUTE
Register(s) read from Register Bank
Shift and ALU operation
Write register(s) back to Register Bank
Rather than pointing to the instruction being executed, the
PC points to the instruction being fetched.
19 of 42
CPU Pipeline
•
•
•
•
3 stages of pipeline.
One instruction issued each cycle.
Each instruction takes at least 3 clock cycles
During 1st data path cycle,each instruction issues fetch for next
but one instruction
• Branch flushes & refills the instruction pipeline
20 of 42
3-stage pipeline organisation
• Register bank –
- 2 read ports & 1 write port to access any register
- 1 read port & 1 write port to access pc
• Barrel shifter- to shift or rotate one operand
• ALU – to perform arithmetic & logic functions
• Address register & incrementer
- to select & hold memory addresses
- to generate sequential addresses
• Data register – to hold data passing to & from memory
• Instruction decoder & associated control logic
21 of 42
Latency and Throughput
• At any time, 3 different instructions may occupy each of the
the 3- stages of pipeline
• It may take three cycles to complete a single-cycle
instruction.
• This is said to have a three cycle latency
• Once a pipeline fills, the processor completes a single-cycle
instruction every clock cycle. Therefore the throughput is
one instruction per cycle.
22 of 42
CPU Pipeline Stages
* Fetch
• Instruction is fetched from memory and placed in instruction
pipeline
• In data transfer instruction address is sent to address register
* Decode
• Instruction is decoded
• Datapath control signals prepared for the next cycle
• Instruction owns decode logic but not datapath
• In data transfer instructions ,ALU holds address component
to compute auto- indexing modification if required
* Execute
• Instruction owns datapath
• Register bank is read
• An operand shifted
• ALU result generated
• Result written back into destination register
23 of 42
Clocking Scheme
• 2-phase non- overlapping clocks internally generated from
single input clock signal
• Uses level sensitive transparent latches
• Data movement controlled by passing data alternately through
latches
24 of 42
Exception Conditions
• Generated as direct effect of executing instruction-software
interrupts
-undefined instructions
-prefetch aborts
• Generated as a side effect of instruction-Data aborts
• Generated externally
-Reset
-IRQ
-FIQ
25 of 42
Coprocessor
• Support for up to16 logical coprocessors for its Internal Function
• Each coprocessor can have up to 16 private registers of any size
• Use load-store architecture
26 of 42
ARM7TDMI core
27 of 42
The Registers
• ARM has 37 registers in total, all of which are 32-bits long.
• 30 general purpose registers
• 5 dedicated saved program status registers
• 1 dedicated program counter
• 1 dedicated current program status register
• However these are arranged into several banks, with the
accessible bank being governed by the processor mode.
Each mode can access
• a particular set of r0-r12 registers
• a particular r13 (the stack pointer) and r14 (link register)
• r15 (the program counter)
• cpsr (the current program status register)
and privileged modes can also access
• a particular spsr (saved program status register)
28 of 42
30 general-purpose, 32-bit registers
• Fifteen general-purpose registers are visible at any one time,
depending on the current processor mode, as r0, r1, ... ,r13, r14.
• By convention, r13 is used as a stack pointer (sp) in ARM
assembly language. The C and C++ compilers always use r13 as
the stack pointer.
• In User mode, r14 is used as a link register (lr) to store the
return address when a subroutine call is made. It can also be
used as a general-purpose register if the return address is stored
on the stack.
• In the exception handling modes, r14 holds the return address
for the exception, or a subroutine return address if subroutine
calls are executed within an exception. r14 can be used as a
general-purpose register if the return address is stored on the
stack.
29 of 42
Saved Program Status Registers (SPSRs)
• The SPSRs are used to store the CPSR when an
exception is taken.One SPSR is accessible in each of
the exception-handling modes.
• User mode and System mode do not have an SPSR
because they are not exception handling modes.
30 of 42
The program counter(pc)
• The program counter is accessed as r15 (or pc). It is
incremented by one word (four bytes) for each instruction in
ARM state, or by two bytes in Thumb state.
• Branch instructions load the destination address into the
program counter. You can also load the program counter
directly using data operation instructions. For example, to
return from a subroutine, you can copy the link register into
the program counter using:
– MOV pc,lr
• During execution, r15 does not contain the address of the
currently executing instruction. The address of the currently
executing instruction is typically pc– 8 for ARM, or pc– 4 for
Thumb.
31 of 42
The Current Program Status Register(CPSR)
* The CPSR holds:
– copies of the Arithmetic Logic Unit (ALU) status flags
– the current processor mode
– interrupt disable flags.
* The ALU status flags in the CPSR are used to determine
whether conditional instructions are executed or not.
* On Thumb-capable processors, the CPSR also holds the
current processor state (ARM or Thumb).
32 of 42
ARM Register Organisation
ARM General registers and Program Counter
User32 / System
FIQ32
Supervisor32
Abort32
IRQ32
Undefined32
r0
r0
r0
r0
r0
r0
r1
r1
r1
r1
r1
r1
r2
r2
r2
r2
r2
r2
r3
r3
r3
r3
r3
r3
r4
r4
r4
r4
r4
r4
r5
r5
r5
r5
r5
r5
r6
r6
r6
r6
r6
r6
r7
r7
r7
r7
r7
r7
r8
r8_fiq
r8
r8
r8
r8
r9
r9_fiq
r9
r9
r9
r9
r10
r10_fiq
r10
r10
r10
r10
r11
r11_fiq
r11
r11
r11
r11
r12
r12_fiq
r13 (sp)
r13_fiq
r13_svc
r13_abt
r13_irq
r13_undef
r14 (lr)
r14_fiq
r14_svc
r14_abt
r14_irq
r14_undef
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r15 (pc)
r12
r12
r12
r12
ARM Program Status Registers
cpsr
cpsr
sprsr_fiq
spsr_fiq
cpsr
spsr_svc
cpsr
spsr_abt
cpsr
sprsr_fiq
spsr_irq
cpsr
spsr_undef
sprsr_fiq
* Shaded indicates Banked Registers
33 of 42
Accessing Registers using
ARM Instructions
• No breakdown of currently accessible registers.
• All instructions can access r0-r14 directly.
• Most instructions also allow use of the PC.
• Specific instructions to allow access to CPSR and SPSR.
34 of 42
The Program Status Registers
(CPSR and SPSRs)
31
28
N Z CV
Copies of the ALU status flags (latched if the
instruction has the "S" bit set).
Condition Code Flags
N = Negative result from ALU flag.
Z = Zero result from ALU flag.
C = ALU operation Carried out
V = ALU operation oVerflowed
Mode Bits
M[4:0] define the processor mode.
4
8
I F T
0
Mode
Condition bits
Interrupt Disable bits.
I = 1, disables the IRQ.
F = 1, disables the FIQ.
T Bit
(Architecture v4T only)
T = 0, Processor in ARM state
T = 1, Processor in Thumb state
35 of 42
Condition Flags
Flag
Logical Instruction
Arithmetic Instruction
Negative
(N=‘1’)
No meaning
Zero
(Z=‘1’)
Result is all zeroes
Bit 31 of the result has been set.
Indicates a negative number in
signed operations
Result of operation was zero
Carry
(C=‘1’)
After Shift operation ‘1’
was left in carry flag
Result was greater than 32 bits
oVerflow
(V=‘1’)
No meaning
Result was greater than 31 bits
Indicates a possible corruption of
the sign bit in signed numbers
36 of 42
The Program Counter (R15)
• When the processor is executing in ARM state:
• All instructions are 32 bits in length
• All instructions must be word aligned
• Therefore the PC value is stored in bits [31:2] with bits
[1:0] equal to zero (as instruction cannot be halfword or
byte aligned).
• R14 is used as the subroutine link register (LR) and stores the
return address when Branch with Link operations are
performed, calculated from the PC.
• Thus to return from a linked branch
• MOV r15,r14
or
• MOV pc,lr
37 of 42
Internal Organization of ARM
• Two main blocks: datapath and decoder
• Register bank (r0 to r15)
• Two read ports to A-bus/B-bus
• One write port from ALU-bus
• Additional read/write ports for
program counter r15
• Barrel shifter - shift/rotate 2nd operand by
any number of bits
• ALU performs arithmetic/logic functions
• Address registers/incrementer holds either
PC address (with increment) or operand
address
38 of 42
Datapath activity during data processing instruction
• SUB r0, r1, #128; r0 := r1 - 128
• Subtract instruction – one operand is a
constant
• Constant 128 encoded in instruction passes
through barrel shifter to produce 128*0
• ALU operates on the operands and writes
the result back to register r0
• PC value in address register is incremented
and coped back to r15 and the address
register
39 of 42
Internal Organization
• Data register holds read/write data from/to memory
• Instruction decoder decodes machine code instructions to
produce control signals to datapath
• In single-cycle data processing instructions, data values are
read on the A-bus & B-bus, the results from ALU is written
back into register bank
• PC value in address register is incremented and copied back to
r15 and the address register – this allows fetching new
instructions ahead of time (instruction pre-fetch)
40 of 42
Summary
ARM7TDMI
 RISC Processor
• Load Store Architecture
• Fixed length and fixed time pipelined organization
 Seven operating modes
 Register Organization
 16 GPRs under User mode
• 5 Shadow registers under FIQ mode
• 5 SP registers for exception mode stack handling
• 5 LR registers for exception handling
• 5 SPSRs to handle status flag contents
• 1 CPSR to indicate status of ALU registers
41 of 42
Thank You, Any Questions ?
42 of 42