Overview
- ARM: Advanced RISC Machine (A)
- Company: ARM Ltd
- Founded in: November 1990
- Company HQ: Cambridge, UK
- Design Center: Cambridge
- Sales & Support: All over the world
- Best known for: Various range of processor cores designs
- Used in:
- high end applications involving complex computation
- Hand held devices
- Robotics
- Automation system
- Consumer electronics
- Website: www.arm.com
Features
- High performance, low power, small in size (ideal for embedded sys)
- Large Register File, Small instruction set, Load-Store instructions
- Fixed length instructions, Conditional execution of instructions
- High code density, most instructions executable in single cycle
- 32-bit in-line barrel shifter, built-in circuit for hardware debugging
- DSP enhanced instructions, Jazelle (Java byte code extn. 3rd state)
- TrustZone (SoC approach to security)
Applications
|
|
ARM Processor Cores
ARM is so popular because it supports wide range of capability and functionality (performance). The evolution of the same is shown in above figure.
ARM7-TDMI-S
- Was widely adopted by cell phone industries in mid 1990’s onwords.
- Foundation for ARM’s early success.
- Still widely available (but ARM no longer licences AMR7-TDMI).
Cortex Processors
Cortex M Family
It is intended to use in microcontroller where cost is at premium.
Cortex R Family
- Provides Very High performance and Throughput
- Precise Timing property.
- Predictable Interrupt Latency.
- Ideal Embedded core for deeply embedded timing critical applications.
- eg. Engine Management system.
Cortex A Family
- It provides scalable High Performance in applications that require platform Operating System. Ie. Linux.
- It co-operates Memory Management system.
- Extended instruction set.
- Supports Multimedia Processing.
- All processors are available in multi-core designs
- It balances performance & power consumption in real time.
Development of the ARM Architecture
V4T
|
V5TE
|
V6
|
V7
|
---|---|---|---|
Half word & signed Half word byte support
|
Improved ARM/Thumb
|
Single Instruction Multiple Data
|
Thumb-2
|
System mode
|
Interworking
|
Multi-processing
|
NEON
|
Thumb instruction set
|
Arithmetic saturation
|
V6 memory architecture
|
Trust Zone
|
DSP MLA instructions
|
Unaligned data support
|
Virtualization
| |
ARM7TDMI-S
|
ARM926EJ-S
|
ARM1136J(F)-S
|
Architecture profiles:
V7-A (Application)
V7-R(Read time)
V7-M(Microcontroller)
|
ARM Nomenclature
A R M {x}{ y}{ z} T D M I E J F S (Example: ARM7-TDMI-S) | |
---|---|
ARM | Advanced RISC Machine |
x | Series |
y | MMU ( No. of Memory Management units present) |
z | Cache Memory ( in terms of KB) |
T | Thumb instructions Support |
D | Debugger ( Debugging via JTAG interface) |
M | Multiplier |
I | In-Circuit Emulator (ICE) macrocell |
E | Enhanced Instructions for DSP related applications |
J | Jazelle instructions support for JAVA Codes execution |
F | Floating-point unit |
S | Synthesizable version |
ARM926EJ-S
ARM1136J(F)-S
ARM7 Processor Family
Introduction
- It is introduced in 1994 (ARM7TDMI, ARM7EJ-S, ARM720T)
- Arm7 family has been immensely successful & has established ARM as the architecture of choice in digital word.
- Over the years more than 10 billion ARM7 processor family based devices have powered a verity of cost & power sensitive applications.
- Now a days never embedded designs are making use of latest ARM processor such as Cortex-M0 & Cortex-M3.
Features of ARM7
- Pipeline Depth: 3 stage (Fetch, Decode, Execute)
- Operating frequency: 80 MHz
- Power Consumption: 0.06 mW/MHz.
- MIPS/MHz: 0.97
- Architecture used: Von-Neumann
- MMU/MPU: Not present
- Cache Memory: Not present
- Jazelle Instruction: Not present
- Thumb Instruction: Yes (16 bit instruction set)
- ARM Instruction set: Yes (32 bit)
- ISA (Instruction Set Architecture): V4T (4 TH Version)
- Interrupt Controller: Not Present
- ISR entry: Non Deterministic ISR entry
- Power Management: No in built Power Management
- Instruction Set Performance v/s code size: Optimal performance code size balance requires interworking between ARM & Thumb code
- Ease of application porting from one device to another: Lack of standardization inhibits application porting
ARM9 Processor Family
Introduction
- This family enables single processor solution for microcontroller, DSP & JAVA applications, offering savings in chip area & complexity, power consumption & time to market
- ARM9 – enhanced processors are well suited for applications requiring a mix of DSP+ Microcontroller performance
- ARM9 family includes – ARM926EJ-S, ARM946E-S, & ARM968E-S processors.
Features of ARM9
- Pipeline Depth: 5 stage (Fetch, Decode, Execute, Decode, Write)
- Operating frequency: 150 MHz
- Power Consumption: 0.19 mW/MHz
- MIPS/MHz: 1.1
- Architecture used: Harvard
- MMU/MPU: Present
- Cache Memory: Present (separate 16k/8k)
- ARM/ Thumb Instruction: Support both
- ISA (Instruction Set Architecture): V5T(ARM926EJ-S)
- 31 (32-Bit size) Registers
- 32-bit ALU & Barrel Shifter
- Enhanced 32- bit MAC block
- Memory Controller
Memory operations are controlled by MMU or MPU- MMU:
- Provides Virtual Memory Support
- Fast Context Switching Extensions
- MPU:
- Enables memory protection & bounding
- Sand – boxing of applications
- MMU:
- Flexible Cache Design (sizes can be 4KB to 128KB)
- Flexible Core Design
- DSP Enhancements: (very important)
- Single cycle 32x16 multiplier Implementation
- Speed up all the multiply instructions
- New 32x16 & 16x16 multiply instructions
- Allows independent access to 16 bit halves of registers
- ARM ISA supports 32x32 multiply instruction
- Saturating Arithmetic (QADD, QSUB)
- Count leading zero for factor Division
Applications of ARM9
- Consumer type: Smart phones, PDA, Set-Top box, Electronics Toys, Digital Cameras, etc.
- Networking type: Wireless LAN, 802.11, Bluetooth, etc.
- Automatic: Power Train, ABS, Navigation, etc.
- Embedded USB controllers, Bluetooth controllers, Medical scanners, etc.
- Storage: HDD controllers, solid state drivers etc.
ARM11 Processors Family
Introduction
- This family provides the engine that power many smartphones, also widely used in consumer, home & embedded applications.
- It delivers low power & a range of performance from 350MHz to 1GHz.
- ARM11 processor software is compatible with all previous generations of ARM processors.
- It introduces 32-bit SIMD for media processing
- Physically tagged caches to improve OS context switch performance.
- Trust zone for H/W enforced security.
- Tightly coupled memories for real-time applications.
- ARM11 family includes
- ARM1176JZ (F)-S & ARM11MP core, ARM1136J(F)-S, ARM1156T2-S processor.
Features of ARM11
- Pipeline Depth: 8stage
- Operating frequency: 335MHz.
- Power Consumption: 0.4mW/MHz.
- MIPS/MHz: 1.2
- Architecture used: Harvard
- MMU/MPU: Present
- Multiplier unit: 16x32 (16 bits of 32-bit size register)
- Cache Memory: present (4-64k size)
- ISA (Instruction Set Architecture): V6
- Enhanced multiply instruction & saturation
- Powerful ARMV6 instruction set architecture
- Supports the thumb instruction set-memory BW & Size requirements reduces by up to 35%
- Supports Jazelle Technology for efficient embedded JAVA execution
- Supports the DSP extensions
- SIMD media processing extensions deliver up to 2x performance for video processing
- ARM Trust-Zone Technology for on chip security
- Thumb-2 Technology for enhanced performance energy efficiency & code density
- Low power consumption
- High performance integer processor
- Vectored interrupt interface & low-interrupt latency mode speeds up interrupt response & real time performance
- Optional vector floating point co-processor for automotive/ industrial controls & 3D graphics acceleration
ARM7 Programmer's Model or Register Model
Diagram
Explanation
- In total 17(Visible)+20(Banked Rrgisters)=37
- The active registers available in the user mode are shown below.
- This is protected mode which is normally used while executing applications.
- 16 Data registers & one status register
- r0 to r13 are orthogonal general purpose register.
- Orthogonal means, any instruction that you can apply to ro can equally be applied to any of the other register.
- Eg. ADD ro, r1, r2
- ADD r5, r6, r7
- R13 (stack pointer) and stores the top of the stack in the current processor mode.
- R14(LR) Link Register where the core puts the return address on executing a subroutine.
- R15(PC) Program counter stores the address of next instruction to be executed.
- In ARM state all ARM instruction are 32-bits wide.
- In Thumb state all instructions are 16-bit wide.
- In ARM state Instruction have to be four byte aligned in the memory. Which implies that the bottom two bits of the PC are always zero(Memory location 1000H,1004,1008H).
CPSR: Current Processor Status Register
About CPSR
- ARM core uses CPSR to moniter & control internal operations.
- The unused part reserved for future expansion.
- CPSR fields is divided in to four fields, each 8-bits wide: flags, status, extension, and control.
- In current designs status & extension fields are reserved for for future purpose.
- In some ARM processor cores have extra bits allocated J bit (available only on Jazelle enabled processing which execute 8-bit instructions).
CPSR Diagram
Flag bit | Sets when |
N- Negative
|
In case of signed no. operations If result
MSB=1 ;Indicates the result of operation is NEGATIVE
|
Z- Zero | The result of operation is zero |
C- Carry | The result causes an unsigned carry(carry out of MSB) |
V-Overflow | The result causes a signed overflow |
Q- Saturation | The result causes an overflow or saturation |
I- Interrupt request Disable | If set interrupt request channel is disabled |
F- Fast interrupt request Disable | If set fast interrupt request channel is disabled |
J- Jazelle instruction set | If set processor will execute Jazelle instructions |
T-Thumb instruction set | If set processor will execute Thumb Instruction set |
SPSR: Save Program Status Register
Suppose Processor is in USER mode of operation and if IRQ request arrives then processor has to switch itself to IRQ mode of operation but at the same after serving IRQ mode processor should return to USER mode and should resume its working.
So current processor status is copied into SPSR from CPSR in order to resume back.
So current processor status is copied into SPSR from CPSR in order to resume back.
ARM and RISC design Philosophy
RISC PROCESSORS
¨It is a design philosophy aimed at delivering simple but powerful instruction set that executes within a single cycle at high clock speed.
- RISC is an acronym for Reduced Instruction Set Computers
- CISC – Complex Instruction Set Computer
The RISC Design Philosophy
- CISC and RISC differ in complexities of their instruction sets where CISC is more complex than RISC.
- Concentrates on reducing the complexity of instructions performed by the hardware to provide greater flexibility and intelligence in software.
- The smaller instruction set allows a designer to implement a hardwired control unit which runs at a higher clock rate than its equivalent micro sequenced control unit.
The RISC Philosophy (Four major Rules)
Rule 1. Instructions
- Reduced number of instruction classes to provide simple operations that can each execute in a single cycle.
- Each instruction is a fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. (Unlike CISC)
Rule 2. Pipelines
- The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines.
- Ideally, the pipeline advances by one step on each cycle for maximum throughput. Instructions can be decoded in one pipeline stage.
Rule 3. Registers
- RISC machines have a large general-purpose register set.
- Any register can contain either data or an address.
(CISC: Have dedicated registers for specific purposes)
Rule 4. Load-Store Architecture
- The processor operates on data held in registers.
- Separate load and store instructions: transfer data between the register bank and external memory. Because memory accesses are costly.
RISC vs CISC
RISC | CISC |
Simple instruction taking one cycle.
|
1.Complex instruction may take one or more clock cycles.
|
Large symmetric register file
|
Few registers to store data.
|
Fewer instructions to access memory.
| More instructions to access memory |
1.Few addressing modes.
| More addressing modes |
Instruction Decoder is simple. Hardwired logic is used for the decoder.
|
The instruction decoder is complex. A decoder using ROM which consists microcode.
|
Supports pipelining. i.e. overlapping of fetch, decode, execute takes place.
|
Does not support pipelining.
|
Fixed instruction size.
|
Variable instruction size.
|
Core takes less chip area so more space for cache, MMU.
|
More chip area is taken by core CPU.
|
Complexity in software. Compiler design is difficult | Complexity in Hardware. Emphasis is on hardware |
Higher clock rates. So faster.
|
Lower clock rates. So, comparatively slower.
|
Cache memory is present.
| Cache memory is absent or unified cache is present |
ARM7 Fundamentals
- All ARM instructions are 32-bit long & stored word aligned.
- ARM processor like all RISC processors is a Load Store architecture, Von-Neuman Architecture (same program + data memory).
- ARM has two special instructions types for transferring data in & data out of processor.
- Load Instruction = Copy data from memory to registers in the core.
- (Registers in the processor core <----Memory)
- Store Instruction = Copy data from registers to memory
- (Registers in processor core ----> Memory)
- Load Instruction = Copy data from memory to registers in the core.
- There are no data processing instructions that are directly manipulate data in memory (Hence Data processing is carried out only in registers).
- ARM core is a 32-bit bit processor most instructions treat the registers ad holding signed or unsigned 32-bit value.
- Data Types
- Word – 32-bit, Halfword – 16-bit, Byte – 8-bit
- Memory is byte addressable, can hold 232 bytes (= 4 GB)
- Word/ halfword /byte size data are placed at word/ halfword/ byte aligned addresses.
- 32-bit ARM instructions are placed at word aligned addresse.
- Byte order – Endian format
- Word/halfword size data can be saved/retrieved in big endian or little endian format.
- Big endian: MSB of word/halfword data are stored in lowest address and the data is addressed by address of MSB
- Little endian: LSB of word/halfword data are stored in lowest address and the data is addressed by address of LSB
Advanced Microcontroller Bus Architecture (AMBA)
- Bus system connects memory, controllers and peripherals in ARM processor based microcontroller to ARM core
- AMBA bus protocol std., adopted as on-chip bus by many mC
- ARM core is bus master, peripherals are slaves
Three Buses within AMBA spec
- AHP (Advanced High-performance Bus)
- Provides high band-width.
- Supports multiple masters, slaves (e.g. of masters: DMA, Test interface, DSP, and e.g. of slaves: external memory).
- Includes bus arbiter, decoder
- Used in complex and more sophisticated systems
- ASB (Advanced System Bus)
- AHB and ASB have many things in common
- Both support bursting, pipelining, split transaction
- ASB is used in simple cost effective designs
- APB (Advanced Peripheral Bus)
- Simple, low speed, low power bus, for UART, .... peripherals
- Implemented with simple tri-stated data bus
- AHB-APB bridge: buffers data & operations between the two
ARM Core Data Flow Model
Definition
When an instruction is decoded inside the ARM core and how a particular instruction is executed by interacting with the internal registers file and then send result out of the registers.
Features
- Von Neuman Architecture Hence data coming through bus is either instruction or data (same memory).
- The Sign extend hardware converts signed 8-bit & 16-bit numbers to 32-bit values as they are read from memory & placed in a register (for signed values), fill zeros if unsigned.
- Source operands (Rn & Rm) are read from the register file using the internal buses A & B respectively & result Rd is written back.
- The PC value is in the address register which is fed in to the incrementer, then the incremented value is copied back in to r15.
- It is also written in to address register to be used as the address for the next instruction fetch.
- ALU: (The Arithmetic & logic Unit) or MAC (multiply & accumulate Unit) takes the register values Rn & Rm from A & B buses & computers a result).
- Data processing instructions write the result in Rd directly to the register file.
- Load & Store instruction use the ALU to generate on Address to be to be held in the address register & broadcast on the address bus.
- Barrel shifter:
- One important feature of the is that register Rm alternatively can be pre processed in barrel in barrel shifter before it enters the ALU [left shift , right shift , rotated etc.].
- Depending on the instruction Barrel Shifter may be used or it could be short circuit.
- Barrel shifter & ALU can calculate together a wide range of expression & address in the same cycle.
ARM7 Processor Modes
Definition: It determines which register are active & The access to the CPSR register itself.
Privileged (Allows full read-write access to CPSR) |
Non- privileged
(Only allows read access to the control field in CPSR but allows read-write access to conditional flags)
|
|
|
Mode
|
When does ARM enters in pericular mode?
|
Abort
|
Failed attempt to access memory.
|
Fast interrupt request
|
Interrupt request arrives through FIQ channel (input).
|
Interrupt request
|
Interrupt request arrives through IRQ channel (input).
|
Supervisor
|
After reset. It is generally the mode that an OS Kernel operates in.
|
System
|
Special version of user mode that allows full read-write access to the CPSR.
|
Undefined
|
When processor encounters an instruction. That is undefined or not supported by the implementation.
|
User mode
|
Used for programs & applications
|
ARM7 Programmer's Model or Register Model
Diagram
Explained
- In total 17(Visible)+20(Banked Rrgisters)=37
- The active registers available in the user mode are shown below.
- This is protected mode which is normally used while executing applications.
- 16 Data registers & one status register
- r0 to r13 are orthogonal general purpose register.
- Orthogonal means, any instruction that you can apply to ro can equally be applied to any of the other register.
- Eg. ADD ro, r1, r2
- ADD r5, r6, r7
- R13 (stack pointer) and stores the top of the stack in the current processor mode.
- R14(LR) Link Register where the core puts the return address on executing a subroutine.
- R15(PC) Program counter stores the address of next instruction to be executed.
- In ARM state all ARM instruction are 32-bits wide.
- In Thumb state all instructions are 16-bit wide.
- In ARM state Instruction have to be four byte aligned in the memory. Which implies that the bottom two bits of the PC are always zero(Memory location 1000H,1004,1008H).
CPSR: Current Processor Status Register
About CPSR
- ARM core uses CPSR to moniter & control internal operations.
- The unused part reserved for future expansion.
- CPSR fields is divided in to four fields, each 8-bits wide: flags, status, extension, and control.
- In current designs status & extension fields are reserved for for future purpose.
- In some ARM processor cores have extra bits allocated J bit (available only on Jazelle enabled processing which execute 8-bit instructions).
CPSR Diagram
Flag bit | Sets when |
N- Negative
|
In case of signed no. operations If result
MSB=1 ;Indicates the result of operation is NEGATIVE
|
Z- Zero | The result of operation is zero |
C- Carry | The result causes an unsigned carry(carry out of MSB) |
V-Overflow | The result causes a signed overflow |
Q- Saturation | The result causes an overflow or saturation |
I- Interrupt request Disable | If set interrupt request channel is disabled |
F- Fast interrupt request Disable | If set fast interrupt request channel is disabled |
J- Jazelle instruction set | If set processor will execute Jazelle instructions |
T-Thumb instruction set | If set processor will execute Thumb Instruction set |
SPSR: Save Program Status Register
Suppose Processor is in USER mode of operation and if IRQ request arrives then processor has to switch itself to IRQ mode of operation but at the same after serving IRQ mode processor should return to USER mode and should resume its working.
So current processor status is copied into SPSR from CPSR in order to resume back.
So current processor status is copied into SPSR from CPSR in order to resume back.
Tiva TM4C123G
Description
- Texas Instrument's Tiva™ C Series microcontrollers offer an 80 MHz Cortex-M with FPU, a variety of integrated memories and multiple programmable GPIO.
- All members of the Tiva™ C Series, including the TM4C123G microcontroller, are designed around an ARM Cortex-M processor core.
- The ARM Cortex-M processor provides the core for a high-performance, low-cost platform that meets the needs of minimal memory implementation, reduced pin count, and low power consumption while delivering outstanding computational performance and exceptional response to interrupts.
Block Diagram
Fig.Block diagram of TI’s Tiva C Series TM4C123x microcontrollers
Features
- Core- ARM Cortex-M4 processor core
- Performance- 80-MHz operation; 100 DMIPS
- Flash- 256 KB single-cycle Flash memory
- System SRAM 32 KB single-cycle SRAM
- EEPROM 2KB
- Communication Interfaces-
- Universal Asynchronous Receivers/Transmitter- Eight UARTs
- Synchronous Serial Interface (SSI) Four SSI modules
- Inter-Integrated Circuit (I2C)-
- Four I2C modules with four transmission speeds including high-speed MODE
- Controller Area Network (CAN)-
- Two CAN 2.0 A/B controllers
- Universal Serial Bus (USB) USB 2.0 OTG/Host/Device
- General-Purpose Timer (GPTM)
- Six 16/32-bit GPTM blocks
- Watchdog Timer (WDT)
- Two watchdog timers
- General-Purpose Input / Output (GPIO)
- Six physical GPIO blocks
- Pulse Width Modulator (PWM)-
- Two PWM modules, each with four PWM generator blocks and a control block, for a total of 16 PWM outputs.
- Analog-to-Digital Converter (ADC)-
- Two 12-bit ADC modules, each with a maximum sample rate of one million samples/second
- Analog Comparator Controller
- Two independent integrated analog comparators
- JTAG and Serial Wire Debug (SWD)
- One JTAG module with integrated ARM SWD
- Thumb-2 mixed 16-/32-bit instruction set delivers the high performance expected of a 32-bit ARM core in a compact memory size
- Single-cycle multiply instruction and hardware divide
- IEEE754-compliant single-precision Floating-Point Unit (FPU)
- Fast code execution permits slower processor clock or increases sleep mode time
- Harvard architecture characterized by separate buses for instruction and data
- Deterministic, high-performance interrupt handling for time-critical applications
- Memory protection unit (MPU) to provide a privileged mode for protected operating system functionality
- Enhanced system debug with extensive breakpoint and trace capabilities
- Serial Wire Debug and Serial Wire Trace reduce the number of pins required for debugging and tracing.
- Migration from the ARM7™ processor family for better performance and power efficiency
Applications
Tiva™ C Series microcontrollers are the leading choice in high-performance 32-bit applications.
The product family is positioned for cost-conscious applications requiring significant control processing and connectivity capabilities such as:
The product family is positioned for cost-conscious applications requiring significant control processing and connectivity capabilities such as:
- Low power, hand-held smart devices
- Gaming equipment
- Home and commercial site monitoring and control
- Motion control
- Medical instrumentation
- Test and measurement equipment
- Smart Energy/Smart Grid solutions
- Factory automation
- Fire and security
- Intelligent lighting control
No comments:
Post a Comment