Monday, February 25, 2019

ARM Processors

Overview

  • ARM: Advanced RISC Machine (A)
  • Company: ARM Ltd
  • Founded in: November 1990
  • Company HQ: Cambridge, UK
  • Design Center: Cambridge 
  • Sales & Support: All over the world
  • Best known for: Various range of processor cores designs
  • Used in:
    • high end applications involving complex computation
    • Hand held devices 
    • Robotics
    • Automation system
    • Consumer electronics
  • Website: www.arm.com

Features

  • High performance, low power, small in size (ideal for embedded sys)     
  • Large Register File, Small instruction set, Load-Store instructions
  • Fixed length instructions, Conditional execution of instructions
  • High code density, most instructions executable in single cycle
  • 32-bit in-line barrel shifter, built-in circuit for hardware debugging
  • DSP enhanced instructions, Jazelle (Java byte code extn. 3rd state)
  • TrustZone (SoC approach to security)

Applications

  • Hard Disk Drives
  • Printers 
  • Tele parking
  • Utility Motors 
  • Digital Sensors
  • Smart Meters
  • Digital exercise Machine
  • Energy Efficient Applications
  • Gaming Devices
  • Washing Machine
  • Digital Television
  • Many more...

ARM Processor Cores

arm-cpu-6-638.jpg
ARM is so popular because it supports wide range of capability and functionality (performance). The evolution of the same is shown in above figure.

ARM7-TDMI-S

  • Was widely adopted by cell phone industries in mid 1990’s onwords.
  • Foundation for ARM’s early success.
  • Still widely available (but ARM no longer licences AMR7-TDMI).

Cortex Processors

Cortex M  Family

It is intended to use in microcontroller where cost is at premium.

Cortex R Family

  • Provides Very High performance and Throughput 
  • Precise Timing property.
  • Predictable Interrupt Latency.
  • Ideal Embedded core for deeply embedded timing critical applications.
  • eg. Engine Management system.

Cortex A Family

  • It provides scalable High Performance in applications that require platform Operating System. Ie. Linux.
  • It co-operates Memory Management system.
  • Extended instruction set.
  • Supports Multimedia Processing.
  • All processors are available in multi-core designs
  • It balances performance & power consumption in real time.

Development of the ARM Architecture

ARM versions.png
V4T
V5TE
V6
V7
Half word & signed Half word byte support
Improved ARM/Thumb
Single Instruction Multiple Data
Thumb-2
System mode
Interworking
Multi-processing
NEON
Thumb instruction set
Arithmetic saturation
V6 memory architecture
Trust Zone

DSP MLA instructions
Unaligned data support
Virtualization
ARM7TDMI-S
ARM926EJ-S
ARM1136J(F)-S
Architecture profiles:
V7-A (Application)
V7-R(Read time)
V7-M(Microcontroller)

ARM Nomenclature

A R M {x}{ y}{ z} T D M I E J F S (Example: ARM7-TDMI-S)
ARMAdvanced RISC Machine
xSeries
yMMU ( No. of Memory Management units present)
zCache Memory ( in terms of KB)
TThumb instructions Support
DDebugger ( Debugging via JTAG interface)
MMultiplier
IIn-Circuit Emulator (ICE) macrocell
EEnhanced Instructions for DSP related applications
JJazelle instructions support for JAVA Codes execution
FFloating-point unit
SSynthesizable version
Eg. ARM7TDMI
      ARM926EJ-S
      ARM1136J(F)-S 

ARM7 Processor Family

Introduction

  • It is introduced in 1994 (ARM7TDMI, ARM7EJ-S, ARM720T)
  • Arm7 family has been immensely successful & has established ARM as the architecture of choice in digital word.
  • Over the years more than 10 billion ARM7 processor family based devices have powered a verity of cost & power sensitive applications.
  • Now a days never embedded designs are making use of latest ARM processor such as Cortex-M0 & Cortex-M3.
Note: The ARM7 processor family ( ARM7 TDMI) is not recommended for new designs.

Features of ARM7

  1. Pipeline Depth: 3 stage (Fetch, Decode, Execute)
  2. Operating frequency: 80 MHz
  3. Power Consumption: 0.06  mW/MHz.
  4. MIPS/MHz: 0.97
  5. Architecture used: Von-Neumann
  6. MMU/MPU: Not present
  7. Cache Memory: Not present
  8. Jazelle Instruction: Not present
  9. Thumb Instruction: Yes (16 bit instruction set)
  10. ARM Instruction set: Yes (32 bit)
  11. ISA (Instruction Set Architecture): V4T (4 TH Version)
  12. Interrupt Controller: Not Present 
  13. ISR entry: Non Deterministic ISR entry
  14. Power Management: No in built Power Management 
  15. Instruction Set Performance v/s code size: Optimal performance code size balance requires interworking between ARM & Thumb code
  16. Ease of application porting from one device to another: Lack of standardization inhibits application porting

ARM9 Processor Family

Introduction

  • This family enables single processor solution for microcontroller, DSP & JAVA applications, offering savings in chip area & complexity, power consumption & time to market
  • ARM9 – enhanced processors are well suited for applications requiring a mix of DSP+ Microcontroller performance 
  • ARM9 family includes – ARM926EJ-S, ARM946E-S, & ARM968E-S processors.

Features of ARM9

  1. Pipeline Depth: 5 stage (Fetch, Decode, Execute, Decode, Write)
  2. Operating frequency: 150 MHz
  3. Power Consumption: 0.19 mW/MHz
  4. MIPS/MHz: 1.1
  5. Architecture used: Harvard
  6. MMU/MPU: Present
  7. Cache Memory: Present (separate 16k/8k)
  8. ARM/ Thumb Instruction: Support both
  9. ISA (Instruction Set Architecture): V5T(ARM926EJ-S)
  10. 31 (32-Bit size) Registers
  11. 32-bit ALU & Barrel Shifter
  12. Enhanced 32- bit MAC block
  13. Memory Controller
    Memory operations are controlled by MMU or MPU
    1. MMU:
      • Provides Virtual Memory Support
      • Fast Context Switching Extensions
    2. MPU:
      • Enables memory protection & bounding 
      • Sand – boxing of applications
  14. Flexible Cache Design (sizes can be 4KB to 128KB)
  15. Flexible Core Design
  16. DSP Enhancements: (very important)
  17. Single cycle 32x16 multiplier Implementation 
  18. Speed up all the multiply instructions
  19. New 32x16 & 16x16 multiply instructions
  20. Allows independent access to 16 bit halves of registers
  21. ARM ISA supports 32x32 multiply instruction
  22. Saturating Arithmetic (QADD, QSUB)
  23. Count leading zero for factor Division

Applications of ARM9

  1. Consumer type: Smart phones, PDA, Set-Top box, Electronics Toys, Digital Cameras, etc.
  2. Networking type: Wireless LAN, 802.11, Bluetooth, etc.
  3. Automatic: Power Train, ABS, Navigation, etc.
  4. Embedded USB controllers, Bluetooth controllers, Medical scanners, etc.
  5. Storage: HDD controllers, solid state drivers etc.

ARM11 Processors Family

Introduction

  • This family provides the engine that power many smartphones, also widely used in consumer, home & embedded applications.
  • It delivers low power & a range of performance from 350MHz to 1GHz.
  • ARM11 processor software is compatible with all previous generations of ARM processors.
  • It introduces 32-bit SIMD for  media processing
    • Physically tagged caches to improve OS context switch performance.
    • Trust zone for H/W enforced security.
    • Tightly coupled memories for real-time applications.
  • ARM11 family includes 
  • ARM1176JZ (F)-S & ARM11MP core, ARM1136J(F)-S, ARM1156T2-S processor.

Features of ARM11

  1. Pipeline Depth: 8stage 
  2. Operating frequency: 335MHz.
  3. Power Consumption: 0.4mW/MHz.
  4. MIPS/MHz: 1.2
  5. Architecture used: Harvard
  6. MMU/MPU: Present
  7. Multiplier unit: 16x32 (16 bits of 32-bit size register)
  8. Cache Memory: present (4-64k size)
  9. ISA (Instruction Set Architecture): V6
  10. Enhanced multiply instruction & saturation
  11. Powerful ARMV6 instruction set architecture
  12. Supports the thumb instruction set-memory BW & Size requirements reduces by up to 35%
  13. Supports Jazelle Technology for efficient embedded JAVA execution
  14. Supports the DSP extensions
  15. SIMD media processing extensions deliver up to 2x performance for video processing
  16. ARM Trust-Zone Technology for on chip security
  17. Thumb-2 Technology for enhanced performance energy efficiency & code density
  18. Low power consumption
  19. High performance integer processor
  20. Vectored interrupt interface & low-interrupt latency mode speeds up interrupt response & real time performance
  21. Optional vector floating point co-processor for automotive/ industrial controls & 3D graphics acceleration

ARM7 Programmer's Model or Register Model

Diagram

Programmers model.jpg

Explanation

  • In total 17(Visible)+20(Banked Rrgisters)=37 
  • The active registers available in the user mode are shown below.
  • This is protected mode which is normally used while executing applications.
  • 16 Data registers & one status register 
  • r0 to  r13 are orthogonal general purpose register.
  • Orthogonal means, any instruction that you can apply to ro can equally be applied to any of the other register.
    • Eg. ADD ro, r1, r2
    • ADD r5, r6, r7
  • R13 (stack pointer) and stores the top of the stack in the current processor mode.
  • R14(LR) Link Register where the core puts the return address on executing a subroutine.
  • R15(PC) Program counter stores the address of next instruction to be executed.
  • In ARM state all ARM instruction  are 32-bits wide.
  • In Thumb state all instructions are 16-bit wide.
  • In ARM state Instruction have to be four byte aligned in the memory. Which implies that the bottom two bits of the PC are always zero(Memory location 1000H,1004,1008H).

CPSR: Current Processor Status Register

About CPSR

  • ARM core uses CPSR to moniter & control internal operations.
  • The unused part reserved for future expansion.
  • CPSR fields is divided in to four fields, each 8-bits wide: flags, status, extension, and control.
  • In current designs status & extension fields are reserved for for future purpose.
  • In some ARM processor cores have extra bits allocated J bit (available only on Jazelle enabled processing which execute 8-bit instructions).

CPSR Diagram

cpsr.jpg
Flag bitSets when
N- Negative         
In case of signed no. operations If result 
MSB=1  ;Indicates the result of operation is NEGATIVE
Z- Zero    The result of operation is zero
C- CarryThe result causes an unsigned carry(carry out of MSB)
V-OverflowThe result causes a signed overflow
Q- SaturationThe result causes an overflow or saturation
I- Interrupt request DisableIf set interrupt request channel is disabled
F- Fast interrupt request Disable If set fast interrupt request channel is disabled
J- Jazelle instruction setIf set processor will execute Jazelle instructions 
T-Thumb instruction set  If set processor will execute Thumb Instruction set

SPSR: Save Program Status Register

Suppose Processor is in USER mode of operation and if IRQ request arrives then processor has to switch itself to IRQ mode of operation but at the same after serving IRQ mode processor should return to USER mode and should resume its working.
So current processor status is copied into SPSR from CPSR in order to resume back.

ARM and RISC design Philosophy

RISC PROCESSORS

¨It is a design philosophy aimed at delivering simple but powerful instruction set that executes within a single cycle at high clock speed.
  • RISC is an acronym for Reduced Instruction Set Computers
  • CISC – Complex Instruction Set Computer

The RISC Design Philosophy

  • CISC and RISC differ in complexities of their instruction sets where CISC is more complex than RISC.
  • Concentrates on reducing the complexity of instructions performed by the hardware to provide greater flexibility and intelligence in software.
  • The smaller instruction set allows a designer to implement a hardwired control unit which runs at a higher clock rate than its equivalent micro sequenced control unit.
RISC-Design-Philosophy.png

The RISC Philosophy (Four major Rules)

Rule 1. Instructions

  • Reduced number of instruction classes to provide simple operations that can each execute in a single cycle.
  • Each instruction is a fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. (Unlike CISC)

Rule 2. Pipelines

  • The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines.
  • Ideally, the pipeline advances by one step on each cycle for maximum throughput. Instructions can be decoded in one pipeline stage.

Rule 3. Registers

  • RISC machines have a large general-purpose register set.
  • Any register can contain either data or an address.
(CISC: Have dedicated registers for specific purposes)

Rule 4. Load-Store Architecture

  • The processor operates on data held in registers.
  • Separate load and store instructions: transfer data between the register bank and external memory. Because memory accesses are costly.

RISC vs CISC

RISCCISC
Simple instruction taking one cycle.
1.Complex instruction may take one or more clock cycles.
Large symmetric register file
Few registers to store data.
Fewer instructions to access memory.
More instructions to access memory
1.Few addressing modes.
More addressing modes
Instruction Decoder is simple. Hardwired logic is used for the decoder.
The instruction decoder is complex. A decoder using ROM which consists microcode.
Supports pipelining. i.e. overlapping of fetch, decode, execute takes place.
Does not support pipelining.
Fixed instruction size.
Variable instruction size.
Core takes less chip area so more space for cache, MMU.
More chip area is taken by core CPU.
Complexity in software. Compiler design is difficultComplexity in Hardware.  Emphasis is on hardware
Higher clock rates. So faster.
Lower clock rates. So, comparatively slower.
Cache memory is present.
Cache memory is absent or unified cache is present

ARM7 Fundamentals

  1. All ARM instructions are 32-bit long & stored word aligned.
  2. ARM processor like all RISC processors is a Load Store architecture, Von-Neuman Architecture (same program + data memory).
  3. ARM has two special instructions types for transferring data in & data out of processor.
    • Load Instruction = Copy data from memory to registers in the core.
      • (Registers in the processor core <----Memory)
    • Store Instruction = Copy data from registers to memory
      • (Registers in processor core ----> Memory)
  4. There are no data processing instructions that are directly manipulate data in memory (Hence Data processing is carried out only in registers).
  5. ARM core is a 32-bit bit processor most instructions treat the registers ad holding signed or unsigned 32-bit value.
  6. Data Types
  7. Word – 32-bit, Halfword – 16-bit, Byte – 8-bit
    • Memory is byte addressable, can hold 232 bytes (= 4 GB)  
    • Word/ halfword /byte size data are placed at word/ halfword/ byte aligned addresses.
    • 32-bit ARM instructions are placed at word aligned addresse.
  8. Byte order – Endian format
    • Word/halfword size data can be saved/retrieved in big endian or little endian format.
    • Big endian: MSB of word/halfword data are stored in lowest address and the data is addressed by address of MSB
    • Little endian: LSB of word/halfword data are stored in lowest address and the data is addressed by address of LSB

Advanced Microcontroller Bus Architecture (AMBA)

ARM_BUS.png
  • Bus system connects memory, controllers and peripherals in ARM processor based microcontroller to ARM core
  • AMBA bus protocol std., adopted as on-chip bus by many mC 
  • ARM core is bus master, peripherals are slaves
Three Buses within AMBA spec
  1. AHP (Advanced High-performance Bus)
    • Provides high band-width. 
    • Supports multiple masters, slaves (e.g. of masters: DMA, Test interface, DSP, and e.g. of slaves: external memory).
    • Includes bus arbiter, decoder
    • Used in complex and more sophisticated systems
  2. ASB (Advanced System Bus)
    • AHB and ASB have many things in common
    • Both support bursting, pipelining, split transaction
    • ASB is used in simple cost effective designs
  3. APB (Advanced Peripheral Bus)
    • Simple, low speed, low power bus, for UART, .... peripherals
    • Implemented with simple tri-stated data bus
    • AHB-APB bridge: buffers data & operations between the two

ARM Core Data Flow Model

Definition

When an instruction is decoded inside the ARM core and how a particular instruction is executed by interacting with the internal registers file and then send result out of the registers.
ARM Data flow model.jpg       

Features

  • Von Neuman Architecture Hence data coming through bus is either instruction or data (same memory).
  • The Sign extend hardware converts signed 8-bit & 16-bit numbers to 32-bit values as they are read from memory & placed in a register (for signed values), fill zeros if unsigned.
  • Source operands (Rn & Rm) are read from the register file using the internal buses A & B respectively & result Rd is written back.
  • The PC value is in the address register which is fed in to the incrementer, then the incremented value is copied back in to r15.
  • It is also written in to address register to be used as the address for the next instruction fetch.
  • ALU: (The Arithmetic & logic Unit) or MAC (multiply & accumulate Unit) takes the register values Rn & Rm from A & B buses & computers a result).
  • Data processing instructions write the result in Rd directly to the register file.
  • Load & Store instruction use the ALU to generate on Address  to be to be held in the address register & broadcast on the address bus.
  • Barrel shifter:
    • One important feature of the is that register Rm alternatively can be pre processed in barrel in barrel shifter before it enters the ALU [left shift , right shift , rotated etc.].
    • Depending on the instruction Barrel Shifter may be used or it could be short circuit.
    • Barrel shifter & ALU can calculate together a wide range of expression & address in the same cycle.

ARM7 Processor Modes

Definition: It determines which register are active  & The access to the CPSR register itself.
Privileged 
(Allows full read-write access to CPSR)
Non- privileged
(Only allows read access to the control field in CPSR but allows read-write  access to conditional flags)
  • Fast interrupt request            
  • Interrupt request
  • System
  • Supervisor
  • Undefined
  • Abort
  • User mode
Mode
When does ARM enters in pericular mode?
Abort
Failed attempt to access memory.
Fast interrupt request
Interrupt request arrives through FIQ channel (input).
Interrupt request
Interrupt request arrives through IRQ channel (input).
Supervisor
After reset. It is generally the mode that an OS Kernel operates in.
System
Special version of user mode that allows full read-write access to the CPSR.
Undefined
When processor encounters an instruction. That is undefined or not supported by the implementation.
User mode
Used for programs & applications

ARM7 Programmer's Model or Register Model

Diagram

Programmers model.jpg

Explained

  • In total 17(Visible)+20(Banked Rrgisters)=37 
  • The active registers available in the user mode are shown below.
  • This is protected mode which is normally used while executing applications.
  • 16 Data registers & one status register 
  • r0 to  r13 are orthogonal general purpose register.
  • Orthogonal means, any instruction that you can apply to ro can equally be applied to any of the other register.
    • Eg. ADD ro, r1, r2
    • ADD r5, r6, r7
  • R13 (stack pointer) and stores the top of the stack in the current processor mode.
  • R14(LR) Link Register where the core puts the return address on executing a subroutine.
  • R15(PC) Program counter stores the address of next instruction to be executed.
  • In ARM state all ARM instruction  are 32-bits wide.
  • In Thumb state all instructions are 16-bit wide.
  • In ARM state Instruction have to be four byte aligned in the memory. Which implies that the bottom two bits of the PC are always zero(Memory location 1000H,1004,1008H).

CPSR: Current Processor Status Register

About CPSR

  • ARM core uses CPSR to moniter & control internal operations.
  • The unused part reserved for future expansion.
  • CPSR fields is divided in to four fields, each 8-bits wide: flags, status, extension, and control.
  • In current designs status & extension fields are reserved for for future purpose.
  • In some ARM processor cores have extra bits allocated J bit (available only on Jazelle enabled processing which execute 8-bit instructions).

CPSR Diagram

cpsr.jpg
Flag bitSets when
N- Negative         
In case of signed no. operations If result 
MSB=1  ;Indicates the result of operation is NEGATIVE
Z- Zero    The result of operation is zero
C- CarryThe result causes an unsigned carry(carry out of MSB)
V-OverflowThe result causes a signed overflow
Q- SaturationThe result causes an overflow or saturation
I- Interrupt request DisableIf set interrupt request channel is disabled
F- Fast interrupt request Disable If set fast interrupt request channel is disabled
J- Jazelle instruction setIf set processor will execute Jazelle instructions
T-Thumb instruction set  If set processor will execute Thumb Instruction set

SPSR: Save Program Status Register

Suppose Processor is in USER mode of operation and if IRQ request arrives then processor has to switch itself to IRQ mode of operation but at the same after serving IRQ mode processor should return to USER mode and should resume its working.
So current processor status is copied into SPSR from CPSR in order to resume back.

Tiva TM4C123G

Description

  • Texas Instrument's Tiva™ C Series microcontrollers offer an 80 MHz Cortex-M with FPU, a variety of integrated memories and multiple programmable GPIO.
  • All members of the Tiva™ C Series, including the TM4C123G microcontroller, are designed around an ARM Cortex-M processor core.
  • The ARM Cortex-M processor provides the core for a high-performance, low-cost platform that meets the needs of minimal memory implementation, reduced pin count, and low power consumption while delivering outstanding computational performance and exceptional response to interrupts.

Block Diagram

tiva-TM4C123G-BLOCK-DIAGRAM.png
Fig.Block diagram of TI’s Tiva C Series TM4C123x microcontrollers

Features

  • Core- ARM Cortex-M4 processor core
  • Performance-  80-MHz operation; 100 DMIPS
  • Flash-   256 KB single-cycle Flash memory
  • System SRAM 32 KB single-cycle SRAM
  • EEPROM 2KB
  • Communication Interfaces-
    • Universal Asynchronous Receivers/Transmitter-  Eight UARTs
    • Synchronous Serial Interface (SSI) Four SSI modules
  • Inter-Integrated Circuit (I2C)- 
    • Four I2C modules with four transmission speeds including high-speed MODE
  • Controller Area Network (CAN)-
    • Two CAN 2.0 A/B controllers
  • Universal Serial Bus (USB) USB 2.0 OTG/Host/Device
  • General-Purpose Timer (GPTM)
    • Six 16/32-bit GPTM blocks
  • Watchdog Timer (WDT)
    • Two watchdog timers
  • General-Purpose Input / Output (GPIO)
    • Six physical GPIO blocks
  • Pulse Width Modulator (PWM)-
    • Two PWM modules, each with four PWM generator blocks and a control block, for a total of 16 PWM outputs.
  • Analog-to-Digital Converter (ADC)-
    • Two 12-bit ADC modules, each with a maximum sample rate of one million samples/second
  • Analog Comparator Controller
    • Two independent integrated analog comparators
  • JTAG and Serial Wire Debug (SWD)
    • One JTAG module with integrated ARM SWD
  • Thumb-2 mixed 16-/32-bit instruction set delivers the high performance expected of a 32-bit ARM core in a compact memory size
  •  Single-cycle multiply instruction and hardware divide
  • IEEE754-compliant single-precision Floating-Point Unit (FPU)
  • Fast code execution permits slower processor clock or increases sleep mode time
  • Harvard architecture characterized by separate buses for instruction and data
  • Deterministic, high-performance interrupt handling for time-critical applications
  • Memory protection unit (MPU) to provide a privileged mode for protected operating system functionality
  • Enhanced system debug with extensive breakpoint and trace capabilities
  • Serial Wire Debug and Serial Wire Trace reduce the number of pins required for debugging and tracing.
  • Migration from the ARM7™ processor family for better performance and power efficiency

Applications

Tiva™ C Series microcontrollers are the leading choice in high-performance 32-bit applications.
The product family is positioned for cost-conscious applications requiring significant control processing and connectivity capabilities such as:
  • Low power, hand-held smart devices
  • Gaming equipment
  • Home and commercial site monitoring and control
  • Motion control
  • Medical instrumentation
  • Test and measurement equipment
  • Smart Energy/Smart Grid solutions
  • Factory automation
  • Fire and security
  • Intelligent lighting control





No comments:

Post a Comment