Introduction to ARM Architecture and the Working Principle of Its Modules
- Uchi Embedded Solutions

- Mar 27
- 5 min read
Updated: Apr 7
ARM stands for Advanced RISC Machine and is one of the most widely used and licensed processor architectures in the world. The first ARM processor concept was developed in 1978 at Cambridge University, and the first practical ARM RISC processor was produced by Acorn Computers in 1985. ARM processors are widely used in portable and embedded devices such as digital cameras, mobile phones, home networking systems, wireless communication devices, and many other embedded applications because they offer low power consumption, efficient performance, and compact design.

ARM Architecture
ARM is based on RISC (Reduced Instruction Set Computing) architecture and is commonly implemented as a 32-bit microcontroller architecture. It was introduced by Acorn Computers in 1987 and later adopted by many semiconductor manufacturers such as STMicroelectronics, Motorola, and others. Over time, ARM architecture evolved through multiple versions such as ARMv1, ARMv2, and later families, each with its own strengths and limitations.

The ARM Cortex family is one of the most important ARM processor families and is based on the ARMv7 architecture. It is divided into three major subfamilies:
ARM Cortex-A series
ARM Cortex-R series
ARM Cortex-M series
Main Components of ARM Architecture
The ARM architecture mainly includes the following components:
Arithmetic Logic Unit (ALU)
Booth Multiplier
Barrel Shifter
Control Unit
Register File
In addition to these, ARM processors also include a Program Status Register, which stores processor flags such as Z, S, V, and C, along with mode bits and interrupt control bits. Other special registers include the instruction register, memory data registers for read and write, and the memory address register. A priority encoder is also used during multiple load and store operations to identify which register in the register file should be loaded or stored. Several multiplexers are used to control processor bus operations. Each architectural block can be modeled behaviorally, allowing easier design, optimization, and maintenance.
Arithmetic Logic Unit (ALU)
The ALU has two 32-bit inputs. One input comes from the register file, and the other comes from the shifter. The ALU updates the status flags based on its output:
V flag is updated from the overflow output
C flag is updated from the carry output
The most significant bit represents the S flag
The ALU output is NORed to generate the Z flag
The ALU uses a 4-bit function bus, which allows up to 16 different operations to be implemented.
Booth Multiplier
The Booth algorithm is an important multiplication technique used for 2’s complement numbers. It handles both positive and negative numbers uniformly. It also improves efficiency by skipping continuous runs of 0s or 1s in the multiplier, reducing unnecessary addition or subtraction steps. This can significantly speed up multiplication. According to the described implementation, the multiplication operation completes in 16 clock cycles.
Barrel Shifter
The barrel shifter takes a 32-bit input, which may come from the register file or from immediate data. The operation of the shifter is controlled by fields from the instruction register. The shift field determines the type of shift to perform, such as:
Logical left shift
Logical right shift
Arithmetic right shift
Rotate right
The amount of shift may come either from an immediate field in the instruction or from the lower 6 bits of a register in the register file. The shift_val input bus is 6 bits wide, allowing shifts up to 32 bits. The shifttype input uses:
00 for shift left
01 for shift right
10 for arithmetic shift right
11 for rotate right
The barrel shifter is mainly built using multiplexers.
Control Unit
The control unit is the heart of the processor and is responsible for supervising the operation of the entire system. Its design is one of the most important aspects of the processor architecture. It is often implemented as a combinational circuit, but in this case it is described as a simple state machine. The processor timing is also handled by the control unit. Signals generated by the control unit are connected to all processor components to coordinate and control their operations.
ARM7 Functional Diagram
The final aspect to understand is how the ARM7 processor is utilized and how the chip is structured. The processor interfaces with a variety of signals, including input, output, and control (supervisory) signals, which collectively manage and regulate its overall operation.

ARM Microcontroller Register Modes
ARM follows a load-store architecture, meaning the core cannot operate directly on memory. Data must first be loaded into registers, processed there, and then written back to memory. The ARM Cortex-M3 includes 37 registers, of which 31 are general-purpose registers and 6 are status registers. ARM processors use several processing modes:
User Mode
FIQ Mode
IRQ Mode
SVC Mode
Undefined Mode
Abort Mode
Monitor Mode
Description of the Modes
User Mode:
This is the normal operating mode. It has the fewest available registers, no SPSR, and limited access to CPSR.
FIQ and IRQ Modes:
These are interrupt modes. FIQ is used for fast interrupts, while IRQ is used for standard interrupts. FIQ mode includes five additional banked registers, allowing faster response and improved performance during critical interrupt handling.
SVC Mode:
Supervisor mode is used for software interrupts, startup, and reset operations.
Undefined Mode:
This mode is entered when the processor tries to execute an illegal instruction.
THUMB and THUMB-2 Modes
In THUMB mode, 32-bit data is handled in 16-bit instruction format, which improves code density and can increase execution efficiency. In THUMB-2 mode, instructions can be either 16-bit or 32-bit, providing a good balance between compact code and high performance. The ARM Cortex-M3 uses only THUMB-2 instructions.
Some registers are reserved for specific purposes in each mode. These include:
Stack Pointer (SP)
Link Register (LR)
Program Counter (PC)
Current Program Status Register (CPSR)
Saved Program Status Register (SPSR)
The CPSR and SPSR store control and status bits such as operating mode, interrupt enable or disable flags, and ALU status flags. The ARM core operates in either 32-bit ARM state or THUMB state.
ARM Cortex Microcontroller Programming
Today, many microcontroller manufacturers offer 32-bit microcontrollers based on ARM Cortex-M3 architecture. Embedded system developers increasingly prefer these controllers for modern applications. ARM microcontrollers support both low-level and high-level programming languages. Older traditional microcontroller architectures often had limited memory and lower performance, which made high-level programming more difficult. ARM microcontrollers, however, can run at 100 MHz or higher, making them suitable for high-level language support and more advanced software development.

ARM microcontrollers are commonly programmed using IDEs such as:
Keil uVision3
Keil uVision4
Coocox
While 8-bit microcontrollers use 8-bit instruction structures, ARM Cortex-M devices use 32-bit instructions for more advanced processing capabilities.
Additional Uses and Features of Cortex Processors
The Cortex processor offers many important features:
Reduced Instruction Set Computing (RISC) design
32-bit high-performance CPU
Compact 3-stage pipeline
THUMB-2 technology
Efficient combination of 16-bit and 32-bit instructions
High performance with low power usage
Support for development tools and RTOS
CoreSight debug and trace support
JTAG or 2-pin Serial Wire Debug connections
Support for multi-processor systems
Low-power sleep modes
Software-controlled power management
Multiple power domains
Nested Vectored Interrupt Controller (NVIC)
Low-latency and low-noise interrupt response
No need for assembly language programming in many cases

Comments