Embedded Systems: ARM Programming and Optimization combines an exploration of the ARM architecture with an examination of the facilities offered by the Linux operating system to explain how various features of program design can influence processor performance. It demonstrates methods by which a programmer can optimize program code in a way that does not impact its behavior but improves its performance. Several applications, including image transformations, fractal generation, image convolution, and computer vision tasks, are used to describe and demonstrate these methods. From this, the reader will gain insight into computer architecture and application design, as well as gain practical knowledge in the area of embedded software design for modern embedded systems.
Key Features
- Covers three ARM instruction set architectures, the ARMv6 and ARMv7-A, as well as three ARM cores, the ARM11 on the Raspberry Pi, Cortex-A9 on the Xilinx Zynq 7020, and Cortex-A15 on the NVIDIA Tegra K1
- Describes how to fully leverage the facilities offered by the Linux operating system, including the Linux GCC compiler toolchain and debug tools, performance monitoring support, OpenMP multicore runtime environment, video frame buffer, and video capture capabilities
- Designed to accompany and work with most of the low cost Linux/ARM embedded development boards currently available
- Dedication
- Preface
- Using this Book
- Instructor Support
- Acknowledgments
- Chapter 1: The Linux/ARM embedded platform
- Abstract
- 1.1 Performance-Oriented Programming
- 1.2 ARM Technology
- 1.3 Brief History of ARM
- 1.4 ARM Programming
- 1.5 ARM Architecture Set Architecture
- 1.6 Assembly Optimization #1: Sorting
- 1.7 Assembly Optimization #2: Bit Manipulation
- 1.8 Code Optimization Objectives
- 1.9 Runtime Profiling with Performance Counters
- 1.10 Measuring Memory Bandwidth
- 1.11 Performance Results
- 1.12 Performance Bounds
- 1.13 Basic ARM Instruction Set
- 1.14 Chapter Wrap-Up
- Exercises
- Chapter 2: Multicore and data-level optimization: OpenMP and SIMD
- Abstract
- 2.1 Optimization Techniques Covered by this Book
- 2.2 Amdahl's Law
- 2.3 Test Kernel: Polynomial Evaluation
- 2.4 Using Multiple Cores: OpenMP
- 2.5 Performance Bounds
- 2.6 Performance Analysis
- 2.7 Inline Assembly Language in GCC
- 2.8 Optimization #1: Reducing Instructions per Flop
- 2.9 Optimization #2: Reducing CPI
- 2.10 Optimization #3: Multiple Flops per Instruction with Single Instruction, Multiple Data
- 2.11 Chapter Wrap-Up
- Chapter 3: Arithmetic optimization and the Linux Framebuffer
- Abstract
- 3.1 The Linux Framebuffer
- 3.2 Affine Image Transformations
- 3.3 Bilinear Interpolation
- 3.4 Floating-Point Image Transformation
- 3.5 Analysis of Floating-Point Performance
- 3.6 Fixed-Point Arithmetic
- 3.7 Fixed-Point Performance
- 3.8 Real-Time Fractal Generation
- 3.9 Chapter Wrap-Up
- Chapter 4: Memory optimization and video processing
- Abstract
- 4.1 Stencil Loops
- 4.2 Example Stencil: The Mean Filter
- 4.3 Separable Filters
- 4.4 Memory Access Behavior of 2D Filters
- 4.5 Loop Tiling
- 4.6 Tiling and the Stencil Halo Region
- 4.7 Example 2D Filter Implementation
- 4.8 Capturing and Converting Video Frames
- 4.9 Video4Linux Driver and API
- 4.10 Applying the 2D Tiled Filter
- 4.11 Applying the Separated 2D Tiled Filter
- 4.12 Top-Level Loop
- 4.13 Performance Results
- 4.14 Chapter Wrap-Up
- Chapter 5: Embedded heterogeneous programming with OpenCL
- Abstract
- 5.1 GPU Microarchitecture
- 5.2 OpenCL
- 5.3 OpenCL Programming Model, Idioms, and Abstractions
- 5.4 Kernel Workload Distribution
- 5.5 OpenCL Implementation of Horner's Method: Device Code
- 5.6 Performance Results
- 5.7 Chapter Wrap-Up
- Appendix A: Adding PMU support to Raspbian for the Generation 1 Raspberry Pi
- A.1 Download the Linux Kernel and Cross-Compiler Tools
- A.2 Kernel Modifications
- A.3 Building the Kernel
- A.4 Installing the Kernel
- Appendix B: NEON intrinsic reference
- B.1 Vector Data Types
- B.2 Reading and Writing Vector Variables
- B.3 Vector Element Manipulation
- B.4 Optimizing Floating-Point Code with NEON Intrinsics
- B.5 Summary of NEON Instrinsics
- Appendix C: OpenCL reference
- C.1 Platform Layer
- C.2 Memory Types
- C.3 Buffer Management
- C.4 Programs and Compiling
- C.5 Kernel Functions
- C.6 Command Queue Functions
- C.7 Vector and Image Data Types
- C.8 Attributes
- C.9 Constants
- C.10 Built-in Functions
- Index
- Sloss, ARM System Developer's Guide, Morgan Kaufmann, 9781558608740, 2004, $89.95
- Yiu, Definitive Guide to ARM Cortex-M0, Newnes, 9780123854773, 2011, $55.95
- Abbott, Linux for Embedded and Real-time Applications (with ARM), Newnes, 9780124159969, 2012, $49.95
- Wolf, Computers as Components 3e, Morgan Kaufmann, 9780123884367, 2012, $84.95