Blue Fox

Name: Blue Fox | Arm Assembly Internals and Reverse Engineering
Brand: Wiley
Price: 28.99 EUR
Availability: OnlineOnly

Arm Assembly Internals and Reverse Engineering

Maria Markstedter(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 11. April 2023

480 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-74672-0 (ISBN)

28,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Provides readers with a solid foundation in Arm assembly internals and reverse-engineering fundamentals as the basis for analyzing and securing billions of Arm devices

Finding and mitigating security vulnerabilities in Arm devices is the next critical internet security frontier--Arm processors are already in use by more than 90% of all mobile devices, billions of Internet of Things (IoT) devices, and a growing number of current laptops from companies including Microsoft, Lenovo, and Apple. Written by a leading expert on Arm security, Blue Fox: Arm Assembly Internals and Reverse Engineering introduces readers to modern Armv8-A instruction sets and the process of reverse-engineering Arm binaries for security research and defensive purposes.

Divided into two sections, the book first provides an overview of the ELF file format and OS internals, followed by Arm architecture fundamentals, and a deep-dive into the A32 and A64 instruction sets. Section Two delves into the process of reverse-engineering itself: setting up an Arm environment, an introduction to static and dynamic analysis tools, and the process of extracting and emulating firmware for analysis. The last chapter provides the reader a glimpse into macOS malware analysis of binaries compiled for the Arm-based M1 SoC. Throughout the book, the reader is given an extensive understanding of Arm instructions and control-flow patterns essential for reverse engineering software compiled for the Arm architecture. Providing an in-depth introduction into reverse-engineering for engineers and security researchers alike, this book:

* Offers an introduction to the Arm architecture, covering both AArch32 and AArch64 instruction set states, as well as ELF file format internals

* Presents in-depth information on Arm assembly internals for reverse engineers analyzing malware and auditing software for security vulnerabilities, as well as for developers seeking detailed knowledge of the Arm assembly language

* Covers the A32/T32 and A64 instruction sets supported by the Armv8-A architecture with a detailed overview of the most common instructions and control flow patterns

* Introduces known reverse engineering tools used for static and dynamic binary analysis

* Describes the process of disassembling and debugging Arm binaries on Linux, and using common disassembly and debugging tools

Blue Fox: Arm Assembly Internals and Reverse Engineering is a vital resource for security researchers and reverse engineers who analyze software applications for Arm-based devices at the assembly level.

Weitere Details

Weitere Ausgaben

Person

Inhalt

CHAPTER 1
Introduction to Reverse Engineering

Introduction to Assembly

If you're reading this book, you've probably already heard about this thing called the Arm assembly language and know that understanding it is the key to analyzing binaries that run on Arm. But what is this language, and why does it exist? After all, programmers usually write code in high-level languages such as C/C++, and hardly anyone programs in assembly directly. High-level languages are, after all, far more convenient for programmers to program in.

Unfortunately, these high-level languages are too complex for processors to interpret directly. Instead, programmers compile these high-level programs down into the binary machine code that the processor can run.

This machine code is not quite the same as assembly language. If you were to look at it directly in a text editor, it would look unintelligible. Processors also don't run assembly language; they run only machine code. So, why is it so important in reverse engineering?

To understand the purpose of assembly, let's do a quick tour of the history of computing to see how we got to where we are and how everything connects.

Bits and Bytes

Back in the mists of time when it all started, people decided to create computers and have them perform simple tasks. Computers don't speak our human languages-they are just electronic devices after all-and so we needed a way to communicate with them electronically. At the lowest level, computers operate on electrical signals, and these signals are formed by switching electrical voltages between one of two levels: on and off.

The first problem is that we need a way to describe these "ons" and "offs" for communication, storage, and simply describing the state of the system. Since there are two states, it was only natural to use the binary system for encoding these values. Each binary digit (or bit) could be 0 or 1. Although each bit can store only the smallest amount of information possible, stringing multiple bits together allows representation of much larger numbers. For example, the number 30,284,334,537 could be represented in just 35 bits as the following:

11100001101000101100100010111001001

Already this system allows for encoding large numbers, but now we have a new problem: where does one number in memory (or on a magnetic tape) end and the next one begin? This is perhaps a strange question to ask modern readers, but back when computers were first being designed, this was a serious problem. The simplest solution here would be to create fixed-size groupings of bits. Computer scientists, never wanting to miss out on a good naming pun, called this group of binary digits or bits a byte.

So, how many bits should be in a byte? This might seem like a blindingly obvious question to our modern ears, since we all know that a modern byte is 8 bits. But it was not always so.

Originally, different systems made different choices for how many bits would be in their bytes. The predecessor of the 8-bit byte we know today is the 6-bit Binary Coded Decimal Interchange Code (BCDIC) format for representing alphanumeric information used in early IBM computers, such as the IBM 1620 in 1959. Before that, bytes were often 4 bits long, and before that, a byte stood for an arbitrary number of bits greater than 1. Only later, with IBM's 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), introduced in the 1960s in its mainframe computer product line System/360 and which had byte-addressable memory with 8-bit bytes, did the byte start to standardize around having 8 bits. This then led to the adoption of the 8-bit storage size in other widely used computer systems, including the Intel 8080 and Motorola 6800.

The following excerpt is from a book titled Planning a Computer System, published 1962, listing three main reasons for adopting the 8-bit byte1:

1. Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record.
3. 8-bit bytes are reasonably economical of storage space.

An 8-bit byte can hold one of 256 uniquely different values from 00000000 to 11111111. The interpretation of those values, of course, depends on the software using it. For example, we can store positive numbers in those bytes to represent a positive number from 0 to 255 inclusive. We can also use the two's complement scheme to represent signed numbers from -128 to 127 inclusive.

Character Encoding

Of course, computers didn't just use bytes for encoding and processing integers. They would also often store and process human-readable letters and numbers, called characters.

Early character encodings, such as ASCII, had settled on using 7 bits per byte, but this gave only a limited set of 128 possible characters. This allowed for encoding English-language letters and digits, as well as a few symbol characters and control characters, but could not represent many of the letters used in other languages. The EBCDIC standard, using its 8-bit bytes, chose a different character set entirely, with code pages for "swapping" to different languages. But ultimately this character set was too cumbersome and inflexible.

Over time, it became clear that we needed a truly universal character set, supporting all the world's living languages and special symbols. This culminated in the creation of the Unicode project in 1987. A few different Unicode encodings exist, but the dominant encoding used on the Web is UTF-8. Characters within the ASCII character -set are included verbatim in UTF-8, and "extended characters" can spread out over multiple consecutive bytes.

Since characters are now encoded as bytes, we can represent characters using two hexadecimal digits. For example, the characters A, R, and M are normally encoded with the octets shown in Figure 1.1.

Figure 1.1: Letters A, R, and M and their hexadecimal values

Each hexadecimal digit can be encoded with a 4-bit pattern ranging from 0000 to 1111, as shown in Figure 1.2.

Figure 1.2: Hexadecimal ASCII values and their 8-bit binary equivalents

Since two hexadecimal values are required to encode an ASCII character, 8 bits seemed like the ideal for storing text in most written languages around the world, or a multiple of 8 bits for characters that cannot be represented in 8 bits alone.

Using this pattern, we can more easily interpret the meaning of a long string of bits. The following bit pattern encodes the word Arm:

0100 0001 0101 0010 0100 1101

Machine Code and Assembly

One uniquely powerful aspect of computers, as opposed to the mechanical calculators that predated them, is that they can also encode their logic as data. This code can also be stored in memory or on disk and be processed or changed on demand. For example, a software update can completely change the operating system of a computer without the need to purchase a new machine.

We've already seen how numbers and characters are encoded, but how is this logic encoded? This is where the processor architecture and its instruction set comes into play.

If we were to create our own computer processor from scratch, we could design our own instruction encoding, mapping binary patterns to machine codes that our processor can interpret and respond to, in effect, creating our own "machine language." Since machine codes are meant to "instruct" the circuitry to perform an "operation," these machine codes are also referred to as instruction codes, or, more commonly, operation codes (opcodes).

In practice, most people use existing computer processors and therefore use the instruction encodings defined by the processor manufacturer. On Arm, instruction encodings have a fixed size and can be either 32-bit or 16-bit, depending on the instruction set in use by the program. The processor fetches and interprets each instruction and runs each in turn to perform the logic of the program. Each instruction is a binary pattern, or instruction encoding, which follows specific rules defined by the Arm architecture.

By way of example, let's assume we're building a tiny 16-bit instruction set and are defining how each instruction will look. Our first task is to designate part of the encoding as specifying exactly what type of instruction is to be run, called the opcode. For example, we might set the first 7 bits of the instruction to be an opcode and specify the opcodes for addition and subtraction, as shown in Table 1.1.

Table 1.1: Addition and Subtraction Opcodes

OPERATION OPCODE Addition 0001110 Subtraction 0001111

Writing machine code by hand is possible...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Blue Fox

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

CHAPTER 1 Introduction to Reverse Engineering

Introduction to Assembly

Bits and Bytes

Character Encoding

Machine Code and Assembly

Systemvoraussetzungen

CHAPTER 1
Introduction to Reverse Engineering