发新话题
打印

[转载]对物理内存的取证分析

[转载]对物理内存的取证分析

原始连接:http://www.forensicfocus.com/digital-forensics-of-physical-memory
文章作者:Mariusz Burdach
信息来源:Forensic Focus

Digital forensics of the physical memory

Abstract
This paper presents methods by which physical memory from a compromised machine can be analyzed. Through this methods, it is possible to extract useful information from memory such as: a full content of files, detailed information about each process and also processes that were being executed and then were terminated in the past. This paper aims to explain the concepts of digital investigations of volatile memory. Techniques covered by this paper will lead you through the process of analyzing important structures and recovering contents of files from physical memory.

In addition, a technique, that detects hidden User Mode processes, will be discussed indepth. This technique leads to detect processes which can be hidden by using various methods such as: function hooking or direct kernel object manipulation (DKOM). Basing on methods discussed in this paper, the proof-of-concept toolkit, called idetect, will be presented. This toolkit can help an investigator to extract some information from memory image or from memory object on a live system.


1. Introduction

In the past, a procedure of making an accurate and a reliable copy of the data from a compromised machine was limited into storages such as hard disks. It means, that a forensic analysis process relied on evidence found on file systems. There are several reasons for using such a procedure. First of all, the acquisition procedure is quite easy and an investigator’s experience is not necessary. It is enough to remove power from a compromised machine and then to protect the crime scene. A second reason is more important. In most cases, examination tools, available on the market, can be used only to investigate file systems. There are some forensic tools such as EnCase EE or ProDiscover IR that help digital investigators to preserve some data from live system but for several reasons the tools are much more useful in an incident response process. It is quite obvious that if we omit volatile data during an acquisition procedure, we can loose evidence. Furthermore, sophisticated methods of infecting computers, used by tools such as the FU rootkit or the SQL Slammer worm, show us that in near future the memory content will be the only place where evidence can be found. An infection of malicious code into a running processes, caused by internet worms and viruses, is more and more popular. For example, the mentioned SQL Slammer resides only in memory and never writes anything to disk.

There are also other advantages of performing memory investigation. Let’s suppose, that we need to recover a part of email or a part of a document lost after a word editor crash. Where are we going to look it for? Even a simple task of searching of strings in main memory is sometimes very useful and allows us to extract interesting information such as commands typed by an intruder [6].

Above examples show us that memory investigation is critical for digital forensics. It is worth mentioning that most interesting information can be found when the compromised system was not rebooted. In this paper I will try to discuss some techniques of finding evidence in preserved memory image.


2. Problems with memory acquisition procedure

Most standards and best practice guidelines, such as: the “Computer Security Incident Handling Guide” from NIST or RFC 3227 “Guidelines for Evidence Collection and Archiving”, include procedures of gathering volatile data. Some data, which must be acquired, is specified in these papers. For example: current network connections, running processes, users’ sessions, kernel parameters, open files etc. But, to gather this data an investigator must use several tools such as: netstat, lsof, ifconfig, etc. These tools help in collecting only obvious data, leaving most of the system’s memory unanalyzed. Moreover, these tools are executed from user mode. Even statically linked tools can print unreliable data because of a kernel level modification.

The perfect tool for collecting volatile data should not rely on an operating system. Such solutions exist and one of them is described in the “Digital Investigation” magazine Vol. 1 No. 1. The described hardware-based solution called Tribble is almost perfect. Unfortunately, the special PCI card must be physically installed in a machine before an intrusion occurs. Obviously, it is impossible to install such a card in each machine in internet. A memory acquisition procedure should be useful in every environment so in most cases it must be a software solution. The only thing which can be done by an investigator when an intrusion occurs is limiting memory collection process to few steps. This allows him to minimize impact on the compromised machine. He should dump main memory by using only one command. In second step, he should remove power from the compromised machine and then preserve remaining storages such as: hard disks, floppy disks, etc. The dd tool can be used to dump main memory. This tool does a bit-by-bit copy from one file to another. Additionally, a content of main memory has to be saved on a storage other than local file systems. One of solutions is sending data to a remote host. The well known tool, which supports sending files through network, is the netcat tool. In Linux operating system there are two files (/dev/mem and /proc/kcore) which correspond to main memory (RAM). The size of dumped memory is equal to the size of RAM. The / proc/kcore object is presented in the ELF core format, so it can be easily analyzed by the gdb tool. The size of the /proc/kcore file is a little bigger because of the ELF file header.

The whole memory can be dumped in the way presented below:

#/mnt/cdrom/dd if=/dev/mem | /mnt/cdrom/nc

If we have dumped memory image, we can start digital investigation.

3. Introduction to analysis of the physical memory
3.1 Limitations of the paper

To limit the size of this document it was necessary to specify a few conditions:

· The 2.4.20 kernel release is used in all examples. Similar investigations can be performed with other kernel releases.
· The total size of physical memory is less than 896 MB. When physical memory is larger, additional calculations must be performed to localize page frames properly.
· The page frame size is 4 KB. This is the default value used in almost each Linux distribution. The proof-of-concept toolkit idetect is used to simplify the described investigation. After simple modifications, the presented tools can be used on live systems during an incident response.


3.2 Symbols

During the digital investigation the System.map file can be very helpful. This file is used as a map with addresses of important kernel symbols. Every time you compile a new kernel, the addresses of various symbols are changed. The symbols included in that file provide helpful information for investigators. Let’s say that we want to enumerate addresses of system calls. These addresses are stored in the kernel structure called the system call table. The sys_call_table symbol stores an address of this table. Using the cat and the grep commands we receive the address of that table.

$ cat /boot/System.map | grep sys_call_table c030a0f0 D sys_call_table

On Listing 1 first few entries of system call table are presented.

(gdb) x/256 0xc030a0f0
0xc030a0f0 : 0xc0128fa0 0xc011f8e0 0xc0107aa0 0xc0146cb0
0xc030a100 : 0xc0146df0 0xc0146220 0xc0146370 0xc0120060
0xc030a110 : 0xc01462c0 0xc0154510 0xc0154070 0xc0107bb0
0xc030a120 : 0xc01457f0 0xc0120d40 0xc01536b0 0xc0145b70
0xc030a130 : 0xc012ca00 0xc0128fa0 0xc014e910 0xc0146b40


Listing 1. The result of running the gdb tool against the /proc/kcore file.

Entries in this table correspond to names of functions stored in the file /usr/include/asm/unistd.h.

#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6


For example, the sys_write function is at 0xc0146df0, the sys_open function is at 0xc0146220, and so on.

The Symbol.map file is usually located in the /boot directory on a local file system.


4. An introduction to the digital investigation of the physical memory

Terminology used in the digital investigation of the physical memory is similar to the digital investigation against file systems. We can define data units and meta-data units. The data unit contains raw data such as execution code or data section from memory mapped file. Additionally, they can contain a content of the stack or some meta data such a process descriptor. The data units are a fixed size. In most systems data unit is equal to 4 KB – this is the default size of the page frame.

The meta data unit is where the descriptive data about various memory structures is stored. This kind of the unit includes structures such as: page descriptors, process descriptors, memory regions, and so on.


4.1 Virtual Address Space

In most examples in this paper, the virtual (linear) addresses are used. All modern operating systems, including Linux, use this kind of addresses to access the contents of memory cells. In the x86 architecture with 32-bit CPU processors, a single 32-bit unsigned integer can be used to address up to 4 GB.

The Linux operating system divides memory into 2 parts. Upper 1 GB (0xc0000000 – 0xffffffff) is reserved for a kernel of operating system (this memory area can be accessed only when the CPU is switched into Kernel Mode). The remaining part of memory (3GB) is called User Land.


4.2 Physical addresses

Physical addresses are used to address memory cells in memory chips. Physical addresses are represented as a 32-bit unsigned integer. The CPU control unit transforms a linear address into a physical address automatically. Helpfully, a calculation from a linear address to a physical one is quite simple and this will be shown several times in the following chapters.

5. Map of system memory
In this chapter important structures of kernel memory are discussed. Only elements, useful for forensic investigators, will be described. It is recommended to use books [1][2] listed in references to find detailed information about each structure discussed in this document.


5.1 Uniform Memory Access

In x86 architecture, the Linux uses physical memory as an homogeneous, shared resource. This method is called Uniform Memory Access (UMA) and it means that the memory of the computer is seen by operating system as a single node. This node is represented as the static pg_data_t structure. The symbol contig_page_data contains the address of this structure. The pg_data_t struct contains information about: a size of a node (it means that it is a total size of physical memory), number of zones in a node, an address of table with page descriptors for this node and many more. At this point, it is important to understand what the zones are.


5.2 Zones

Physical memory (or node) is partitioned into three zones: ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. As we can read in book [1], there are reasons for providing such a fragmentation.

“However, real computer architectures have hardware constraints that may limit the way page frames can be used. In particular, the Linux kernel must deal with two hardware constraints of the 80 x 86 architecture:

- The Direct Memory Access (DMA) processors for ISA buses have a strong limitation: they are able to address only the first 16 MB of RAM. - In modern 32-bit computers with lots of RAM, the CPU cannot directly access all physical memory because the linear address space is too small. To scope with these two limitations, Linux partitions the physical memory in three zones.” The size of the ZONE_DMA zone is 16 MB. The size of the ZONE_NORMAL zone is equal to 896 MB – 16 MB (ZONE_DMA). The memory above 896 MB is included in the ZONE_HIGHMEM zone. This last zone contains page frames that cannot be directly accessed by the kernel because of limitation of a single 32-bit unsigned integer. Each memory zone has its own descriptor of type zone_struct. This structure is defined in the file /usr/src/linux-2.4/include/linux/mmzone.h.

In all examples I used physical memory which size is 128 MB. It means that all users’ and kernel data are stored in the ZONE_NORMAL zone and sometimes in the ZONE_DMA zone. Pointers to the mentioned zone descriptors are kept in the zone_table array. The address of the zone_table symbol is stored in the System.map file.

$cat System.map | grep zone_table c03e6238 B zone_table

A letter "B" means that the symbol is in the uninitialized data section (known as BSS).

The content of the zone_table array is presented in Listing 2.

· zone_table[0] stores an address of the ZONE_DMA descriptor
· zone_table[1] stores an address of the ZONE_NORMAL descriptor
· zone_table[2] stores an address of the ZONE_HIGH descriptor

003e6230 80 9b 34 c0 ce ff 02 00 80 9b 34 c0 80 9e 34 c0 |..4.......4...4.|
003e6240 80 a1 34 c0 00 00 00 00 00 00 00 00 00 00 00 00 |..4.............|
003e6250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
003e6260 ce ff 02 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|

Listing 2. Fragment of physical memory with the zone_table array.

At address 0xc0349e80 we can find the zone descriptor for the ZONE_NORMAL zone.

Each zone descriptor contains a lot of information about its own page frames. There is also an address of the mem_map array. This table stores page descriptors of each page frame in the zone. Before looking closer at the mem_map array, let’s focus on page frames and page descriptors.


5.3 Page frames and page descriptors

Most data used by the CPU are stored in physical memory in a form of pages frames (In fact, physical memory is partitioned into a fixed-length page frames). Each page frame is 4KB large – this is the default value. Sometimes x86 processors can use different sizes of page frame, such as 4 MB or 2 MB, but the standard memory allocation unit is 4 KB (only a standard size of page frames is discussed in this document). In page frames all volatile data is stored. For instance, when a file, which has a size of 7 KB is mapped, its content (code and data segments) will be stored in physical memory in two page frames. When a process requests a memory item, the system will use a linear (virtual) address to access requested data. To read data properly a hardware Memory Management Unit (MMU) translates a virtual address automatically to a physical one. The page may be marked as paged in or paged out. If the page is paged in then an access to memory can be proceed after translating a virtual address to a physical address. If the requested page is paged out, the MMU has to locate this page in the swap area and then load it into physical memory. These two possibilities will be discussed in next sections.

A kernel uses page descriptors to keep track of all physical pages. Each page frame has a corresponding page descriptor. In a page descriptor the information about state of page is stored.

A structure of page descriptor is defined in the file /usr/src/linux-2.4/include/linux/mm.h.

typedef struct page {
struct list_head list;
struct address_space *mapping;
unsigned long index;
struct page *next_hash;
atomic_t count;
unsigned long flags;
struct list_head lru;
union {
struct pte_chain *chain;
pte_addr_t direct;
} pte;
unsigned char age;
struct page **pprev_hash;
struct buffer_head * buffers;
struct buffer_head * buffers;
#if defined(CONFIG_HIGHMEM) || defined(WANT_PAGE_VIRTUAL)
void *virtual;
#endif /* CONFIG_HIGMEM || WANT_PAGE_VIRTUAL */
} mem_map_t;


The mapping field is a pointer to an address_space struct. The virtual field is an address of the physical page frame where data is stored. When the total amount of the physical memory is less than 896MB then it is easy to calculate a real (physical) address of each page frame by removing the PAGE_OFFSET (0xc0000000) from an address pointed by the virtual field.

The list field contains pointers to next and previous page descriptors which belong to the same memory region.

As it was mentioned, all page descriptors are stored in the one global mem_map array.


5.4 The mem_map array

In fact, we can identify three mem_map arrays. Each zone has its own array. When the first mem_map array (for the ZONE_DMA zone) finishes then the second mem_map array (for the ZONE_NORMAL zone) starts.

If we know the size of the page descriptor, it will be quite easy to find the beginning address of the mem_map array for the ZONE_NORMAL. The ZONE_NORMAL has a special meaning because most page frames, allocated by users’ processes, belong to this zone. For most 2.4.x kernels the size of the page descriptor is equal to 56 (0x38) bytes. We know that the total size of the ZONE_DMA is equal to 16 MB (0x01000000). We need 4096 page frames to address the ZONE_DMA zone (0x00000000 – 0x01000000). The size of the mem_map array for the ZONE_DMA zone is [0x38] bytes * [0x1000] = 0x38000.

I have to mention that Linux operating system maps virtual addresses into physical addresses starting from PAGE_OFFSET. For instance, the virtual address 0xc0000000 corresponds to the physical address 0x00000000, the address 0xc0001000 corresponds to 0x00001000 one and so on.

It is also important to note that physically the mem_map array is placed in page frames which belong to the ZONE_NORMAL zone. The location of the mem_map array is shown in Figure 1.



Figure 1. The mem_map array is stored in the ZONE_NORMAL zone.

Basing on the above scheme, we can assume that the mem_map for the ZONE_DMA zone starts from the physical address 0x01000000. In fact, the mem_map for ZONE_DMA starts from offset 0x30. Now we can easily locate the beginning physical address of the ZONE_NORMAL which is equal to: 0x01000030 + 0x00038000 = 0x01038030 (the virtual address of the mem_map array for ZONE_NORMAL zone is 0xc1038030).

I focused on the ZONE_NORMAL zone but digital investigators must also look at the ZONE_DMA zone. On some conditions, the page frames, which belong to users’ processes, can be allocated in the ZONE_DMA zone. It happens when all page frames in the ZONE_NORMAL have been already allocated.

When files are mapped into the main memory, their inode structures (this inode structure will be discussed in the next section) have the associated address_space structure. The mapping field in page descriptor points exactly to this structure.



更多请参考原始链接(图)
Sprite

TOP

发新话题