[转载]Defeating Honeypots:System Issues(第一部分)

[转载]Defeating Honeypots:System Issues(第一部分)

  文章作者:Thorsten Holz and Frédéric Raynal

To learn about attack patterns and attacker behavior, the concept of electronic decoys or honeypots are often used. These look like regular network resources (computers, routers, switches, etc.) that are deployed to be probed, attacked, and compromised. This electronic bait lures in attackers and helps with the assessment of vulnerabilities. As honeypots are being deployed more and more often within computer networks, blackhats have started to devise techniques to detect, circumvent, and disable the logging mechanisms used on honeypots.

This paper will explain how an attacker typically proceeds as he attacks a honeypot for fun and profit. We will introduce several publicly known (or perhaps unknown) techniques and present some diverse tools which help blackhats to discover and interact with honeypots. The article aims to show those security teams and practitioners who would like to setup or harden their own lines of deception-based defense what the limitation of honeypot-based research currently is. After a brief theoretical introduction, we present several technical examples of different methodologies. This two-part paper will focus on the system world and the application layer, as opposed to our first paper, "Defeating Honeypots: Network Issues," [ref 1] which concentrated purely on network issues.

Honeypots versus steganography
Before going any further, let us talk briefly about steganography. Its goal is to hide the existence of a communication channel to anyone but the intended recipient of a message. As an art and science, it came to the forefront a few years ago when Simmons introduced his classic prisoners problem. [ref 2] Assume two prisoners are jailed in different cells. A warden has been authorized to carry messages from the one to the other. If the messages are ciphered -- which means the warden cannot understand the content of the message -- he will become suspicious, and the communication channel will be stopped. But if the prisoners have agreed on a code (for instance, a red sun on a painting is a code to mean something, while a yellow sun means something else), the message will not be noticed by the warden, and the prisoners will have the chance to covertly plot their escape.

When we configure a high interaction honeypot, we hope to capture a great deal of information about the attacker's activity. Even if he notices he is on a honeypot, learning how he noticed it to be a fake system is still valuable information. So, honeypots do need to be covert, but not necessarily completely covert.

Steganography and honeypots share some characteristics: mainly, that once you are discovered, the game is almost over. Also, in both steganography and honeypots you have to hide the presence of something as best you can. But there are always signs that you leave that inevitably allow for detection. For example, let's use our analogy with the warden again. He may examine the image he's carrying, and if he looks closely he will notice differences between several pictures, and perhaps become suspicious. For honeypots, the situation is comparable: if an attacker carefully watches for signs of deception, he will sooner or later find some.

Since honeypots are being deployed all across the Internet, more and more blackhats' tools are starting to include automatic detection of suspect environments. This has already begun with the backdoor-virus-worm known as AgoBot (also known as Gaobot). [ref 3]

Let's start with some technical examples that show some of the different techniques that attackers can use to detect honeypots.

Many tools are available for building a high interaction honeypot. We will focus some of the most known, and help show you the inside of the matrix.

User Mode Linux (UML)
Some people have tried to used UML [ref 4] as a honeypot, but in order to gauge its effectiveness, we need to first recall what UML is. Basically, UML is a way to have a Linux system running inside another Linux system. We will call the initial Linux kernel the host kernel (or host OS), while the one started by the command linux will be called the guest OS. It runs "above" the host kernel, all in userland. Note that UML is only a hacked kernel that is able to run in userland. Thus, you have to provide the filesystem containing your preferred Linux distribution.

By default, UML executes in TT (Tracing Thread) mode. One main thread will ptrace() each new process that is started in the guest OS. On the host OS, you can see this tracing with the help of ps:

host$ ps a
1039 pts/6   S    0:00 linux [(tracing thread)]
1044 pts/6   S    0:00 linux [(kernel thread)]
1049 pts/6   S    0:00 linux [(kernel thread)]
1051 pts/6   S    0:00 linux [(kernel thread)]
1053 pts/6   S    0:00 linux [(kernel thread)]
1055 pts/6   S    0:00 linux [(kernel thread)]
1057 pts/6   S    0:00 linux [(kernel thread)]
1059 pts/6   S    0:00 linux [(kernel thread)]
1061 pts/6   S    0:00 linux [(kernel thread)]
1063 pts/6   S    0:00 linux [(kernel thread)]
1064 pts/6   S    0:00 linux [(kernel thread)]
1065 pts/6   S    0:00 linux [(kernel thread)]
1066 pts/6   S    0:00 linux [(kernel thread)]
1068 pts/6   S    0:00 linux [/sbin/init]
1268 pts/6   S    0:00 linux [ile]
1272 pts/6   S    0:00 linux [/bin/sh]
1348 pts/6   S    0:00 linux [dd]

You can identify the main thread (PID 1039) and several threads which are ptrace()d: several kernel threads (PID 1044 -- 1066), init (PID 1068), ile (PID 1268), a shell (PID 1272), and dd (PID 1348).

We quickly discover that when used "by default," UML is not designed to be hidden:

uml$ dmesg
Linux version 2.6.10-rc2 (wstearns@sparrow.stearns.org)
(gcc version 3.3.2 20031022 (Red Hat Linux 3.3.2-1)) #1 Tue Nov 16 01:43:27 EST 2004
On node 0 totalpages: 8192
Kernel command line: ubd0=/home/raynal/MISC/uml/FS/debian.ext3 eth0=tuntap,tap0
PID hash table entries: 256 (order: 8, 4096 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...missing
Checking that host ptys support output SIGIO...Yes
Checking that host ptys support SIGIO on close...No, enabling workaround
Checking for /dev/anon on the host...Not available (open failed with errno 2)
NET: Registered protocol family 16
mconsole (version 2) initialized on /home/raynal/.uml/Es5BHO/mconsole
UML Audio Relay (host dsp = /dev/sound/dsp, host mixer = /dev/sound/mixer)
Netdevice 0 : TUN/TAP backend -
divert: allocating divert_blk for eth0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Coda Kernel/Venus communications, v6.0.0, coda@cs.cmu.edu
devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
Initializing software serial port version 1
/dev/ubd/disc0: unknown partition table
Initializing stdio console driver

The red lines are ones specific to UML in its default mode. Also note that the network device 0 uses a TUN/TAP (blue line) which is not that common on a real system.

One of the big issues with UML is that it does not use a real hard disk; it uses a fake IDE device called ubd*. If you take a look at the /etc/fstab, execute the command mount, or check the directory /dev/ubd/, you will notice that you are inside a UML system. To hide that information, it is possible to start UML with the options fake_ide and fakehd. But don't forget that what you read may not, in fact, be true: have a look at the UML's root device ubd to see that it is 98 (0x62).

UML can also be easily identified by taking a look at the /proc tree. Most of the entries in this directory will show signs of UML if you just take a closer look:

$ cat /proc/cpuinfo
processor    : 0
vendor_id    : User Mode Linux
model name    : UML
mode        : tt

$ cat /proc/devices
Block devices:
60 cow
90 ubd

$ cat /proc/filesystems
nodev  hostfs

$ egrep -i "uml|honey" /proc/ksysms
a02eb408 uml_physmem
a02ed688 honeypot

In addition, the entries iomen, ioports, interrupts, and many others look suspicious. To counter this way of fingerprinting UML, you can use hppfs (Honeypot procfs, [ref 5]) and customize the entries in the /proc hierarchy.

Another place to look for UML at is the address space of a process. On the host OS, the address space looks as follows:

host$ cat /proc/self/maps
08048000-0804c000 r-xp 00000000 03:01 1058722   /bin/cat
0804c000-0804d000 rw-p 00003000 03:01 1058722   /bin/cat
0804d000-0806e000 rw-p 0804d000 00:00 0
b7ca9000-b7ea9000 r--p 00000000 03:01 171      /usr/lib/locale/locale-archive
b7ea9000-b7eaa000 rw-p b7ea9000 00:00 0
b7eaa000-b7fd3000 r-xp 00000000 03:01 781848    /lib/tls/i686/cmov/libc-2.3.2.so
b7fd3000-b7fdb000 rw-p 00129000 03:01 781848    /lib/tls/i686/cmov/libc-2.3.2.so
b7fdb000-b7fde000 rw-p b7fdb000 00:00 0
b7fe9000-b7fea000 rw-p b7fe9000 00:00 0
b7fea000-b8000000 r-xp 00000000 03:01 782112    /lib/ld-2.3.2.so
b8000000-b8001000 rw-p 00015000 03:01 782112    /lib/ld-2.3.2.so
bfffe000-c0000000 rw-p bfffe000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0

In contrast, the address space inside the guest OS looks like this:

uml:~# cat /proc/self/maps
08048000-0804c000 r-xp 00000000 62:00 9957     /bin/cat
0804c000-0804d000 rw-p 00003000 62:00 9957     /bin/cat
0804d000-0806e000 rw-p 0804d000 00:00 0
40000000-40016000 r-xp 00000000 62:00 13907    /lib/ld-2.3.2.so
40016000-40017000 rw-p 00015000 62:00 13907    /lib/ld-2.3.2.so
40017000-40018000 rw-p 40017000 00:00 0
4001b000-4014b000 r-xp 00000000 62:00 21846    /lib/tls/libc-2.3.2.so
4014b000-40154000 rw-p 0012f000 62:00 21846    /lib/tls/libc-2.3.2.so
40154000-40156000 rw-p 40154000 00:00 0
9ffff000-a0000000 rw-p 9ffff000 00:00 0
beffe000-befff000 ---p 00000000 00:00 0

What one should notice, and what is not that common, is the topmost address which indicates the end of the stack (forget about the mapping of the dynamic libraries). Depending on the amount of memory available on your host, it is usually 0xc0000000. However, on the UML, we have 0xbefff000. In fact, the address space between 0xbefff000 and 0xc0000000 on a UML contains the mapping of the UML kernel. This means that each process can access, change, or do whatever it wants with the UML kernel.

To fix most of these problems, you can start UML either with the argument honeypot [ref 6, ref 7] or with the skas mode (Separate Kernel Address Space) [ref 8]. However, having skas mode running is not that easy to do, and the host kernel is really not stable when it is (pending processes, and so on, lead to reboots).

VMware is a very efficient virtual machine which provides a virtual x86 system. Thus, you can install (almost) any Operating System you want, from Linux or Windows to Solaris 10.

The first step to detect a VMware is to look at the hardware that it is supposed to emulate. Prior to version 4.5, there were some specific pieces of hardware that are not configurable:

the video card: VMware Inc [VMware SVGA II] PCI Display Adapter,
the network card: Advanced Micro Devices [AMD] 79c970 [PCnet 32 LANCE] (rev 10),
the name of IDE and SCSI devices: VMware Virtual IDE Hard Drive, NECVMWar VMware IDE CDR10, VMware SCSI Controller.
It is possible to patch the VMware binary to change these default values, however. Kostya Kortchinsky from the French Honeynet Project has written such a patch, which is able to set these values to some other values. This patch is publicly available. [ref 9]

Furthermore, the VMware binary also has an I/O backdoor. This backdoor is used to configure VMware during runtime. The following sequence is used to call the backdoor functions:

   MOV EAX, 564D5868h              ; Magic Number
   MOV DX, 5658h                 ; Port Number


At first, register EAX is loaded with a magic number that is used to "authenticate" the backdoor commands. Register EBX stores parameters for the commands. In register ECX the command itself is loaded. The following table gives an overview of some possible commands:

Number Description  
00h..03h ?
04h Get current mouse cursor position.
05h Set current mouse cursor position.
06h Get data length in host's clipboard.
07h Read data from host's clipboard.
08h Set data length to send to host's clipboard.
09h Send data to host's clipboard.
0Ah Get VMware version.
0Bh Get device information.

In total, there are at least 15 implemented commands.

Register DX stores the I/O backdoor port, and with the help of the IN instruction, the backdoor command finally gets executed. It is clear that with the help of the VMware I/O backdoor it is possible to interfere with a running VMware.

With the help of Kostya Kortchinsky's patch, you can change the magic number and thus "hide" the backdoor from an attacker. More information about the backdoor in VMware is also available. [ref 10]

Detecting additional lines of defense: chroot and jails
chroot() was never designed for security, but it is considered to be a necessity when one wants to protect a sensitive server. Detecting that you are in a chroot environment , or even circumventing it, is not really that difficult.

Unless the chroot directory is on a specific partition, and placed at the top of it, the inode numbers are not those expected of a real root directory:

# ls -ial /
2 drwxr-xr-x  24 root root  4096 2004-11-30 08:14 .
2 drwxr-xr-x  24 root root  4096 2004-11-30 08:14 ..

Here, the directories inodes of . and .. are the same, and are equal to 2 (which is the normal value for a root directory on a partition). In the current directory, we have:

# ls -ail .
1553552 drwxr-xr-x  6 raynal users  4096 2004-12-14 13:58 .
6657574 drwxr-xr-x  6 raynal raynal  4096 2004-12-12 16:25 ..

Then, when we chroot a shell in the current directory, we retrieve the same inodes numbers:

# chroot . /bin/busybox
BusyBox v0.60.5 (2004.10.29-22:08+0000) multi-call binary
# ls -ial
1553552 drwxr-xr-x   6 1000    100       4096 Dec 14 12:58 .
1553552 drwxr-xr-x   6 1000    100       4096 Dec 14 12:58 ..

While the .. has been changed to match the . directory, it is still not the expected value.

Note that there is much more to do in a chroot. For instance, you can send signals to any process outside the chroot(), or even attach to outside processes with ptrace(). Since ptrace() can be executed from inside the chroot on any process that is outside the chroot(), the attacker has an easy way to inject whatever he wants on the host. Such evasions are also possible through mount(), fchdir(), sysctl() and so many others [ref 11].

When we think about virtual environments and security, it's pretty clear that chroot() is definitely not something to rely upon. Another option to enforce confinement provided by FreeBSD, which is based on chroot() but is more reliable, is the jail(). A jail() let you create a virtual host, bound to an IP address, with its own tools, users, and more. It is very convenient for virtual hosting, and it could be used for honeypots too.

However, even though FreeBSD's jail() is more reliable, it is not really much more covert. There are several tests one can perform to detect if you are in a jail:

All processes in a jail have a specific 'J' flag, as shown below:
    jail# ps
    6908  p0  SJ    0:00.02 /bin/sh
    6910  p0  R+J   0:00.00 ps

You can also have a look at the PIDs as they do not increase in the usual way.

The inode number of the root directory is not 2 as expected.
By default, raw sockets are forbidden as you can see:
    jail# ping -c 3 miscmag.com
    ping: socket: Operation not permitted

Note that it is now configurable in the latest version of FreeBSD.

Sniffing in a jail gives access to all the traffic that comes through the device. This is normal since a jail is usually built as an alias on a real device.
There are surely more ...
In this section, we focused on detecting if we were in a confined environment with chroot() and jail(). However, are these really even issues for a hacker inside a honeypot? Learning that we are on a "restricted host" is not all that important anymore, as such systems are spreading all across the Internet. The real issue here deals with the leaking of security from the guest to the host. And currently, there are very few (if any) systems out there that have proved to be well enough confined.

Concluding part one
In the first of this two-part series, we compared honeypots to steganography and then looked at three common techniques for virtualizing honeypots. For each of these methods, which included User Mode Linux, VMware environments, and chroot/jail environments, we looked at weaknesses that lead to their detection. It's clear that while each of these have their advantages, they can all be easily detected by an experienced hacker.
Next time, we'll continue our look at honeypot virtualization tools by discussing the Sebek data capture tool in detail, along with some of the ways it too can be detected. Then we'll discuss some other techniques available for detecting honeypots, such as x86-specific ones and time based analysis. Stay tuned.