Getting started with syscalls

What is a system call?

If you look online for MSDN documentation around the word syscall or system call, you might come up empty handed. You might even hit this page thinking you found something of interest. Nope. Not even close. Perhaps some of the best, formal documentation is found in the Windows Internals books when describing system service calls, trapping, and overall system service handling. Windows Internals Chapter 8: System mechanics talks about this, in great detail. Something I might cover way later near the end of this thread.

In general, a system call is an interrupt that will interrupt the system (kernel) and invoke a service routine according to an index value. This value is what is commonly referred to as the system service call number, or system service number, or syscall ID. The kernel will handle the dispatching of the routine.

Perhaps the best source for this is from the Intel Software Developer Manual where it is defined that a syscall is a fast system call to privilege level 0 system procedures.

What generates a system call?

For our purposes here in this thread, syscall is interchangeable with system call. From here on out I will just use the word syscall. These are generated by invoking the instruction syscall or the int 2e instruction. These instructions are heavily found in two low level DLLs: ntdll.dll and win32u.dll.

Where do syscalls come from?

GUI vs Native

Short answer: it depends. Longer answer, a syscall can from from GUI applications that depend on win32u.dll or Native applications where the only dependency is ntdll.dll.

win32u.dll

This module is the lowest level DLL for all GUI applications or rather GUI threads to be the most technically accurate. Something that will become clearer later on is this DLL has a system call table identifier of 0x20. This module implements many syscalls like win32u!NtUserCloseClipboard or win32u!NtUserOpenClipboard. Bottom line here is practically anything that happens from a GUI window will trickle its way down to this module.

On my Dev VM, my version of win32u.dll has 5,411 syscalls implemented.

ntdll.dll

This module is the lowest level DLL for all native functions that are not related to GUIs. Typically you will see a native function with the two-letter prefix Nt, but don’t be fooled, Nt functions are also implemented inside win32u.dll. Bottom line here is anything stemming from native programs, like programs that execute early on in the boot process (the session manager: smss.exe) will trickle its way down to this module.

This graphic shows a nice represenation of the flow a syscall can take depending on the process (GUI vs Native).

How is a syscall structured?

The generic format

All syscalls, no matter if they come from win32u.dll or ntdll.dll, will all have the same kind of structure. You might even think of it as a signature, a signature that can be scanned for in memory just like what is done in a lab for Day 5, or Section 5 for OnDemand students :grin:. Here is what the format looks like:

4c8bd1           mov     r10, rcx
b8c3100000       mov     eax, <some number here>
f604250803fe7f01 test    byte ptr [7FFE0308h], 1
7503             jne     <module_name>!<Some Nt function>+0x15
0f05             syscall 
c3               ret     
cd2e             int     2Eh
c3               ret     
0f1f840000000000 nop     dword ptr [rax+rax]

Let’s break it down a bit more…

mov r10, rcx

For x64 CPUs and of course x64 code, the first parameter is typically found in RCX. You will see that RCX is moved into R10 and this is obviously intentional.

Why is this intentional?

Because the code can eventually execute the syscall instruction, it has to get rid of the RCX value and move it into R10. Further, syscall will destroy the RCX register and load it with the return address. If the MOV is not done, the first parameter will be lost. You can’t use R11 either since it will be clobbered by syscall so the last best option was R10 to hold the first parameter as code transitions into Ring 0.

What’s next?

Let’s continue breaking it down…

mov     eax, <some number here>

<some number here> will be reserved for the number of the syscall to be “called”. Again, this is really an index into a table that we can take a look at later in this series of threads using a kernel debugger.

Here is one full example for a syscall from win32u.dll:

4c8bd1           mov     r10, rcx
b8c3100000       mov     eax, 10C3h

For the function win32u!NtUserOpenClipboard the above number is tied to that function.

I’m going to skip…

test    byte ptr [7FFE0308h], 1
jne     <module_name>!<Some Nt function>+0x15

… for now just because that is getting too deep too soon for where I want this thread.

The next thing that will happen is that syscall will be executed.

The flow to a syscall

Native processes

We already saw a very high level graphic depicting the flow for syscalls, but now let’s dive into a real example on the native side of things, since that’s the most common in our world of implant dev.

Allocating memory

Let’s say that you want to allocate a page or so of memory for shellcode execution or the manual mapping of a COFF object, EXE image or DLL image. To do so, you’d have to call VirtualAlloc (staying inside the local process), the highest level API we can call, implemented in kernelbase.dll, but ultimately forwarded to ntdll.dll.

Let’s check it out on the C-side.

INT
__cdecl
main(VOID)
{
  // alloc a single page
  LPVOID pBuffer = VirtualAlloc(NULL, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
  // error check here
  // do shady things
  // cleanup, overwrite with garbage data
  VirtualFree(..);
  pBuffer = LPVOID();
  // go home
  return ERROR_SUCCESS;
}

It’s a simple program that allocates one page of memory. Now we can follow this in WinDbg… the best debugger on the planet.

Tip

Launch your executable under WinDbg (e.g. windbg -o your.exe) so you break in before any code runs. Then set your breakpoint and go.

I launched a new executable under WinDbg and set a BP on kernelbase!VirtualAlloc using the command: bp kernelbase!VirtualAlloc

F5 the program and let the BP hit.

The BP will get hit and at that time you can see what calls it makes. We are looking for some Nt prefixed function in the output.

0:000> uf /c kernelbase!VirtualAlloc
KERNELBASE!VirtualAlloc (00007ffc`b1bb18a0)
  KERNELBASE!VirtualAlloc+0x41 (00007ffc`b1bb18e1):
    // bingo!
    call to ntdll!NtAllocateVirtualMemory (00007ffc`b402d060)
  KERNELBASE!VirtualAlloc+0x5e (00007ffc`b1bb18fe):
    call to KERNELBASE!BaseSetLastNTError (00007ffc`b1b7b300)
  KERNELBASE!VirtualAlloc+0x4e30d (00007ffc`b1bffbad):
    call to ntdll!RtlSetLastWin32Error (00007ffc`b3fe0770)

Cool. Now we can jump into NTDLL and take a look from there.

Tip

From here, set a BP on that routine or just hit TC (Trace to next Call) to single-step until the first call instruction—you’ll land at the ntdll stub.

Here is what things will look like at the BP.

ntdll!NtAllocateVirtualMemory:
00007ffc`b402d060 4c8bd1           mov     r10, rcx. // <--- first param
00007ffc`b402d063 b818000000       mov     eax, 18h  // <--- syscall number
00007ffc`b402d068 f604250803fe7f01 test    byte ptr [7FFE0308h], 1
00007ffc`b402d070 7503             jne     ntdll!NtAllocateVirtualMemory+0x15 (7ffcb402d075)
00007ffc`b402d072 0f05             syscall 
00007ffc`b402d074 c3               ret     
00007ffc`b402d075 cd2e             int     2Eh
00007ffc`b402d077 c3               ret     
00007ffc`b402d078 0f1f840000000000 nop     dword ptr [rax+rax]

You can continue to single step up to the syscall if you’d like. You won’t be able to jump with the syscall into the kernel just yet, but we will dive into that later and take this all the way home!

Note

Inspecting the call stack with k at this point shows the full path from your code → KERNELBASE → ntdll. That chain is exactly what we’re walking in this chapter.

One thing I like to check out around this time is the call stack. Run k to show the call stack up to this point.

0:000> k
 # Child-SP          RetAddr               Call Site
00 000000f4`8970fb18 00007ffc`b1bb18e8     ntdll!NtAllocateVirtualMemory
01 000000f4`8970fb20 00007ff7`63a91e84     KERNELBASE!VirtualAlloc+0x48
[..SNIP..]
05 000000f4`8970fb60 00007ff7`63a9408c     Shellcode!main+0x254
[..SNIP..]
07 000000f4`8970fc80 00007ffc`b3fe2651     KERNEL32!BaseThreadInitThunk+0x14
08 000000f4`8970fcb0 00000000`00000000     ntdll!RtlUserThreadStart+0x21

Cool. Now we can start to dive a bit deeper into things over the next several chapters.

What is a direct syscall?

Warning

Wrapper routines in kernel32.dll and kernelbase.dll can be hooked by EDRs. Bypassing them with direct syscalls avoids those hooks but introduces other telltales (e.g. return address) that we’ll cover when we talk about indirect syscalls.

Over in The flow to a syscall, you see what are viewed as wrappers that wrap around direct syscalls. Some of the wrapping could be seen as bloat or unwanted overhead in a program. Also, those wrapper routines can be subject to user mode hooks, but let’s leave hooks out of this for now.

An option that can be done is to avoid those wrapper functions that are typically found in kernel32.dll and kernelbase.dll, to name a few, and just go directly to the syscall itself in ntdll.dll. This action is how the technique named direct syscalls was born. Skip all of the higher level stuff and go directly to the lowest level possible.

Syscall stub

We have already seen the format of a syscall and many folks simply call it a stub that prepares for the transition from ring 3 to ring 0, since a Windows installation only configures Intel CPUs to use those two rings (there are rings 1 and 2). The transition of rings is not done until the syscall instruction is executed. At that point in time, user mode code is left behind and the system will make the transition from ring 3 to ring 0. This will also change something called the Current Privilege Level (CPL) to 0.

The syscall stub pattern

When implant developers want to perform a direct syscall, many times a search is implemented looking for the following sequence of bytes:

// the start of the syscall stub
4c 8b d1 XX XX 00 00 

// the end of the stub
0f 05 c3 cd 2e c3

Note

On older Windows builds the byte pattern above can match a full stub. On recent builds, the stub includes extra instructions (the test/jne we cover in the next chapter), so pattern length and offsets may differ.

At the root of it, that is not really a complete stub, but on some older versions of Windows and ntdll.dll, it is. On more recent versions of Windows, there are some missing instructions from what is shown above. Those missing instructions are the following:

test    byte ptr [7FFE0308h], 1
jne     ntdll!<Some Nt Function>+0x15

Let’s dive into those instructions in the next chapter.

The test instruction

What is at address `7FFE0308h`?

For those who have sat the class, you know exactly what that address is. For those that haven’t sat that class, this is looking into an undocumented structure. If you look at 308h as an offset into a struct, you’d be left with 7FFE0000h. That user mode address is the static memory location for the _KUSER_SHARED_DATA struct. Here is a small snippet of that struct dumped from WinDbg on a Windows 10 system without Virtual Based Security enabled.

0: kd> dt ntdll!_KUSER_SHARED_DATA 0x7ffe0000
   +0x000 TickCountLowDeprecated 0 : Uint4B
   +0x004 TickCountMultiplier 0xfa00000 : Uint4B
   +0x008 InterruptTime    : _KSYSTEM_TIME
   +0x014 SystemTime       : _KSYSTEM_TIME
   +0x020 TimeZoneBias     : _KSYSTEM_TIME
   +0x02c ImageNumberLow   0x8664 : Uint2B
   +0x02e ImageNumberHigh  0x8664 : Uint2B
   +0x030 NtSystemRoot     "C:\Windows": [260] Wchar
[...SNIP...]
   +0x308 SystemCall      0 : Uint4B

Note

_KUSER_SHARED_DATA is read-only for user mode and is mapped at the same address in every process. Exploit and implant code often reads it to detect VBS or to find kernel-related hints.

It’s a massive structure that is often abused by exploit devs and implant devs. One of the interesting fields is at offset 308h, SystemCall.

`_KUSER_SHARED_DATA.SystemCall`

Back in the day, this field used to hold the address that was going to be executed in the kernel. This became abused and now the meaning has changed for it. Currently, the value will be either clear (0) or set (1), and is a way to detect if Virtual Based Security is enabled for a system. If it is, then the TEST instruction will result in TRUE, the EFLAGS will be updated accordingly, and the JMP will be taken. The JMP will then land on the INT 2Eh instruction. This also means that there are Virtual Trust Levels and the syscalls will now happen in VTL0.

Tip

The SYSCALL instruction is measurably faster than INT 2Eh by about one clock tick. When VBS is off and both paths are valid, the kernel uses SYSCALL; when VBS is on, the stub uses INT 2Eh and VTL0.

Interrupts

What is INT 2Eh?

Note

The IDT is per-processor and lives in kernel space. You need a kernel debugger (e.g. !idt in WinDbg) to inspect it; user-mode debuggers cannot read it.

The INT instruction invokes something called an interrupt that kind of “halts” the system to “wake up” the kernel’s interrupt handler routine. The value given to the instruction is an index into a table called the Interrupt Descriptor Table, or IDT. Every system is going to have one of these tables and you can see them with a kernel debugging session. Here is a dump of the IDT on a Windows 10 Dev-VM built for SEC670.

0: kd> !idt
Dumping IDT: fffff804774f2000
00:    fffff80474a13800 nt!KiDivideErrorFault
[..SNIP..]
03:    fffff80474a14500 nt!KiBreakpointTrap
[..SNIP..]
1f:    fffff80474a0ce40 nt!KiApcInterrupt
[..SNIP..]
2f:    fffff80474a0efe0 nt!KiDpcInterrupt

There are some interesting ones there in this snippet like dividing by 0, a breakpoint getting hit, APC interrupts, DPC interrupts. Whenever one of those events happen, like a divide by 0, an interrupt will be triggered and INT 00h will be the underlying entry for that. You can also see the address of the routine that will handle an interrupt.

Caution

Hooking IDT entries was a common technique in the past. Modern systems (e.g. with HVCI/VBS) make this much harder and less reliable; consider it historical context, not a recommended approach.

Back in the day, people loved to hook entries in the IDT, but that’s not really a thing anymore.

What’s missing from the above snippet is the entry 2Eh. This one is missing because even though Hyper-V is enabled, VBS is not. Thus, there will be no corresponding entry for it built into the table. It will never get hit and there’s no need to have a handler routine for it. Simple as that.

Building the IDT

The IDT is built when the kernel is being initialized in the routine nt!KiInitializeKernel. Also in that massive routine, the _KUSER_SHARED_DATA structure is filled out by the nt!KiInitializeBootStructures routine.

Indirect syscalls

Cons of direct syscalls

As a refresher, direct syscalls are being invoked from within our own EXE. Cool, we can maybe get past some EDRs but there is a downside to this. We can be caught because the return address is not going to be located where it should be; ntdll.dll. > [!CAUTION]

This check can happen when the kernel is done doing its thing. Inside _KPROCESS.InstrumentationCallback is a pointer to the routine invoked after SYSRET. If the return address is inside your EXE instead of ntdll, you can be flagged as using direct syscalls.

This check can happen when the kernel is done doing its thing. Inside the _KPROCESS.InstrumentationCallback field, is a pointer to the routine that will be invoked after the SYSCALL is done (SYSRET). The check can be quite simple and if you know how to parse a PE image, then I’m sure you can already think of the simple 3 or so lines of C++ code to get that done, like so:

const auto ImageBase = NtCurrentPeb()->ImageBaseAddress;
const auto NtHeaders = RtlImageNtHeader(ImageBase);
if ((retaddr >= ImageBase) && (retaddr < ImageBase + NtHeaders->OptionalHeader.SizeOfImage))
{
    // retaddr is within the EXE
}

What are indirect syscalls?

Instead of having the SYSCALL instruction coming from within our own EXE’s image, we need to have it come from within NTDLL. Like any normal SYSCALL would look like. To help with this, Bouncy Gate and Recycled Gate were made. There are too many gates.

Implementation

Tip

With indirect syscalls you don’t execute the SYSCALL from your own code—you find the SYSCALL instruction inside ntdll and jump to it. The transition to kernel therefore appears to come from ntdll, which looks like normal execution.

Here, instead of calling the SYSCALL ourselves, we jump to it. We find the address of the SYSCALL instruction inside of NTDLL and JMP there. JMPs and CALLs are indirect instructions. In addition to the gates mentioned above, SysWhispers3 also uses indirect syscalls, just like Cobalt Strike’s BOFs do too.

Cons about indirect syscalls

Warning

EDRs are increasingly checking not only the return address but also where the syscall was invoked from. If the call site is still your EXE (even if you JMP to ntdll for the actual instruction), you may need call-stack spoofing or a different execution context (e.g. a new thread with a faked stack) to blend in.

One of the downsides is that EDRs are catching on here. In addition to checking the return address of them, they are now looking at where they came from, EXE or NTDLL. So now, we have to fake where they are coming from with a new thread and then spoofing the call stack.

The `SYSCALL` instruction

Let’s get stuck in

The absolute best place to understand what is really happening with this instruction is to look at the Intel manual for it. Let’s take a look at this right from there:

“SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (after saving the address of the instruction following SYSCALL into RCX). (The WRMSR instruction ensures that the IA32_LSTAR MSR always contain a canonical address.)”

Things to note

OK, so we have some things to take away from that:

System-call handler
IA32_LSTAR MSR
Return address saved in RCX

Let’s keep going:

“SYSCALL also saves RFLAGS into R11 and then masks RFLAGS using the IA32_FMASK MSR (MSR address C0000084H); specifically, the processor clears in RFLAGS every bit corresponding to a bit that is set in the IA32_FMASK MSR. SYSCALL loads the CS and SS selectors …”

OK, that isn’t entirely useful anymore. We have what we need to move on with this.

System-call handler

From the notes above, we need to know what this routine is if we want to dive into this any deeper. To find out what the handler is for IA32_LSTAR MSR, we need to be in a kernel debugger session.

Note

Reading MSRs requires CPL0 (kernel mode). In the debugger you use rdmsr; in a kernel driver you’d use the __readmsr compiler intrinsic.

0: kd> rdmsr C0000082H
msr[c0000082] = fffff804`74a1a2c0

We now have the address of the system-call handler, but what is it really? Let’s find out back in WinDbg. To do this, we want to use the ln command to list the nearest symbol at that address.

0: kd> ln fffff804`74a1a2c0
Browse module
Set bu breakpoint

(fffff804`74a1a2c0)   nt!KiSystemCall64   |  (fffff804`74a1a500)   nt!KiSystemServiceUser
Exact matches:

For this remote system, VBS is not enabled and as such, the handler is nt!KiSystemCall64.

Handling syscalls

Discovering the syscall handler

From the previous chapter, The SYSCALL instruction, we saw how to obtain the handler that services all syscalls coming from userland. Since we still have a kernel debugging session up and running, we can disassemble the function and also set a breakpoint on it.

Breakpoints

Caution

Do not set a breakpoint on nt!KiSystemCall64 itself. The first instruction is SWAPGS, and RSP still points at userland until the next two instructions run. Hitting a BP there can crash or BSOD the VM. Use bp nt!KiSystemCall64+15h so you break after the kernel stack is set up (e.g. after the push 2Bh).

The first instinct might have you set a BP right at the function itself like so: bp nt!KiSystemCall64. Not a bad idea at first and I did this as well. After BSODing my VM a few times after setting that BP, I realized that this isn’t your normal function with a proper function prolog. Typically, a proper function prolog will set up the stack frame, make sure RSP is taken care of before execution of that function proceeds.

The first instruction here is a SWAPGS which is insanely critical for syscall handling and is also the reason the VM becomes unstable and ultimately BSODs when setting a BP on that instruction. After a bit more experimentation, I found that setting a BP at nt!KiSystemCall64+15h was more stable and reliable.

Let’s look at the first several instructions for the syscall handler on this VM.

nt!KiSystemCall64:
0f01f8                         swapgs      // BP here crashes the VM
654889242510000000             mov     qword ptr gs:[10h], rsp
65488b2425a8010000             mov     rsp, qword ptr gs:[1A8h]
6a2b                           push    2Bh  // BP here, all is good!

Setting up `RSP`

If you look at the disassembly, you will see that RSP isn’t properly set up until the second MOV instruction: MOV RSP, QWORD PTR GS:[1A8h]. This is why setting a BP any time before, or even on that instruction, creates instability. At the first MOV instruction, RSP will still be pointing to a userland address.

Note

Using that userland RSP in the kernel (CPL0) is unsafe—writes could corrupt user data or cause a bugcheck. The handler deliberately switches to the kernel stack with the second MOV before doing anything else. The GS segment offsets (10h, 1A8h) are part of the kernel’s per-CPU data; we’ll cover SWAPGS and that layout in a later chapter.

This is not good now that we are in the kernel with CPL0. The offsets you see for the GS segment register won’t really make sense yet. For that, I will dedicate a chapter for the SWAPGS instruction next.

KiSystemServiceRepeat

The purpose

At some point, the code flow will make its way down to KiSystemServiceRepeat after the trap frame has been established in KiSystemServiceStart. One of the first things KiSystemServiceRepeat does is grab pointers to 2 of 3 arrays that are used to keep track of system service tables. The 3 arrays are KeServiceDescriptorTable, KeServiceDescriptorTableShadow, and KeServiceDescriptorTableFilter. Inside each of the tables, there will be a couple of entries where each entry will hold some useful information like the following:

A pointer to the array of system calls that are implemented in that table
How many system calls are present in that table
A pointer to an array of byte arguments for each of the system calls in that table

Like most things, this data is held within a structure called _SYSTEM_SERVICE_TABLE.

typedef struct _SYSTEM_SERVICE_TABLE {
    PULONG      ServiceTable;   // pointer to array of system calls in this table
    PULONG_PTR  CounterTable;
    ULONG_PTR   ServiceLimit;   // how many system calls are present in this table
    PBYTE       ArgumentTable;  // pointer to array of byte arguments for each system call
} SYSTEM_SERVICE_TABLE;

Argument table

The argument table is very important. Its purpose is to tell the kernel how many arguments it needs to find on the user-mode stack. It will then take those arguments and bring them into the kernel’s stack.

In the debugger

In the kernel debugger, you can easily dump the table and see the pointers as well as some of the other data from the structs. First off, grab the address of the table: x nt!KiServiceTable. You can use that symbol and index your way into the table to find things like arguments for a syscall. Like so:

dx (((int*)&(nt!KiServiceTable))[1] & 0xf)
(((int*)&(nt!KiServiceTable))[1] & 0xf) : 0

dx (((int*)&(nt!KiServiceTable))[2] & 0xf)
(((int*)&(nt!KiServiceTable))[2] & 0xf) : 2  // 2 stack args

It’s quite a manual process doing that especially if there are hundreds of syscalls in that table.

Tip

WinDbg’s dx (debugger data model) can iterate the service table, apply the RVA shift, and resolve symbols in one expression. The examples below dump the table and then resolve to function names—handy when you’re exploring the full SSDT.

It would be much better to use the true power of WinDbg and its debugger data model like so:

// make a pseudo variable
dx @$table = &nt!KiServiceTable
@$table = &nt!KiServiceTable : 0xfffff8012dee4eb0 [Type: void *]
// dump the table and shift right by 4, then add to base of the table
dx (((int(*)[90000])&(nt!KiServiceTable)))->Take(*(int*)&nt!KiServiceLimit)->Select(x => (x >> 4) + @$table)
    [0]              : 0xfffff8012e14b650 [Type: void *]
    [1]              : 0xfffff8012e155ce0 [Type: void *]
    [2]              : 0xfffff8012e506e10 [Type: void *]
    [3]              : 0xfffff8012e6f1640 [Type: void *]
    [4]              : 0xfffff8012e435a40 [Type: void *]
    [5]              : 0xfffff8012e218710 [Type: void *]
    [6]              : 0xfffff8012e419f60 [Type: void *]

The above output is cool and all but we can do better. The pointers can be resolved and we can see what lies behind them; the syscalls!

Let’s again leverage the number of syscalls in the table (ServiceLimit) and dump everything, but then resolve the symbolic names.

dx (((int(*)[90000])&(nt!KiServiceTable)))->Take(*(int*)&nt!KiServiceLimit)->Select(x => @$dumpit((x >> 4) + @$table))
    [0]              : nt!NtAccessCheck (fffff801`2e14b650)
    [1]              : nt!NtWorkerFactoryWorkerReady (fffff801`2e155ce0)
    [2]              : nt!NtAcceptConnectPort (fffff801`2e506e10)
    [3]              : nt!NtMapUserPhysicalPagesScatter (fffff801`2e6f1640)
    [4]              : nt!NtWaitForSingleObject (fffff801`2e435a40)
    [5]              : nt!NtCallbackReturn (fffff801`2e218710)
    [6]              : nt!NtReadFile (fffff801`2e419f60)

Note

This is only the table for service syscalls that come from ntdll.dll (native). GUI-related syscalls from win32u.dll use a different table (e.g. table ID 0x20) and dispatch into win32k.sys. You can explore that path the same way in the debugger.

I leave that as an exercise to you all!

Checking the syscall index

One of the next actions KiSystemServiceRepeat does is check to see if the syscall index is beyond this table. Remember, every table has a ServiceLimit that indicates how many syscalls there are. Let’s see how it computes this in code.

cmp eax, [r10+rdi+10h]

EAX holds the syscall index. R10 holds the table. RDI holds the table ID which could be one of four values from 00b to 11b. Obviously 10h is just 16 in decimal. So, the check here is accessing an offset into the KeServiceDescriptorTable to find the ServiceLimit.

dps nt!KeServiceDescriptorTable
fffff801`2ec1e8c0  fffff801`2dee4eb0 nt!KiServiceTable
fffff801`2ec1e8c8  00000000`00000000
fffff801`2ec1e8d0  00000000`000001d7  // this is the ServiceLimit
fffff801`2ec1e8d8  fffff801`2dee5610 nt!KiArgumentTable

This can be validated because the ServiceLimit is a global symbol so check it out.

dc nt!KiServiceLimit l1
fffff801`2dee560c  000001d7  // it's a match!!

After this check, a jump will be taken if it’s out of range, meaning it is not in this table. Then when it jumps, the thread will be converted to a GUI thread by calling KiConvertToGuiThread. This is what that code block looks like:

mov     [rbp-80h], eax
mov     [rbp-78h], rcx
mov     [rbp-70h], rdx
mov     [rbp-68h], r8
mov     [rbp-60h], r9
call    KiConvertToGuiThread
or      eax, eax
mov     eax, [rbp-80h]
mov     rcx, [rbp-78h]
mov     rdx, [rbp-70h]
mov     r8, [rbp-68h]
mov     r9, [rbp-60h]
mov     [rbx+90h], rsp
jz      KiSystemServiceRepeat   // start the process all over after the conversion

Eventually, the jump will not be taken and we will wind up in this code block:

// KeServiceDescriptorTable + tableID (00h or 20h) = nt!KiServiceTable
mov     r10, [r10+rdi]
// movsxd will move a 32-bit number into 64-bit register keeping the sign
// so negative number will stay negative
// nt!KiServiceTable + syscallIndex * 4
// syscall index 23h - NtQueryVirtualMemory
// dd /c1 nt!KiServiceTable + 23h * 4 l1 = fffff801`14cdcf3c  0557bd02 <-- RVA!
movsxd  r11, dword ptr [r10+rax*4]
// RAX - RVA of syscall - 0557bd02

This RVA is interesting. The first byte holds the number of stack args. In this instance, it says 2. So, the kernel will be copying 2 values from the user mode stack. This is why the RVA is shifted right by 4 bits, to skip over this value.

mov     rax, r11
// RVA >> 4 = 557bd0
sar     r11, 4
// nt!KiServiceTable + RVA = fffff801`15234a80
// ln fffff801`15234a80
// (fffff801`15234a80) nt!NtQueryVirtualMemory <-- found it!!
add     r10, r11
cmp     edi, 20h ; ' '
jnz     short loc_140408ED0

After a few more checks are done, the syscall is eventually invoked using an indirect call like this:

mov rax, r10
call rax

At this time, all the proper registers RCX, RDX, R8, R9 have been populated with the proper syscall args, and the user mode stack was copied to the kernel stack.

The end

Thanks for sticking with me

Tip

The best way to internalize this is to follow along in your own kernel debugger (local or remote VM). Set breakpoints, single-step through the handler, and inspect the service tables—you’ll have the full picture from user mode to kernel and back.

Thanks a ton for sticking with me through this journey of an incredible deep dive into syscalls. If you follow along in a kernel debugger, you will have truly mastered the entire flow from user mode to kernel mode.

I hope you learned something along the way!

Keyboard shortcuts

syscalls