Call Stack Validation: Detecting API Calls from Unbacked Memory
This post covers how Peregrine validates call stacks inside its injected DLL hooks. Every time a hooked API is called, the validation code captures the return address chain, checks each frame against a cache of known module address ranges, and flags any return address that lands in unbacked executable memory. The technique catches both direct shellcode callers and call stack spoofing attempts where attackers manipulate return addresses to disguise the true origin of an API call.
Why Call Stacks Matter for Detection Evasion
When an anti-cheat or EDR hooks an API like OpenProcess or WriteProcessMemory, the hook can inspect the return addresses on the call stack to determine who made the call. A return address inside game.exe or kernel32.dll looks normal. A return address pointing into a PAGE_EXECUTE_READWRITE allocation with no backing file is suspicious.
Call stack spoofing is the evasion technique that targets this exact check. Cheats manipulate the stack frames before calling a hooked API so the return addresses appear to originate from legitimate modules. The goal is to make the call look like it came from ordinary code rather than from injected shellcode or a manually mapped DLL [1].
Peregrine’s approach treats any return address in unbacked executable memory as a detection signal. This complements the VAD scanning that finds the unbacked memory regions themselves, and the shellcode thread analysis that catches threads with start addresses outside loaded modules. Call stack validation catches the moment that unbacked code actually calls a sensitive API.
Building a Module Range Cache with VirtualQuery
The first step is knowing where all the legitimate modules live. The callstack.c module maintains a sorted array of [base, end) intervals covering every MEM_IMAGE region in the process. MEM_IMAGE regions are memory-mapped sections of on-disk PE files, as set by the Windows loader [2]. Anything that is not MEM_IMAGE was allocated at runtime.
The cache is built by walking the entire virtual address space with VirtualQuery [3]:
// From: PeregrineDLL/callstack.c
typedef struct {
ULONG_PTR base;
ULONG_PTR end;
} ModuleRange;
#define MAX_MODULE_RANGES 512
#define CACHE_REFRESH_MS 5000
static ModuleRange g_modules[MAX_MODULE_RANGES];
static volatile LONG g_module_count = 0;
static volatile LONG g_cache_lock = 0;
static volatile ULONGLONG g_cache_last_refresh = 0;
static void refresh_module_cache(void) {
if (InterlockedCompareExchange(&g_cache_lock, 1, 0) != 0)
return;
ModuleRange tmp[MAX_MODULE_RANGES];
int count = 0;
MEMORY_BASIC_INFORMATION mbi;
ULONG_PTR addr = 0;
while (VirtualQuery((LPCVOID)addr, &mbi, sizeof(mbi)) == sizeof(mbi)) {
if (mbi.Type == MEM_IMAGE && count < MAX_MODULE_RANGES) {
ULONG_PTR rBase = (ULONG_PTR)mbi.BaseAddress;
ULONG_PTR rEnd = rBase + mbi.RegionSize;
if (count > 0 && tmp[count - 1].end == rBase) {
tmp[count - 1].end = rEnd;
} else {
tmp[count].base = rBase;
tmp[count].end = rEnd;
count++;
}
}
ULONG_PTR next = (ULONG_PTR)mbi.BaseAddress + mbi.RegionSize;
if (next <= addr) break;
addr = next;
}
qsort(tmp, count, sizeof(ModuleRange), range_cmp);
memcpy(g_modules, tmp, count * sizeof(ModuleRange));
InterlockedExchange(&g_module_count, count);
g_cache_last_refresh = GetTickCount64();
InterlockedExchange(&g_cache_lock, 0);
}
A few design choices worth noting:
- Adjacent regions are coalesced. A single DLL is typically represented by multiple
MEM_IMAGEregions (one per PE section:.text,.rdata,.data, etc.) [4]. The code merges consecutive regions into a single[base, end)range so the final array has one entry per module, not one per section. - The cache is sorted and searched with binary search. This keeps lookup cost at O(log n) rather than linear, which matters because every hooked API call checks every frame in the captured stack.
- The spinlock uses InterlockedCompareExchange. If another thread is already refreshing the cache, the current thread simply skips the refresh rather than blocking. A stale cache (at most 5 seconds old) is acceptable; a deadlock is not.
- The 5-second refresh interval balances accuracy against performance. A new DLL loaded via LoadLibrary will appear in the cache within 5 seconds. If the cache is stale during a check, the validation code triggers an immediate refresh before declaring a detection (described below).
Binary Search Over Module Ranges
Each return address is tested against the sorted module array:
// From: PeregrineDLL/callstack.c
static int is_in_module(ULONG_PTR addr) {
LONG count = g_module_count;
int lo = 0, hi = count - 1;
while (lo <= hi) {
int mid = (lo + hi) / 2;
if (addr < g_modules[mid].base)
hi = mid - 1;
else if (addr >= g_modules[mid].end)
lo = mid + 1;
else
return 1;
}
return 0;
}
This is a standard binary search over non-overlapping intervals. A return value of 1 means the address falls within a known file-backed module. A return value of 0 means it does not.
The Core Check: Walking Frames with RtlCaptureStackBackTrace
The callstack_check function is called from every hook in dllmain.cpp. It captures the call stack, walks each frame, and determines whether the return addresses belong to legitimate modules:
// From: PeregrineDLL/callstack.c
#define MAX_FRAMES 32
#define SKIP_FRAMES 2
int callstack_check(const char* hook_name) {
if (!rate_limit_allow(hook_name))
return 0;
ULONGLONG now = GetTickCount64();
if (now - g_cache_last_refresh > CACHE_REFRESH_MS)
refresh_module_cache();
void* frames[MAX_FRAMES];
USHORT count = RtlCaptureStackBackTrace(SKIP_FRAMES, MAX_FRAMES, frames, NULL);
for (USHORT i = 0; i < count; i++) {
ULONG_PTR addr = (ULONG_PTR)frames[i];
if (addr == 0) continue;
if (!is_in_module(addr)) {
refresh_module_cache();
if (!is_in_module(addr)) {
MEMORY_BASIC_INFORMATION mbi;
if (VirtualQuery((LPCVOID)addr, &mbi, sizeof(mbi)) == sizeof(mbi)) {
DWORD prot = mbi.Protect & 0xFF;
int executable = (prot == PAGE_EXECUTE ||
prot == PAGE_EXECUTE_READ ||
prot == PAGE_EXECUTE_READWRITE ||
prot == PAGE_EXECUTE_WRITECOPY);
if (mbi.Type == MEM_PRIVATE && executable) {
ipc_log_event("CallstackAnomaly",
"\"hook\":\"%s\",\"address\":\"0x%llX\","
"\"frameIndex\":%u,\"regionBase\":\"0x%llX\","
"\"regionSize\":%llu,\"protect\":\"0x%lX\"",
hook_name,
(unsigned long long)addr,
(unsigned int)i,
(unsigned long long)(ULONG_PTR)mbi.BaseAddress,
(unsigned long long)mbi.RegionSize,
(unsigned long)mbi.Protect);
return 1;
}
}
}
}
}
return 0;
}
RtlCaptureStackBackTrace is the Windows API that walks the call stack and fills an array of return addresses [5]. The SKIP_FRAMES parameter is set to 2, which skips the frame for callstack_check itself and the hook trampoline. The remaining frames represent the actual callers leading up to the hooked API.
The detection logic uses a three-stage filter for each frame:
- Fast path: binary search against the module cache. If the address falls inside a known module, the frame is clean. Move on.
- Cache miss recovery: force a cache refresh. If the address is not in any cached module range, the cache might be stale (a DLL loaded in the last 5 seconds). A forced refresh and re-check eliminates false positives from newly loaded modules.
- VirtualQuery confirmation. If the address still is not in any module after a fresh cache, VirtualQuery inspects the memory region directly. The code only flags it as a detection if two conditions hold: the region type is MEM_PRIVATE (not backed by a file) and the protection flags include execute permission.
The combination of MEM_PRIVATE and executable protection is the core signal. Normal module code lives in MEM_IMAGE regions. Heap allocations and stack memory are MEM_PRIVATE but not executable (unless DEP is disabled) [6]. Shellcode and manually mapped code must be MEM_PRIVATE with execute permission because the Windows loader did not create those regions.
When a detection fires, the event includes the hook name, the suspicious address, the frame index in the call stack, and the region’s base address, size, and protection flags. This telemetry is sent over the named pipe IPC to the Peregrine service.
Per-Hook Rate Limiting
Calling RtlCaptureStackBackTrace and VirtualQuery on every single API invocation would be expensive, especially for high-frequency hooks like OpenProcess. The rate limiter prevents this from becoming a performance problem:
// From: PeregrineDLL/callstack.c
typedef struct {
const char* hook_name;
LONG max_per_minute;
volatile LONG count;
volatile ULONGLONG window_start;
} RateLimit;
#define NUM_HOOKS 8
static RateLimit g_limits[NUM_HOOKS] = {
{ "OpenProcess", 50, 0, 0 },
{ "CreateRemoteThread", 50, 0, 0 },
{ "VirtualAllocEx", 20, 0, 0 },
{ "VirtualProtectEx", 20, 0, 0 },
{ "ReadProcessMemory", 2, 0, 0 },
{ "WriteProcessMemory", 2, 0, 0 },
{ "NtReadVirtualMemory", 2, 0, 0 },
{ "NtWriteVirtualMemory", 2, 0, 0 },
};
Each hook has its own per-minute budget. OpenProcess and CreateRemoteThread get 50 checks per minute. The memory read/write APIs get only 2 because they are called at extremely high frequency by legitimate game code (anti-tamper systems, memory-mapped I/O, etc.) and exhaustive checking would degrade performance.
The rate limiter uses InterlockedIncrement for thread-safe counting and resets the window when 60 seconds have elapsed:
// From: PeregrineDLL/callstack.c
static int rate_limit_allow(const char* hook_name) {
RateLimit* rl = NULL;
for (int i = 0; i < NUM_HOOKS; i++) {
if (strcmp(g_limits[i].hook_name, hook_name) == 0) {
rl = &g_limits[i];
break;
}
}
if (!rl) return 1;
ULONGLONG now = GetTickCount64();
if (now - rl->window_start >= 60000) {
rl->window_start = now;
InterlockedExchange(&rl->count, 0);
}
LONG c = InterlockedIncrement(&rl->count);
return c <= rl->max_per_minute;
}
The rate limiter is a detection trade-off. A cheat that makes its call within the first 2 invocations of ReadProcessMemory will be caught. A cheat that waits until the 3rd call in a given minute window will not have its call stack validated. Patchi chose these thresholds to balance detection coverage against the performance cost of stack walking in a hot path.
Integration with the MinHook Hooks
Every hook function in dllmain.cpp calls callstack_check immediately after invoking the original function. For example, the OpenProcess hook:
// From: PeregrineDLL/dllmain.cpp
static HANDLE WINAPI HookOpenProcess(DWORD dwAccess, BOOL bInherit, DWORD dwPID) {
HANDLE result = oOpenProcess(dwAccess, bInherit, dwPID);
callstack_check("OpenProcess");
const DWORD DANGEROUS = 0x0001 | 0x0002 | 0x0008 | 0x0010 | 0x0020 | 0x0040 | 0x0800;
if (dwPID != PID && (dwAccess & DANGEROUS)) {
ipc_log_event("OpenProcess",
"\"callerPID\":%lu,\"targetPID\":%lu,\"access\":\"0x%08X\"",
PID, dwPID, dwAccess);
}
return result;
}
The call to callstack_check happens before the parameter-based logging. This means call stack validation runs on every invocation (subject to rate limits), regardless of whether the parameters would have triggered a log event. A cheat calling OpenProcess on its own PID with benign flags will still have its call stack checked.
The pattern is identical across all eight hooks. The ReadProcessMemory and WriteProcessMemory hooks at both the Win32 and NT layers:
// From: PeregrineDLL/dllmain.cpp
static BOOL WINAPI HookReadProcessMemory(HANDLE hProcess, LPCVOID lpBase,
LPVOID lpBuf, SIZE_T nSize, SIZE_T* pRead) {
BOOL result = oReadProcessMemory(hProcess, lpBase, lpBuf, nSize, pRead);
callstack_check("ReadProcessMemory");
// ...
return result;
}
static NTSTATUS NTAPI HookNtReadVirtualMemory(HANDLE hProcess, PVOID base,
PVOID buf, SIZE_T size, PSIZE_T pRead) {
NTSTATUS status = oNtReadVirtualMemory(hProcess, base, buf, size, pRead);
callstack_check("NtReadVirtualMemory");
// ...
return status;
}
The call stack module is initialized at the end of the init thread, after all hooks are installed:
// From: PeregrineDLL/dllmain.cpp
// ... hook installation ...
callstack_init();
DebugLog("[PeregrineDLL] Initialization complete\n");
callstack_init populates the module cache and zeroes out the rate limiter state:
// From: PeregrineDLL/callstack.c
void callstack_init(void) {
refresh_module_cache();
ULONGLONG now = GetTickCount64();
for (int i = 0; i < NUM_HOOKS; i++) {
g_limits[i].window_start = now;
g_limits[i].count = 0;
}
}
The Test Binary: Calling OpenProcess from Shellcode
The cheat_callstack.c test binary demonstrates the detection in action. It allocates a PAGE_EXECUTE_READWRITE page, copies shellcode into it, and calls OpenProcess from that unbacked memory:
// From: test/cheat_callstack.c
/* x64 shellcode: calls OpenProcess(PROCESS_VM_READ, FALSE, <pid>)
*
* mov ecx, 0x0010 ; dwDesiredAccess = PROCESS_VM_READ
* xor edx, edx ; bInheritHandle = FALSE
* mov r8d, <pid> ; dwProcessId (patched at +9)
* mov rax, <addr> ; OpenProcess pointer (patched at +15)
* sub rsp, 0x28 ; shadow space
* call rax
* add rsp, 0x28
* ret
*/
static unsigned char shellcode_template[] = {
0xB9, 0x10, 0x00, 0x00, 0x00,
0x31, 0xD2,
0x41, 0xB8, 0x00, 0x00, 0x00, 0x00,
0x48, 0xB8,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x48, 0x83, 0xEC, 0x28,
0xFF, 0xD0,
0x48, 0x83, 0xC4, 0x28,
0xC3
};
The shellcode is a minimal x64 stub that sets up the three OpenProcess parameters in the correct registers (RCX, RDX, R8 per the Windows x64 calling convention [7]), allocates shadow space, and calls through a function pointer. The target PID and OpenProcess address are patched in at runtime:
// From: test/cheat_callstack.c
void* execMem = VirtualAlloc(NULL, 4096,
MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
// ...
unsigned char sc[sizeof(shellcode_template)];
memcpy(sc, shellcode_template, sizeof(sc));
*(DWORD*)(sc + PID_OFFSET) = targetPid;
*(ULONGLONG*)(sc + ADDR_OFFSET) = (ULONGLONG)(ULONG_PTR)pOpenProcess;
memcpy(execMem, sc, sizeof(sc));
The test then calls the shellcode five times with a 2-second delay between calls:
// From: test/cheat_callstack.c
typedef HANDLE (*ShellFunc)(void);
ShellFunc fn = (ShellFunc)execMem;
for (int i = 0; i < 5; i++) {
HANDLE h = fn();
printf("[CALLSTACK] Call %d: OpenProcess returned 0x%p\n", i + 1, h);
if (h) CloseHandle(h);
Sleep(2000);
}
When the Peregrine DLL is injected into this process, the HookOpenProcess function fires on each of these calls. The call stack captured by RtlCaptureStackBackTrace will include a return address inside the VirtualAlloc’d page. That address is not in any MEM_IMAGE region. VirtualQuery confirms it is MEM_PRIVATE with PAGE_EXECUTE_READWRITE protection. A CallstackAnomaly event is emitted.
This mirrors exactly what a real cheat does: allocate executable memory (or get it from a manual mapper), write code into it, and call Windows APIs from there.
Limitations and Evasion Vectors
JIT engines and legitimate unbacked code. Some processes contain legitimate code executing from MEM_PRIVATE executable regions. .NET’s RyuJIT, Java’s HotSpot, and JavaScript engines like V8 all allocate executable memory at runtime for JIT-compiled code [8]. If a hooked API call passes through JIT-compiled code, the call stack will contain return addresses in unbacked executable memory. This would produce a false positive. In practice, game processes rarely have JIT engines in the call path for OpenProcess or WriteProcessMemory, but the possibility exists.
ROP-based call stack spoofing. The current check validates that return addresses point into file-backed module memory. An attacker using ROP (Return-Oriented Programming) gadgets can construct a fake call stack where every return address is inside a real module [9]. The addresses point to small instruction sequences ending in ret, effectively chaining through legitimate code. Peregrine’s is_in_module check would see valid module addresses and pass them. Detecting ROP-based spoofing requires deeper analysis, such as verifying that each return address is preceded by a call instruction targeting the next frame, or checking that frame sizes are consistent with the function prologues at those addresses.
MEM_IMAGE spoofing. An attacker could map a dummy PE file from disk, making their code region appear as MEM_IMAGE to VirtualQuery. The Windows loader marks memory as MEM_IMAGE when it is mapped via NtMapViewOfSection from a section object backed by a file [2]. An attacker who maps their payload as a section from a temporary file on disk would bypass the MEM_IMAGE vs. MEM_PRIVATE distinction. However, this leaves a file artifact and the mapped image would appear in module enumeration, giving other Peregrine detection layers (like IAT/EAT scanning and relocation-aware hashing) an opportunity to flag it.
Rate limiter windows. The per-minute rate limits mean that a cheat timing its API calls carefully could avoid having its call stack checked. If ReadProcessMemory has already been called twice in the current minute window by legitimate code, the third call (from the cheat) will not be validated. This is an explicit trade-off: Patchi chose to accept some detection gaps rather than impose the cost of stack walking on every high-frequency API call.
Drafted with LLM assistance from the Peregrine Anti-Cheat source code, reviewed and verified against the actual implementation.
References
[1] F-Secure Labs, “Call Stack Spoofing,” https://blog.f-secure.com/call-stack-spoofing/
[2] Microsoft, “MEMORY_BASIC_INFORMATION structure,” https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-memory_basic_information
[3] Microsoft, “VirtualQuery function,” https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualquery
[4] Microsoft, “PE Format: Section Headers,” https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers
[5] Microsoft, “RtlCaptureStackBackTrace function,” https://learn.microsoft.com/en-us/windows/win32/debug/capturestackbacktrace
[6] Microsoft, “Data Execution Prevention,” https://learn.microsoft.com/en-us/windows/win32/memory/data-execution-prevention
[7] Microsoft, “x64 calling convention,” https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention
[8] Microsoft, ”.NET RyuJIT Overview,” https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-8/runtime#codegen
[9] Hovav Shacham, “The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86),” ACM CCS 2007