Threads Outside the Map: Shellcode Detection
This post covers how Peregrine detects injected shellcode by enumerating a process’s threads and checking whether each thread’s instruction pointer and start address fall within known module boundaries. Threads executing outside any loaded module’s address range are, with high probability, running from raw allocated memory. Threads outside the map are shellcode.
Building a Module Map from EnumProcessModulesEx
Every loaded DLL and the main executable occupy a contiguous region of virtual memory described by a base address and a size [1]. Enumerating these regions with EnumProcessModulesEx and GetModuleInformation produces a list of [base, base+size) intervals that account for all legitimate code the Windows loader brought into the process. Anything executing outside these intervals is not part of any on-disk image.
The module enumeration lives in pe.rs. Each entry stores the base address, size, and full path of the module:
// From: peregrine-tauri/src-tauri/src/detections/pe.rs
#[derive(Debug, Clone)]
pub struct ModuleEntry {
pub base: usize,
pub size: usize,
pub path: String,
}
impl ModuleEntry {
pub fn name(&self) -> &str {
self.path.rsplit('\\').next().unwrap_or(&self.path)
}
}
The modules() method on ProcessHandle calls EnumProcessModulesEx with LIST_MODULES_ALL, retrieves MODULEINFO for each handle, and grabs the filename via GetModuleFileNameExW:
// From: peregrine-tauri/src-tauri/src/detections/pe.rs
pub fn modules(&self) -> Vec<ModuleEntry> {
let mut needed = 0u32;
unsafe {
let _ = EnumProcessModulesEx(
self.0,
std::ptr::null_mut(),
0,
&mut needed,
LIST_MODULES_ALL,
);
}
if needed == 0 {
return Vec::new();
}
let count = needed as usize / std::mem::size_of::<HMODULE>();
let mut arr = vec![HMODULE::default(); count];
let ok = unsafe {
EnumProcessModulesEx(
self.0,
arr.as_mut_ptr(),
needed,
&mut needed,
LIST_MODULES_ALL,
)
};
if ok.is_err() {
return Vec::new();
}
arr.truncate(needed as usize / std::mem::size_of::<HMODULE>());
let mut result = Vec::new();
for hmod in &arr {
let mut info = MODULEINFO::default();
let ok = unsafe {
GetModuleInformation(
self.0,
*hmod,
&mut info,
std::mem::size_of::<MODULEINFO>() as u32,
)
};
if ok.is_err() {
continue;
}
let base = info.lpBaseOfDll as usize;
let size = info.SizeOfImage as usize;
let mut buf = [0u16; 1024];
let len = unsafe { GetModuleFileNameExW(Some(self.0), Some(*hmod), &mut buf) } as usize;
let path = if len > 0 {
String::from_utf16_lossy(&buf[..len])
} else {
String::new()
};
result.push(ModuleEntry { base, size, path });
}
result
}
The resulting vector serves as ground truth for all subsequent thread checks.
Snapshotting Threads with the Toolhelp API
Thread enumeration uses CreateToolhelp32Snapshot with the TH32CS_SNAPTHREAD flag, which captures every thread on the system [2]. There is no per-process thread snapshot flag; the toolhelp API requires filtering by PID in the enumeration loop. For each thread belonging to the target process, the scanner opens a handle with THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION and extracts two pieces of data.
The first is the current instruction pointer. GetThreadContext fills a CONTEXT64 struct, and the Rip field gives the address where the thread is executing right now. The developer defined a custom CONTEXT64 struct because the windows crate’s CONTEXT binding does not handle the 16-byte alignment requirement correctly on x64 [3]:
// From: peregrine-tauri/src-tauri/src/detections/threads.rs
#[repr(C, align(16))]
#[allow(non_snake_case)]
struct CONTEXT64 {
P1Home: u64,
P2Home: u64,
P3Home: u64,
P4Home: u64,
P5Home: u64,
P6Home: u64,
ContextFlags: u32,
MxCsr: u32,
SegCs: u16,
SegDs: u16,
SegEs: u16,
SegFs: u16,
SegGs: u16,
SegSs: u16,
EFlags: u32,
Dr0: u64,
Dr1: u64,
Dr2: u64,
Dr3: u64,
Dr6: u64,
Dr7: u64,
Rax: u64,
Rcx: u64,
Rdx: u64,
Rbx: u64,
Rsp: u64,
Rbp: u64,
Rsi: u64,
Rdi: u64,
R8: u64,
R9: u64,
R10: u64,
R11: u64,
R12: u64,
R13: u64,
R14: u64,
R15: u64,
Rip: u64,
FltSave: [u8; 512],
VectorRegister: [M128A; 26],
VectorControl: u64,
DebugControl: u64,
LastBranchToRip: u64,
LastBranchFromRip: u64,
LastExceptionToRip: u64,
LastExceptionFromRip: u64,
}
const CONTEXT_AMD64: u32 = 0x00100000;
const CONTEXT_FULL: u32 = CONTEXT_AMD64 | 0x07;
The align(16) attribute is critical. Without it, GetThreadContext silently fails or corrupts the stack.
Retrieving the Win32 Start Address via NtQueryInformationThread
The second data point is the thread’s original start address: the function pointer passed to CreateThread or NtCreateThreadEx. This comes from NtQueryInformationThread with the information class ThreadQuerySetWin32StartAddress (value 9):
// From: peregrine-tauri/src-tauri/src/detections/threads.rs
extern "system" {
fn NtQueryInformationThread(
ThreadHandle: windows::Win32::Foundation::HANDLE,
ThreadInformationClass: u32,
ThreadInformation: *mut u64,
ThreadInformationLength: u32,
ReturnLength: *mut u32,
) -> i32;
}
const THREAD_QUERY_SET_WIN32_START_ADDRESS: u32 = 9;
NtQueryInformationThread is partially documented by Microsoft [4]. The THREADINFOCLASS enumeration is exposed in SDK headers, but ThreadQuerySetWin32StartAddress sits in the grey zone between documented and undocumented. It works reliably across Windows versions and is used by most security tools.
Two Addresses Checked Against the Module Map
With both the current RIP and the original start address in hand, the scanner checks each against the module map:
// From: peregrine-tauri/src-tauri/src/detections/threads.rs
let rip_mod = modules
.iter()
.find(|m| rip >= m.base as u64 && rip < (m.base + m.size) as u64)
.map(|m| m.name().to_string());
let start_mod = if start_addr != 0 {
modules
.iter()
.find(|m| {
start_addr >= m.base as u64
&& start_addr < (m.base + m.size) as u64
})
.map(|m| m.name().to_string())
} else {
None
};
let suspicious = rip_mod.is_none() || (start_addr != 0 && start_mod.is_none());
The suspicious flag fires when either the current RIP or the start address falls outside every known module. This produces two confidence tiers:
-
Both RIP and start address outside all modules: high confidence shellcode. The thread was created pointing at allocated memory and is still executing there. This is a classic
VirtualAllocEx+CreateRemoteThreadinjection, or reflective DLL loading that has not resolved into a proper module. -
Start address inside a module, but RIP is outside: lower confidence. The thread started in a legitimate function (often
ntdll!RtlUserThreadStartor a known DLL export) but the instruction pointer has moved into unmapped territory. This can indicate a code cave, a JIT engine, or a callback trampoline.
The inverse case (start address outside, RIP inside) means the thread was spawned pointing at shellcode that has since jumped into a legitimate module, possibly after manually resolving API addresses. The start address preserves the lasting evidence.
The ThreadInfo Output Structure
Each thread produces a ThreadInfo struct that gets serialized and sent to the front end:
// From: peregrine-tauri/src-tauri/src/detections/threads.rs
#[derive(Debug, Clone, Serialize)]
pub struct ThreadInfo {
pub tid: u32,
pub rip: u64,
pub start_address: u64,
pub rip_module: Option<String>,
pub start_module: Option<String>,
pub suspicious: bool,
}
The Option<String> fields carry the module name when found (Some("ntdll.dll"), Some("kernel32.dll"), etc.) or None when the address is off the map. This gives the GUI everything it needs to display a thread inventory with clear indicators of what belongs and what does not.
The Complete Scan: check_all_threads
The check_all_threads function ties the pieces together. It opens the target process, builds the module map, snapshots all system threads, filters by PID, and checks each thread:
// From: peregrine-tauri/src-tauri/src/detections/threads.rs
pub fn check_all_threads(pid: u32) -> Result<Vec<ThreadInfo>, String> {
let proc = ProcessHandle::open(pid).ok_or("OpenProcess failed")?;
let modules = proc.modules();
let snap = unsafe { CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0) }
.map_err(|e| format!("snapshot: {e}"))?;
if snap == INVALID_HANDLE_VALUE {
return Err("CreateToolhelp32Snapshot failed".into());
}
let mut results = Vec::new();
let mut te = THREADENTRY32::default();
te.dwSize = std::mem::size_of::<THREADENTRY32>() as u32;
if unsafe { Thread32First(snap, &mut te) }.is_err() {
let _ = unsafe { CloseHandle(snap) };
return Ok(results);
}
loop {
if te.th32OwnerProcessID == pid {
let tid = te.th32ThreadID;
let th = unsafe {
OpenThread(THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION, false, tid)
};
if let Ok(th) = th {
if th != INVALID_HANDLE_VALUE {
let mut ctx: CONTEXT64 = unsafe { std::mem::zeroed() };
ctx.ContextFlags = CONTEXT_FULL;
let ok = unsafe { GetThreadContext(th, &mut ctx) };
if ok != 0 {
let rip = ctx.Rip;
let mut start_addr: u64 = 0;
unsafe {
NtQueryInformationThread(
th,
THREAD_QUERY_SET_WIN32_START_ADDRESS,
&mut start_addr,
8,
std::ptr::null_mut(),
);
}
// ... module lookup and suspicious check shown above ...
let rip_mod = modules
.iter()
.find(|m| rip >= m.base as u64 && rip < (m.base + m.size) as u64)
.map(|m| m.name().to_string());
let start_mod = if start_addr != 0 {
modules
.iter()
.find(|m| start_addr >= m.base as u64 && start_addr < (m.base + m.size) as u64)
.map(|m| m.name().to_string())
} else {
None
};
let suspicious = rip_mod.is_none() || (start_addr != 0 && start_mod.is_none());
results.push(ThreadInfo {
tid,
rip,
start_address: start_addr,
rip_module: rip_mod,
start_module: start_mod,
suspicious,
});
}
let _ = unsafe { CloseHandle(th) };
}
}
}
if unsafe { Thread32Next(snap, &mut te) }.is_err() {
break;
}
}
let _ = unsafe { CloseHandle(snap) };
Ok(results)
}
The TH32CS_SNAPTHREAD flag snapshots all threads system-wide. The PID filter happens inside the loop because the toolhelp API does not support per-process thread snapshots [2].
Limitations and False Positives
This approach has known blind spots.
JIT compilers (.NET CLR, V8, LuaJIT) allocate executable memory and run threads from it. Those threads will appear suspicious by this metric. The same applies to some anti-tamper solutions that unpack code at runtime. In practice, whitelisting known JIT regions or correlating with the allocation source (JIT pages are typically PAGE_EXECUTE_READWRITE allocated by a known module, while shellcode injection comes from a cross-process VirtualAllocEx [5]) would reduce false positives.
Thread hijacking via SetThreadContext or SuspendThread plus context manipulation will still show a legitimate start address. Only the RIP will appear outside the map. The start address is immutable once the thread is created, so it preserves the original entry point regardless of later tampering. This is useful in both directions: it catches some hijacking scenarios, but a sophisticated attacker who hijacks an existing thread rather than creating a new one will only trip the RIP check, not the start address check.
The module map is a point-in-time snapshot. If a module is loaded or unloaded between the module enumeration and the thread scan, the results can be inconsistent. The current implementation does not lock the module list.
This post was generated by an LLM based on code from Peregrine Anti-Cheat. All code snippets are from the actual repository. Claims about Windows internals are sourced from Microsoft documentation.
References
- [1] Microsoft, MODULEINFO structure
- [2] Microsoft, CreateToolhelp32Snapshot function
- [3] Microsoft, CONTEXT structure (x64)
- [4] Microsoft, NtQueryInformationThread function
- [5] Microsoft, VirtualAllocEx function