YARA in Memory: Signature Scanning with yara-x for Cheat Detection

This post covers how Peregrine scans a target process’s memory with the YARA-X engine, matching byte patterns and string combinations defined in standard YARA rules. The previous approach used a custom byte-pattern signature scanner; the current implementation replaces it entirely with yara-x, the Rust-native YARA engine maintained by VirusTotal [1]. This gives Peregrine access to the same expressive rule language used across the malware analysis and threat hunting industry, without maintaining a bespoke pattern matching engine.

Why YARA Over Custom Signature Scanning

Custom byte-pattern scanners work, but they carry maintenance cost. Every new pattern needs code changes or a custom binary format for the rule database. Matching logic is limited to what the developer implements: exact byte sequences, maybe wildcards, maybe some boolean combinations. As the number of signatures grows, the custom format becomes a liability.

YARA solves this with a well-defined rule language that has been the standard in malware research for over a decade [1]. A single YARA rule can combine ASCII and hex strings, regular expressions, boolean logic, byte offsets, and metadata. VirusTotal, CISA, and hundreds of threat intelligence teams publish and consume YARA rules as a common interchange format. Adopting YARA means Peregrine can leverage the entire ecosystem of existing rule-writing knowledge and tooling.

The specific implementation Patchi chose is yara-x [2], VirusTotal’s ground-up rewrite of the YARA engine in Rust. Unlike the original C library (which requires C bindings and careful memory management), yara-x is a native Rust crate that integrates cleanly with the existing Tauri application. The dependency is a single line in Cargo.toml:

# From: peregrine-tauri/src-tauri/Cargo.toml
yara-x = "0.13"

No C build toolchain, no bindgen, no unsafe FFI wrappers around a C library. The yara-x crate compiles as pure Rust and provides safe APIs for compiling rules and scanning byte buffers.

Loading YARA Rules at Runtime

Rules are loaded from a rules.yar file on disk rather than compiled into the binary. This means new detection signatures can be deployed by updating a text file without recompiling Peregrine. The loader searches multiple paths in priority order:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

fn load_rules() -> Result<yara_x::Rules, String> {
    let paths = [
        std::env::current_exe()
            .ok()
            .and_then(|p| p.parent().map(|d| d.join("rules.yar"))),
        Some(std::path::PathBuf::from("rules.yar")),
        Some(std::path::PathBuf::from(r"C:\Peregrine\rules.yar")),
        Some(std::path::PathBuf::from(r"E:\Peregrine\rules.yar")),
    ];
    for p in paths.iter().flatten() {
        if let Ok(src) = std::fs::read_to_string(p) {
            return yara_x::compile(src.as_str())
                .map_err(|e| format!("YARA compile error: {e}"));
        }
    }
    Err("rules.yar not found".into())
}

The function tries the directory containing the executable first, then the current working directory, then two fixed install paths. The first file it can read is compiled into a yara_x::Rules object using yara_x::compile, which parses the YARA source and produces an optimized representation ready for scanning. If the rule syntax is invalid, the compile error propagates to the caller rather than silently failing.

The Memory Walk: VirtualQueryEx Region by Region

The scanner needs to read every accessible region of the target process’s virtual address space. The approach is the same technique used in VAD scanning, but from usermode rather than kernel mode: walk the address space with VirtualQueryEx [3], check each region’s state and protection, and read the committed, readable regions.

The implementation uses raw FFI to call VirtualQueryEx directly, with a manually defined MEMORY_BASIC_INFORMATION structure:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

#[repr(C)]
struct MemBasicInfo {
    base_address: usize,
    allocation_base: usize,
    allocation_protect: u32,
    _partition_pad: u32,
    region_size: usize,
    state: u32,
    protect: u32,
    mem_type: u32,
}

#[link(name = "kernel32")]
extern "system" {
    fn VirtualQueryEx(
        process: *mut std::ffi::c_void,
        address: usize,
        buffer: *mut MemBasicInfo,
        length: usize,
    ) -> usize;
}

The #[repr(C)] attribute ensures the struct layout matches the Win32 MEMORY_BASIC_INFORMATION [4] expected by the kernel. The _partition_pad field accounts for the PartitionId member added in recent Windows versions.

Several constants define which regions are worth scanning:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

const MEM_COMMIT: u32 = 0x1000;
const MEM_IMAGE: u32 = 0x1000000;
const MEM_PRIVATE: u32 = 0x20000;
const PAGE_GUARD: u32 = 0x100;
const PAGE_NOACCESS: u32 = 0x01;
const MAX_REGION_READ: usize = 16 * 1024 * 1024;

MEM_COMMIT indicates the region has physical storage (RAM or pagefile) allocated [4]. PAGE_GUARD and PAGE_NOACCESS mark regions that will fault on access [5]. The MAX_REGION_READ cap of 16 MB prevents the scanner from attempting to read extremely large regions that could cause memory pressure or long stalls.

The readability check is straightforward:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

fn is_readable(protect: u32) -> bool {
    let base = protect & 0xFF;
    base != 0 && base != PAGE_NOACCESS && (protect & PAGE_GUARD) == 0
}

Any committed region with a non-zero, non-NOACCESS, non-GUARD protection is treated as readable. This is deliberately broad: unlike the VAD scanner which focuses on executable regions, the YARA scanner checks all readable memory because cheat signatures may appear in data sections, configuration buffers, or heap allocations, not just executable code.

The Scan Loop: Reading and Matching

The core of the scanner is the scan_process function. It opens the target process, creates a YARA scanner from the compiled rules, and walks every region:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

pub fn scan_process(pid: u32) -> Result<(Vec<SigMatch>, usize), String> {
    let rules = load_rules()?;
    let proc = ProcessHandle::open(pid).ok_or("OpenProcess failed")?;
    let handle_raw = proc.0.0;

    let mut scanner = yara_x::Scanner::new(&rules);
    let mut results = Vec::new();
    let mut addr: usize = 0;
    let mut bytes_scanned: usize = 0;

    loop {
        let mut mbi: MemBasicInfo = unsafe { std::mem::zeroed() };
        let ret = unsafe {
            VirtualQueryEx(handle_raw, addr, &mut mbi, std::mem::size_of::<MemBasicInfo>())
        };
        if ret == 0 { break; }

        let base = mbi.base_address;
        let size = mbi.region_size;

        if mbi.state == MEM_COMMIT
            && is_readable(mbi.protect)
            && size > 0
            && size <= MAX_REGION_READ
        {
            if let Some(data) = proc.read_memory(base, size) {
                bytes_scanned += data.len();

                let scan_results = scanner.scan(&data);
                if let Ok(scan_results) = scan_results {
                    for rule in scan_results.matching_rules() {
                        for pattern in rule.patterns() {
                            for m in pattern.matches() {
                                results.push(SigMatch {
                                    rule_name: rule.identifier().to_string(),
                                    address: format!("0x{:X}", base + m.range().start),
                                    region_protection: prot_str(mbi.protect).into(),
                                    region_type: type_str(mbi.mem_type).into(),
                                    match_length: m.range().len(),
                                });
                            }
                        }
                    }
                }
            }
        }

        let next = base.wrapping_add(size);
        if next <= addr { break; }
        addr = next;
    }

    Ok((results, bytes_scanned))
}

The flow for each region is: query metadata with VirtualQueryEx, skip if not committed or not readable or too large, read the bytes with read_memory, then pass the buffer to scanner.scan(). The yara-x scanner evaluates all compiled rules against the buffer in a single call.

When a rule matches, the scanner iterates through the matching rules, their patterns, and each pattern’s match offsets. The match offset within the buffer is added to the region’s base address to produce the absolute virtual address of the match. Each match is recorded as a SigMatch:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

#[derive(Debug, Clone, Serialize)]
pub struct SigMatch {
    pub rule_name: String,
    pub address: String,
    pub region_protection: String,
    pub region_type: String,
    pub match_length: usize,
}

The struct includes the rule name (which rule fired), the absolute address, the memory protection and type of the containing region, and the length of the matched pattern. The protection and type strings are human-readable labels:

// From: peregrine-tauri/src-tauri/src/detections/sigscan.rs

fn prot_str(p: u32) -> &'static str {
    match p & 0xFF {
        0x02 => "READONLY",
        0x04 => "READWRITE",
        0x08 => "WRITECOPY",
        0x10 => "EXECUTE",
        0x20 => "EXECUTE_READ",
        0x40 => "EXECUTE_READWRITE",
        0x80 => "EXECUTE_WRITECOPY",
        _ => "?",
    }
}

fn type_str(t: u32) -> &'static str {
    match t {
        MEM_IMAGE => "IMAGE",
        MEM_PRIVATE => "PRIVATE",
        0x40000 => "MAPPED",
        _ => "?",
    }
}

These map the raw Win32 protection constants [5] and memory type constants [4] to readable strings. A match in a PRIVATE/READWRITE region is more suspicious than one in an IMAGE/READONLY region, because image-backed readonly memory is typically a legitimate loaded module.

Anatomy of a YARA Rule: PeregrineTestCheat

The rules.yar file shipped with Peregrine contains detection rules written in standard YARA syntax. The current test rule demonstrates the structure:

// From: rules.yar

rule PeregrineTestCheat {
    meta:
        description = "Peregrine test cheat marker strings"
        severity = "critical"
    strings:
        $marker = "PEREGRINE_CHEAT_MARKER_v1" ascii
        $config = "[cheat_config]" ascii
        $aimbot = "aimbot_fov=" ascii
        $esp    = "esp_enabled=" ascii
    condition:
        $marker and 2 of ($config, $aimbot, $esp)
}

A YARA rule has three sections [1]:

  • meta: Descriptive key-value pairs that do not affect matching. These carry metadata like severity levels, author attribution, or references.
  • strings: Named patterns to search for. These can be ASCII strings, wide (UTF-16) strings, hex byte sequences with wildcards, or regular expressions.
  • condition: A boolean expression over the defined strings. This is where YARA’s expressiveness shines.

The condition $marker and 2 of ($config, $aimbot, $esp) requires the exact marker string to be present, plus at least two of the three configuration-related strings. This is more expressive than a simple byte pattern search. A custom scanner would need specific code to implement “match string A and at least N of strings B, C, D.” In YARA, that logic is a single line.

Real-world cheat detection rules would target known cheat frameworks, injector signatures, hook trampolines, or specific configuration patterns. The rule language supports hex patterns with wildcards ({ 48 8B ?? 48 83 ?? 10 [0-4] E8 }) for matching instruction sequences that vary slightly between builds, nocase for case-insensitive matching, and wide for UTF-16 encoded strings common in game engines. Because rules are text files, new signatures can be added without recompiling the scanner.

The Test Binary: Planting a Known Marker

The repository includes cheat_yara.c, a test program that injects a known payload into a target process for the YARA scanner to find:

// From: test/cheat_yara.c

static const char cheat_payload[] =
    "PEREGRINE_CHEAT_MARKER_v1\0"
    "[cheat_config]\n"
    "aimbot_fov=2.5\n"
    "aimbot_smooth=0.8\n"
    "esp_enabled=1\n"
    "esp_box=1\n"
    "esp_health=1\n"
    "triggerbot_delay=45\n";

The payload contains exactly the strings the PeregrineTestCheat rule looks for: the marker, the config header, and the aimbot/ESP configuration keys. The test binary injects this payload into a target process using the standard injection sequence:

// From: test/cheat_yara.c

HANDLE hProc = OpenProcess(
    PROCESS_VM_OPERATION | PROCESS_VM_WRITE, FALSE, pid);
// ...

LPVOID remote = VirtualAllocEx(hProc, NULL, 4096,
    MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// ...

SIZE_T written = 0;
WriteProcessMemory(hProc, remote, cheat_payload, sizeof(cheat_payload), &written);

The sequence is: OpenProcess [6] with write access, VirtualAllocEx [7] to allocate a page of committed memory, and WriteProcessMemory [8] to copy the payload. The allocation uses PAGE_READWRITE, not executable, because this payload is data, not code. This is an important distinction: the YARA scanner finds this payload in a READWRITE region, not an executable one. Unlike the VAD scanner which specifically targets executable private memory, the YARA scanner checks all readable regions and can detect cheat configuration data regardless of whether it sits next to executable code.

The test binary waits for user input before cleaning up, giving time to run the YARA scan:

// From: test/cheat_yara.c

printf("[YARA-TEST] Wrote %zu bytes at 0x%p\n", written, remote);
printf("[YARA-TEST] Click YARA in Peregrine to detect.\n");
printf("[YARA-TEST] Press Enter to cleanup.\n");
getchar();

VirtualFreeEx(hProc, remote, 0, MEM_RELEASE);
CloseHandle(hProc);

After the scan confirms detection, pressing Enter calls VirtualFreeEx [9] to release the remote allocation and CloseHandle to close the process handle.

How YARA Scanning Complements VAD Scanning

The VAD scanner and the YARA scanner answer different questions about the same address space.

VAD scanning asks: “Is there executable memory here that should not exist?” It finds regions with suspicious structural properties, specifically committed, executable, non-image-backed memory. The answer is binary: the region is structurally suspicious or it is not. VAD scanning does not examine the contents of the memory; it only looks at the metadata.

YARA scanning asks: “Does this memory contain known bad content?” It reads the actual bytes and matches them against signatures. A cheat configuration block sitting in a READWRITE heap allocation has perfectly normal structural properties but contains identifiable strings. YARA catches what VAD scanning cannot see.

The two complement each other:

  • VAD scanning catches structurally anomalous memory: manually mapped DLLs, reflectively loaded code, shellcode injections. These have abnormal protection and type flags but may not contain known signatures if the cheat is novel.
  • YARA scanning catches content-identified threats: known cheat frameworks, configuration formats, signature strings. These may reside in structurally normal memory regions.
  • Together, they cover both unknown threats with suspicious structure and known threats with identifiable content.

For context on how these detection modules fit into the broader system, see the architecture overview.

Limitations: Performance, Evasion, and Rule Maintenance

Scanning an entire address space is expensive. A typical 64-bit process can have hundreds of committed regions totaling hundreds of megabytes or more. The scanner reads every readable region up to 16 MB and runs the full rule set against each one. For a game process with a large working set, this means reading and scanning potentially gigabytes of memory. The bytes_scanned counter returned by scan_process reports the total volume; on a large process, a single scan can take multiple seconds. This makes continuous scanning impractical. The YARA scan is best used as a periodic or on-demand check, not a real-time monitor.

Encryption and packing defeat content-based scanning. If a cheat encrypts its configuration in memory and only decrypts it into a short-lived buffer during use, the YARA rule will not match during the window when the data is encrypted. Similarly, packed or obfuscated cheat binaries may not contain recognizable strings until unpacked at runtime. This is a fundamental limitation of any signature-based approach: the scanner can only match what it can read in plaintext.

Rule maintenance is ongoing work. YARA rules are only as good as the signatures they contain. New cheats require new rules. Cheat developers who discover which strings trigger detection can trivially rename variables, change configuration formats, or split strings across multiple allocations to break exact-match rules. Hex patterns with wildcards are more resilient to minor variations, but maintaining an effective rule set against actively evolving cheats requires continuous analysis of new cheat samples.

The 16 MB region cap means very large allocations are skipped. The MAX_REGION_READ constant limits reads to 16 MB per region. Any single region larger than this is silently skipped. A cheat could theoretically allocate its payload inside a very large region to avoid scanning, though this would be an unusual allocation pattern that might trigger other heuristics.

Cross-region matches are missed. Because each region is scanned independently, a pattern that spans two adjacent regions will not match. In practice, most meaningful signatures (strings, configuration blocks, code sequences) fit within a single allocation, but this is a theoretical gap.


Drafted with LLM assistance from the Peregrine Anti-Cheat source code, reviewed and verified against the actual implementation.

References