Reversing IDA's Lumina Protocol

Complete specification from reverse engineering libida.so and the Lumina server

Background
Architecture Overview
Wire Protocol
Varint Encoding
CalcRel Hash
Metadata Format
Delta Encoding
Tag Reference
Implementation Notes
Reference Tables

Appendices

A: Complete Packet Structures
B: Memory Layouts (from RE)
C: Dazhbog Implementation References
D: Test Vectors

1. Background

Lumina is Hex-Rays' function signature sharing service. You query it with a function hash, you get back names, types, comments. The official server requires a license, and the protocol isn't documented anywhere.

I was using Lumen for a while, an open-source alternative. It worked, but queries were slow and the matching felt rudimentary. I wanted better relevance signals: name similarity scoring, function size proximity, popularity weighting, recency decay, co-occurrence patterns from binaries that share functions. So I reversed the protocol.

The result is Dazhbog. This document is everything I learned along the way: wire format, hash computation, metadata encoding. Validated against 5+ million records, tested with real IDA clients.

2. Architecture Overview

Lumina is conceptually simple. It's a key-value store where:

Key = MD5 hash of a function's "normalized" bytes (more on this later)
Value = metadata blob containing names, types, comments, stack frames

The protocol is binary RPC over TLS on port 443 (or 20667 for non-TLS). Client sends a request, server sends a response. No persistent connections, no streaming, no fancy stuff.

┌─────────────────┐                    ┌─────────────────┐
│    IDA Client   │                    │  Lumina Server  │
│                 │                    │                 │
│  ┌───────────┐  │     TLS/443        │  ┌───────────┐  │
│  │ libida.so │  │◄──────────────────►│  │  lumina   │  │
│  └───────────┘  │                    │  └───────────┘  │
│                 │                    │                 │
│  "What's this   │                    │  "That's        │
│   function?"    │                    │   malloc()."    │
└─────────────────┘                    └─────────────────┘

The conversation looks like this:

Client                Server
  │                      │
  ├───── HELO ──────────►│  "IDA 8.3, here's my license"
  │                      │
  │◄────── OK ───────────┤  "You have these features"
  │                      │
  ├─── PullMetadata ────►│  "Know these 50 hashes?"
  │                      │
  │◄── PullResult ───────┤  "Here's 23 matches"
  │                      │
  ├─── PushMetadata ────►│  "I analyzed these"
  │                      │
  │◄────── OK ───────────┤  "Stored"
  │                      │

Simple enough. The details are where it gets interesting.

3. Wire Protocol

Every packet has the same framing:

┌────────────────┬──────────┬─────────────────────┐
│ Length (4B)    │ Type (1B)│ Body (variable)     │
└────────────────┴──────────┴─────────────────────┘

Here's your first gotcha: that length field is big-endian. Everything else in the entire protocol is little-endian. I don't know why. Maybe someone at Hex-Rays thought it would be funny.

// The moment you realize the endianness is wrong
let len = u32::from_be_bytes(lenb) as usize;  // NOT from_le_bytes!

Message Types

Code	Name	Who Sends It	What It Does
0x0a	OK	Server	"Request succeeded"
0x0b	FAIL	Server	"Something broke" (includes error message)
0x0d	HELO	Client	Version negotiation + auth
0x0e	PullMetadata	Client	"Give me info for these hashes"
0x0f	PullResult	Server	"Here's what I found"
0x10	PushMetadata	Client	"Store this metadata"

There are a few more (0x18 for history deletion, 0x2f for history queries) but these are the ones that matter.

The HELO Dance

The handshake is straightforward:

┌─────────┬─────────┬────────┬──────┬──────┐
│ version │ license │ id[6]  │ user │ pass │
│ varint  │ len+data│ 6 bytes│ cstr │ cstr │
└─────────┴─────────┴────────┴──────┴──────┘
                             └──── v3+ ────┘

Protocol versions 1-6 are accepted. Anything higher gets rejected:

// From the server binary
if ( version > 6 )
{
    send_error("This server doesn't support version %d", version);
}

Version 3 added username/password auth. Version 5 added feature flags in the response. If you're building a server, just accept everything and return what IDA expects.

Size Limits (and Trust)

The server tracks whether a client is "trusted." Untrusted clients are limited to 8KB packets. Trusted clients can send up to 2GB.

// Untrusted? You get 8KB max
if ( !is_trusted && packet_len > 0x2000 )
{
    reject();
}

What makes a client trusted? Valid license data. For a private server, you probably want to just trust everyone, or implement your own auth.

4. Varint Encoding

IDA doesn't use protobuf varints. It doesn't use LEB128. It has its own encoding with three variants: pack_dw (16-bit), pack_dd (32-bit), and pack_dq (64-bit).

pack_dd (32-bit)

The most common variant:

First Byte	Format	Bytes Total	How to Decode
`0xxxxxxx`	1	1	Value is the byte itself (0-127)
`10xxxxxx`	2	2	`((b0 & 0x3F) << 8)
`110xxxxx`	3	5	Ignore b0, read next 4 bytes as BE u32
`111xxxxx`	4	1-5	Continuation encoding

Format 3 is the trap. See that 110xxxxx? The 5 bits in the first byte are not part of the value. They're just a format marker. The actual value is in the next 4 bytes, big-endian.

I spent an embarrassing amount of time debugging this because I assumed the first byte contributed bits to the value. It doesn't.

pub fn unpack_dd(data: &mut &[u8]) -> Option<u32> {
    let b0 = data[0];
    *data = &data[1..];

    if b0 & 0x80 == 0 {
        // Format 1: just the byte
        Some(b0 as u32)
    } else if b0 & 0x40 == 0 {
        // Format 2: 14 bits across 2 bytes
        let b1 = data[0]; *data = &data[1..];
        Some((((b0 & 0x3F) as u32) << 8) | (b1 as u32))
    } else if b0 & 0x20 == 0 {
        // Format 3: IGNORE b0, read 4 bytes big-endian
        // This is the gotcha. Don't use b0's bits.
        let v = u32::from_be_bytes([data[0], data[1], data[2], data[3]]);
        *data = &data[4..];
        Some(v)
    } else {
        // Format 4: continuation
        let mut value = (b0 & 0x1F) as u32;
        let mut shift = 5;
        loop {
            let b = data[0]; *data = &data[1..];
            value |= ((b & 0x7F) as u32) << shift;
            if b & 0x80 == 0 { break; }
            shift += 7;
        }
        Some(value)
    }
}

pack_dq (64-bit)

Same idea, but format 3 can read either 4 or 8 bytes depending on a flag bit:

110x0xxx → read 4 bytes (32-bit value)
110x1xxx → read 8 bytes (64-bit value)

Encoding Direction

Going the other way is simpler because you just pick the smallest format that fits:

fn pack_dd(v: u32) -> Vec<u8> {
    match v {
        0..=0x7f => vec![v as u8],
        0x80..=0x3fff => vec![0x80 | (v >> 8) as u8, v as u8],
        _ => {
            let mut out = vec![0xc0];
            out.extend_from_slice(&v.to_be_bytes());
            out
        }
    }
}

5. CalcRel Hash

How do you hash a function so that the same code at different addresses produces the same hash?

The Problem

Consider this x86 snippet:

func_example:
    push ebp              ; 55
    mov ebp, esp          ; 89 E5
    call some_func        ; E8 73 56 34 12  ← relative offset!
    mov eax, [global]     ; A1 21 43 65 87  ← absolute address!
    pop ebp               ; 5D
    ret                   ; C3

If you hash the raw bytes 55 89 E5 E8 73 56 34 12 A1 21 43 65 87 5D C3, you'll get a different hash every time the function is compiled at a different address, because those call and mov operands change.

The Solution: Placeholder Masks

IDA's processor modules implement something called ev_calcrel (event 0x52). For each instruction, it returns:

The raw instruction bytes
A mask indicating which bits are position-dependent

The mask semantics are:

0 = keep this bit (opcode, register, etc.)
1 = mask this bit (address/offset, set to zero)

Then you compute: normalized = raw & ~mask

Worked Example

Instruction      Raw Bytes         Mask              Normalized
───────────────  ────────────────  ────────────────  ────────────────
push ebp         55                00                55
mov ebp, esp     89 E5             00 00             89 E5
call +0x12345673 E8 73 56 34 12    00 FF FF FF FF    E8 00 00 00 00
mov eax, [abs]   A1 21 43 65 87    00 FF FF FF FF    A1 00 00 00 00
pop ebp          5D                00                5D
ret              C3                00                C3

Normalized stream: 55 89 E5 E8 00 00 00 00 A1 00 00 00 00 5D C3

Hash that with MD5 and you've got your position-independent function signature.

The Decompiler Output (Annotated)

Here's what the normalization loop looks like after decompilation. I've added comments because the original variable names are... not helpful.

// This is the core loop that builds the normalized byte stream
while ( 1 )
{
    mask_ptr = placeholder_mask;
    do
    {
        mask_byte = *mask_ptr;
        // THE CRITICAL LINE: raw AND (NOT mask)
        normalized_byte = raw_byte & ~mask_byte;
        output[out_idx] = normalized_byte;

        raw_byte >>= 8;
        // ... continues for each byte of instruction
    }
    while ( byte_idx < insn_len );
    // ... next instruction
}

That raw_byte & ~mask_byte is the entire algorithm. The rest is just iteration.

Architecture Coverage

Each processor module knows what to mask:

Architecture	What Gets Masked
x86/x64	Call/jmp offsets, absolute addresses, ModRM displacements
ARM	B/BL offsets, PC-relative loads (LDR rx, [pc, #offset])
ARM64	ADRP immediates, branch targets
MIPS	J/JAL targets, branch offsets

Multi-Chunk Functions

Functions can have multiple non-contiguous chunks (think: cold code, exception handlers). All chunks contribute to a single hash, iterated in order:

func_tail_iterator_set();  // Start iterating chunks
total_size = 0;

for ( i = 0; ; ++i )
{
    total_size += chunk_end - chunk_start;
    next_chunk = get_fchunk_1();
    if ( !next_chunk )
        break;
    // Process chunk...
}

The hash gets you a lookup key. The value is a metadata blob containing everything IDA knows about the function: name, type signature, comments, stack frame layout, variable types.

Blob Structure

┌─────────────────┬─────────────────────────────────────────────┐
│ func_size (4B)  │ Tagged Blocks...                            │
│ little-endian   │                                             │
└─────────────────┴─────────────────────────────────────────────┘

First 4 bytes are the function size. Then comes a sequence of TLV (tag-length-value) blocks:

┌──────────────┬──────────────┬─────────────────────┐
│ Tag (varint) │ Size (varint)│ Data (Size bytes)   │
└──────────────┴──────────────┴─────────────────────┘

Tag Types

Tag	Name	What It Contains
1	FUNC_TYPE_INFO	Serialized function signature (tinfo_t)
3	FUNC_CMT	Function comment
4	FUNC_CMT_REP	Repeatable function comment
5	CMT_REGULAR	Inline comments (with locations)
6	CMT_REPEATABLE	Repeatable inline comments
7	EXTRA_CMT	Anterior/posterior comments
8	SP_DELTA	Stack pointer tracking
9	FRAME_DESC	Complete stack frame layout
10	VAR_TYPE_INFO	Variable type information
11	OP_INFO	Operand information (up to 8 per instruction)

Tag 2 is reserved/unused. Tags 5-11 use delta encoding for locations (covered next section).

Parsing Tags

pub fn parse_tagged_blocks(data: &[u8]) -> io::Result<(u32, Vec<TaggedBlock>)> {
    // First 4 bytes: function size
    let func_size = u32::from_le_bytes([data[0], data[1], data[2], data[3]]);
    let mut remaining = &data[4..];
    let mut blocks = Vec::new();

    while !remaining.is_empty() {
        let tag = unpack_dd(&mut remaining)?;
        let size = unpack_dd(&mut remaining)? as usize;

        let block_data = remaining[..size].to_vec();
        remaining = &remaining[size..];

        blocks.push(TaggedBlock { tag, data: block_data });
    }

    Ok((func_size, blocks))
}

7. Delta Encoding

Tags 5-11 store data at specific locations within the function. Rather than storing absolute offsets for each entry, Lumina uses delta encoding to save space.

How Delta Encoding Works

Entries are sorted by location. The first entry stores its absolute offset. Each subsequent entry stores the delta from the previous one.

Entry 1: chunk=0, offset=16      → encode: chunk=0, offset=16
Entry 2: chunk=0, offset=24      → encode: delta=8  (24-16)
Entry 3: chunk=0, offset=30      → encode: delta=6  (30-24)
Entry 4: chunk=1, offset=4       → encode: 0 (marker), chunk=1, offset=4
Entry 5: chunk=1, offset=12      → encode: delta=8  (12-4)

When the chunk changes, you emit a zero marker followed by the new chunk index and absolute offset.

Decoding

if is_first_entry {
    chunk = val;
    offset = unpack_dd(&mut data)?;
    is_first_entry = false;
} else if val == 0 {
    // Chunk change marker
    chunk = unpack_dd(&mut data)?;
    offset = unpack_dd(&mut data)?;
} else {
    // Delta from previous
    offset += val;
}

The +1 Encoding

Many fields can be negative (stack offsets, SP deltas). Varints are unsigned. Solution? Add 1 before encoding, subtract 1 after decoding.

Actual Value	Encoded Value
-1	0
0	1
1	2
n	n + 1

This shows up in:

SP delta values (tag 8)
Frame size and args size (tag 9)
Stack offsets in frame variables
Various operand fields

// Decoding
let encoded = unpack_dq(&mut data)?;
let actual = (encoded as i64) - 1;  // Don't forget the -1!

// Encoding
let encoded = (actual + 1) as u64;
pack_dq(encoded);

I found this by noticing stack offsets were always off by one.

8. Tag Reference

Let's go through each tag format. These are based on actual data from millions of records.

Tag 1: Function Type Info

The serialized tinfo_t structure. IDA's internal type representation.

┌──────────────────┬────────────────────────────────┐
│ flags (1 byte)   │ serialized tinfo data          │
└──────────────────┴────────────────────────────────┘

The tinfo format is complex and out of scope here. Just store it opaquely.

Plain UTF-8 text, no null terminator:

┌────────────────────────────────────────────────────┐
│ comment text (UTF-8)                               │
└────────────────────────────────────────────────────┘

Tag 3 is non-repeatable (shown only at function start). Tag 4 is repeatable (shown at every xref).

Comments at specific addresses within the function.

[location via delta encoding]
pack_dd(text_length)      ← includes null in count!
[text_bytes]              ← length-1 actual bytes

Gotcha: The length field includes a conceptual null terminator, but the actual bytes don't contain it. For "hello" (5 chars), length = 6, but you read 5 bytes.

let text_len = unpack_dd(&mut data)?;
let actual_len = if text_len > 0 { text_len - 1 } else { 0 };
let text = &data[..actual_len];

Comments above or below an instruction line.

[location via delta encoding]
pack_dd(anterior_length)
[anterior_bytes]          ← if length > 0
pack_dd(posterior_length)
[posterior_bytes]         ← if length > 0

Tag 8: SP Delta

Stack pointer tracking points for call analysis.

[location via delta encoding]
pack_dq(delta + 1)        ← +1 encoded!

Tag 9: Frame Description

The most complex tag. Complete stack frame layout.

pack_dq(frame_size + 1)   ← +1 encoded
pack_dq(args_size + 1)    ← +1 encoded, 0 means "unknown" (actual -1)
pack_dw(flags)
pack_dd(var_count)

For each variable:
  pack_dd(name_length)
  [name_bytes]
  pack_dq(offset + 1)     ← +1 encoded
  pack_dq(size)
  pack_dd(type_length)
  [type_bytes]

Tags 10: Variable Type Info

Per-instruction type information.

[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)

Tag 11: Operand Info

Per-instruction operand information (up to 8 operands per instruction).

[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)

9. Implementation Notes

Dazhbog is the implementation I built to validate this spec.

Storage Format

Records need more than just the metadata blob. You want:

The 128-bit hash key
Timestamp (for history/versioning)
Function name (for display without parsing metadata)
Popularity score (how many times it's been requested)

┌─────────────────────────────────────────────────────┐
│                 Header (12 bytes)                   │
├────────────┬────────────┬───────────────────────────┤
│ magic (4B) │ length (4B)│ checksum (4B)             │
│ "LMN1"     │            │                           │
├────────────┴────────────┴───────────────────────────┤
│                 Body (variable)                     │
│  key_lo (8B) | key_hi (8B) | timestamp (8B) | ...   │
│  name (variable) | metadata (variable)              │
└─────────────────────────────────────────────────────┘

Real Data Examples

From a database with 5+ million records, here's what typical entries look like:

Record: "_fetch_headers"
  Hash: 0x80df1f34c6f4cd3ecff5973f7ef61cb8
  Timestamp: 2025-11-18 15:17:54 UTC
  Popularity: 12
  Blocks:
    - tag=1 (FuncTypeInfo): 111 bytes
    - tag=9 (FrameDesc): frame_size=152, args=unknown
    - tag=10 (VarTypeInfo): 7 entries

Record: "patch_handler_697"
  Hash: 0xb6bf85ebed7a4cf7ff9cf6b12593ff54
  Blocks:
    - tag=1 (FuncTypeInfo): 26 bytes
    - tag=5 (Comment): chunk=0, off=58, "patch_697"
    - tag=6 (RepComment): chunk=0, off=58, "patch_697"
    - tag=9 (FrameDesc): frame_size=72
    - tag=10 (VarTypeInfo): 7 entries

Most records have tags 1, 9, and 10. Comments (tags 3-7) are less common. SP deltas (tag 8) are rare.

What Matters for Matching

The hash is the primary key. Everything else is metadata enrichment. When returning results:

Exact hash match is required
Most recent entry wins if there are duplicates
Popularity can be used for ranking when multiple functions in a binary match

10. Reference Tables

Varint Quick Reference

Encoding	First Byte	Total Bytes	Max Value
pack_dw	`0xxxxxxx`	1	127
pack_dw	`10xxxxxx`	2	16,383
pack_dw	`11xxxxxx`	3	65,535
pack_dd	`0xxxxxxx`	1	127
pack_dd	`10xxxxxx`	2	16,383
pack_dd	`110xxxxx`	5	4,294,967,295
pack_dd	`111xxxxx`	1-5	varies

Message Types

Code	Name	Direction
0x0a	OK	S→C
0x0b	FAIL	S→C
0x0d	HELO	C→S
0x0e	PullMetadata	C→S
0x0f	PullResult	S→C
0x10	PushMetadata	C→S

Tag Summary

Tag	Name	Has Location	+1 Encoded Fields
1	FUNC_TYPE_INFO	No	No
3	FUNC_CMT	No	No
4	FUNC_CMT_REP	No	No
5	CMT_REGULAR	Yes (delta)	No
6	CMT_REPEATABLE	Yes (delta)	No
7	EXTRA_CMT	Yes (delta)	No
8	SP_DELTA	Yes (delta)	Yes (delta value)
9	FRAME_DESC	No	Yes (sizes, offsets)
10	VAR_TYPE_INFO	Yes (delta)	Yes (value1 only)
11	OP_INFO	Yes (delta)	Yes (values)

Common Pitfalls

Problem	Symptom	Solution
Wrong endianness on packet length	Connection drops immediately	Use `from_be_bytes` for length only
Varint format 3 misparse	Huge garbage values	Don't use first byte's bits as value
Missing +1 decode	Stack offsets off by one	Subtract 1 after decode
String length includes null	Strings have trailing garbage	Read `length - 1` bytes
Delta encoding chunk change	Locations jump wildly	Check for 0 marker

Appendix A: Complete Packet Structures

For those implementing a client or server, here are the exact byte layouts.

HELO Packet (0x0d)

┌───────────────────────────────────────────┐
│ protocol_version (varint)                 │
├───────────────────────────────────────────┤
│ license_data_len (varint)                 │
├───────────────────────────────────────────┤
│ license_data (license_data_len bytes)     │
├───────────────────────────────────────────┤
│ license_id (6 bytes, fixed)               │
├───────────────────────────────────────────┤
│ username (cstring) ────────────── v3+     │
├───────────────────────────────────────────┤
│ password (cstring) ────────────── v3+     │
└───────────────────────────────────────────┘

HELO Response - OK (0x0a)

┌───────────────────────────────────────────┐
│ features (varint) ─────────────── v5+     │
└───────────────────────────────────────────┘

FAIL Response (0x0b)

┌───────────────────────────────────────────┐
│ error_code (varint)                       │
├───────────────────────────────────────────┤
│ error_message (cstring)                   │
└───────────────────────────────────────────┘

PullMetadata Request (0x0e)

┌───────────────────────────────────────────┐
│ flags (varint)                            │
├───────────────────────────────────────────┤
│ count (varint)                            │
├───────────────────────────────────────────┤
│ For each function:                        │
│   ┌───────────────────────────────────┐   │
│   │ md5_hash (16 bytes)               │   │
│   ├───────────────────────────────────┤   │
│   │ func_size (varint)                │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

PullResult Response (0x0f)

┌───────────────────────────────────────────┐
│ found_count (varint)                      │
├───────────────────────────────────────────┤
│ For each result:                          │
│   ┌───────────────────────────────────┐   │
│   │ status (varint): 0=miss, 1=hit    │   │
│   ├───────────────────────────────────┤   │
│   │ If status == 1:                   │   │
│   │   ┌───────────────────────────┐   │   │
│   │   │ score (varint)            │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ name_len (varint)         │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ name (name_len bytes)     │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ metadata_len (varint)     │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ metadata (metadata_len)   │   │   │
│   │   └───────────────────────────┘   │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

PushMetadata Request (0x10)

┌───────────────────────────────────────────┐
│ flags (varint)                            │
├───────────────────────────────────────────┤
│ idb_filepath_len (varint)                 │
├───────────────────────────────────────────┤
│ idb_filepath (idb_filepath_len bytes)     │
├───────────────────────────────────────────┤
│ input_filepath_len (varint)               │
├───────────────────────────────────────────┤
│ input_filepath (input_filepath_len bytes) │
├───────────────────────────────────────────┤
│ input_md5 (16 bytes)                      │
├───────────────────────────────────────────┤
│ hostname_len (varint)                     │
├───────────────────────────────────────────┤
│ hostname (hostname_len bytes)             │
├───────────────────────────────────────────┤
│ func_count (varint)                       │
├───────────────────────────────────────────┤
│ For each function:                        │
│   ┌───────────────────────────────────┐   │
│   │ func_md5 (16 bytes)               │   │
│   ├───────────────────────────────────┤   │
│   │ func_name_len (varint)            │   │
│   ├───────────────────────────────────┤   │
│   │ func_name (func_name_len bytes)   │   │
│   ├───────────────────────────────────┤   │
│   │ func_size (varint)                │   │
│   ├───────────────────────────────────┤   │
│   │ metadata_len (varint)             │   │
│   ├───────────────────────────────────┤   │
│   │ metadata (metadata_len bytes)     │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

Appendix B: Memory Layouts (from RE)

These structures were recovered from reverse engineering libida.so. Useful if you're trying to understand the decompiler output.

struct comment_entry {
    int32_t  chunk_index;    // 0x00: Which function chunk (0 for main)
    int32_t  offset;         // 0x04: Byte offset within chunk
    char    *text;           // 0x08: Pointer to comment text
    size_t   length;         // 0x10: Text length (excluding null)
    size_t   capacity;       // 0x18: Allocated buffer size
};  // Total: 0x20 (32 bytes)

struct extra_comment_entry {
    int32_t  chunk_index;     // 0x00
    int32_t  offset;          // 0x04
    char    *anterior_text;   // 0x08: Comment above instruction
    size_t   anterior_len;    // 0x10
    size_t   anterior_cap;    // 0x18
    char    *posterior_text;  // 0x20: Comment below instruction
    size_t   posterior_len;   // 0x28
    size_t   posterior_cap;   // 0x30
};  // Total: 0x38 (56 bytes)

SP Delta Entry (16 bytes)

struct sp_delta_entry {
    int32_t  chunk_index;    // 0x00
    int32_t  offset;         // 0x04
    int64_t  delta;          // 0x08: Stack pointer change at this point
};  // Total: 0x10 (16 bytes)

Frame Description

struct frame_desc {
    int64_t  frame_size;     // Total stack frame size
    int64_t  args_size;      // Size of arguments area (-1 if unknown)
    uint16_t flags;          // Frame flags
    uint32_t var_count;      // Number of local variables
    // Followed by var_count local_var_t entries
};

struct local_var_t {
    char    *name;           // Variable name
    size_t   name_len;
    int64_t  offset;         // Stack offset (can be negative)
    uint64_t size;           // Size in bytes
    uint8_t *type_info;      // Serialized tinfo_t
    size_t   type_len;
};

Operand Info Entry (2192 bytes)

This one's large because it supports up to 8 operands per instruction:

struct op_info_entry {
    int32_t  chunk_index;              // 0x00
    int32_t  offset;                   // 0x04
    uint64_t flags;                    // 0x08
    operand_info_t operands[8];        // 0x10: 8 operand slots × 272 bytes each
};  // Total: 0x890 (2192 bytes)

struct operand_info_t {
    uint64_t addr;           // 0x00
    uint64_t value1;         // 0x08
    uint64_t value2;         // 0x10
    uint32_t flags;          // 0x18
    uint8_t  reserved[252];  // Remaining fields (not fully mapped)
};  // Total: 0x110 (272 bytes)

Appendix C: Dazhbog Implementation References

If you're reading the Dazhbog source code, here's where to find things:

Feature	File	Lines	Notes
Packet read/write	src/lumina.rs	504-623	Async framing
HELO parsing	src/lumina.rs	150-197	Version detection
Varint decode	src/metadata.rs	58-114	All formats
Varint encode	src/lumina.rs	39-103	pack_dd, pack_dq
Tag parsing	src/metadata.rs	210-466	Main entry point
Comment parsing	src/metadata.rs	531-605	Tags 5, 6
Extra comment parsing	src/metadata.rs	607-680	Tag 7
SP delta parsing	src/metadata.rs	682-720	Tag 8
Frame parsing	src/metadata.rs	770-828	Tag 9
Delta encoding	src/metadata.rs	557-575	Shared logic
Record storage	src/engine/segment.rs	100-130	LMN1 format
Server RPC dispatch	src/server.rs	422-822	Message handling

Appendix D: Test Vectors

For validating your implementation.

Varint Encoding

Value       pack_dd output    pack_dq output
──────────  ────────────────  ────────────────
0           00                00
127         7F                7F
128         80 80             80 80
16383       BF FF             BF FF
16384       C0 00 00 40 00    C0 00 00 40 00
0xDEADBEEF  C0 DE AD BE EF    C0 DE AD BE EF
0xFFFFFFFF  C0 FF FF FF FF    C0 FF FF FF FF

Metadata Blob (Minimal)

Hex: 00 00 00 00
Meaning: func_size=0, no tags

Hex: 10 00 00 00 03 05 68 65 6C 6C 6F
     └─────┬────┘ │  │  └────┬─────┘
     func_size=16 │  │  "hello"
                tag=3│
                   size=5

Hex: 20 00 00 00 05 0B 00 10 06 68 65 6C 6C 6F
     └─────┬────┘ │  │  │  │  │  └────┬─────┘
     func_size=32 │  │  │  │  │  "hello"
                tag=5│  │  │  │
                   size=11 │  text_len=6 (includes null)
                         │  \
                     chunk=0 │
                         offset=16

Frame Description

Hex: 09 0A 99 01 00 01 00
     │  │  └─┬─┘ │  │  │
   tag=9│    │   │  │  var_count=0
     size=10 │   │  flags=0x0001
             │   args_size=-1 (+1 encoded as 0)
        frame_size=152 (+1 encoded as 0x99 0x01 = 153)

Methodology Note

All protocol details were obtained through static reverse engineering of libida.so (IDA Pro's core library) and the Lumina server binary. The decompilation output was... not pretty. Variable names like v91, v89, v188 don't exactly document themselves.

Decompile, trace data flow, capture traffic, build, test, fix, repeat. Dazhbog has processed millions of records at this point.

Acknowledgments

Thanks to everyone who's poked at Lumina before. The IDA SDK docs helped where they existed.

Last updated: December 2025

License: This document is provided for educational and interoperability purposes. IDA Pro is a trademark of Hex-Rays SA.

Table of Contents

1. Background

2. Architecture Overview

3. Wire Protocol

Message Types

The HELO Dance

Size Limits (and Trust)

4. Varint Encoding

pack_dd (32-bit)

pack_dq (64-bit)

Encoding Direction

5. CalcRel Hash

The Problem

The Solution: Placeholder Masks

Worked Example

The Decompiler Output (Annotated)

Architecture Coverage

Multi-Chunk Functions

Blob Structure

Tag Types

Parsing Tags

7. Delta Encoding

How Delta Encoding Works

Decoding

The +1 Encoding

8. Tag Reference

Tag 1: Function Type Info

Tag 8: SP Delta

Tag 9: Frame Description

Tags 10: Variable Type Info

Tag 11: Operand Info

9. Implementation Notes

Storage Format

Real Data Examples

What Matters for Matching

10. Reference Tables

Varint Quick Reference

Message Types

Tag Summary

Common Pitfalls

Appendix A: Complete Packet Structures

HELO Packet (0x0d)

HELO Response - OK (0x0a)

FAIL Response (0x0b)

PullMetadata Request (0x0e)

PullResult Response (0x0f)

PushMetadata Request (0x10)

Appendix B: Memory Layouts (from RE)

SP Delta Entry (16 bytes)

Frame Description

Operand Info Entry (2192 bytes)

Appendix C: Dazhbog Implementation References

Appendix D: Test Vectors

Varint Encoding

Metadata Blob (Minimal)

Frame Description

Methodology Note

Acknowledgments