Reversing IDA's Lumina Protocol

18 min read Original article ↗

Complete specification from reverse engineering libida.so and the Lumina server


Table of Contents

  1. Background
  2. Architecture Overview
  3. Wire Protocol
  4. Varint Encoding
  5. CalcRel Hash
  6. Metadata Format
  7. Delta Encoding
  8. Tag Reference
  9. Implementation Notes
  10. Reference Tables

Appendices


1. Background

Lumina is Hex-Rays' function signature sharing service. You query it with a function hash, you get back names, types, comments. The official server requires a license, and the protocol isn't documented anywhere.

I was using Lumen for a while, an open-source alternative. It worked, but queries were slow and the matching felt rudimentary. I wanted better relevance signals: name similarity scoring, function size proximity, popularity weighting, recency decay, co-occurrence patterns from binaries that share functions. So I reversed the protocol.

The result is Dazhbog. This document is everything I learned along the way: wire format, hash computation, metadata encoding. Validated against 5+ million records, tested with real IDA clients.


2. Architecture Overview

Lumina is conceptually simple. It's a key-value store where:

  • Key = MD5 hash of a function's "normalized" bytes (more on this later)
  • Value = metadata blob containing names, types, comments, stack frames

The protocol is binary RPC over TLS on port 443 (or 20667 for non-TLS). Client sends a request, server sends a response. No persistent connections, no streaming, no fancy stuff.

┌─────────────────┐                    ┌─────────────────┐
│    IDA Client   │                    │  Lumina Server  │
│                 │                    │                 │
│  ┌───────────┐  │     TLS/443        │  ┌───────────┐  │
│  │ libida.so │  │◄──────────────────►│  │  lumina   │  │
│  └───────────┘  │                    │  └───────────┘  │
│                 │                    │                 │
│  "What's this   │                    │  "That's        │
│   function?"    │                    │   malloc()."    │
└─────────────────┘                    └─────────────────┘

The conversation looks like this:

Client                Server
  │                      │
  ├───── HELO ──────────►│  "IDA 8.3, here's my license"
  │                      │
  │◄────── OK ───────────┤  "You have these features"
  │                      │
  ├─── PullMetadata ────►│  "Know these 50 hashes?"
  │                      │
  │◄── PullResult ───────┤  "Here's 23 matches"
  │                      │
  ├─── PushMetadata ────►│  "I analyzed these"
  │                      │
  │◄────── OK ───────────┤  "Stored"
  │                      │

Simple enough. The details are where it gets interesting.


3. Wire Protocol

Every packet has the same framing:

┌────────────────┬──────────┬─────────────────────┐
│ Length (4B)    │ Type (1B)│ Body (variable)     │
└────────────────┴──────────┴─────────────────────┘

Here's your first gotcha: that length field is big-endian. Everything else in the entire protocol is little-endian. I don't know why. Maybe someone at Hex-Rays thought it would be funny.

// The moment you realize the endianness is wrong
let len = u32::from_be_bytes(lenb) as usize;  // NOT from_le_bytes!

Message Types

Code Name Who Sends It What It Does
0x0a OK Server "Request succeeded"
0x0b FAIL Server "Something broke" (includes error message)
0x0d HELO Client Version negotiation + auth
0x0e PullMetadata Client "Give me info for these hashes"
0x0f PullResult Server "Here's what I found"
0x10 PushMetadata Client "Store this metadata"

There are a few more (0x18 for history deletion, 0x2f for history queries) but these are the ones that matter.

The HELO Dance

The handshake is straightforward:

┌─────────┬─────────┬────────┬──────┬──────┐
│ version │ license │ id[6]  │ user │ pass │
│ varint  │ len+data│ 6 bytes│ cstr │ cstr │
└─────────┴─────────┴────────┴──────┴──────┘
                             └──── v3+ ────┘

Protocol versions 1-6 are accepted. Anything higher gets rejected:

// From the server binary
if ( version > 6 )
{
    send_error("This server doesn't support version %d", version);
}

Version 3 added username/password auth. Version 5 added feature flags in the response. If you're building a server, just accept everything and return what IDA expects.

Size Limits (and Trust)

The server tracks whether a client is "trusted." Untrusted clients are limited to 8KB packets. Trusted clients can send up to 2GB.

// Untrusted? You get 8KB max
if ( !is_trusted && packet_len > 0x2000 )
{
    reject();
}

What makes a client trusted? Valid license data. For a private server, you probably want to just trust everyone, or implement your own auth.


4. Varint Encoding

IDA doesn't use protobuf varints. It doesn't use LEB128. It has its own encoding with three variants: pack_dw (16-bit), pack_dd (32-bit), and pack_dq (64-bit).

pack_dd (32-bit)

The most common variant:

First Byte Format Bytes Total How to Decode
0xxxxxxx 1 1 Value is the byte itself (0-127)
10xxxxxx 2 2 `((b0 & 0x3F) << 8)
110xxxxx 3 5 Ignore b0, read next 4 bytes as BE u32
111xxxxx 4 1-5 Continuation encoding

Format 3 is the trap. See that 110xxxxx? The 5 bits in the first byte are not part of the value. They're just a format marker. The actual value is in the next 4 bytes, big-endian.

I spent an embarrassing amount of time debugging this because I assumed the first byte contributed bits to the value. It doesn't.

pub fn unpack_dd(data: &mut &[u8]) -> Option<u32> {
    let b0 = data[0];
    *data = &data[1..];

    if b0 & 0x80 == 0 {
        // Format 1: just the byte
        Some(b0 as u32)
    } else if b0 & 0x40 == 0 {
        // Format 2: 14 bits across 2 bytes
        let b1 = data[0]; *data = &data[1..];
        Some((((b0 & 0x3F) as u32) << 8) | (b1 as u32))
    } else if b0 & 0x20 == 0 {
        // Format 3: IGNORE b0, read 4 bytes big-endian
        // This is the gotcha. Don't use b0's bits.
        let v = u32::from_be_bytes([data[0], data[1], data[2], data[3]]);
        *data = &data[4..];
        Some(v)
    } else {
        // Format 4: continuation
        let mut value = (b0 & 0x1F) as u32;
        let mut shift = 5;
        loop {
            let b = data[0]; *data = &data[1..];
            value |= ((b & 0x7F) as u32) << shift;
            if b & 0x80 == 0 { break; }
            shift += 7;
        }
        Some(value)
    }
}

pack_dq (64-bit)

Same idea, but format 3 can read either 4 or 8 bytes depending on a flag bit:

  • 110x0xxx → read 4 bytes (32-bit value)
  • 110x1xxx → read 8 bytes (64-bit value)

Encoding Direction

Going the other way is simpler because you just pick the smallest format that fits:

fn pack_dd(v: u32) -> Vec<u8> {
    match v {
        0..=0x7f => vec![v as u8],
        0x80..=0x3fff => vec![0x80 | (v >> 8) as u8, v as u8],
        _ => {
            let mut out = vec![0xc0];
            out.extend_from_slice(&v.to_be_bytes());
            out
        }
    }
}

5. CalcRel Hash

How do you hash a function so that the same code at different addresses produces the same hash?

The Problem

Consider this x86 snippet:

func_example:
    push ebp              ; 55
    mov ebp, esp          ; 89 E5
    call some_func        ; E8 73 56 34 12  ← relative offset!
    mov eax, [global]     ; A1 21 43 65 87  ← absolute address!
    pop ebp               ; 5D
    ret                   ; C3

If you hash the raw bytes 55 89 E5 E8 73 56 34 12 A1 21 43 65 87 5D C3, you'll get a different hash every time the function is compiled at a different address, because those call and mov operands change.

The Solution: Placeholder Masks

IDA's processor modules implement something called ev_calcrel (event 0x52). For each instruction, it returns:

  1. The raw instruction bytes
  2. A mask indicating which bits are position-dependent

The mask semantics are:

  • 0 = keep this bit (opcode, register, etc.)
  • 1 = mask this bit (address/offset, set to zero)

Then you compute: normalized = raw & ~mask

Worked Example

Instruction      Raw Bytes         Mask              Normalized
───────────────  ────────────────  ────────────────  ────────────────
push ebp         55                00                55
mov ebp, esp     89 E5             00 00             89 E5
call +0x12345673 E8 73 56 34 12    00 FF FF FF FF    E8 00 00 00 00
mov eax, [abs]   A1 21 43 65 87    00 FF FF FF FF    A1 00 00 00 00
pop ebp          5D                00                5D
ret              C3                00                C3

Normalized stream: 55 89 E5 E8 00 00 00 00 A1 00 00 00 00 5D C3

Hash that with MD5 and you've got your position-independent function signature.

The Decompiler Output (Annotated)

Here's what the normalization loop looks like after decompilation. I've added comments because the original variable names are... not helpful.

// This is the core loop that builds the normalized byte stream
while ( 1 )
{
    mask_ptr = placeholder_mask;
    do
    {
        mask_byte = *mask_ptr;
        // THE CRITICAL LINE: raw AND (NOT mask)
        normalized_byte = raw_byte & ~mask_byte;
        output[out_idx] = normalized_byte;

        raw_byte >>= 8;
        // ... continues for each byte of instruction
    }
    while ( byte_idx < insn_len );
    // ... next instruction
}

That raw_byte & ~mask_byte is the entire algorithm. The rest is just iteration.

Architecture Coverage

Each processor module knows what to mask:

Architecture What Gets Masked
x86/x64 Call/jmp offsets, absolute addresses, ModRM displacements
ARM B/BL offsets, PC-relative loads (LDR rx, [pc, #offset])
ARM64 ADRP immediates, branch targets
MIPS J/JAL targets, branch offsets

Multi-Chunk Functions

Functions can have multiple non-contiguous chunks (think: cold code, exception handlers). All chunks contribute to a single hash, iterated in order:

func_tail_iterator_set();  // Start iterating chunks
total_size = 0;

for ( i = 0; ; ++i )
{
    total_size += chunk_end - chunk_start;
    next_chunk = get_fchunk_1();
    if ( !next_chunk )
        break;
    // Process chunk...
}

The hash gets you a lookup key. The value is a metadata blob containing everything IDA knows about the function: name, type signature, comments, stack frame layout, variable types.

Blob Structure

┌─────────────────┬─────────────────────────────────────────────┐
│ func_size (4B)  │ Tagged Blocks...                            │
│ little-endian   │                                             │
└─────────────────┴─────────────────────────────────────────────┘

First 4 bytes are the function size. Then comes a sequence of TLV (tag-length-value) blocks:

┌──────────────┬──────────────┬─────────────────────┐
│ Tag (varint) │ Size (varint)│ Data (Size bytes)   │
└──────────────┴──────────────┴─────────────────────┘

Tag Types

Tag Name What It Contains
1 FUNC_TYPE_INFO Serialized function signature (tinfo_t)
3 FUNC_CMT Function comment
4 FUNC_CMT_REP Repeatable function comment
5 CMT_REGULAR Inline comments (with locations)
6 CMT_REPEATABLE Repeatable inline comments
7 EXTRA_CMT Anterior/posterior comments
8 SP_DELTA Stack pointer tracking
9 FRAME_DESC Complete stack frame layout
10 VAR_TYPE_INFO Variable type information
11 OP_INFO Operand information (up to 8 per instruction)

Tag 2 is reserved/unused. Tags 5-11 use delta encoding for locations (covered next section).

Parsing Tags

pub fn parse_tagged_blocks(data: &[u8]) -> io::Result<(u32, Vec<TaggedBlock>)> {
    // First 4 bytes: function size
    let func_size = u32::from_le_bytes([data[0], data[1], data[2], data[3]]);
    let mut remaining = &data[4..];
    let mut blocks = Vec::new();

    while !remaining.is_empty() {
        let tag = unpack_dd(&mut remaining)?;
        let size = unpack_dd(&mut remaining)? as usize;

        let block_data = remaining[..size].to_vec();
        remaining = &remaining[size..];

        blocks.push(TaggedBlock { tag, data: block_data });
    }

    Ok((func_size, blocks))
}

7. Delta Encoding

Tags 5-11 store data at specific locations within the function. Rather than storing absolute offsets for each entry, Lumina uses delta encoding to save space.

How Delta Encoding Works

Entries are sorted by location. The first entry stores its absolute offset. Each subsequent entry stores the delta from the previous one.

Entry 1: chunk=0, offset=16      → encode: chunk=0, offset=16
Entry 2: chunk=0, offset=24      → encode: delta=8  (24-16)
Entry 3: chunk=0, offset=30      → encode: delta=6  (30-24)
Entry 4: chunk=1, offset=4       → encode: 0 (marker), chunk=1, offset=4
Entry 5: chunk=1, offset=12      → encode: delta=8  (12-4)

When the chunk changes, you emit a zero marker followed by the new chunk index and absolute offset.

Decoding

if is_first_entry {
    chunk = val;
    offset = unpack_dd(&mut data)?;
    is_first_entry = false;
} else if val == 0 {
    // Chunk change marker
    chunk = unpack_dd(&mut data)?;
    offset = unpack_dd(&mut data)?;
} else {
    // Delta from previous
    offset += val;
}

The +1 Encoding

Many fields can be negative (stack offsets, SP deltas). Varints are unsigned. Solution? Add 1 before encoding, subtract 1 after decoding.

Actual Value Encoded Value
-1 0
0 1
1 2
n n + 1

This shows up in:

  • SP delta values (tag 8)
  • Frame size and args size (tag 9)
  • Stack offsets in frame variables
  • Various operand fields
// Decoding
let encoded = unpack_dq(&mut data)?;
let actual = (encoded as i64) - 1;  // Don't forget the -1!

// Encoding
let encoded = (actual + 1) as u64;
pack_dq(encoded);

I found this by noticing stack offsets were always off by one.


8. Tag Reference

Let's go through each tag format. These are based on actual data from millions of records.

Tag 1: Function Type Info

The serialized tinfo_t structure. IDA's internal type representation.

┌──────────────────┬────────────────────────────────┐
│ flags (1 byte)   │ serialized tinfo data          │
└──────────────────┴────────────────────────────────┘

The tinfo format is complex and out of scope here. Just store it opaquely.

Plain UTF-8 text, no null terminator:

┌────────────────────────────────────────────────────┐
│ comment text (UTF-8)                               │
└────────────────────────────────────────────────────┘

Tag 3 is non-repeatable (shown only at function start). Tag 4 is repeatable (shown at every xref).

Comments at specific addresses within the function.

[location via delta encoding]
pack_dd(text_length)      ← includes null in count!
[text_bytes]              ← length-1 actual bytes

Gotcha: The length field includes a conceptual null terminator, but the actual bytes don't contain it. For "hello" (5 chars), length = 6, but you read 5 bytes.

let text_len = unpack_dd(&mut data)?;
let actual_len = if text_len > 0 { text_len - 1 } else { 0 };
let text = &data[..actual_len];

Comments above or below an instruction line.

[location via delta encoding]
pack_dd(anterior_length)
[anterior_bytes]          ← if length > 0
pack_dd(posterior_length)
[posterior_bytes]         ← if length > 0

Tag 8: SP Delta

Stack pointer tracking points for call analysis.

[location via delta encoding]
pack_dq(delta + 1)        ← +1 encoded!

Tag 9: Frame Description

The most complex tag. Complete stack frame layout.

pack_dq(frame_size + 1)   ← +1 encoded
pack_dq(args_size + 1)    ← +1 encoded, 0 means "unknown" (actual -1)
pack_dw(flags)
pack_dd(var_count)

For each variable:
  pack_dd(name_length)
  [name_bytes]
  pack_dq(offset + 1)     ← +1 encoded
  pack_dq(size)
  pack_dd(type_length)
  [type_bytes]

Tags 10: Variable Type Info

Per-instruction type information.

[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)

Tag 11: Operand Info

Per-instruction operand information (up to 8 operands per instruction).

[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)

9. Implementation Notes

Dazhbog is the implementation I built to validate this spec.

Storage Format

Records need more than just the metadata blob. You want:

  • The 128-bit hash key
  • Timestamp (for history/versioning)
  • Function name (for display without parsing metadata)
  • Popularity score (how many times it's been requested)
┌─────────────────────────────────────────────────────┐
│                 Header (12 bytes)                   │
├────────────┬────────────┬───────────────────────────┤
│ magic (4B) │ length (4B)│ checksum (4B)             │
│ "LMN1"     │            │                           │
├────────────┴────────────┴───────────────────────────┤
│                 Body (variable)                     │
│  key_lo (8B) | key_hi (8B) | timestamp (8B) | ...   │
│  name (variable) | metadata (variable)              │
└─────────────────────────────────────────────────────┘

Real Data Examples

From a database with 5+ million records, here's what typical entries look like:

Record: "_fetch_headers"
  Hash: 0x80df1f34c6f4cd3ecff5973f7ef61cb8
  Timestamp: 2025-11-18 15:17:54 UTC
  Popularity: 12
  Blocks:
    - tag=1 (FuncTypeInfo): 111 bytes
    - tag=9 (FrameDesc): frame_size=152, args=unknown
    - tag=10 (VarTypeInfo): 7 entries

Record: "patch_handler_697"
  Hash: 0xb6bf85ebed7a4cf7ff9cf6b12593ff54
  Blocks:
    - tag=1 (FuncTypeInfo): 26 bytes
    - tag=5 (Comment): chunk=0, off=58, "patch_697"
    - tag=6 (RepComment): chunk=0, off=58, "patch_697"
    - tag=9 (FrameDesc): frame_size=72
    - tag=10 (VarTypeInfo): 7 entries

Most records have tags 1, 9, and 10. Comments (tags 3-7) are less common. SP deltas (tag 8) are rare.

What Matters for Matching

The hash is the primary key. Everything else is metadata enrichment. When returning results:

  1. Exact hash match is required
  2. Most recent entry wins if there are duplicates
  3. Popularity can be used for ranking when multiple functions in a binary match

10. Reference Tables

Varint Quick Reference

Encoding First Byte Total Bytes Max Value
pack_dw 0xxxxxxx 1 127
pack_dw 10xxxxxx 2 16,383
pack_dw 11xxxxxx 3 65,535
pack_dd 0xxxxxxx 1 127
pack_dd 10xxxxxx 2 16,383
pack_dd 110xxxxx 5 4,294,967,295
pack_dd 111xxxxx 1-5 varies

Message Types

Code Name Direction
0x0a OK S→C
0x0b FAIL S→C
0x0d HELO C→S
0x0e PullMetadata C→S
0x0f PullResult S→C
0x10 PushMetadata C→S

Tag Summary

Tag Name Has Location +1 Encoded Fields
1 FUNC_TYPE_INFO No No
3 FUNC_CMT No No
4 FUNC_CMT_REP No No
5 CMT_REGULAR Yes (delta) No
6 CMT_REPEATABLE Yes (delta) No
7 EXTRA_CMT Yes (delta) No
8 SP_DELTA Yes (delta) Yes (delta value)
9 FRAME_DESC No Yes (sizes, offsets)
10 VAR_TYPE_INFO Yes (delta) Yes (value1 only)
11 OP_INFO Yes (delta) Yes (values)

Common Pitfalls

Problem Symptom Solution
Wrong endianness on packet length Connection drops immediately Use from_be_bytes for length only
Varint format 3 misparse Huge garbage values Don't use first byte's bits as value
Missing +1 decode Stack offsets off by one Subtract 1 after decode
String length includes null Strings have trailing garbage Read length - 1 bytes
Delta encoding chunk change Locations jump wildly Check for 0 marker

Appendix A: Complete Packet Structures

For those implementing a client or server, here are the exact byte layouts.

HELO Packet (0x0d)

┌───────────────────────────────────────────┐
│ protocol_version (varint)                 │
├───────────────────────────────────────────┤
│ license_data_len (varint)                 │
├───────────────────────────────────────────┤
│ license_data (license_data_len bytes)     │
├───────────────────────────────────────────┤
│ license_id (6 bytes, fixed)               │
├───────────────────────────────────────────┤
│ username (cstring) ────────────── v3+     │
├───────────────────────────────────────────┤
│ password (cstring) ────────────── v3+     │
└───────────────────────────────────────────┘

HELO Response - OK (0x0a)

┌───────────────────────────────────────────┐
│ features (varint) ─────────────── v5+     │
└───────────────────────────────────────────┘

FAIL Response (0x0b)

┌───────────────────────────────────────────┐
│ error_code (varint)                       │
├───────────────────────────────────────────┤
│ error_message (cstring)                   │
└───────────────────────────────────────────┘

PullMetadata Request (0x0e)

┌───────────────────────────────────────────┐
│ flags (varint)                            │
├───────────────────────────────────────────┤
│ count (varint)                            │
├───────────────────────────────────────────┤
│ For each function:                        │
│   ┌───────────────────────────────────┐   │
│   │ md5_hash (16 bytes)               │   │
│   ├───────────────────────────────────┤   │
│   │ func_size (varint)                │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

PullResult Response (0x0f)

┌───────────────────────────────────────────┐
│ found_count (varint)                      │
├───────────────────────────────────────────┤
│ For each result:                          │
│   ┌───────────────────────────────────┐   │
│   │ status (varint): 0=miss, 1=hit    │   │
│   ├───────────────────────────────────┤   │
│   │ If status == 1:                   │   │
│   │   ┌───────────────────────────┐   │   │
│   │   │ score (varint)            │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ name_len (varint)         │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ name (name_len bytes)     │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ metadata_len (varint)     │   │   │
│   │   ├───────────────────────────┤   │   │
│   │   │ metadata (metadata_len)   │   │   │
│   │   └───────────────────────────┘   │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

PushMetadata Request (0x10)

┌───────────────────────────────────────────┐
│ flags (varint)                            │
├───────────────────────────────────────────┤
│ idb_filepath_len (varint)                 │
├───────────────────────────────────────────┤
│ idb_filepath (idb_filepath_len bytes)     │
├───────────────────────────────────────────┤
│ input_filepath_len (varint)               │
├───────────────────────────────────────────┤
│ input_filepath (input_filepath_len bytes) │
├───────────────────────────────────────────┤
│ input_md5 (16 bytes)                      │
├───────────────────────────────────────────┤
│ hostname_len (varint)                     │
├───────────────────────────────────────────┤
│ hostname (hostname_len bytes)             │
├───────────────────────────────────────────┤
│ func_count (varint)                       │
├───────────────────────────────────────────┤
│ For each function:                        │
│   ┌───────────────────────────────────┐   │
│   │ func_md5 (16 bytes)               │   │
│   ├───────────────────────────────────┤   │
│   │ func_name_len (varint)            │   │
│   ├───────────────────────────────────┤   │
│   │ func_name (func_name_len bytes)   │   │
│   ├───────────────────────────────────┤   │
│   │ func_size (varint)                │   │
│   ├───────────────────────────────────┤   │
│   │ metadata_len (varint)             │   │
│   ├───────────────────────────────────┤   │
│   │ metadata (metadata_len bytes)     │   │
│   └───────────────────────────────────┘   │
└───────────────────────────────────────────┘

Appendix B: Memory Layouts (from RE)

These structures were recovered from reverse engineering libida.so. Useful if you're trying to understand the decompiler output.

struct comment_entry {
    int32_t  chunk_index;    // 0x00: Which function chunk (0 for main)
    int32_t  offset;         // 0x04: Byte offset within chunk
    char    *text;           // 0x08: Pointer to comment text
    size_t   length;         // 0x10: Text length (excluding null)
    size_t   capacity;       // 0x18: Allocated buffer size
};  // Total: 0x20 (32 bytes)
struct extra_comment_entry {
    int32_t  chunk_index;     // 0x00
    int32_t  offset;          // 0x04
    char    *anterior_text;   // 0x08: Comment above instruction
    size_t   anterior_len;    // 0x10
    size_t   anterior_cap;    // 0x18
    char    *posterior_text;  // 0x20: Comment below instruction
    size_t   posterior_len;   // 0x28
    size_t   posterior_cap;   // 0x30
};  // Total: 0x38 (56 bytes)

SP Delta Entry (16 bytes)

struct sp_delta_entry {
    int32_t  chunk_index;    // 0x00
    int32_t  offset;         // 0x04
    int64_t  delta;          // 0x08: Stack pointer change at this point
};  // Total: 0x10 (16 bytes)

Frame Description

struct frame_desc {
    int64_t  frame_size;     // Total stack frame size
    int64_t  args_size;      // Size of arguments area (-1 if unknown)
    uint16_t flags;          // Frame flags
    uint32_t var_count;      // Number of local variables
    // Followed by var_count local_var_t entries
};

struct local_var_t {
    char    *name;           // Variable name
    size_t   name_len;
    int64_t  offset;         // Stack offset (can be negative)
    uint64_t size;           // Size in bytes
    uint8_t *type_info;      // Serialized tinfo_t
    size_t   type_len;
};

Operand Info Entry (2192 bytes)

This one's large because it supports up to 8 operands per instruction:

struct op_info_entry {
    int32_t  chunk_index;              // 0x00
    int32_t  offset;                   // 0x04
    uint64_t flags;                    // 0x08
    operand_info_t operands[8];        // 0x10: 8 operand slots × 272 bytes each
};  // Total: 0x890 (2192 bytes)

struct operand_info_t {
    uint64_t addr;           // 0x00
    uint64_t value1;         // 0x08
    uint64_t value2;         // 0x10
    uint32_t flags;          // 0x18
    uint8_t  reserved[252];  // Remaining fields (not fully mapped)
};  // Total: 0x110 (272 bytes)

Appendix C: Dazhbog Implementation References

If you're reading the Dazhbog source code, here's where to find things:

Feature File Lines Notes
Packet read/write src/lumina.rs 504-623 Async framing
HELO parsing src/lumina.rs 150-197 Version detection
Varint decode src/metadata.rs 58-114 All formats
Varint encode src/lumina.rs 39-103 pack_dd, pack_dq
Tag parsing src/metadata.rs 210-466 Main entry point
Comment parsing src/metadata.rs 531-605 Tags 5, 6
Extra comment parsing src/metadata.rs 607-680 Tag 7
SP delta parsing src/metadata.rs 682-720 Tag 8
Frame parsing src/metadata.rs 770-828 Tag 9
Delta encoding src/metadata.rs 557-575 Shared logic
Record storage src/engine/segment.rs 100-130 LMN1 format
Server RPC dispatch src/server.rs 422-822 Message handling

Appendix D: Test Vectors

For validating your implementation.

Varint Encoding

Value       pack_dd output    pack_dq output
──────────  ────────────────  ────────────────
0           00                00
127         7F                7F
128         80 80             80 80
16383       BF FF             BF FF
16384       C0 00 00 40 00    C0 00 00 40 00
0xDEADBEEF  C0 DE AD BE EF    C0 DE AD BE EF
0xFFFFFFFF  C0 FF FF FF FF    C0 FF FF FF FF

Metadata Blob (Minimal)

Hex: 00 00 00 00
Meaning: func_size=0, no tags
Hex: 10 00 00 00 03 05 68 65 6C 6C 6F
     └─────┬────┘ │  │  └────┬─────┘
     func_size=16 │  │  "hello"
                tag=3│
                   size=5
Hex: 20 00 00 00 05 0B 00 10 06 68 65 6C 6C 6F
     └─────┬────┘ │  │  │  │  │  └────┬─────┘
     func_size=32 │  │  │  │  │  "hello"
                tag=5│  │  │  │
                   size=11 │  text_len=6 (includes null)
                         │  \
                     chunk=0 │
                         offset=16

Frame Description

Hex: 09 0A 99 01 00 01 00
     │  │  └─┬─┘ │  │  │
   tag=9│    │   │  │  var_count=0
     size=10 │   │  flags=0x0001
             │   args_size=-1 (+1 encoded as 0)
        frame_size=152 (+1 encoded as 0x99 0x01 = 153)

Methodology Note

All protocol details were obtained through static reverse engineering of libida.so (IDA Pro's core library) and the Lumina server binary. The decompilation output was... not pretty. Variable names like v91, v89, v188 don't exactly document themselves.

Decompile, trace data flow, capture traffic, build, test, fix, repeat. Dazhbog has processed millions of records at this point.


Acknowledgments

Thanks to everyone who's poked at Lumina before. The IDA SDK docs helped where they existed.


Last updated: December 2025

License: This document is provided for educational and interoperability purposes. IDA Pro is a trademark of Hex-Rays SA.