Complete specification from reverse engineering libida.so and the Lumina server
Table of Contents
- Background
- Architecture Overview
- Wire Protocol
- Varint Encoding
- CalcRel Hash
- Metadata Format
- Delta Encoding
- Tag Reference
- Implementation Notes
- Reference Tables
Appendices
- A: Complete Packet Structures
- B: Memory Layouts (from RE)
- C: Dazhbog Implementation References
- D: Test Vectors
1. Background
Lumina is Hex-Rays' function signature sharing service. You query it with a function hash, you get back names, types, comments. The official server requires a license, and the protocol isn't documented anywhere.
I was using Lumen for a while, an open-source alternative. It worked, but queries were slow and the matching felt rudimentary. I wanted better relevance signals: name similarity scoring, function size proximity, popularity weighting, recency decay, co-occurrence patterns from binaries that share functions. So I reversed the protocol.
The result is Dazhbog. This document is everything I learned along the way: wire format, hash computation, metadata encoding. Validated against 5+ million records, tested with real IDA clients.
2. Architecture Overview
Lumina is conceptually simple. It's a key-value store where:
- Key = MD5 hash of a function's "normalized" bytes (more on this later)
- Value = metadata blob containing names, types, comments, stack frames
The protocol is binary RPC over TLS on port 443 (or 20667 for non-TLS). Client sends a request, server sends a response. No persistent connections, no streaming, no fancy stuff.
┌─────────────────┐ ┌─────────────────┐
│ IDA Client │ │ Lumina Server │
│ │ │ │
│ ┌───────────┐ │ TLS/443 │ ┌───────────┐ │
│ │ libida.so │ │◄──────────────────►│ │ lumina │ │
│ └───────────┘ │ │ └───────────┘ │
│ │ │ │
│ "What's this │ │ "That's │
│ function?" │ │ malloc()." │
└─────────────────┘ └─────────────────┘
The conversation looks like this:
Client Server
│ │
├───── HELO ──────────►│ "IDA 8.3, here's my license"
│ │
│◄────── OK ───────────┤ "You have these features"
│ │
├─── PullMetadata ────►│ "Know these 50 hashes?"
│ │
│◄── PullResult ───────┤ "Here's 23 matches"
│ │
├─── PushMetadata ────►│ "I analyzed these"
│ │
│◄────── OK ───────────┤ "Stored"
│ │
Simple enough. The details are where it gets interesting.
3. Wire Protocol
Every packet has the same framing:
┌────────────────┬──────────┬─────────────────────┐
│ Length (4B) │ Type (1B)│ Body (variable) │
└────────────────┴──────────┴─────────────────────┘
Here's your first gotcha: that length field is big-endian. Everything else in the entire protocol is little-endian. I don't know why. Maybe someone at Hex-Rays thought it would be funny.
// The moment you realize the endianness is wrong
let len = u32::from_be_bytes(lenb) as usize; // NOT from_le_bytes!
Message Types
| Code | Name | Who Sends It | What It Does |
|---|---|---|---|
| 0x0a | OK | Server | "Request succeeded" |
| 0x0b | FAIL | Server | "Something broke" (includes error message) |
| 0x0d | HELO | Client | Version negotiation + auth |
| 0x0e | PullMetadata | Client | "Give me info for these hashes" |
| 0x0f | PullResult | Server | "Here's what I found" |
| 0x10 | PushMetadata | Client | "Store this metadata" |
There are a few more (0x18 for history deletion, 0x2f for history queries) but these are the ones that matter.
The HELO Dance
The handshake is straightforward:
┌─────────┬─────────┬────────┬──────┬──────┐
│ version │ license │ id[6] │ user │ pass │
│ varint │ len+data│ 6 bytes│ cstr │ cstr │
└─────────┴─────────┴────────┴──────┴──────┘
└──── v3+ ────┘
Protocol versions 1-6 are accepted. Anything higher gets rejected:
// From the server binary
if ( version > 6 )
{
send_error("This server doesn't support version %d", version);
}
Version 3 added username/password auth. Version 5 added feature flags in the response. If you're building a server, just accept everything and return what IDA expects.
Size Limits (and Trust)
The server tracks whether a client is "trusted." Untrusted clients are limited to 8KB packets. Trusted clients can send up to 2GB.
// Untrusted? You get 8KB max
if ( !is_trusted && packet_len > 0x2000 )
{
reject();
}
What makes a client trusted? Valid license data. For a private server, you probably want to just trust everyone, or implement your own auth.
4. Varint Encoding
IDA doesn't use protobuf varints. It doesn't use LEB128. It has its own encoding with three variants: pack_dw (16-bit), pack_dd (32-bit), and pack_dq (64-bit).
pack_dd (32-bit)
The most common variant:
| First Byte | Format | Bytes Total | How to Decode |
|---|---|---|---|
0xxxxxxx |
1 | 1 | Value is the byte itself (0-127) |
10xxxxxx |
2 | 2 | `((b0 & 0x3F) << 8) |
110xxxxx |
3 | 5 | Ignore b0, read next 4 bytes as BE u32 |
111xxxxx |
4 | 1-5 | Continuation encoding |
Format 3 is the trap. See that 110xxxxx? The 5 bits in the first byte are not part of the value. They're just a format marker. The actual value is in the next 4 bytes, big-endian.
I spent an embarrassing amount of time debugging this because I assumed the first byte contributed bits to the value. It doesn't.
pub fn unpack_dd(data: &mut &[u8]) -> Option<u32> {
let b0 = data[0];
*data = &data[1..];
if b0 & 0x80 == 0 {
// Format 1: just the byte
Some(b0 as u32)
} else if b0 & 0x40 == 0 {
// Format 2: 14 bits across 2 bytes
let b1 = data[0]; *data = &data[1..];
Some((((b0 & 0x3F) as u32) << 8) | (b1 as u32))
} else if b0 & 0x20 == 0 {
// Format 3: IGNORE b0, read 4 bytes big-endian
// This is the gotcha. Don't use b0's bits.
let v = u32::from_be_bytes([data[0], data[1], data[2], data[3]]);
*data = &data[4..];
Some(v)
} else {
// Format 4: continuation
let mut value = (b0 & 0x1F) as u32;
let mut shift = 5;
loop {
let b = data[0]; *data = &data[1..];
value |= ((b & 0x7F) as u32) << shift;
if b & 0x80 == 0 { break; }
shift += 7;
}
Some(value)
}
}
pack_dq (64-bit)
Same idea, but format 3 can read either 4 or 8 bytes depending on a flag bit:
110x0xxx→ read 4 bytes (32-bit value)110x1xxx→ read 8 bytes (64-bit value)
Encoding Direction
Going the other way is simpler because you just pick the smallest format that fits:
fn pack_dd(v: u32) -> Vec<u8> {
match v {
0..=0x7f => vec![v as u8],
0x80..=0x3fff => vec![0x80 | (v >> 8) as u8, v as u8],
_ => {
let mut out = vec![0xc0];
out.extend_from_slice(&v.to_be_bytes());
out
}
}
}
5. CalcRel Hash
How do you hash a function so that the same code at different addresses produces the same hash?
The Problem
Consider this x86 snippet:
func_example:
push ebp ; 55
mov ebp, esp ; 89 E5
call some_func ; E8 73 56 34 12 ← relative offset!
mov eax, [global] ; A1 21 43 65 87 ← absolute address!
pop ebp ; 5D
ret ; C3
If you hash the raw bytes 55 89 E5 E8 73 56 34 12 A1 21 43 65 87 5D C3, you'll get a different hash every time the function is compiled at a different address, because those call and mov operands change.
The Solution: Placeholder Masks
IDA's processor modules implement something called ev_calcrel (event 0x52). For each instruction, it returns:
- The raw instruction bytes
- A mask indicating which bits are position-dependent
The mask semantics are:
- 0 = keep this bit (opcode, register, etc.)
- 1 = mask this bit (address/offset, set to zero)
Then you compute: normalized = raw & ~mask
Worked Example
Instruction Raw Bytes Mask Normalized
─────────────── ──────────────── ──────────────── ────────────────
push ebp 55 00 55
mov ebp, esp 89 E5 00 00 89 E5
call +0x12345673 E8 73 56 34 12 00 FF FF FF FF E8 00 00 00 00
mov eax, [abs] A1 21 43 65 87 00 FF FF FF FF A1 00 00 00 00
pop ebp 5D 00 5D
ret C3 00 C3
Normalized stream: 55 89 E5 E8 00 00 00 00 A1 00 00 00 00 5D C3
Hash that with MD5 and you've got your position-independent function signature.
The Decompiler Output (Annotated)
Here's what the normalization loop looks like after decompilation. I've added comments because the original variable names are... not helpful.
// This is the core loop that builds the normalized byte stream
while ( 1 )
{
mask_ptr = placeholder_mask;
do
{
mask_byte = *mask_ptr;
// THE CRITICAL LINE: raw AND (NOT mask)
normalized_byte = raw_byte & ~mask_byte;
output[out_idx] = normalized_byte;
raw_byte >>= 8;
// ... continues for each byte of instruction
}
while ( byte_idx < insn_len );
// ... next instruction
}
That raw_byte & ~mask_byte is the entire algorithm. The rest is just iteration.
Architecture Coverage
Each processor module knows what to mask:
| Architecture | What Gets Masked |
|---|---|
| x86/x64 | Call/jmp offsets, absolute addresses, ModRM displacements |
| ARM | B/BL offsets, PC-relative loads (LDR rx, [pc, #offset]) |
| ARM64 | ADRP immediates, branch targets |
| MIPS | J/JAL targets, branch offsets |
Multi-Chunk Functions
Functions can have multiple non-contiguous chunks (think: cold code, exception handlers). All chunks contribute to a single hash, iterated in order:
func_tail_iterator_set(); // Start iterating chunks
total_size = 0;
for ( i = 0; ; ++i )
{
total_size += chunk_end - chunk_start;
next_chunk = get_fchunk_1();
if ( !next_chunk )
break;
// Process chunk...
}
The hash gets you a lookup key. The value is a metadata blob containing everything IDA knows about the function: name, type signature, comments, stack frame layout, variable types.
Blob Structure
┌─────────────────┬─────────────────────────────────────────────┐
│ func_size (4B) │ Tagged Blocks... │
│ little-endian │ │
└─────────────────┴─────────────────────────────────────────────┘
First 4 bytes are the function size. Then comes a sequence of TLV (tag-length-value) blocks:
┌──────────────┬──────────────┬─────────────────────┐
│ Tag (varint) │ Size (varint)│ Data (Size bytes) │
└──────────────┴──────────────┴─────────────────────┘
Tag Types
| Tag | Name | What It Contains |
|---|---|---|
| 1 | FUNC_TYPE_INFO | Serialized function signature (tinfo_t) |
| 3 | FUNC_CMT | Function comment |
| 4 | FUNC_CMT_REP | Repeatable function comment |
| 5 | CMT_REGULAR | Inline comments (with locations) |
| 6 | CMT_REPEATABLE | Repeatable inline comments |
| 7 | EXTRA_CMT | Anterior/posterior comments |
| 8 | SP_DELTA | Stack pointer tracking |
| 9 | FRAME_DESC | Complete stack frame layout |
| 10 | VAR_TYPE_INFO | Variable type information |
| 11 | OP_INFO | Operand information (up to 8 per instruction) |
Tag 2 is reserved/unused. Tags 5-11 use delta encoding for locations (covered next section).
Parsing Tags
pub fn parse_tagged_blocks(data: &[u8]) -> io::Result<(u32, Vec<TaggedBlock>)> {
// First 4 bytes: function size
let func_size = u32::from_le_bytes([data[0], data[1], data[2], data[3]]);
let mut remaining = &data[4..];
let mut blocks = Vec::new();
while !remaining.is_empty() {
let tag = unpack_dd(&mut remaining)?;
let size = unpack_dd(&mut remaining)? as usize;
let block_data = remaining[..size].to_vec();
remaining = &remaining[size..];
blocks.push(TaggedBlock { tag, data: block_data });
}
Ok((func_size, blocks))
}
7. Delta Encoding
Tags 5-11 store data at specific locations within the function. Rather than storing absolute offsets for each entry, Lumina uses delta encoding to save space.
How Delta Encoding Works
Entries are sorted by location. The first entry stores its absolute offset. Each subsequent entry stores the delta from the previous one.
Entry 1: chunk=0, offset=16 → encode: chunk=0, offset=16
Entry 2: chunk=0, offset=24 → encode: delta=8 (24-16)
Entry 3: chunk=0, offset=30 → encode: delta=6 (30-24)
Entry 4: chunk=1, offset=4 → encode: 0 (marker), chunk=1, offset=4
Entry 5: chunk=1, offset=12 → encode: delta=8 (12-4)
When the chunk changes, you emit a zero marker followed by the new chunk index and absolute offset.
Decoding
if is_first_entry {
chunk = val;
offset = unpack_dd(&mut data)?;
is_first_entry = false;
} else if val == 0 {
// Chunk change marker
chunk = unpack_dd(&mut data)?;
offset = unpack_dd(&mut data)?;
} else {
// Delta from previous
offset += val;
}
The +1 Encoding
Many fields can be negative (stack offsets, SP deltas). Varints are unsigned. Solution? Add 1 before encoding, subtract 1 after decoding.
| Actual Value | Encoded Value |
|---|---|
| -1 | 0 |
| 0 | 1 |
| 1 | 2 |
| n | n + 1 |
This shows up in:
- SP delta values (tag 8)
- Frame size and args size (tag 9)
- Stack offsets in frame variables
- Various operand fields
// Decoding
let encoded = unpack_dq(&mut data)?;
let actual = (encoded as i64) - 1; // Don't forget the -1!
// Encoding
let encoded = (actual + 1) as u64;
pack_dq(encoded);
I found this by noticing stack offsets were always off by one.
8. Tag Reference
Let's go through each tag format. These are based on actual data from millions of records.
Tag 1: Function Type Info
The serialized tinfo_t structure. IDA's internal type representation.
┌──────────────────┬────────────────────────────────┐
│ flags (1 byte) │ serialized tinfo data │
└──────────────────┴────────────────────────────────┘
The tinfo format is complex and out of scope here. Just store it opaquely.
Plain UTF-8 text, no null terminator:
┌────────────────────────────────────────────────────┐
│ comment text (UTF-8) │
└────────────────────────────────────────────────────┘
Tag 3 is non-repeatable (shown only at function start). Tag 4 is repeatable (shown at every xref).
Comments at specific addresses within the function.
[location via delta encoding]
pack_dd(text_length) ← includes null in count!
[text_bytes] ← length-1 actual bytes
Gotcha: The length field includes a conceptual null terminator, but the actual bytes don't contain it. For "hello" (5 chars), length = 6, but you read 5 bytes.
let text_len = unpack_dd(&mut data)?;
let actual_len = if text_len > 0 { text_len - 1 } else { 0 };
let text = &data[..actual_len];
Comments above or below an instruction line.
[location via delta encoding]
pack_dd(anterior_length)
[anterior_bytes] ← if length > 0
pack_dd(posterior_length)
[posterior_bytes] ← if length > 0
Tag 8: SP Delta
Stack pointer tracking points for call analysis.
[location via delta encoding]
pack_dq(delta + 1) ← +1 encoded!
Tag 9: Frame Description
The most complex tag. Complete stack frame layout.
pack_dq(frame_size + 1) ← +1 encoded
pack_dq(args_size + 1) ← +1 encoded, 0 means "unknown" (actual -1)
pack_dw(flags)
pack_dd(var_count)
For each variable:
pack_dd(name_length)
[name_bytes]
pack_dq(offset + 1) ← +1 encoded
pack_dq(size)
pack_dd(type_length)
[type_bytes]
Tags 10: Variable Type Info
Per-instruction type information.
[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)
Tag 11: Operand Info
Per-instruction operand information (up to 8 operands per instruction).
[location via delta encoding]
[type flags byte]
pack_dq(value1 + 1)
9. Implementation Notes
Dazhbog is the implementation I built to validate this spec.
Storage Format
Records need more than just the metadata blob. You want:
- The 128-bit hash key
- Timestamp (for history/versioning)
- Function name (for display without parsing metadata)
- Popularity score (how many times it's been requested)
┌─────────────────────────────────────────────────────┐
│ Header (12 bytes) │
├────────────┬────────────┬───────────────────────────┤
│ magic (4B) │ length (4B)│ checksum (4B) │
│ "LMN1" │ │ │
├────────────┴────────────┴───────────────────────────┤
│ Body (variable) │
│ key_lo (8B) | key_hi (8B) | timestamp (8B) | ... │
│ name (variable) | metadata (variable) │
└─────────────────────────────────────────────────────┘
Real Data Examples
From a database with 5+ million records, here's what typical entries look like:
Record: "_fetch_headers"
Hash: 0x80df1f34c6f4cd3ecff5973f7ef61cb8
Timestamp: 2025-11-18 15:17:54 UTC
Popularity: 12
Blocks:
- tag=1 (FuncTypeInfo): 111 bytes
- tag=9 (FrameDesc): frame_size=152, args=unknown
- tag=10 (VarTypeInfo): 7 entries
Record: "patch_handler_697"
Hash: 0xb6bf85ebed7a4cf7ff9cf6b12593ff54
Blocks:
- tag=1 (FuncTypeInfo): 26 bytes
- tag=5 (Comment): chunk=0, off=58, "patch_697"
- tag=6 (RepComment): chunk=0, off=58, "patch_697"
- tag=9 (FrameDesc): frame_size=72
- tag=10 (VarTypeInfo): 7 entries
Most records have tags 1, 9, and 10. Comments (tags 3-7) are less common. SP deltas (tag 8) are rare.
What Matters for Matching
The hash is the primary key. Everything else is metadata enrichment. When returning results:
- Exact hash match is required
- Most recent entry wins if there are duplicates
- Popularity can be used for ranking when multiple functions in a binary match
10. Reference Tables
Varint Quick Reference
| Encoding | First Byte | Total Bytes | Max Value |
|---|---|---|---|
| pack_dw | 0xxxxxxx |
1 | 127 |
| pack_dw | 10xxxxxx |
2 | 16,383 |
| pack_dw | 11xxxxxx |
3 | 65,535 |
| pack_dd | 0xxxxxxx |
1 | 127 |
| pack_dd | 10xxxxxx |
2 | 16,383 |
| pack_dd | 110xxxxx |
5 | 4,294,967,295 |
| pack_dd | 111xxxxx |
1-5 | varies |
Message Types
| Code | Name | Direction |
|---|---|---|
| 0x0a | OK | S→C |
| 0x0b | FAIL | S→C |
| 0x0d | HELO | C→S |
| 0x0e | PullMetadata | C→S |
| 0x0f | PullResult | S→C |
| 0x10 | PushMetadata | C→S |
Tag Summary
| Tag | Name | Has Location | +1 Encoded Fields |
|---|---|---|---|
| 1 | FUNC_TYPE_INFO | No | No |
| 3 | FUNC_CMT | No | No |
| 4 | FUNC_CMT_REP | No | No |
| 5 | CMT_REGULAR | Yes (delta) | No |
| 6 | CMT_REPEATABLE | Yes (delta) | No |
| 7 | EXTRA_CMT | Yes (delta) | No |
| 8 | SP_DELTA | Yes (delta) | Yes (delta value) |
| 9 | FRAME_DESC | No | Yes (sizes, offsets) |
| 10 | VAR_TYPE_INFO | Yes (delta) | Yes (value1 only) |
| 11 | OP_INFO | Yes (delta) | Yes (values) |
Common Pitfalls
| Problem | Symptom | Solution |
|---|---|---|
| Wrong endianness on packet length | Connection drops immediately | Use from_be_bytes for length only |
| Varint format 3 misparse | Huge garbage values | Don't use first byte's bits as value |
| Missing +1 decode | Stack offsets off by one | Subtract 1 after decode |
| String length includes null | Strings have trailing garbage | Read length - 1 bytes |
| Delta encoding chunk change | Locations jump wildly | Check for 0 marker |
Appendix A: Complete Packet Structures
For those implementing a client or server, here are the exact byte layouts.
HELO Packet (0x0d)
┌───────────────────────────────────────────┐
│ protocol_version (varint) │
├───────────────────────────────────────────┤
│ license_data_len (varint) │
├───────────────────────────────────────────┤
│ license_data (license_data_len bytes) │
├───────────────────────────────────────────┤
│ license_id (6 bytes, fixed) │
├───────────────────────────────────────────┤
│ username (cstring) ────────────── v3+ │
├───────────────────────────────────────────┤
│ password (cstring) ────────────── v3+ │
└───────────────────────────────────────────┘
HELO Response - OK (0x0a)
┌───────────────────────────────────────────┐
│ features (varint) ─────────────── v5+ │
└───────────────────────────────────────────┘
FAIL Response (0x0b)
┌───────────────────────────────────────────┐
│ error_code (varint) │
├───────────────────────────────────────────┤
│ error_message (cstring) │
└───────────────────────────────────────────┘
PullMetadata Request (0x0e)
┌───────────────────────────────────────────┐
│ flags (varint) │
├───────────────────────────────────────────┤
│ count (varint) │
├───────────────────────────────────────────┤
│ For each function: │
│ ┌───────────────────────────────────┐ │
│ │ md5_hash (16 bytes) │ │
│ ├───────────────────────────────────┤ │
│ │ func_size (varint) │ │
│ └───────────────────────────────────┘ │
└───────────────────────────────────────────┘
PullResult Response (0x0f)
┌───────────────────────────────────────────┐
│ found_count (varint) │
├───────────────────────────────────────────┤
│ For each result: │
│ ┌───────────────────────────────────┐ │
│ │ status (varint): 0=miss, 1=hit │ │
│ ├───────────────────────────────────┤ │
│ │ If status == 1: │ │
│ │ ┌───────────────────────────┐ │ │
│ │ │ score (varint) │ │ │
│ │ ├───────────────────────────┤ │ │
│ │ │ name_len (varint) │ │ │
│ │ ├───────────────────────────┤ │ │
│ │ │ name (name_len bytes) │ │ │
│ │ ├───────────────────────────┤ │ │
│ │ │ metadata_len (varint) │ │ │
│ │ ├───────────────────────────┤ │ │
│ │ │ metadata (metadata_len) │ │ │
│ │ └───────────────────────────┘ │ │
│ └───────────────────────────────────┘ │
└───────────────────────────────────────────┘
PushMetadata Request (0x10)
┌───────────────────────────────────────────┐
│ flags (varint) │
├───────────────────────────────────────────┤
│ idb_filepath_len (varint) │
├───────────────────────────────────────────┤
│ idb_filepath (idb_filepath_len bytes) │
├───────────────────────────────────────────┤
│ input_filepath_len (varint) │
├───────────────────────────────────────────┤
│ input_filepath (input_filepath_len bytes) │
├───────────────────────────────────────────┤
│ input_md5 (16 bytes) │
├───────────────────────────────────────────┤
│ hostname_len (varint) │
├───────────────────────────────────────────┤
│ hostname (hostname_len bytes) │
├───────────────────────────────────────────┤
│ func_count (varint) │
├───────────────────────────────────────────┤
│ For each function: │
│ ┌───────────────────────────────────┐ │
│ │ func_md5 (16 bytes) │ │
│ ├───────────────────────────────────┤ │
│ │ func_name_len (varint) │ │
│ ├───────────────────────────────────┤ │
│ │ func_name (func_name_len bytes) │ │
│ ├───────────────────────────────────┤ │
│ │ func_size (varint) │ │
│ ├───────────────────────────────────┤ │
│ │ metadata_len (varint) │ │
│ ├───────────────────────────────────┤ │
│ │ metadata (metadata_len bytes) │ │
│ └───────────────────────────────────┘ │
└───────────────────────────────────────────┘
Appendix B: Memory Layouts (from RE)
These structures were recovered from reverse engineering libida.so. Useful if you're trying to understand the decompiler output.
struct comment_entry {
int32_t chunk_index; // 0x00: Which function chunk (0 for main)
int32_t offset; // 0x04: Byte offset within chunk
char *text; // 0x08: Pointer to comment text
size_t length; // 0x10: Text length (excluding null)
size_t capacity; // 0x18: Allocated buffer size
}; // Total: 0x20 (32 bytes)
struct extra_comment_entry {
int32_t chunk_index; // 0x00
int32_t offset; // 0x04
char *anterior_text; // 0x08: Comment above instruction
size_t anterior_len; // 0x10
size_t anterior_cap; // 0x18
char *posterior_text; // 0x20: Comment below instruction
size_t posterior_len; // 0x28
size_t posterior_cap; // 0x30
}; // Total: 0x38 (56 bytes)
SP Delta Entry (16 bytes)
struct sp_delta_entry {
int32_t chunk_index; // 0x00
int32_t offset; // 0x04
int64_t delta; // 0x08: Stack pointer change at this point
}; // Total: 0x10 (16 bytes)
Frame Description
struct frame_desc {
int64_t frame_size; // Total stack frame size
int64_t args_size; // Size of arguments area (-1 if unknown)
uint16_t flags; // Frame flags
uint32_t var_count; // Number of local variables
// Followed by var_count local_var_t entries
};
struct local_var_t {
char *name; // Variable name
size_t name_len;
int64_t offset; // Stack offset (can be negative)
uint64_t size; // Size in bytes
uint8_t *type_info; // Serialized tinfo_t
size_t type_len;
};
Operand Info Entry (2192 bytes)
This one's large because it supports up to 8 operands per instruction:
struct op_info_entry {
int32_t chunk_index; // 0x00
int32_t offset; // 0x04
uint64_t flags; // 0x08
operand_info_t operands[8]; // 0x10: 8 operand slots × 272 bytes each
}; // Total: 0x890 (2192 bytes)
struct operand_info_t {
uint64_t addr; // 0x00
uint64_t value1; // 0x08
uint64_t value2; // 0x10
uint32_t flags; // 0x18
uint8_t reserved[252]; // Remaining fields (not fully mapped)
}; // Total: 0x110 (272 bytes)
Appendix C: Dazhbog Implementation References
If you're reading the Dazhbog source code, here's where to find things:
| Feature | File | Lines | Notes |
|---|---|---|---|
| Packet read/write | src/lumina.rs | 504-623 | Async framing |
| HELO parsing | src/lumina.rs | 150-197 | Version detection |
| Varint decode | src/metadata.rs | 58-114 | All formats |
| Varint encode | src/lumina.rs | 39-103 | pack_dd, pack_dq |
| Tag parsing | src/metadata.rs | 210-466 | Main entry point |
| Comment parsing | src/metadata.rs | 531-605 | Tags 5, 6 |
| Extra comment parsing | src/metadata.rs | 607-680 | Tag 7 |
| SP delta parsing | src/metadata.rs | 682-720 | Tag 8 |
| Frame parsing | src/metadata.rs | 770-828 | Tag 9 |
| Delta encoding | src/metadata.rs | 557-575 | Shared logic |
| Record storage | src/engine/segment.rs | 100-130 | LMN1 format |
| Server RPC dispatch | src/server.rs | 422-822 | Message handling |
Appendix D: Test Vectors
For validating your implementation.
Varint Encoding
Value pack_dd output pack_dq output
────────── ──────────────── ────────────────
0 00 00
127 7F 7F
128 80 80 80 80
16383 BF FF BF FF
16384 C0 00 00 40 00 C0 00 00 40 00
0xDEADBEEF C0 DE AD BE EF C0 DE AD BE EF
0xFFFFFFFF C0 FF FF FF FF C0 FF FF FF FF
Metadata Blob (Minimal)
Hex: 00 00 00 00
Meaning: func_size=0, no tags
Hex: 10 00 00 00 03 05 68 65 6C 6C 6F
└─────┬────┘ │ │ └────┬─────┘
func_size=16 │ │ "hello"
tag=3│
size=5
Hex: 20 00 00 00 05 0B 00 10 06 68 65 6C 6C 6F
└─────┬────┘ │ │ │ │ │ └────┬─────┘
func_size=32 │ │ │ │ │ "hello"
tag=5│ │ │ │
size=11 │ text_len=6 (includes null)
│ \
chunk=0 │
offset=16
Frame Description
Hex: 09 0A 99 01 00 01 00
│ │ └─┬─┘ │ │ │
tag=9│ │ │ │ var_count=0
size=10 │ │ flags=0x0001
│ args_size=-1 (+1 encoded as 0)
frame_size=152 (+1 encoded as 0x99 0x01 = 153)
Methodology Note
All protocol details were obtained through static reverse engineering of libida.so (IDA Pro's core library) and the Lumina server binary. The decompilation output was... not pretty. Variable names like v91, v89, v188 don't exactly document themselves.
Decompile, trace data flow, capture traffic, build, test, fix, repeat. Dazhbog has processed millions of records at this point.
Acknowledgments
Thanks to everyone who's poked at Lumina before. The IDA SDK docs helped where they existed.
Last updated: December 2025
License: This document is provided for educational and interoperability purposes. IDA Pro is a trademark of Hex-Rays SA.