A comprehensive reverse-engineering effort to understand and document Apple's Rosetta 2 binary translation technology.
Table of Contents
- Background
- What is Rosetta?
- What is Rosetta 2?
- How Apple Delivers Rosetta 2 in macOS
- Technical Architecture
- This Project
- File Structure
- Usage
- Progress
- References
Background
The Architecture Transition
In November 2020, Apple announced their first Apple Silicon Macs, marking a historic transition from Intel x86_64 processors to their own ARM-based M1 chips. This was Apple's third major architecture transition:
- 1994: Motorola 68000 -> PowerPC
- 2006: PowerPC -> Intel x86_64
- 2020: Intel x86_64 -> Apple Silicon (ARM64)
Each transition required a binary translation solution to run existing software during the migration period. Rosetta 2 is Apple's most sophisticated binary translation system yet.
What is Rosetta?
Rosetta (2006-2011) was Apple's first dynamic binary translation software, enabling PowerPC applications to run on Intel-based Macs.
Key Features:
- Dynamic Translation: Translated PowerPC code to x86_64 at runtime
- OS Integration: Built into Mac OS X 10.4 (Tiger) through 10.6 (Snow Leopard)
- Transparent Operation: Users launched PowerPC apps normally
- Performance Overhead: Typically 20-50% slower than native code
Rosetta was removed in Mac OS X 10.7 (Lion), completing the Intel transition.
What is Rosetta 2?
Rosetta 2 is Apple's advanced dynamic binary translation technology that enables applications compiled for Intel x86_64 Macs to run on Apple Silicon (ARM64) Macs.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ User Application (x86_64) │
├─────────────────────────────────────────────────────────────┤
│ Rosetta 2 Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Translator │ │ Runtime │ │ System Call │ │
│ │ (AOT/JIT) │ │ Library │ │ Translation │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ macOS Kernel (ARM64) │
├─────────────────────────────────────────────────────────────┤
│ Apple Silicon Hardware │
└─────────────────────────────────────────────────────────────┘
Key Technologies
-
Ahead-of-Time (AOT) Translation
- Translates x86_64 binaries to ARM64 at install time
- Stores translated code in a cache for faster subsequent launches
- Reduces runtime overhead compared to pure JIT translation
-
Just-in-Time (JIT) Translation
- Translates code blocks on-demand during execution
- Handles dynamically loaded code and self-modifying code
- Maintains translation cache for efficiency
-
Instruction Set Translation
- x86_64 -> ARM64 instruction mapping
- SSE/AVX -> NEON vector instruction translation
- x86_64 flags -> ARM64 condition codes
-
System Call Translation
- Translates x86_64 macOS syscalls to ARM64 equivalents
- Handles different calling conventions
- Manages register state across syscall boundaries
-
Runtime Support
- CPU feature detection emulation
- Thread-local storage handling
- Signal and exception handling
How Apple Delivers Rosetta 2 in macOS
Installation Location
Rosetta 2 is located at:
/Library/Apple/usr/libexec/oah/
├── rosetta # Main translator binary
├── rosettad # Rosetta daemon
└── librosetta.* # Runtime libraries
The oah directory stands for "Old Architecture Hardware" - a continuation from the PowerPC transition era.
Automatic Installation
On Apple Silicon Macs, Rosetta 2 is not installed by default. It's triggered in two ways:
-
First Launch Prompt
The "Rosetta" software is not installed on your Mac. Rosetta translates apps from Intel-based Macs for use on Apple Silicon Macs. -
Command-Line Installation
softwareupdate --install-rosetta --agree-to-license
Components Delivered
| Component | Description |
|---|---|
RosettaLinux/rosetta |
Core ARM64 binary containing translation engine |
RosettaLinux/rosettad |
System daemon managing translation services |
debugserver -> /usr/libexec/rosetta/debugserver |
Debugging support for translated processes |
libRosettaRuntime |
Runtime library linked during translation |
translate_tool -> /usr/libexec/rosetta/translate_tool |
Translation tool for building translated binaries |
Integration with macOS
- launchd Integration: Rosetta daemon runs as a system service
- Code Signing: Translated binaries are code-signed automatically
- Gatekeeper: Rosetta-translated apps pass security checks
- System Integrity Protection: Protected from modification
Technical Architecture
Translation Process
┌──────────────────────────────────────────────────────────────────┐
│ Phase 1: Binary Loading │
│ ─────────────────────────────────────────────────────────────── │
│ 1. Load x86_64 Mach-O binary │
│ 2. Parse segments, sections, symbols │
│ 3. Validate code signatures │
│ 4. Map into translation context │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Phase 2: AOT Translation │
│ ─────────────────────────────────────────────────────────────── │
│ 1. Disassemble x86_64 code sections │
│ 2. Translate instructions to ARM64 │
│ 3. Apply optimizations │
│ 4. Store in translation cache (~/.oah) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Phase 3: Runtime Execution │
│ ─────────────────────────────────────────────────────────────── │
│ 1. Load translated ARM64 code │
│ 2. Set up x86_64 emulation context │
│ 3. Handle JIT translations for dynamic code │
│ 4. Translate syscalls on-the-fly │
└──────────────────────────────────────────────────────────────────┘
Key Translation Challenges
-
Register Mapping
- x86_64 has 16 GPRs; ARM64 has 31 GPRs
- x86_64 flags register -> ARM64 NZCV flags
- RIP (instruction pointer) emulation
-
Memory Ordering
- x86_64: Strong memory ordering (TSO)
- ARM64: Weak memory ordering
- Requires memory barriers for correctness
-
Vector Instructions
- SSE (128-bit) -> NEON (128-bit) direct mapping
- AVX (256-bit) -> NEON pair emulation
- Different exception handling for SIMD
-
Calling Conventions
- x86_64: First 6 args in registers (RDI, RSI, RDX, RCX, R8, R9)
- ARM64: First 8 args in registers (X0-X7)
- Different stack frame layouts
This Project
This repository contains reverse-engineered implementations of functions from the Rosetta 2 binaries. Through careful analysis and decompilation, we've identified and documented the semantic purpose of hundreds of functions.
Goals
- Educational: Understand how Rosetta 2 works internally
- Documentation: Create comprehensive documentation of translation techniques
- Implementation: Provide clean, well-documented C implementations
- Community: Share knowledge with the reverse-engineering community
What We've Accomplished
- 828 functions identified and named in the main
rosettabinary - 612 functions fully implemented with clean C code
- 66 categories of functionality documented
- Complete function name mappings with semantic names
Categories of Functions
| Category | Functions | Description |
|---|---|---|
| Entry Point | 1 | Rosetta initialization |
| FP/Vector Operations | ~20 | Floating-point and SIMD state management |
| SIMD Memory Operations | ~10 | memchr, memcmp, memcpy with SIMD |
| Vector Operations | ~30 | NEON vector arithmetic, comparison |
| Binary Translation | ~50 | x86_64 -> ARM64 instruction translation |
| Syscall Handlers | ~60 | System call translation and forwarding |
| Memory Management | ~20 | malloc, free, mmap wrappers |
| Hash Functions | ~5 | Address hashing for translation cache |
| String Operations | ~30 | SIMD-optimized string functions |
| Cryptographic Extensions | ~30 | AES, SHA, CRC32 passthrough |
| ELF Parsing | ~15 | Linux binary format support |
| Translation Cache | ~20 | AOT/JIT cache management |
File Structure
Core Files
Rosetta2/
├── README.md # This file
├── rosetta_decomp.c # Original decompilation (74,677 lines)
├── rosettad_decomp.c # Daemon decompilation (44,064 lines)
├── rosetta_refactored.c # Minimal wrapper (59 lines) - includes modular headers
├── rosetta_refactored.c.legacy # Legacy refactored code (19,302 lines) - archived
├── rosetta_refactored.h # Main header (1,215 lines)
├── rosetta_refactored_complete.c # Single-file implementation (2,686 lines)
├── rosetta_function_map.h # Function name mapping (828 functions)
├── rosettad_refactored.c # Daemon-side refactoring
└── SESSION_*.md # Session logs (30+ sessions)
Modular Translation Infrastructure (56 C files + 65 H files)
The translation infrastructure is fully modularized into categorized components:
├── rosetta_types.h # Base type definitions
├── rosetta_x86_decode.h/.c # x86 decoder
├── rosetta_arm64_emit.h/.c # ARM64 emitter
├── rosetta_translate_alu.h/.c # ALU translations
├── rosetta_translate_memory.h/.c # Memory translations
├── rosetta_translate_branch.h/.c # Branch translations
├── rosetta_translate_bit.h/.c # Bit manipulation
├── rosetta_translate_string.h/.c # String operations
├── rosetta_translate_special.h/.c # Special instructions
├── rosetta_translate_block.h/.c # Block translation coordinator
├── rosetta_translate_dispatch.h/.c # Instruction dispatch
├── rosetta_trans_dispatch.h/.c # Main dispatch layer
├── rosetta_trans_alu.h/.c # ALU emulation layer
├── rosetta_trans_mem.h/.c # Memory emulation layer
├── rosetta_trans_branch.h/.c # Branch emulation layer
├── rosetta_trans_bit.h/.c # Bit emulation layer
├── rosetta_trans_string.h/.c # String emulation layer
├── rosetta_trans_special.h/.c # Special instruction emulation
├── rosetta_trans_system.h/.c # System instruction emulation
├── rosetta_trans_neon.h/.c # NEON emulation layer
└── Makefile.modular # Modular build system
Functional Modules
Core infrastructure and support modules:
├── rosetta_jit.h/.c # JIT compilation infrastructure
├── rosetta_exec.h/.c # Execution engine (NEW)
├── rosetta_init.h/.c # Initialization & FP registers (NEW)
├── rosetta_codegen.h/.c # Code generation primitives
├── rosetta_cache.h/.c # Translation cache (AOT/JIT)
├── rosetta_transcache.h/.c # Translation cache management
├── rosetta_context.h/.c # CPU context management
├── rosetta_runtime.h/.c # Runtime support
├── rosetta_hash.h/.c # Address hashing
├── rosetta_memmgmt.h/.c # Memory management
├── rosetta_memory_utils.h/.c # Memory utilities
├── rosetta_utils.h/.c # Utility functions
├── rosetta_string_utils.h/.c # String utilities
├── rosetta_trans_helpers.h/.c # Translation helpers
├── rosetta_refactored_helpers.h/.c # Refactoring helpers
└── rosetta_refactored_types.h # Refactored type definitions
SIMD/Vector Modules
SIMD and vector operation modules:
├── rosetta_simd.h/.c # SIMD operations
├── rosetta_simd_mem.h/.c # SIMD memory operations
├── rosetta_simd_mem_helpers.h/.c # SIMD memory helpers
├── rosetta_vector.h/.c # Vector operations
├── rosetta_refactored_vector.h/.c # Refactored vector ops
├── rosetta_jit_emit.h/.c # JIT emission
├── rosetta_jit_emit_simd.h/.c # SIMD JIT emission
├── rosetta_fp_translate.h/.c # FP translation
├── rosetta_fp_helpers.h/.c # FP helpers
├── rosetta_trans_neon.c # NEON translation
└── rosetta_string_simd.c # SIMD string operations
Syscall Modules
Guest syscall handling and translation:
├── rosetta_syscalls.h/.c # Syscall translation core
├── rosetta_syscalls_impl.h/.c # Syscall implementations
└── rosetta_crypto.h/.c # Crypto instructions (AES, SHA, CRC32)
Additional Modules
Additional translation and support modules:
├── rosetta_translate.h/.c # Translation core
├── rosetta_translate_alu_impl.h/.c # ALU implementation details
├── rosetta_translate_memory_impl.h/.c # Memory implementation details
├── rosetta_translate_branch_impl.h/.c # Branch implementation details
├── rosetta_translate_special_impl.h/.c # Special implementation details
└── rosetta_arm64_insns.h # ARM64 instruction definitions
Test Files
├── test_jit.c # JIT unit tests (737 lines)
└── test_translate.c # Translation tests (1,059 lines)
Total: 50+ source files, ~150,000+ lines of code
Build System
Prerequisites
- GCC or Clang with C11 support
- macOS or Linux (POSIX-compatible system)
Building the Library
# Build static library using modular Makefile make -f Makefile.modular all # This creates librosetta.a static library
Building Individual Components
# Compile core modules gcc -c -I. -std=c11 rosetta_types.h gcc -c -I. -std=c11 rosetta_codegen.c gcc -c -I. -std=c11 rosetta_jit.c gcc -c -I. -std=c11 rosetta_x86_decode.c gcc -c -I. -std=c11 rosetta_arm64_emit.c # Compile translation modules gcc -c -I. -std=c11 rosetta_translate_alu.c gcc -c -I. -std=c11 rosetta_translate_memory.c gcc -c -I. -std=c11 rosetta_translate_branch.c gcc -c -I. -std=c11 rosetta_translate_block.c gcc -c -I. -std=c11 rosetta_translate_dispatch.c # Compile support modules gcc -c -I. -std=c11 rosetta_cache.c gcc -c -I. -std=c11 rosetta_context.c gcc -c -I. -std=c11 rosetta_syscalls.c gcc -c -I. -std=c11 rosetta_runtime.c
Running Tests
# Build and run JIT tests make -f Makefile.modular test_jit # Build and run translation tests make -f Makefile.modular test_translate
Using as a Library
# Link against the static library gcc -o my_app my_app.c -L. -lrosetta # Or compile with source files directly gcc -I. -o my_app my_app.c rosetta_*.c
Decompiled Source File Analysis
The original decompiled file rosetta_decomp.c contains string literals that reveal the original source code structure. These file names appear in assertion/error messages throughout the binary.
Note: The refactored code uses a different, more modular structure than the original.
Header Files (.h)
Register.hTaggedPointer.hRedBlackTree.hTransactionalList.hTranslator.hAssemblerBuffer.hBuilderBase.hIrBuilder_x86.h
C++ Source Files (.cpp)
Repatch.cppDecoder.cppFixup.cppAssemblerHelpers.cppOperand.cppOpcode.cppBasicBlock.cppThreadContextFcntl.cppInitStack.cppThread.cppThreadContext.cppThreadContextRuntimeSignals.cppThreadContextVm.cppVMAllocationTracker.cppVdso.cppProcMapsParser.cppThreadContextSignals.cppThreadContextSyscalls.cppTranslationCacheAot.cppTranslationCacheJit.cppTranslationCache.cppTranslator.cppRuntimeLibraryBridgeInternal.cppTwoLevelOffsetMap.cppDeltaCodedOffsetMap.cpp
C++ Header Files (.hpp)
AssemblerBase.hppTranslatorBase.hpp
Progress
Current Status
| Metric | Value |
|---|---|
| Total Functions | 828 |
| Functions Mapped | 828 (100%) |
| Functions Implemented | 600+ |
| Completion | ~75% |
| Categories Documented | 66 |
| Source Files | 56 C + 65 H |
| Total Lines of Code | ~150,000+ |
Modular Architecture (Complete)
The translation infrastructure is fully modularized into the following components:
| Module Category | Files | Description |
|---|---|---|
| Core Types | rosetta_types.h |
Base type definitions |
| Execution Engine | rosetta_exec.h/.c |
execute_translated, context switching |
| Initialization | rosetta_init.h/.c |
init_translation_env, FP registers |
| x86 Decoding | rosetta_x86_decode.h/.c |
x86_64 instruction decoder |
| ARM64 Emission | rosetta_arm64_emit.h/.c |
ARM64 code emission |
| Code Generation | rosetta_codegen.h/.c |
Code generation primitives |
| JIT Core | rosetta_jit.h/.c |
JIT compilation infrastructure |
| Translation Cache | rosetta_cache.h/.c, rosetta_transcache.h/.c |
Block caching (AOT/JIT) |
| Block Translation | rosetta_translate_block.h/.c |
Basic block translation |
| Instruction Dispatch | rosetta_trans_dispatch.h/.c |
Instruction dispatching |
| ALU Translation | rosetta_translate_alu.h/.c, rosetta_trans_alu.h/.c |
Arithmetic/logic ops |
| Memory Translation | rosetta_translate_memory.h/.c, rosetta_trans_mem.h/.c |
Load/store operations |
| Branch Translation | rosetta_translate_branch.h/.c, rosetta_trans_branch.h/.c |
Control flow |
| Bit Translation | rosetta_translate_bit.h/.c, rosetta_trans_bit.h/.c |
Bit manipulation |
| String Translation | rosetta_translate_string.h/.c, rosetta_trans_string.h/.c |
String operations |
| Special Translation | rosetta_translate_special.h/.c, rosetta_trans_special.h/.c |
Special instructions |
| System Translation | rosetta_trans_system.h/.c |
System registers |
| NEON Translation | rosetta_trans_neon.c |
SIMD/NEON operations |
| SIMD Ops | rosetta_simd.h/.c, rosetta_simd_mem.h/.c |
SIMD operations |
| Vector Ops | rosetta_vector.h/.c |
Vector operations |
| FP Translation | rosetta_fp_translate.h/.c, rosetta_fp_helpers.h/.c |
Floating-point |
| JIT Emit | rosetta_jit_emit.h/.c, rosetta_jit_emit_simd.h/.c |
JIT emission |
| Syscalls | rosetta_syscalls.h/.c, rosetta_syscalls_impl.h/.c |
Syscall handling |
| Crypto | rosetta_crypto.h/.c |
AES, SHA, CRC32 |
| Context | rosetta_context.h/.c |
CPU context save/restore |
| Runtime | rosetta_runtime.h/.c |
Runtime entry point |
| Memory Mgmt | rosetta_memmgmt.h/.c |
Memory management |
| Utilities | rosetta_utils.h/.c, rosetta_string_utils.h/.c |
Utility functions |
Total: 40+ modular components
Translation Coverage
| Category | Instructions |
|---|---|
| ALU | ADD, SUB, AND, OR, XOR, MUL, DIV, INC, DEC, NEG, NOT, SHL, SHR, SAR, ROL, ROR |
| Memory | MOV, MOVZX, MOVSX, MOVSXD, LEA, PUSH, POP, CMP, TEST |
| Branch | Jcc, JMP, CALL, RET, CMOVcc, SETcc, XCHG |
| Bit | BSF, BSR, POPCNT, BT, BTS, BTR, BTC |
| String | MOVS, STOS, LODS, CMPS, SCAS |
| Special | CPUID, RDTSC, SHLD, SHRD, CWD, CDQ, CQO, CLI, STI, NOP |
| SIMD | SSE, SSE2, SSE3, SSSE3, SSE4.x |
| FP | x87, SSE scalar FP |
| Crypto | AES-NI, SHA, CRC32 |
Recent Sessions
| Session | Focus | Files Created/Modified |
|---|---|---|
| 61+ | Full Modularization | 35+ modular components |
| 61 | Translation Modularization | 6 translation modules + x86_decode enhancements |
| 60 | Translation Infrastructure | translate_block() core implementation |
| 59 | Syscall Implementation | Additional syscall handlers |
| 58 | Syscall Translation | I/O vector and network handlers |
| 57 | Memory Management | VM allocation tracker enhancements |
| 56 | SIMD Operations | Advanced SIMD translations |
| 55 | FP/SIMD | Floating-point instruction translation |
| 54 | Crypto Extensions | AES-NI passthrough implementation |
| 53 | Crypto Extensions | SHA and CRC32 instructions |
| 52 | String Operations | SIMD-optimized string functions |
| 51 | Vector Operations | NEON vector arithmetic |
| 50 | Vector Conversions | Floating-point conversions |
| 49 | Translation Cache | AOT/JIT cache management |
| 48 | JIT Core | JIT compilation infrastructure |
| 46-47 | Code Generation | x86_64 code generation helpers |
| 45 | Decode Helpers | ARM64 decode utilities |
References
Official Apple Documentation
Technical Resources
Related Projects
- FEX-Emu - Linux x86_64 on ARM64 emulator
- QEMU - Generic machine emulator
- Rosetta Linux - Community research project
Disclaimer
This project is for educational and research purposes only.
- Rosetta 2 is proprietary Apple software
- This project does not distribute Apple's binaries
- All code in this repository is written by Claude Code with Qwen 3.5.
- Do not use this project to circumvent Apple's security measures
License
MIT License - See LICENSE file for details.
Contributing
Contributions are welcome! Areas of interest:
- Implementing remaining functions
- Improving documentation
- Adding test cases
- Performance analysis
- Architecture diagrams
Last updated: February 2026