GitHub - nulone/jq-by-example: AI-powered jq filter synthesis from input/output JSON examples

AI-Powered JQ Filter Synthesis Tool

JQ-By-Example automatically generates jq filter expressions from input/output JSON examples using LLM-powered synthesis with iterative refinement.

Overview

JQ-By-Example solves a common developer problem: you know what JSON transformation you want, but writing the correct jq filter is tricky. Simply provide example input/output pairs, and JQ-By-Example will synthesize the filter for you.

Key Features:

🤖 LLM-Powered Generation - Uses OpenAI, Anthropic, or compatible APIs to generate filter candidates
🔄 Iterative Refinement - Automatically improves filters based on algorithmic feedback
✅ Verified Correctness - Executes filters against real jq binary to verify outputs
📊 Detailed Diagnostics - Classifies errors (syntax, shape, missing keys, order) with partial scoring
🛡️ Safe Execution - Sandboxed jq execution with timeout and output limits
🔒 Production-Ready - Comprehensive edge case handling, security auditing, structured logging

Installation

Prerequisites

Python 3.10 or higher

jq binary installed and available in PATH:

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Windows (with chocolatey)
choco install jq

Install JQ-By-Example

git clone https://github.com/nulone/jq-by-example.git
cd jq-by-example
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Quick Start

Interactive Mode

Synthesize a filter from a single input/output example:

jq-by-example \
  --input '{"user": {"name": "Alice", "age": 30}}' \
  --output '"Alice"' \
  --desc "Extract the user's name"

Output:

============================================================
[1/1] Solving: interactive
Description: Extract the user's name
Examples: 1
Max iterations: 10
============================================================

✓ Task: interactive
  Filter: .user.name
  Score: 1.000
  Iterations: 1
  Time: 2.34s

============================================================
OVERALL SUMMARY
============================================================
Tasks: 1/1 passed (100.0%)
Total time: 2.34s
Average time per task: 2.34s
============================================================

Batch Mode

Run predefined tasks from a file:

# Run a specific task
jq-by-example --task nested-field

# Run all tasks
jq-by-example --task all

# With verbose output (shows iteration details)
jq-by-example --task all --verbose

CLI Options

usage: jq-by-example [-h] [-t TASK] [--tasks-file TASKS_FILE] [--max-iters MAX_ITERS]
                [--baseline] [-i INPUT] [-o OUTPUT] [-d DESC]
                [--provider {openai,anthropic}] [--model MODEL] [--base-url BASE_URL]
                [-v] [--debug]

AI-Powered JQ Filter Synthesis Tool

options:
  -h, --help            Show this help message and exit

Task Selection:
  -t TASK, --task TASK  Task ID to run, or 'all' to run all tasks
  --tasks-file TASKS_FILE
                        Path to tasks JSON file (default: data/tasks.json)

Iteration Control:
  --max-iters MAX_ITERS
                        Maximum iterations per task (default: 10)
  --baseline            Single-shot mode (max_iterations=1, no refinement)

Interactive Mode:
  -i INPUT, --input INPUT
                        Input JSON for interactive mode
  -o OUTPUT, --output OUTPUT
                        Expected output JSON for interactive mode
  -d DESC, --desc DESC  Task description for interactive mode

LLM Provider:
  --provider {openai,anthropic}
                        LLM provider type (default: from LLM_PROVIDER env or 'openai')
  --model MODEL         Model identifier (default: from LLM_MODEL env or provider default)
  --base-url BASE_URL   Base URL for OpenAI-compatible providers (default: from LLM_BASE_URL env)

Output Control:
  -v, --verbose         Enable verbose output (shows iteration details)
  --debug               Enable debug logging (shows detailed internal state)

Usage Examples

# Interactive mode - simple field extraction
jq-by-example -i '{"x": 42}' -o '42' -d 'Extract x'

# Interactive mode - array filtering
jq-by-example -i '[1,2,3,4,5]' -o '[2,4]' -d 'Keep only even numbers'

# Interactive mode - nested object access
jq-by-example \
  -i '{"data": {"users": [{"name": "Alice"}]}}' \
  -o '["Alice"]' \
  -d 'Extract all user names'

# Batch mode - run specific task
jq-by-example --task nested-field

# Batch mode - all tasks with verbose output
jq-by-example --task all --verbose

# Single-shot mode (no refinement) for baseline comparison
jq-by-example --task nested-field --baseline

# Custom tasks file
jq-by-example --task my-task --tasks-file my-tasks.json

# Debug mode for troubleshooting
jq-by-example --task nested-field --debug

# Limit iterations
jq-by-example --task filter-active --max-iters 5

# Use Anthropic provider
jq-by-example --provider anthropic --task nested-field

# Use specific model
jq-by-example --model gpt-4o-mini --task nested-field

# Use OpenRouter
jq-by-example --base-url https://openrouter.ai/api/v1 --model anthropic/claude-3.5-sonnet --task nested-field

# Use local Ollama
jq-by-example --base-url http://localhost:11434/v1 --model llama3 --task nested-field

How It Works

JQ-By-Example uses a deterministic oracle approach:

Generation: An LLM (GPT-4, Claude, or compatible model) generates candidate jq filters based on your examples and description
Verification: Each filter is executed against the real jq binary with your input examples
Scoring: A deterministic algorithm compares actual vs expected outputs, computing similarity scores (0.0 to 1.0)
Feedback: The algorithm classifies errors (syntax, shape, missing/extra elements, order) and generates actionable feedback
Refinement: The LLM receives the feedback and generates an improved filter
Iteration: Steps 2-5 repeat until a perfect match is found or limits are reached

This hybrid approach combines LLM creativity with deterministic verification, ensuring correctness while leveraging AI for filter synthesis.

Architecture

JQ-By-Example follows a modular architecture with clear separation of concerns:

┌──────────┐
│   CLI    │  Entry point, argument parsing, output formatting
└────┬─────┘
     │
     ▼
┌────────────────┐
│  Orchestrator  │  Manages synthesis loop, tracks progress
└─┬──────────┬───┘
  │          │
  ▼          ▼
┌──────────┐ ┌──────────┐
│Generator │ │ Reviewer │  Filter evaluation & scoring
│(LLM)     │ └────┬─────┘
└──────────┘      │
                  ▼
               ┌──────────┐
               │ Executor │  Sandboxed jq execution
               └──────────┘

Components

1. CLI (`src/cli.py`)

Parses command-line arguments
Loads tasks from JSON files
Formats and displays results with progress indicators
Tracks timing and generates summaries

2. Orchestrator (`src/orchestrator.py`)

Manages the iterative refinement loop
Coordinates between Generator and Reviewer
Implements anti-stuck protocols:
- Duplicate filter detection (normalized)
- Stagnation detection (no improvement for N iterations)
- Max iteration limit
Tracks best solution and complete history

3. Generator (`src/generator.py`)

Interfaces with LLM providers (OpenAI, Anthropic, or compatible APIs)
Builds prompts with task description, examples, and feedback history
Extracts clean filter code from LLM responses
Implements retry logic with exponential backoff
Includes security features (API key never logged, input truncation)

4. Reviewer (`src/reviewer.py`)

Evaluates generated filters against examples
Computes similarity scores using:
- Jaccard similarity for lists
- Key/value matching for objects
- Exact matching for scalars
Classifies errors by priority (SYNTAX → SHAPE → MISSING_EXTRA → ORDER)
Generates actionable feedback for refinement

5. Executor (`src/executor.py`)

Safely executes jq binary in subprocess
Enforces resource limits (timeout, output size)
Prevents shell injection (uses argument list, not shell)
Handles jq errors and timeouts gracefully

6. Domain (`src/domain.py`)

Defines core data structures (Task, Example, Attempt, Solution)
Uses frozen dataclasses for immutability
Type-safe with full type hints

Data Flow

User provides task (JSON examples + description) via CLI
CLI loads/validates task, initializes components
Orchestrator starts synthesis loop:
- Iteration 1: Calls Generator with task only
- Generator queries LLM API for filter candidate
- Reviewer evaluates filter using Executor
- Executor runs jq binary with filter on examples
- Reviewer computes scores and generates feedback
- Iteration 2+: Generator receives history/feedback
- Loop continues until perfect match or limits reached
Orchestrator returns Solution with best filter, score, history
CLI displays formatted results with timing information

Error Classification

The reviewer classifies errors by priority (highest to lowest):

Error Type	Description	Example	Score
`SYNTAX`	Invalid jq filter syntax	`invalid[[[`	0.0
`SHAPE`	Wrong output type	Expected `[]`, got `{}`	0.0
`MISSING_EXTRA`	Missing or extra elements/keys	Expected `[1,2,3]`, got `[1,2]`	0.67 (Jaccard)
`ORDER`	Correct elements, wrong order	Expected `[1,2,3]`, got `[3,2,1]`	0.8
`NONE`	Perfect match	-	1.0

Scoring Algorithm

Lists: Jaccard similarity = |intersection| / |union|
- Special case: Correct elements, wrong order = 0.8
Dicts: (key_similarity + value_match_ratio) / 2
Scalars: Binary (1.0 for exact match, 0.0 for mismatch)
Multiple examples: Arithmetic mean of scores

Supported jq Patterns

JQ-By-Example works well with these common jq operations:

Field extraction: .foo, .user.name, .data.items[0]
Array operations: .[], .[0], .[1:3], .[-1]
Filtering: select(.active == true), select(.age > 18)
Mapping: map(.name), [.[] | .id]
Array construction: [.items[].name]
Object construction: {name: .user.name, email: .user.email}
Conditionals: if .status == "active" then .name else null end
Null handling: select(. != null), .field // "default"
String operations: String interpolation, concatenation
Arithmetic: Addition, subtraction, comparison operators
Type checking: type, length

Known Limitations

JQ-By-Example may struggle with these advanced jq features:

Aggregations: group_by(), reduce, min_by(), max_by()
Complex recursion: recurse(), walk()
Variable bindings: Complex as $var patterns
Custom functions: def statements (blocked for security)
Advanced array operations: combinations(), transpose()
Path manipulation: getpath(), setpath(), delpaths()
Format strings: @csv, @json, @base64

For these cases, you may need to write the filter manually or break down the task into simpler steps.

Model recommendations

Task complexity	Recommended model	Speed
Simple filters (extract, select)	GPT-4o-mini, Claude Haiku	Fast
Medium (grouping, aggregation, recursion)	Claude Sonnet, GPT-4o	Fast
Complex algorithms (graph traversal, sorting)	DeepSeek R1	Slow (minutes)

Note: DeepSeek R1 solved topological sort and Dijkstra's shortest path in jq. Most users won't need this — standard models handle 95%+ of real-world tasks.

Supported Providers

Provider	Status	Note
OpenAI	Stable ✅	Default provider
Anthropic	Beta ⚠️	Different API format
OpenRouter	Tested ✅	OpenAI-compatible
Ollama	Alpha 🧪	Local only, requires setup

Note: OpenAI is default and most tested. Others should work but report issues if found.

Provider Setup

OpenAI (Default)

export OPENAI_API_KEY='sk-...'
# Optional: specify model (default: gpt-4o)
export LLM_MODEL='gpt-4o'

Anthropic

export LLM_PROVIDER='anthropic'
export ANTHROPIC_API_KEY='sk-ant-...'
# Optional: specify model (default: claude-sonnet-4-20250514)
export LLM_MODEL='claude-sonnet-4-20250514'

OpenRouter

export LLM_BASE_URL='https://openrouter.ai/api/v1'
export OPENAI_API_KEY="$OPENROUTER_API_KEY"
export LLM_MODEL='anthropic/claude-3.5-sonnet'

Note: Set OPENROUTER_API_KEY environment variable with your OpenRouter API key before running.

Local (Ollama)

export LLM_BASE_URL='http://localhost:11434/v1'
export LLM_MODEL='llama3'
export OPENAI_API_KEY='dummy'  # Ollama doesn't require a real key

Together AI / Groq

# Together AI
export LLM_BASE_URL='https://api.together.xyz/v1'
export OPENAI_API_KEY='...'

# Groq
export LLM_BASE_URL='https://api.groq.com/openai/v1'
export OPENAI_API_KEY='gsk_...'

Task File Format

Tasks are defined in JSON format:

{
  "tasks": [
    {
      "id": "nested-field",
      "description": "Extract the user's name from a nested object structure",
      "examples": [
        {
          "input": {"user": {"name": "Alice", "age": 30}},
          "expected_output": "Alice"
        },
        {
          "input": {"user": {"name": "Bob", "email": "bob@example.com"}},
          "expected_output": "Bob"
        }
      ]
    }
  ]
}

Guidelines for Good Tasks

Provide 3+ examples for better generalization
Include edge cases: empty arrays, null values, missing fields
Be specific in descriptions: "Extract user names" vs "Transform data"
Use diverse inputs: different structures help the LLM understand the pattern
Test edge cases: null, empty arrays/objects, deeply nested (3+ levels), special characters in keys

Built-in Tasks

The data/tasks.json file includes these example tasks:

Task ID	Description	Difficulty	Expected Filter
`nested-field`	Extract `.user.name`	Easy	`.user.name`
`filter-active`	Filter where `active == true`	Medium	`[.[] \| select(.active == true)]`
`extract-emails`	Extract emails, skip null/missing	Medium	`[.[].email \| select(. != null)]`

Troubleshooting

"jq binary not found"

Problem: JQ-By-Example can't locate the jq executable.

Solution: Ensure jq is installed and in your PATH:

# Check if jq is installed
which jq

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Verify installation
jq --version

"API key required"

Problem: Missing API key environment variable.

Solution: Set the appropriate API key for your provider:

# For OpenAI
export OPENAI_API_KEY='sk-...'

# For Anthropic
export ANTHROPIC_API_KEY='sk-ant-...'

# Or use generic variable
export LLM_API_KEY='...'

# Permanent (add to ~/.bashrc or ~/.zshrc)
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.bashrc
source ~/.bashrc

"API request failed: DNS resolution failed"

Problem: DNS resolution failed for the API endpoint.

Solution:

Check your internet connection

Verify the API endpoint is correct:

# For OpenAI
curl -I https://api.openai.com/v1/chat/completions

# For Anthropic
curl -I https://api.anthropic.com/v1/messages

If using a custom endpoint, check LLM_BASE_URL:

export LLM_BASE_URL='https://api.openai.com/v1'

"API request timed out"

Problem: API request has a 60-second timeout. Connection issues or server problems.

Solution:

Check your internet connection
Try again (transient network issues)
Check your provider's service status
Reduce task complexity (fewer examples, simpler description)

"Connection failed after 3 attempts"

Problem: Multiple retry attempts failed.

Solution:

Verify API endpoint is reachable:

# For OpenAI
curl https://api.openai.com/v1/chat/completions

# For custom endpoint
curl $LLM_BASE_URL/chat/completions

Check your firewall/proxy settings
Try with --debug flag to see detailed error messages

Filter works in jq but not in JQ-By-Example

Problem: Your filter works when you run it manually with jq, but fails in JQ-By-Example.

Cause: JQ-By-Example uses these jq flags: -M (monochrome) and -c (compact output).

Solution: Ensure your expected output matches compact JSON format:

# Wrong: pretty-printed JSON
{
  "name": "Alice"
}

# Correct: compact JSON
{"name":"Alice"}

Low success rate or poor quality filters

Problem: Filters don't match expected outputs, or require many iterations.

Solution:

Improve task description: Be specific about what transformation you want
Add more examples: 3+ examples help the LLM generalize better
Include edge cases: Empty arrays, null values, missing keys
Simplify the task: Break complex transformations into smaller tasks
Use verbose mode: --verbose to see iteration details and understand failures

Debug mode for troubleshooting

Enable debug logging to see detailed internal state:

jq-by-example --task my-task --debug

Debug mode shows:

Full API request/response details (with truncation for security)
Detailed scoring calculations
Duplicate filter detection
Stagnation counter progression

Security

JQ-By-Example implements production-ready security measures:

API Key Protection

API keys are never logged (even in debug mode)
Stored securely in environment variables
Transmitted only via HTTPS headers

Input Sanitization

Large inputs are truncated in logs (max 100 characters)
Prevents accidental exposure of sensitive data in log files

Shell Injection Prevention

jq filters passed as subprocess arguments (not via shell)
No use of shell=True in subprocess calls
Filters are never interpolated into shell commands

Resource Limits

Timeout: 1 second per filter execution
Max output: 1 MB per execution
Prevents denial-of-service attacks and resource exhaustion

Edge Case Handling

Comprehensive test coverage for:

Null input/output
Empty arrays and objects
Deeply nested structures (3+ levels)
Special characters in keys (spaces, unicode, @, -)
Large arrays (100+ items)
Type mismatches and conversions

Development

Setup Development Environment

git clone https://github.com/nulone/jq-by-example.git
cd jq-by-example
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Running Tests

# Run unit tests (no API key required)
pytest -m "not e2e"

# Run all tests including E2E (requires API key)
export OPENAI_API_KEY='your-key-here'
# or
export ANTHROPIC_API_KEY='your-key-here'
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_generator.py -v

Code Quality

# Type checking
mypy src

# Linting
ruff check src tests

# Formatting
ruff format src tests

# Run all checks (recommended before commit)
ruff check src tests && \
ruff format --check src tests && \
mypy src && \
pytest -m "not e2e"

Project Structure

jq-by-example/
├── src/
│   ├── cli.py           # CLI entry point
│   ├── orchestrator.py  # Synthesis loop coordinator
│   ├── generator.py     # LLM-based filter generation
│   ├── providers.py     # LLM provider abstractions (OpenAI, Anthropic)
│   ├── reviewer.py      # Filter evaluation & scoring
│   ├── executor.py      # Safe jq execution
│   ├── domain.py        # Core data structures
│   └── security.py      # Security utilities (log truncation)
├── tests/
│   ├── test_cli.py
│   ├── test_orchestrator.py
│   ├── test_generator.py
│   ├── test_reviewer.py
│   ├── test_executor.py
│   ├── test_domain.py
│   ├── test_edge_cases.py  # Production-ready edge cases
│   └── test_e2e.py         # End-to-end tests (require API key)
├── data/
│   └── tasks.json       # Example task definitions
├── pyproject.toml       # Project configuration
└── README.md            # This file

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes with tests

Ensure all checks pass:

ruff check src tests
ruff format --check src tests
mypy src
pytest -m "not e2e"

Commit with clear messages: git commit -m "Add feature X"
Push to your fork: git push origin feature/my-feature
Open a Pull Request

Code Style

Type hints required for all public functions
Docstrings required for all public functions and classes (Google style)
100 character line limit
Follow existing patterns in codebase
Add tests for all new features
Security-first mindset (never log sensitive data)

License

MIT License - see LICENSE for details.

Acknowledgments

jq - The excellent JSON processor by Stephen Dolan
OpenAI - GPT models and API
Anthropic - Claude models and API

JQ-By-Example - Because life's too short to debug jq filters manually.

Overview

Installation

Prerequisites

Install JQ-By-Example

Quick Start

Interactive Mode

Batch Mode

CLI Options

Usage Examples

How It Works

Architecture

Components

1. CLI (src/cli.py)

2. Orchestrator (src/orchestrator.py)

3. Generator (src/generator.py)

4. Reviewer (src/reviewer.py)

5. Executor (src/executor.py)

6. Domain (src/domain.py)

Data Flow

Error Classification

Scoring Algorithm

Supported jq Patterns

Known Limitations

Model recommendations

Supported Providers

Provider Setup

Task File Format

Guidelines for Good Tasks

Built-in Tasks

Troubleshooting

"jq binary not found"

"API key required"

"API request failed: DNS resolution failed"

"API request timed out"

"Connection failed after 3 attempts"

Filter works in jq but not in JQ-By-Example

Low success rate or poor quality filters

Debug mode for troubleshooting

Security

API Key Protection

Input Sanitization

Shell Injection Prevention

Resource Limits

Edge Case Handling

Development

Setup Development Environment

Running Tests

Code Quality

Project Structure

Contributing

Code Style

License

Acknowledgments

1. CLI (`src/cli.py`)

2. Orchestrator (`src/orchestrator.py`)

3. Generator (`src/generator.py`)

4. Reviewer (`src/reviewer.py`)

5. Executor (`src/executor.py`)

6. Domain (`src/domain.py`)