GitHub - dabit3/fabricate: An experimental research tool for fabricating GitHub personas with AI-generated repositories

An experimental research tool for fabricating GitHub personas with AI-generated repositories

⚠️ This is an experimental/research project. Use responsibly and ethically.

Overview

Fabricate creates synthetic GitHub activity by generating:

Multiple repositories with realistic code
Varied commit histories spanning configurable time periods
Code across different programming languages
Projects of varying complexity and scope

All code is generated using Anthropic's Claude API, creating unique and realistic-looking projects.

Installation

# Clone the repository
git clone https://github.com/yourusername/fabricate.git
cd fabricate

# Install with uv
uv venv
source .venv/bin/activate
uv pip install -e .

Or run directly without activating:

uv run fabricate generate

Configuration

Required Credentials

Anthropic API Key: Get from console.anthropic.com
GitHub Personal Access Token: Create at github.com/settings/tokens
- Required scopes: repo, delete_repo (optional, for cleanup)

Environment Variables

Set these environment variables or pass them as CLI arguments:

export FABRICATE_ANTHROPIC_API_KEY="sk-ant-..."
export FABRICATE_GITHUB_TOKEN="ghp_..."
export FABRICATE_GITHUB_USERNAME="your-username"  # Optional, auto-detected

Or create a .env file in the project root:

FABRICATE_ANTHROPIC_API_KEY=sk-ant-...
FABRICATE_GITHUB_TOKEN=ghp_...

Usage

Basic Generation

# Generate with random settings (1-50 repos, random history depth)
fabricate generate

# Specify exact count and languages
fabricate generate -l python -l javascript -l rust -r 10

# Custom history depth (commits spread over 2 years)
fabricate generate -d 730

# Custom commit range per repository
fabricate generate --min-commits 10 --max-commits 50

# Specify technologies and project types
fabricate generate -l nextjs -t tailwind -t prisma -c saas

# Full-stack apps with specific stack
fabricate generate -l nextjs -t supabase -t stripe -t shadcn -c "e-commerce platform"

Command Options

fabricate generate [OPTIONS]

Options:
  -a, --anthropic-key TEXT     Anthropic API key
  -g, --github-token TEXT      GitHub personal access token
  -l, --languages TEXT         Languages to use (can repeat)
  -r, --repos INTEGER          Number of repos (random 1-50 if not set)
  -d, --history-days INTEGER   History depth (random 30-1825 days if not set)
  -t, --tech TEXT              Technologies to use (can repeat, e.g., tailwind, prisma)
  -c, --category TEXT          Project types to build (can repeat, e.g., saas, cli_tool)
  --min-commits INTEGER        Min commits per repo (1-100)
  --max-commits INTEGER        Max commits per repo (1-100)
  -u, --github-username TEXT   GitHub username
  -w, --work-dir TEXT          Local work directory
  --no-push                    Don't push to GitHub (local only)
  --cleanup                    Remove local files after pushing
  --dry-run                    Show what would be created

Technologies (`-t, --tech`)

The -t flag lets you specify technologies, libraries, or tools that generated projects should use. Pass any technology you want — there's no predefined list.

# Frontend stack
fabricate generate -l nextjs -t tailwind -t shadcn -t framer-motion

# Backend with databases
fabricate generate -l python -t fastapi -t postgres -t redis -t docker

# Full-stack with auth and payments  
fabricate generate -l nextjs -t supabase -t clerk -t stripe

Categories (`-c, --category`)

The -c flag lets you specify what type of project to build. Pass any description — it's free-form text that guides the AI.

# Specific project types
fabricate generate -l rust -c cli_tool -c automation
fabricate generate -l nextjs -c saas -c dashboard

# Descriptive categories work too
fabricate generate -l python -c "data pipeline for ETL"
fabricate generate -l nextjs -c "e-commerce marketplace" -c "booking platform"

If neither flag is set, the generator picks randomly from defaults.

Other Commands

# Check GitHub connection status
fabricate status

# List existing repositories
fabricate list-repos
fabricate list-repos --prefix my-project

# Delete repositories (use with caution!)
fabricate delete repo-name-1 repo-name-2
fabricate delete --force repo-name

Examples

Create a Python-focused persona

fabricate generate \
  -l python \
  -r 8 \
  -d 365 \
  --min-commits 10 \
  --max-commits 40

Multi-language developer profile

fabricate generate \
  -l python \
  -l typescript \
  -l go \
  -l rust \
  -r 15 \
  -d 730

Test locally without pushing

fabricate generate \
  -l python \
  -r 2 \
  --no-push \
  -w ./test-repos

How It Works

Configuration: Parse input parameters for languages, repo count, and history depth
Concept Generation: Claude generates unique project concepts with names, descriptions, and features
Code Generation: For each repository:
- Generate initial project structure (README, config files, source code)
- Generate 5-37 subsequent commits with incremental changes
- Each commit includes realistic commit messages
Git Operations: Create local repos with properly timestamped commits
GitHub Push: Create remote repos and push with full history preserved

Generated Project Types

The system creates varied projects including:

CLI tools
Web APIs
Libraries/packages
Data processing utilities
Automation scripts
Games
Visualization tools
DevOps utilities
Machine learning projects

Complexity Levels

Low: 2-5 files, simple utilities or scripts
Medium: 5-15 files, small libraries or tools
High: 10-30 files, complex applications or frameworks

Project Structure

fabricate/
├── fabricate/
│   ├── __init__.py
│   ├── cli.py          # Click-based CLI
│   ├── config.py       # Pydantic models and settings
│   ├── generator.py    # Anthropic code generation
│   ├── git_ops.py      # Local git operations
│   ├── github_client.py # GitHub API client
│   └── persona.py      # Main orchestrator
├── main.py
├── pyproject.toml
├── requirements.txt
└── README.md

Limitations

API Costs: Code generation uses Anthropic API tokens
Rate Limits: GitHub has rate limits for repo creation
Quality: Generated code may not always compile/run correctly
Detection: Patterns may be detectable with analysis

Ethical Considerations

This tool is intended for:

Research into AI-generated code detection
Understanding GitHub activity patterns
Educational purposes about code generation

Do NOT use for:

Fraudulent job applications
Deceiving others about your experience
Any malicious purposes

License

MIT License - See LICENSE file

Contributing

This is an experimental research project. Issues and PRs welcome for improvements.