DataKit - Private Data Analysis Studio
What is DataKit?
DataKit is a browser-based data analysis platform that processes multi-gigabyte files locally. All processing happens in your browser - no data is sent to external servers.
Key Features
Data Import & Processing
- Local File Support: Import CSV, Excel (XLSX), JSON, and Parquet files directly in your browser
- Large File Handling: Process files up to several GBs efficiently using WASM
- Remote Data Sources: Connect to various cloud platforms:
- Amazon S3 buckets
- Google Sheets (public)
- HuggingFace datasets
- MotherDuck (cloud DuckDB)
- PostgreSQL databases
- Custom URLs
Data Preview & Inspection
- Interactive Grid View: Browse your data with sortable columns and resizable cells
- Smart Data Detection: Automatic detection of data types and formats
- Data Quality Analysis: Instant insights about:
- Missing values and null patterns
- Data type distributions
- Column statistics (mean, median, mode, etc.)
- Outlier detection
- Data quality scores
- Quick Overview Panel: Key metrics and patterns
- Export Options: Export filtered data or specific columns in various formats
SQL Query Engine
- DuckDB Integration: Full-featured SQL database running entirely in your browser
- Query Editor: Professional SQL editor with:
- Syntax highlighting and auto-completion
- Query history and favorites
- Smart query optimization suggestions
- Real-time error detection
- Schema Browser: Explore your data structure
- Query Templates: Pre-built queries for common operations
- Performance Optimization: Automatic query optimization for large datasets
- Results Management:
- Paginated results for smooth browsing
- Export query results to CSV
- Full-screen mode for focused work
AI Assistant
- Natural Language Queries: Ask questions in plain English
- SQL Generation: Automatically generate SQL from your questions
- Data Insights: Get AI-powered insights and recommendations
- Multiple AI Providers:
- DataKit Cloud AI (default)
- OpenAI GPT models
- Anthropic Claude
- Groq
- Local models via Ollama
- Context-Aware: Understands your data structure and suggests relevant queries
- Query Explanation: Plain-English explanations of complex SQL
Python Notebooks
- Interactive Notebooks: Write and execute Python code in cells
- DuckDB Bridge: Direct access to your loaded data via SQL
- Data Science Libraries: Pre-loaded with pandas, numpy, matplotlib, and more
- Hugging Face Transformers: Built-in support for ML models
- Variable Inspector: Track your Python variables and their values
- Package Manager: Install additional Python packages on demand
- Export Options: Save as Jupyter notebooks or PDF reports
- Templates: Start with pre-built templates for common analyses
Privacy & Security
Complete Privacy: All data processing happens in your browser. No data is sent to external servers unless you explicitly connect to cloud services.
No Installation Required: Works in any modern browser - Chrome, Firefox, Safari, or Edge.
Getting Started
- Open DataKit in your browser
- Import your data - drag & drop or click to browse
- Choose your workflow:
- Preview data in the grid view
- Write SQL queries in the Query tab
- Ask questions using the AI Assistant
- Write Python code in Notebooks
Supported File Formats
- CSV - Comma-separated values
- Excel - .xlsx and .xls files
- JSON - Including nested structures
- Parquet - Columnar storage format
- Remote Sources - S3, Google Sheets, PostgreSQL, and more
System Requirements
- Modern web browser (Chrome 90+, Firefox 88+, Safari 14+, Edge 90+)
- At least 4GB RAM (8GB+ recommended for large files)
- Desktop or laptop computer (mobile support coming soon)
📋 License
DataKit is available under a dual licensing model:
Open Source (AGPL-3.0)
- Free for open source projects
- Must disclose source code
- For learning and non-commercial use
Commercial License
- No source code disclosure required
- Enterprise self-hosting allowed
- Priority support included
- Custom features available
Need a commercial license? Enterprise and commercial users must obtain a commercial license.
Get your license: Contact Us at hello@datakit.page
Contributing
We welcome contributions! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
Support
- Documentation: DataKit Docs
- Issues: GitHub Issues
- Contact: Amin | LinkedIn
