Summary Forge Module
An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.
Repository: git@github.com:profullstack/summary-forge-module.git
Features
- 📚 Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
- 🌐 Web Page Summarization: Fetch and summarize any web page with automatic content extraction
- 🤖 AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
- 📊 Vision API: Preserves formatting, tables, diagrams, and images from PDFs
- 🧩 Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
- 🛡️ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
- 📦 Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
- 🃏 Printable Flashcards: Generates double-sided flashcard PDFs for studying
- 🖼️ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
- 🎙️ Natural Audio Narration: AI-generated conversational audio script for better listening
- 🗜️ Bundled Output: Packages everything into a convenient
.tgzarchive - 🔄 Auto-Conversion: Automatically converts EPUB to PDF using Calibre
- 🔍 Book Search: Search Amazon by title using Rainforest API
- 📖 Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
- 💻 CLI & Module: Use as a command-line tool or import as an ESM module
- 🎨 Interactive Mode: Guided workflow with inquirer prompts
- 📥 EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)
Installation
Global Installation (CLI)
pnpm install -g @profullstack/summary-forge-module
Local Installation (Module)
pnpm add @profullstack/summary-forge-module
Prerequisites
-
Node.js v20 or newer
-
Calibre (for EPUB conversion - provides
ebook-convertcommand)# macOS brew install calibre # Ubuntu/Debian sudo apt-get install calibre # Arch Linux sudo pacman -S calibre
-
Pandoc (for document conversion)
# macOS brew install pandoc # Ubuntu/Debian sudo apt-get install pandoc # Arch Linux sudo pacman -S pandoc
-
XeLaTeX (for PDF generation)
# macOS brew install --cask mactex # Ubuntu/Debian sudo apt-get install texlive-xetex # Arch Linux sudo pacman -S texlive-core texlive-xetex
CLI Usage
First-Time Setup
Before using the CLI, configure your API keys:
This interactive command will prompt you for:
- OpenAI API Key (required)
- Rainforest API Key (optional - for Amazon book search)
- ElevenLabs API Key (optional - for audio generation, get key here)
- 2Captcha API Key (optional - for CAPTCHA solving, sign up here)
- Browserless API Key (optional)
- Browser and proxy settings
Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.
Managing Configuration
# View current configuration summary config # Update configuration summary setup # Delete configuration summary config --delete
Note: The CLI will use configuration in this priority order:
- Environment variables (
.envfile) - Configuration file (
~/.config/summary-forge/settings.json)
Interactive Mode (Recommended)
summary interactive
# or
summary iThis launches an interactive menu where you can:
- Process local files (PDF/EPUB)
- Process web page URLs
- Search for books by title
- Look up books by ISBN/ASIN
Process a File
summary file /path/to/book.pdf
summary file /path/to/book.epub
# Force overwrite if directory already exists
summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -fProcess a Web Page URL
summary url https://example.com/article
summary url https://blog.example.com/post/123
# Force overwrite if directory already exists
summary url https://example.com/article --force
summary url https://example.com/article -fFeatures:
- Automatically fetches web page content using Puppeteer
- Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
- Saves web page as PDF for processing
- Generates clean title from page title or uses OpenAI to create one
- Prompts specifically optimized for web page content (ignores nav/ads/footers)
- Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)
Search by Title
# Search for books (defaults to 1lib.sk - faster, no DDoS protection) summary search "LLM Fine Tuning" summary search "JavaScript" --max-results 5 --extensions pdf,epub summary search "Python" --year-from 2020 --year-to 2024 summary search "Machine Learning" --languages english --order date # Use Anna's Archive instead (has DDoS protection, slower) summary search "Clean Code" --source anna summary search "Rare Book" --source anna --sources zlib,lgli # Title search (shortcut for search command) summary title "A Philosophy of Software Design" summary title "Clean Code" --force # Auto-select first result summary title "Python" --source anna # Use Anna's Archive # ISBN lookup (defaults to 1lib.sk) summary isbn 9780134685991 summary isbn B075HYVHWK --force # Auto-select and process summary isbn 9780134685991 --source anna # Use Anna's Archive # Common Options: # --source <source> Search source: zlib (1lib.sk, default) or anna (Anna's Archive) # -n, --max-results <number> Maximum results to display (default: 10) # -f, --force Auto-select first result and process immediately # # 1lib.sk Options (--source zlib, default): # --year-from <year> Filter by publication year from (e.g., 2020) # --year-to <year> Filter by publication year to (e.g., 2024) # -l, --languages <languages> Language filter, comma-separated (default: english) # -e, --extensions <extensions> File extensions, comma-separated (case-insensitive, default: PDF) # --content-types <types> Content types, comma-separated (default: book) # -s, --order <order> Sort order: date (newest) or empty for relevance # --view <view> View type: list or grid (default: list) # # Anna's Archive Options (--source anna): # -f, --format <format> Filter by format: pdf, epub, pdf,epub, or all (default: pdf) # -s, --sort <sort> Sort by: date (newest) or empty for relevance (default: '') # -l, --language <language> Language code(s), comma-separated (e.g., en, es, fr) (default: en) # --sources <sources> Data sources, comma-separated (default: all sources) # Options: zlib, lgli, lgrs, and others
Look up by ISBN/ASIN
summary isbn B075HYVHWK
# Force overwrite if directory already exists
summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -fHelp
summary --help summary file --help
Programmatic Usage
JSON API Format
All methods now return consistent JSON objects with the following structure:
{ success: true | false, // Indicates if operation succeeded ...data, // Method-specific data fields error?: string, // Error message (only when success is false) message?: string // Success message (optional) }
This enables:
- ✅ Consistent error handling - Check
successfield instead of try-catch - ✅ REST API ready - Direct JSON responses for HTTP endpoints
- ✅ Better debugging - Rich metadata in all responses
- ✅ Type-safe - Predictable structure for TypeScript users
Basic Example
import { SummaryForge } from '@profullstack/summary-forge-module'; import { loadConfig } from '@profullstack/summary-forge-module/config'; // Load config from ~/.config/summary-forge/settings.json const configResult = await loadConfig(); if (!configResult.success) { console.error('Failed to load config:', configResult.error); process.exit(1); } const forge = new SummaryForge(configResult.config); const result = await forge.processFile('./my-book.pdf'); if (result.success) { console.log('Summary created:', result.archive); console.log('Files:', result.files); console.log('Costs:', result.costs); } else { console.error('Processing failed:', result.error); }
Configuration Options
import { SummaryForge } from '@profullstack/summary-forge-module'; const forge = new SummaryForge({ // Required openaiApiKey: 'sk-...', // Optional API keys rainforestApiKey: 'your-key', // For Amazon search elevenlabsApiKey: 'sk-...', // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv) twocaptchaApiKey: 'your-key', // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996) browserlessApiKey: 'your-key', // For browserless.io // Processing options maxChars: 500000, // Max chars to process maxTokens: 20000, // Max tokens in output summary maxInputTokens: 250000, // Max input tokens per API call (default: 250000 for GPT-5) // Audio options voiceId: '21m00Tcm4TlvDq8ikWAM', // ElevenLabs voice voiceSettings: { stability: 0.5, similarity_boost: 0.75 }, // Browser options headless: true, // Run browser in headless mode enableProxy: false, // Enable proxy proxyUrl: 'http://proxy.com', // Proxy URL proxyUsername: 'user', // Proxy username proxyPassword: 'pass', // Proxy password proxyPoolSize: 36 // Number of proxies in pool (default: 36) }); const result = await forge.processFile('./book.epub'); console.log('Archive:', result.archive);
Search for Books
Using Amazon/Rainforest API
const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, rainforestApiKey: process.env.RAINFOREST_API_KEY }); const searchResult = await forge.searchBookByTitle('Clean Code'); if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); } console.log(`Found ${searchResult.count} results:`); console.log(searchResult.results.map(b => ({ title: b.title, author: b.author, asin: b.asin }))); // Get download URL const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin); console.log('Download from:', url);
Using Anna's Archive Direct Search (No Rainforest API Required)
const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, enableProxy: true, proxyUrl: process.env.PROXY_URL, proxyUsername: process.env.PROXY_USERNAME, proxyPassword: process.env.PROXY_PASSWORD }); // Basic search const searchResult = await forge.searchAnnasArchive('JavaScript', { maxResults: 10, format: 'pdf', sortBy: 'date' // Sort by newest }); if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); } console.log(`Found ${searchResult.count} results`); console.log(searchResult.results.map(r => ({ title: r.title, author: r.author, format: r.format, size: `${r.sizeInMB.toFixed(1)} MB`, url: r.url }))); // Download the first result if (searchResult.results.length > 0) { const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1]; const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title); if (downloadResult.success) { console.log('Downloaded:', downloadResult.filepath); console.log('Directory:', downloadResult.directory); } else { console.error('Download failed:', downloadResult.error); } }
Using 1lib.sk Search (Faster, No DDoS Protection)
const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, enableProxy: true, proxyUrl: process.env.PROXY_URL, proxyUsername: process.env.PROXY_USERNAME, proxyPassword: process.env.PROXY_PASSWORD }); // Basic search const searchResult = await forge.search1lib('LLM Fine Tuning', { maxResults: 10, yearFrom: 2020, languages: ['english'], extensions: ['PDF'] }); if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); } console.log(`Found ${searchResult.count} results`); console.log(searchResult.results.map(r => ({ title: r.title, author: r.author, year: r.year, extension: r.extension, size: r.size, language: r.language, isbn: r.isbn, url: r.url }))); // Download the first result if (searchResult.results.length > 0) { const downloadResult = await forge.downloadFrom1lib( searchResult.results[0].url, '.', searchResult.results[0].title ); if (downloadResult.success) { console.log('Downloaded:', downloadResult.filepath); // Process the downloaded book const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier); if (processResult.success) { console.log('Summary created:', processResult.archive); console.log('Costs:', processResult.costs); } else { console.error('Processing failed:', processResult.error); } } else { console.error('Download failed:', downloadResult.error); } }
Enhanced Error Handling:
The 1lib.sk download functionality includes robust error handling with automatic debugging:
- Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons
- Debug HTML Capture: Saves page HTML when download button isn't found
- Link Analysis: Lists all links on the page for troubleshooting
- Detailed Error Messages: Provides actionable information for debugging
If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.
API Reference
Constructor Options
new SummaryForge({ // API Keys openaiApiKey: string, // Required: OpenAI API key rainforestApiKey: string, // Optional: For title search elevenlabsApiKey: string, // Optional: For audio generation twocaptchaApiKey: string, // Optional: For CAPTCHA solving browserlessApiKey: string, // Optional: For browserless.io // Processing Options maxChars: number, // Optional: Max chars to process (default: 400000) maxTokens: number, // Optional: Max tokens in output summary (default: 16000) maxInputTokens: number, // Optional: Max input tokens per API call (default: 250000 for GPT-5) // Audio Options voiceId: string, // Optional: ElevenLabs voice ID (default: Brian) voiceSettings: object, // Optional: Voice customization settings // Browser Options headless: boolean, // Optional: Run browser in headless mode (default: true) enableProxy: boolean, // Optional: Enable proxy (default: false) proxyUrl: string, // Optional: Proxy URL proxyUsername: string, // Optional: Proxy username proxyPassword: string, // Optional: Proxy password proxyPoolSize: number // Optional: Number of proxies in pool (default: 36) })
Methods
All methods return JSON objects with { success, ...data, error?, message? } format.
Processing Methods
-
processFile(filePath, asin?)- Process a PDF or EPUB file- Returns:
{ success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? } - Example:
const result = await forge.processFile('./book.pdf'); if (result.success) { console.log('Archive:', result.archive); console.log('Costs:', result.costs); }
- Returns:
-
processWebPage(url, outputDir?)- Process a web page URL- Returns:
{ success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? } - Example:
const result = await forge.processWebPage('https://example.com/article'); if (result.success) { console.log('Summary:', result.markdown.substring(0, 100)); }
- Returns:
Search Methods
-
searchBookByTitle(title)- Search Amazon using Rainforest API- Returns:
{ success, results, count, query, message, error? } - Example:
const result = await forge.searchBookByTitle('Clean Code'); if (result.success) { console.log(`Found ${result.count} books`); }
- Returns:
-
searchAnnasArchive(query, options?)- Search Anna's Archive directly- Returns:
{ success, results, count, query, options, message, error? } - Example:
const result = await forge.searchAnnasArchive('JavaScript', { maxResults: 10, format: 'pdf', sortBy: 'date' }); if (result.success) { console.log(`Found ${result.count} results`); }
- Returns:
-
search1lib(query, options?)- Search 1lib.sk- Returns:
{ success, results, count, query, options, message, error? }
- Returns:
Download Methods
-
downloadFromAnnasArchive(asin, outputDir?, bookTitle?)- Download from Anna's Archive- Returns:
{ success, filepath, directory, asin, format, message, error? } - Example:
const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.'); if (result.success) { console.log('Downloaded to:', result.filepath); }
- Returns:
-
downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?)- Download from 1lib.sk- Returns:
{ success, filepath, directory, title, format, message, error? }
- Returns:
-
search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?)- Search and download in one session- Returns:
{ success, results, download, message, error? }
- Returns:
Generation Methods
-
generateSummary(pdfPath)- Generate AI summary from PDF- Returns:
{ success, markdown, length, method, chunks?, message, error? } - Methods:
gpt5_pdf_upload,text_extraction_single,text_extraction_chunked - Example:
const result = await forge.generateSummary('./book.pdf'); if (result.success) { console.log(`Generated ${result.length} char summary using ${result.method}`); }
- Returns:
-
generateAudioScript(markdown)- Generate audio-friendly narration script- Returns:
{ success, script, length, message }
- Returns:
-
generateAudio(text, outputPath)- Generate audio using ElevenLabs TTS- Returns:
{ success, path, size, duration, message, error? }
- Returns:
-
generateOutputFiles(markdown, basename, outputDir)- Generate all output formats- Returns:
{ success, files: {...}, message }
- Returns:
Utility Methods
-
convertEpubToPdf(epubPath)- Convert EPUB to PDF- Returns:
{ success, pdfPath, originalPath, message, error? }
- Returns:
-
createBundle(files, archiveName)- Create tar.gz archive- Returns:
{ success, path, files, message, error? }
- Returns:
-
getCostSummary()- Get cost tracking information- Returns:
{ success, openai, elevenlabs, rainforest, total, breakdown }
- Returns:
Configuration
CLI Configuration (Recommended)
For CLI usage, run the setup command to configure your API keys:
This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.
Environment Variables (Alternative)
For programmatic usage or if you prefer environment variables, create a .env file:
OPENAI_API_KEY=sk-your-key-here RAINFOREST_API_KEY=your-key-here ELEVENLABS_API_KEY=sk-your-key-here # Optional: for audio generation TWOCAPTCHA_API_KEY=your-key-here # Optional: for CAPTCHA solving BROWSERLESS_API_KEY=your-key-here # Optional # Browser Configuration HEADLESS=true # Run browser in headless mode ENABLE_PROXY=false # Enable proxy for browser requests PROXY_URL=http://proxy.example.com # Proxy URL (if enabled) PROXY_USERNAME=username # Proxy username (if enabled) PROXY_PASSWORD=password # Proxy password (if enabled) PROXY_POOL_SIZE=36 # Number of proxies in your pool (default: 36)
Or set them in your shell:
export OPENAI_API_KEY=sk-your-key-here export RAINFOREST_API_KEY=your-key-here export ELEVENLABS_API_KEY=sk-your-key-here # Optional
Configuration Priority
When using the module programmatically, configuration is loaded in this order (highest priority first):
-
Constructor options - Passed directly to
new SummaryForge(options) -
Environment variables - From
.envfile or shell -
Config file - From
~/.config/summary-forge/settings.json(CLI only)
Proxy Configuration (Recommended for Anna's Archive)
To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:
When prompted:
- Enable proxy:
Yes - Enter proxy URL:
http://your-proxy.com:8080 - Enter proxy username and password
Why use a proxy?
- ✅ Avoids IP bans from Anna's Archive
- ✅ USA-based proxies prevent geo-location issues
- ✅ Works with both browser navigation and file downloads
- ✅ Automatically applied to all download operations
Recommended Proxy Service:
We recommend Webshare.io for reliable, USA-based proxies:
- 🌎 USA-based IPs (no geo-location issues)
- ⚡ Fast and reliable
- 💰 Affordable pricing with free tier
- 🔒 HTTP/HTTPS/SOCKS5 support
Important: Use Static Proxies for Sticky Sessions
For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:
- In your Webshare dashboard, go to Proxy → List
- Copy a Static Proxy endpoint (not the rotating endpoint)
- Use the format:
http://host:port(e.g.,http://45.95.96.132:8080) - Username format:
dmdgluqz-US-{session_id}(session ID added automatically)
The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.
Proxy Pool Size Configuration:
Set PROXY_POOL_SIZE to match your Webshare plan (default: 36):
- Free tier: 10 proxies →
PROXY_POOL_SIZE=10 - Starter plan: 25 proxies →
PROXY_POOL_SIZE=25 - Professional plan: 100 proxies →
PROXY_POOL_SIZE=100 - Enterprise plan: 250+ proxies →
PROXY_POOL_SIZE=250
The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.
Smart ISBN Detection:
When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN:
- Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results
- Amazon ASINs (alphanumeric): Searches by book title instead for better results
- This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs
Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.
Testing your proxy:
node test-proxy.js <ASIN>
This will verify your proxy configuration by attempting to download a book.
Audio Generation
Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.
Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.
Features:
- Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks)
- Default voice: Brian (best for technical content, customizable)
- Automatically truncates long texts to fit API limits
- Generates high-quality MP3 audio files
- Natural, conversational narration style
Output
The tool generates:
-
<book_name>_summary.md- Markdown summary -
<book_name>_summary.txt- Plain text summary -
<book_name>_summary.pdf- PDF summary with table of contents -
<book_name>_summary.epub- EPUB summary with clickable TOC -
<book_name>_summary.mp3- Audio summary (if ElevenLabs key provided) -
<book_name>.pdf- Original or converted PDF -
<book_name>.epub- Original EPUB (if input was EPUB) -
<book_name>_bundle.tgz- Compressed archive containing all files
Example Workflow
# 1. Search for a book summary search # Enter: "A Philosophy of Software Design" # Select from results, get ASIN # 2. Download and process automatically summary isbn B075HYVHWK # Downloads, asks if you want to process # Creates summary bundle automatically! # Alternative: Process a local file summary file ~/Downloads/book.epub
How It Works
- Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
-
Smart Processing Strategy:
- Small PDFs (<400k chars): Direct upload to OpenAI's vision API
- Large PDFs (>400k chars): Intelligent chunking with synthesis
- AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
- Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
- Audio Generation: Optional TTS conversion using ElevenLabs
- Bundling: Creates a compressed archive with all generated files
Intelligent Chunking for Large PDFs
For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:
How it works:
- Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
- Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
- Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
- Parallel Processing: Each chunk is summarized independently by GPT-5
- Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
- Quality Preservation: Maintains narrative flow and eliminates redundancy
Token Limit Handling:
- GPT-5 Input Limit: 272,000 tokens
- System Overhead: 20,000 tokens reserved for prompts and instructions
- Available Tokens: 250,000 tokens for content
- Safety Margin: 70% utilization to account for token estimation variance
- Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)
Benefits:
- ✅ Complete Coverage: Processes entire books without truncation
- ✅ High Quality: Each section gets full AI attention
- ✅ Seamless Output: Final summary reads as a unified document
- ✅ Cost Efficient: Optimizes token usage across multiple API calls
- ✅ Automatic: No configuration needed - works transparently
- ✅ Token-Aware: Respects API limits to prevent errors
Example Output:
📊 PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens
📚 PDF is large - using intelligent chunking strategy
This will process the ENTIRE 523-page PDF without truncation
📐 Using chunk size: 120,000 chars
📦 Created 11 chunks for processing
Chunk 1: Pages 1-48 (119,234 chars)
Chunk 2: Pages 49-95 (118,901 chars)
...
✅ All 11 chunks processed successfully
🔄 Synthesizing chunk summaries into final comprehensive summary...
✅ Final summary synthesized: 45,678 characters
Why Direct PDF Upload?
The tool prioritizes OpenAI's vision API for direct PDF upload when possible:
- ✅ Better Quality: Preserves document formatting, tables, and diagrams
- ✅ More Accurate: AI can see the actual PDF layout and structure
- ✅ Better for Technical Books: Code examples and diagrams are preserved
- ✅ Fallback Strategy: Automatically switches to intelligent chunking for large files
Testing
Summary Forge includes a comprehensive test suite using Vitest.
Run Tests
# Run all tests pnpm test # Run tests in watch mode pnpm test:watch # Run tests with coverage report pnpm test:coverage
Test Coverage
The test suite includes:
- ✅ 30+ passing tests
- Constructor validation
- Helper method tests
- PDF upload functionality tests
- API integration tests
- Error handling tests
- Edge case coverage
- File operation tests
See test/summary-forge.test.js for the complete test suite.
Flashcard Generation
Summary Forge includes powerful flashcard generation capabilities for study and review.
Printable PDF Flashcards
Generate double-sided flashcard PDFs optimized for printing:
import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises'; // Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8'); // Extract Q&A pairs const extractResult = extractFlashcards(markdown, { maxCards: 50 }); console.log(`Extracted ${extractResult.count} flashcards`); // Generate printable PDF const pdfResult = await generateFlashcardsPDF( extractResult.flashcards, './flashcards.pdf', { title: 'JavaScript Fundamentals', branding: 'SummaryForge.com', cardWidth: 3.5, // inches cardHeight: 2.5, // inches fontSize: 11 } ); console.log(`PDF created: ${pdfResult.path}`); console.log(`Total pages: ${pdfResult.pages}`);
Individual Flashcard Images
Generate individual PNG images for each flashcard, perfect for web applications:
import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises'; // Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8'); // Extract Q&A pairs const extractResult = extractFlashcards(markdown); // Generate individual PNG images const imageResult = await generateFlashcardImages( extractResult.flashcards, './flashcards', // Output directory { title: 'JavaScript Fundamentals', branding: 'SummaryForge.com', width: 800, // pixels height: 600, // pixels fontSize: 24 } ); if (imageResult.success) { console.log(`Generated ${imageResult.images.length} images`); console.log('Files:', imageResult.images); // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...] }
Image Naming Convention:
-
q-001.png,q-002.png, etc. - Question cards -
a-001.png,a-002.png, etc. - Answer cards
Use Cases:
- 🌐 Web-based flashcard applications
- 📱 Mobile learning apps
- 🎮 Interactive quiz games
- 📊 Study progress tracking systems
- 🔄 Spaced repetition software
Features:
- ✅ Clean, professional design with book title
- ✅ Automatic text wrapping for long content
- ✅ Customizable dimensions and styling
- ✅ SVG-based rendering for crisp quality
- ✅ Works in Docker (no native dependencies)
Flashcard Extraction Formats
The extractFlashcards function supports multiple markdown formats:
1. Explicit Q&A Format:
**Q: What is a closure?** A: A closure is a function that has access to variables in its outer scope.
2. Definition Lists:
**Closure** : A function that has access to variables in its outer scope.
3. Question Headers:
### What is a closure?
A closure is a function that has access to variables in its outer scope.Examples
See the examples/ directory for more usage examples:
-
programmatic-usage.js- Using as a module -
flashcard-images-demo.js- Generating flashcard images
Troubleshooting
Rate Limiting (1lib.sk)
If you encounter "Too many requests" errors from 1lib.sk:
Error Message:
Too many requests from your IP xxx.xxx.xxx.xxx
Please wait 10 seconds. support@z-lib.fm. Err #ipd1
Automatic Handling: The tool automatically detects rate limiting and:
- ✅ Waits the requested time (usually 10 seconds)
- ✅ Retries up to 3 times with exponential backoff
- ✅ Adds a 2-second buffer to ensure rate limit has cleared
Manual Solutions:
- Wait a few minutes before trying again
- Use a different proxy session (the tool rotates through your proxy pool automatically)
-
Switch to Anna's Archive:
summary search "book title" --source anna - Reduce concurrent requests if running multiple downloads
Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.
Download Button Not Found (1lib.sk)
If you encounter "Download button not found" errors when downloading from 1lib.sk:
-
Check Debug Files: The tool automatically saves
debug-book-page.htmlin the book's directory- Open this file to inspect the actual page structure
- Look for download links or buttons that might have different selectors
-
Review Error Output: The error message includes:
- All selectors that were tried
- List of links found on the page
- Location of the debug HTML file
-
Common Causes:
- Z-Access/Library Access Page: Book page redirects to authentication page (most common)
- Page structure changed (1lib.sk updates their site)
- Book is deleted or unavailable
- Session expired or cookies not maintained
- Proxy issues preventing proper page load
-
Solutions:
-
Recommended: Use Anna's Archive instead:
summary search "book title" --source anna - Try the
search1libcommand separately to verify the book exists - Check if the book page loads correctly in a regular browser with the same proxy
- Verify proxy configuration is working correctly
- Try a different book from search results
-
Recommended: Use Anna's Archive instead:
-
Known Issue - Z-Access Page: If you see links to
library-access.skorZ-Access pagein the debug output, this means:- The book page requires authentication or special access
- 1lib.sk's session management is blocking automated access
- Workaround: Use Anna's Archive which has better automation support
Example Debug Output (Z-Access Issue):
❌ Download button not found on book page
Debug HTML saved to: ./uploads/book_name/debug-book-page.html
Found 6 links on page
First 5 links:
- https://library-access.sk (Z-Access page)
- mailto:blackbox@z-library.so (blackbox@z-library.so)
- https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)
Recommended Alternative:
# Use Anna's Archive instead (more reliable for automation) summary search "prompt engineering" --source anna
IP Bans from Anna's Archive
If you're getting blocked by Anna's Archive:
-
Enable proxy in your configuration:
-
Use a USA-based proxy to avoid geo-location issues
-
Test your proxy before downloading:
node test-proxy.js B0BCTMXNVN
-
Run browser in visible mode to debug:
summary config --headless false
Proxy Configuration
The proxy is used for:
- ✅ Browser navigation (Puppeteer)
- ✅ File downloads (fetch with https-proxy-agent)
- ✅ All HTTP requests to Anna's Archive
Supported proxy formats:
http://proxy.example.com:8080https://proxy.example.com:8080socks5://proxy.example.com:1080-
http://proxy.example.com:8080-session-<SESSION_ID>(sticky session)
Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.
Webshare Sticky Sessions:
Add -session-<YOUR_SESSION_ID> to your proxy URL to maintain the same IP:
http://p.webshare.io:80-session-myapp123
CAPTCHA Solving
When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:
- Sign up for 2Captcha: Get API key here
- Add to configuration:
- Enter your 2Captcha API key when prompted
The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.
Limitations
- Maximum PDF file size: No practical limit (intelligent chunking handles any size)
- GPT-5 uses default temperature of 1 (not configurable)
- Requires external tools: Calibre, Pandoc, XeLaTeX
- CAPTCHA solving requires 2captcha.com API key (optional)
- Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
- Anna's Archive may block IPs without proxy configuration
- Chunked processing uses text extraction (images/diagrams described in text only)
Roadmap
- [x] ISBN/ASIN lookup via Anna's Archive
- [x] Automatic download from Anna's Archive with CAPTCHA solving
- [x] Book title search via Rainforest API
- [x] CLI with interactive mode
- [x] ESM module for programmatic use
- [x] Audio generation with ElevenLabs TTS
- [x] Direct PDF upload to OpenAI vision API
- [x] EPUB format prioritization (open standard)
- [ ] Support for more input formats (MOBI, AZW3)
- [ ] Chunked processing for very large books (>100MB)
- [ ] Custom summary templates
- [ ] Web interface
- [ ] Multiple voice options for audio
- [ ] Audio chapter markers
- [ ] Batch processing multiple books
License
ISC
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.