@profullstack/summary-forge-module

Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: git@github.com:profullstack/summary-forge-module.git

Features

📚 Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
🌐 Web Page Summarization: Fetch and summarize any web page with automatic content extraction
🤖 AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
📊 Vision API: Preserves formatting, tables, diagrams, and images from PDFs
🧩 Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
🛡️ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
📦 Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
🃏 Printable Flashcards: Generates double-sided flashcard PDFs for studying
🖼️ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
🎙️ Natural Audio Narration: AI-generated conversational audio script for better listening
🗜️ Bundled Output: Packages everything into a convenient .tgz archive
🔄 Auto-Conversion: Automatically converts EPUB to PDF using Calibre
🔍 Book Search: Search Amazon by title using Rainforest API
📖 Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
💻 CLI & Module: Use as a command-line tool or import as an ESM module
🎨 Interactive Mode: Guided workflow with inquirer prompts
📥 EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)

Installation

Global Installation (CLI)

pnpm install -g @profullstack/summary-forge-module

Local Installation (Module)

pnpm add @profullstack/summary-forge-module

Prerequisites

Node.js v20 or newer

Calibre (for EPUB conversion - provides ebook-convert command)

# macOS
brew install calibre

# Ubuntu/Debian
sudo apt-get install calibre

# Arch Linux
sudo pacman -S calibre

Pandoc (for document conversion)

# macOS
brew install pandoc

# Ubuntu/Debian
sudo apt-get install pandoc

# Arch Linux
sudo pacman -S pandoc

XeLaTeX (for PDF generation)

# macOS
brew install --cask mactex

# Ubuntu/Debian
sudo apt-get install texlive-xetex

# Arch Linux
sudo pacman -S texlive-core texlive-xetex

CLI Usage

First-Time Setup

Before using the CLI, configure your API keys:

This interactive command will prompt you for:

OpenAI API Key (required)
Rainforest API Key (optional - for Amazon book search)
ElevenLabs API Key (optional - for audio generation, get key here)
2Captcha API Key (optional - for CAPTCHA solving, sign up here)
Browserless API Key (optional)
Browser and proxy settings

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

Managing Configuration

# View current configuration
summary config

# Update configuration
summary setup

# Delete configuration
summary config --delete

Note: The CLI will use configuration in this priority order:

Environment variables (.env file)
Configuration file (~/.config/summary-forge/settings.json)

Interactive Mode (Recommended)

summary interactive
# or
summary i

This launches an interactive menu where you can:

Process local files (PDF/EPUB)
Process web page URLs
Search for books by title
Look up books by ISBN/ASIN

Process a File

summary file /path/to/book.pdf
summary file /path/to/book.epub

# Force overwrite if directory already exists
summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

Process a Web Page URL

summary url https://example.com/article
summary url https://blog.example.com/post/123

# Force overwrite if directory already exists
summary url https://example.com/article --force
summary url https://example.com/article -f

Features:

Automatically fetches web page content using Puppeteer
Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
Saves web page as PDF for processing
Generates clean title from page title or uses OpenAI to create one
Prompts specifically optimized for web page content (ignores nav/ads/footers)
Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

Search by Title

# Search for books (defaults to 1lib.sk - faster, no DDoS protection)
summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date

# Use Anna's Archive instead (has DDoS protection, slower)
summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli

# Title search (shortcut for search command)
summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive

# ISBN lookup (defaults to 1lib.sk)
summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive

# Common Options:
#   --source <source>              Search source: zlib (1lib.sk, default) or anna (Anna's Archive)
#   -n, --max-results <number>     Maximum results to display (default: 10)
#   -f, --force                    Auto-select first result and process immediately
#
# 1lib.sk Options (--source zlib, default):
#   --year-from <year>             Filter by publication year from (e.g., 2020)
#   --year-to <year>               Filter by publication year to (e.g., 2024)
#   -l, --languages <languages>    Language filter, comma-separated (default: english)
#   -e, --extensions <extensions>  File extensions, comma-separated (case-insensitive, default: PDF)
#   --content-types <types>        Content types, comma-separated (default: book)
#   -s, --order <order>            Sort order: date (newest) or empty for relevance
#   --view <view>                  View type: list or grid (default: list)
#
# Anna's Archive Options (--source anna):
#   -f, --format <format>          Filter by format: pdf, epub, pdf,epub, or all (default: pdf)
#   -s, --sort <sort>              Sort by: date (newest) or empty for relevance (default: '')
#   -l, --language <language>      Language code(s), comma-separated (e.g., en, es, fr) (default: en)
#   --sources <sources>            Data sources, comma-separated (default: all sources)
#                                  Options: zlib, lgli, lgrs, and others

Look up by ISBN/ASIN

summary isbn B075HYVHWK

# Force overwrite if directory already exists
summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

Help

summary --help
summary file --help

Programmatic Usage

JSON API Format

All methods now return consistent JSON objects with the following structure:

{
  success: true | false,  // Indicates if operation succeeded
  ...data,                // Method-specific data fields
  error?: string,         // Error message (only when success is false)
  message?: string        // Success message (optional)
}

This enables:

✅ Consistent error handling - Check success field instead of try-catch
✅ REST API ready - Direct JSON responses for HTTP endpoints
✅ Better debugging - Rich metadata in all responses
✅ Type-safe - Predictable structure for TypeScript users

Basic Example

import { SummaryForge } from '@profullstack/summary-forge-module';
import { loadConfig } from '@profullstack/summary-forge-module/config';

// Load config from ~/.config/summary-forge/settings.json
const configResult = await loadConfig();
if (!configResult.success) {
  console.error('Failed to load config:', configResult.error);
  process.exit(1);
}

const forge = new SummaryForge(configResult.config);

const result = await forge.processFile('./my-book.pdf');
if (result.success) {
  console.log('Summary created:', result.archive);
  console.log('Files:', result.files);
  console.log('Costs:', result.costs);
} else {
  console.error('Processing failed:', result.error);
}

Configuration Options

import { SummaryForge } from '@profullstack/summary-forge-module';

const forge = new SummaryForge({
  // Required
  openaiApiKey: 'sk-...',
  
  // Optional API keys
  rainforestApiKey: 'your-key',      // For Amazon search
  elevenlabsApiKey: 'sk-...',        // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv)
  twocaptchaApiKey: 'your-key',      // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996)
  browserlessApiKey: 'your-key',     // For browserless.io
  
  // Processing options
  maxChars: 500000,                  // Max chars to process
  maxTokens: 20000,                  // Max tokens in output summary
  maxInputTokens: 250000,            // Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio options
  voiceId: '21m00Tcm4TlvDq8ikWAM',  // ElevenLabs voice
  voiceSettings: {
    stability: 0.5,
    similarity_boost: 0.75
  },
  
  // Browser options
  headless: true,                    // Run browser in headless mode
  enableProxy: false,                // Enable proxy
  proxyUrl: 'http://proxy.com',     // Proxy URL
  proxyUsername: 'user',             // Proxy username
  proxyPassword: 'pass',             // Proxy password
  proxyPoolSize: 36                  // Number of proxies in pool (default: 36)
});

const result = await forge.processFile('./book.epub');
console.log('Archive:', result.archive);

Search for Books

Using Amazon/Rainforest API

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  rainforestApiKey: process.env.RAINFOREST_API_KEY
});

const searchResult = await forge.searchBookByTitle('Clean Code');
if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results:`);
console.log(searchResult.results.map(b => ({
  title: b.title,
  author: b.author,
  asin: b.asin
})));

// Get download URL
const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin);
console.log('Download from:', url);

Using Anna's Archive Direct Search (No Rainforest API Required)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'  // Sort by newest
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  format: r.format,
  size: `${r.sizeInMB.toFixed(1)} MB`,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1];
  const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title);
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    console.log('Directory:', downloadResult.directory);
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Using 1lib.sk Search (Faster, No DDoS Protection)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.search1lib('LLM Fine Tuning', {
  maxResults: 10,
  yearFrom: 2020,
  languages: ['english'],
  extensions: ['PDF']
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  year: r.year,
  extension: r.extension,
  size: r.size,
  language: r.language,
  isbn: r.isbn,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const downloadResult = await forge.downloadFrom1lib(
    searchResult.results[0].url,
    '.',
    searchResult.results[0].title
  );
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    
    // Process the downloaded book
    const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier);
    if (processResult.success) {
      console.log('Summary created:', processResult.archive);
      console.log('Costs:', processResult.costs);
    } else {
      console.error('Processing failed:', processResult.error);
    }
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons
Debug HTML Capture: Saves page HTML when download button isn't found
Link Analysis: Lists all links on the page for troubleshooting
Detailed Error Messages: Provides actionable information for debugging

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

API Reference

Constructor Options

new SummaryForge({
  // API Keys
  openaiApiKey: string,      // Required: OpenAI API key
  rainforestApiKey: string,  // Optional: For title search
  elevenlabsApiKey: string,  // Optional: For audio generation
  twocaptchaApiKey: string,  // Optional: For CAPTCHA solving
  browserlessApiKey: string, // Optional: For browserless.io
  
  // Processing Options
  maxChars: number,          // Optional: Max chars to process (default: 400000)
  maxTokens: number,         // Optional: Max tokens in output summary (default: 16000)
  maxInputTokens: number,    // Optional: Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio Options
  voiceId: string,           // Optional: ElevenLabs voice ID (default: Brian)
  voiceSettings: object,     // Optional: Voice customization settings
  
  // Browser Options
  headless: boolean,         // Optional: Run browser in headless mode (default: true)
  enableProxy: boolean,      // Optional: Enable proxy (default: false)
  proxyUrl: string,          // Optional: Proxy URL
  proxyUsername: string,     // Optional: Proxy username
  proxyPassword: string,     // Optional: Proxy password
  proxyPoolSize: number      // Optional: Number of proxies in pool (default: 36)
})

Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

Processing Methods

processFile(filePath, asin?) - Process a PDF or EPUB file

Returns: { success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? }

Example:

const result = await forge.processFile('./book.pdf');
if (result.success) {
  console.log('Archive:', result.archive);
  console.log('Costs:', result.costs);
}

processWebPage(url, outputDir?) - Process a web page URL
- Returns: { success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? }
- Example:
```
const result = await forge.processWebPage('https://example.com/article');
if (result.success) {
  console.log('Summary:', result.markdown.substring(0, 100));
}
```

Search Methods

searchBookByTitle(title) - Search Amazon using Rainforest API

Returns: { success, results, count, query, message, error? }

Example:

const result = await forge.searchBookByTitle('Clean Code');
if (result.success) {
  console.log(`Found ${result.count} books`);
}

searchAnnasArchive(query, options?) - Search Anna's Archive directly

Returns: { success, results, count, query, options, message, error? }

Example:

const result = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'
});
if (result.success) {
  console.log(`Found ${result.count} results`);
}

search1lib(query, options?) - Search 1lib.sk
- Returns: { success, results, count, query, options, message, error? }

Download Methods

downloadFromAnnasArchive(asin, outputDir?, bookTitle?) - Download from Anna's Archive
- Returns: { success, filepath, directory, asin, format, message, error? }
- Example:
```
const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.');
if (result.success) {
  console.log('Downloaded to:', result.filepath);
}
```
downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?) - Download from 1lib.sk
- Returns: { success, filepath, directory, title, format, message, error? }
search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?) - Search and download in one session
- Returns: { success, results, download, message, error? }

Generation Methods

generateSummary(pdfPath) - Generate AI summary from PDF
- Returns: { success, markdown, length, method, chunks?, message, error? }
- Methods: gpt5_pdf_upload, text_extraction_single, text_extraction_chunked
- Example:
```
const result = await forge.generateSummary('./book.pdf');
if (result.success) {
  console.log(`Generated ${result.length} char summary using ${result.method}`);
}
```
generateAudioScript(markdown) - Generate audio-friendly narration script
- Returns: { success, script, length, message }
generateAudio(text, outputPath) - Generate audio using ElevenLabs TTS
- Returns: { success, path, size, duration, message, error? }
generateOutputFiles(markdown, basename, outputDir) - Generate all output formats
- Returns: { success, files: {...}, message }

Utility Methods

convertEpubToPdf(epubPath) - Convert EPUB to PDF
- Returns: { success, pdfPath, originalPath, message, error? }
createBundle(files, archiveName) - Create tar.gz archive
- Returns: { success, path, files, message, error? }
getCostSummary() - Get cost tracking information
- Returns: { success, openai, elevenlabs, rainforest, total, breakdown }

Configuration

CLI Configuration (Recommended)

For CLI usage, run the setup command to configure your API keys:

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

Environment Variables (Alternative)

For programmatic usage or if you prefer environment variables, create a .env file:

OPENAI_API_KEY=sk-your-key-here
RAINFOREST_API_KEY=your-key-here
ELEVENLABS_API_KEY=sk-your-key-here  # Optional: for audio generation
TWOCAPTCHA_API_KEY=your-key-here      # Optional: for CAPTCHA solving
BROWSERLESS_API_KEY=your-key-here     # Optional

# Browser Configuration
HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)

Or set them in your shell:

export OPENAI_API_KEY=sk-your-key-here
export RAINFOREST_API_KEY=your-key-here
export ELEVENLABS_API_KEY=sk-your-key-here  # Optional

Configuration Priority

When using the module programmatically, configuration is loaded in this order (highest priority first):

Constructor options - Passed directly to new SummaryForge(options)
Environment variables - From .env file or shell
Config file - From ~/.config/summary-forge/settings.json (CLI only)

Proxy Configuration (Recommended for Anna's Archive)

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

When prompted:

Enable proxy: Yes
Enter proxy URL: http://your-proxy.com:8080
Enter proxy username and password

Why use a proxy?

✅ Avoids IP bans from Anna's Archive
✅ USA-based proxies prevent geo-location issues
✅ Works with both browser navigation and file downloads
✅ Automatically applied to all download operations

Recommended Proxy Service:

We recommend Webshare.io for reliable, USA-based proxies:

🌎 USA-based IPs (no geo-location issues)
⚡ Fast and reliable
💰 Affordable pricing with free tier
🔒 HTTP/HTTPS/SOCKS5 support

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

In your Webshare dashboard, go to Proxy → List
Copy a Static Proxy endpoint (not the rotating endpoint)
Use the format: http://host:port (e.g., http://45.95.96.132:8080)
Username format: dmdgluqz-US-{session_id} (session ID added automatically)

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

Set PROXY_POOL_SIZE to match your Webshare plan (default: 36):

Free tier: 10 proxies → PROXY_POOL_SIZE=10
Starter plan: 25 proxies → PROXY_POOL_SIZE=25
Professional plan: 100 proxies → PROXY_POOL_SIZE=100
Enterprise plan: 250+ proxies → PROXY_POOL_SIZE=250

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN:

Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results
Amazon ASINs (alphanumeric): Searches by book title instead for better results
This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:

node test-proxy.js <ASIN>

This will verify your proxy configuration by attempting to download a book.

Audio Generation

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

Features:

Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks)
Default voice: Brian (best for technical content, customizable)
Automatically truncates long texts to fit API limits
Generates high-quality MP3 audio files
Natural, conversational narration style

Output

The tool generates:

<book_name>_summary.md - Markdown summary
<book_name>_summary.txt - Plain text summary
<book_name>_summary.pdf - PDF summary with table of contents
<book_name>_summary.epub - EPUB summary with clickable TOC
<book_name>_summary.mp3 - Audio summary (if ElevenLabs key provided)
<book_name>.pdf - Original or converted PDF
<book_name>.epub - Original EPUB (if input was EPUB)
<book_name>_bundle.tgz - Compressed archive containing all files

Example Workflow

# 1. Search for a book
summary search
# Enter: "A Philosophy of Software Design"
# Select from results, get ASIN

# 2. Download and process automatically
summary isbn B075HYVHWK
# Downloads, asks if you want to process
# Creates summary bundle automatically!

# Alternative: Process a local file
summary file ~/Downloads/book.epub

How It Works

Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
Smart Processing Strategy:
- Small PDFs (<400k chars): Direct upload to OpenAI's vision API
- Large PDFs (>400k chars): Intelligent chunking with synthesis
AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
Audio Generation: Optional TTS conversion using ElevenLabs
Bundling: Creates a compressed archive with all generated files

Intelligent Chunking for Large PDFs

For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:

How it works:

Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
Parallel Processing: Each chunk is summarized independently by GPT-5
Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
Quality Preservation: Maintains narrative flow and eliminates redundancy

Token Limit Handling:

GPT-5 Input Limit: 272,000 tokens
System Overhead: 20,000 tokens reserved for prompts and instructions
Available Tokens: 250,000 tokens for content
Safety Margin: 70% utilization to account for token estimation variance
Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)

Benefits:

✅ Complete Coverage: Processes entire books without truncation
✅ High Quality: Each section gets full AI attention
✅ Seamless Output: Final summary reads as a unified document
✅ Cost Efficient: Optimizes token usage across multiple API calls
✅ Automatic: No configuration needed - works transparently
✅ Token-Aware: Respects API limits to prevent errors

Example Output:

📊 PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens
📚 PDF is large - using intelligent chunking strategy
   This will process the ENTIRE 523-page PDF without truncation
📐 Using chunk size: 120,000 chars
📦 Created 11 chunks for processing
   Chunk 1: Pages 1-48 (119,234 chars)
   Chunk 2: Pages 49-95 (118,901 chars)
   ...
✅ All 11 chunks processed successfully
🔄 Synthesizing chunk summaries into final comprehensive summary...
✅ Final summary synthesized: 45,678 characters

Why Direct PDF Upload?

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

✅ Better Quality: Preserves document formatting, tables, and diagrams
✅ More Accurate: AI can see the actual PDF layout and structure
✅ Better for Technical Books: Code examples and diagrams are preserved
✅ Fallback Strategy: Automatically switches to intelligent chunking for large files

Testing

Summary Forge includes a comprehensive test suite using Vitest.

Run Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Run tests with coverage report
pnpm test:coverage

Test Coverage

The test suite includes:

✅ 30+ passing tests
Constructor validation
Helper method tests
PDF upload functionality tests
API integration tests
Error handling tests
Edge case coverage
File operation tests

See test/summary-forge.test.js for the complete test suite.

Flashcard Generation

Summary Forge includes powerful flashcard generation capabilities for study and review.

Printable PDF Flashcards

Generate double-sided flashcard PDFs optimized for printing:

import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown, { maxCards: 50 });
console.log(`Extracted ${extractResult.count} flashcards`);

// Generate printable PDF
const pdfResult = await generateFlashcardsPDF(
  extractResult.flashcards,
  './flashcards.pdf',
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    cardWidth: 3.5,   // inches
    cardHeight: 2.5,  // inches
    fontSize: 11
  }
);

console.log(`PDF created: ${pdfResult.path}`);
console.log(`Total pages: ${pdfResult.pages}`);

Individual Flashcard Images

Generate individual PNG images for each flashcard, perfect for web applications:

import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown);

// Generate individual PNG images
const imageResult = await generateFlashcardImages(
  extractResult.flashcards,
  './flashcards',  // Output directory
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    width: 800,   // pixels
    height: 600,  // pixels
    fontSize: 24
  }
);

if (imageResult.success) {
  console.log(`Generated ${imageResult.images.length} images`);
  console.log('Files:', imageResult.images);
  // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...]
}

Image Naming Convention:

q-001.png, q-002.png, etc. - Question cards
a-001.png, a-002.png, etc. - Answer cards

Use Cases:

🌐 Web-based flashcard applications
📱 Mobile learning apps
🎮 Interactive quiz games
📊 Study progress tracking systems
🔄 Spaced repetition software

Features:

✅ Clean, professional design with book title
✅ Automatic text wrapping for long content
✅ Customizable dimensions and styling
✅ SVG-based rendering for crisp quality
✅ Works in Docker (no native dependencies)

Flashcard Extraction Formats

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:

**Q: What is a closure?**
A: A closure is a function that has access to variables in its outer scope.

2. Definition Lists:

**Closure**
: A function that has access to variables in its outer scope.

3. Question Headers:

### What is a closure?

A closure is a function that has access to variables in its outer scope.

Examples

See the examples/ directory for more usage examples:

programmatic-usage.js - Using as a module
flashcard-images-demo.js - Generating flashcard images

Troubleshooting

Rate Limiting (1lib.sk)

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:

Too many requests from your IP xxx.xxx.xxx.xxx
Please wait 10 seconds. support@z-lib.fm. Err #ipd1

Automatic Handling: The tool automatically detects rate limiting and:

✅ Waits the requested time (usually 10 seconds)
✅ Retries up to 3 times with exponential backoff
✅ Adds a 2-second buffer to ensure rate limit has cleared

Manual Solutions:

Wait a few minutes before trying again
Use a different proxy session (the tool rotates through your proxy pool automatically)
Switch to Anna's Archive: summary search "book title" --source anna
Reduce concurrent requests if running multiple downloads

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

Download Button Not Found (1lib.sk)

If you encounter "Download button not found" errors when downloading from 1lib.sk:

Check Debug Files: The tool automatically saves debug-book-page.html in the book's directory
- Open this file to inspect the actual page structure
- Look for download links or buttons that might have different selectors
Review Error Output: The error message includes:
- All selectors that were tried
- List of links found on the page
- Location of the debug HTML file
Common Causes:
- Z-Access/Library Access Page: Book page redirects to authentication page (most common)
- Page structure changed (1lib.sk updates their site)
- Book is deleted or unavailable
- Session expired or cookies not maintained
- Proxy issues preventing proper page load
Solutions:
- Recommended: Use Anna's Archive instead: summary search "book title" --source anna
- Try the search1lib command separately to verify the book exists
- Check if the book page loads correctly in a regular browser with the same proxy
- Verify proxy configuration is working correctly
- Try a different book from search results
Known Issue - Z-Access Page: If you see links to library-access.sk or Z-Access page in the debug output, this means:
- The book page requires authentication or special access
- 1lib.sk's session management is blocking automated access
- Workaround: Use Anna's Archive which has better automation support

Example Debug Output (Z-Access Issue):

❌ Download button not found on book page
   Debug HTML saved to: ./uploads/book_name/debug-book-page.html
   Found 6 links on page
   First 5 links:
   - https://library-access.sk (Z-Access page)
   - mailto:blackbox@z-library.so (blackbox@z-library.so)
   - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)

Recommended Alternative:

# Use Anna's Archive instead (more reliable for automation)
summary search "prompt engineering" --source anna

IP Bans from Anna's Archive

If you're getting blocked by Anna's Archive:

Enable proxy in your configuration:
Use a USA-based proxy to avoid geo-location issues
Test your proxy before downloading:
```
node test-proxy.js B0BCTMXNVN
```
Run browser in visible mode to debug:
```
summary config --headless false
```

Proxy Configuration

The proxy is used for:

✅ Browser navigation (Puppeteer)
✅ File downloads (fetch with https-proxy-agent)
✅ All HTTP requests to Anna's Archive

Supported proxy formats:

http://proxy.example.com:8080
https://proxy.example.com:8080
socks5://proxy.example.com:1080
http://proxy.example.com:8080-session-<SESSION_ID> (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add -session-<YOUR_SESSION_ID> to your proxy URL to maintain the same IP:

http://p.webshare.io:80-session-myapp123

CAPTCHA Solving

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

Sign up for 2Captcha: Get API key here
Add to configuration:
Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.

Limitations

Maximum PDF file size: No practical limit (intelligent chunking handles any size)
GPT-5 uses default temperature of 1 (not configurable)
Requires external tools: Calibre, Pandoc, XeLaTeX
CAPTCHA solving requires 2captcha.com API key (optional)
Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
Anna's Archive may block IPs without proxy configuration
Chunked processing uses text extraction (images/diagrams described in text only)

Roadmap

[x] ISBN/ASIN lookup via Anna's Archive
[x] Automatic download from Anna's Archive with CAPTCHA solving
[x] Book title search via Rainforest API
[x] CLI with interactive mode
[x] ESM module for programmatic use
[x] Audio generation with ElevenLabs TTS
[x] Direct PDF upload to OpenAI vision API
[x] EPUB format prioritization (open standard)
[ ] Support for more input formats (MOBI, AZW3)
[ ] Chunked processing for very large books (>100MB)
[ ] Custom summary templates
[ ] Web interface
[ ] Multiple voice options for audio
[ ] Audio chapter markers
[ ] Batch processing multiple books

License

ISC

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.