A FastAPI-based Model Context Protocol (MCP) server that aggregates and serves your professional profile, projects, publications, and career data.
Includes a flexible scraper framework with optional built-in LLM-powered content extraction and enrichment, supporting multiple data sources like GitHub, Medium, LinkedIn, RSS feeds, and more.
For your convenience look at the meMCP Skill for how to interact with this server programmatically, e.g., via an LLM agent or external scripts.
TOC
- meMCP
Features
- Three-Tier access system (public, private, elevated) with token-based authentication
- elevated access allows access to a LLM powered endpoint for real time interactions, like job interview prep or project brainstorming based on the data in the MCP
- owner can decide what endpoints are publicly available
- Aggregates data from GitHub, Medium, RSS feeds, Sitemaps, LinkedIn, by plain HTML scraping or manual YAML files
- LLM-powered content extraction and enrichment (e.g., summarization, tag extraction, skill/technology classification)
- Entity graph with relationships
- RESTful API with advanced search and filtering
- full MCP compliance with endpoints for index, coverage contract, prompts, tools, and resources:
- endpoint
/promptsto return a couple of example prompts that can be used to query the MCP programmatically (e.g., via LLM agents or external scripts) - endpoint
/mcp/toolsto return a list of available MCP tools that can be used for programmatic queries (e.g., via LLM agents or external scripts) - endpoint
/mcp/resourcesto return a browsable list of data resources (e.g., entities, categories, technologies, skills, tags) mapped to REST endpoints for easy access and integration
- endpoint
- calculate different metrics for
skills,technologiesandtags:proficiency(based on recency and duration of experience)experience_years(total years of experience)project_count(number of projects associated with the skill/technology/tag)frequency(how often the skill/technology/tag appears across all entities)last_used(date of the most recent entity associated with the skill/technology/tag)diversity(variety of contexts in which the skill/technology/tag is used, e.g., different categories or flavors)growth(trend of usage over time, e.g., increasing, stable, decreasing)distribution(indicates distribution of a tag, skill, or technology across different entities flavors, stage categories (job, education, ...) or oeuvre categories (coding, article, book, ...))relevance(a composite score based on the above metrics to indicate overall relevance of a skill/technology/tag in the profile)
- Multi-language support (EN, DE, ...) with automatic detection and translation of content
Available Scrapers
| Connector | Purpose | Config Example |
|---|---|---|
github_api |
GitHub repos via API | url: https://api.github.com/users/USERNAME/repos |
rss |
RSS/Atom feeds | url: https://example.com/feed.xml |
medium_raw |
Raw HTML dump of Medium profile (bypasses Cloudflare) | url: file://data/Medium.html |
linkedin_pdf |
LinkedIn profile PDF export with smart YAML caching | url: file:///data/linkedin_profile.pdf |
sitemap |
Scrape multiple URLs from sitemap.xml | url: https://example.com/sitemap.xmlcache-file: file://data/cache.yaml |
html |
Scrape single HTML page | url: https://example.comcache-file: file://data/cache.yaml |
manual |
Manual YAML data entry | url: file://path/to/data.yaml |
Note on Medium: The medium_raw connector extracts all articles from a saved HTML copy of your Medium profile page, while rss only gets the ~10 most recent posts from the RSS feed.
Data Model
The MCP stores information as entities. There are three main types of entities (called flavors):
identity- Static information about you (name, bio, greetings, contact details, etc.)stages- Career stages (education, jobs, projects, etc.) with timeframes and descriptionsoeuvre- Your work (code repos, articles, books, talks, etc.) with metadata and links
Entities carry additional meta data, like:
source- refers to the configuration slugsource_url- refers to where the script fetched the data fromtitle- if the particular source provides a titledescription- an LLM enriched description of the entity, based on the original source content (e.g., repo description, article summary, etc.)start_dateandend_date- for stages, if available (e.g., job duration, project timeline, etc.)date- for oeuvre, if available (e.g., publication date, repo creation date, etc.)created_atandupdated_at- timestamps for when the entity was created and last updated in the databasellm_enriched- boolean flag indicating whether the entity has been enriched with LLM-generated content (e.g., description)llm_model- the name of the LLM model used for enrichment, if applicable (e.g., "mistral-small:24b-instruct-2501-q4_K_M")
Each entity of stages can be classified into three categories:
educationjobother(specified in thedescriptionfield)
Entities of oeuvre can be classified into:
codingblog_postarticlebookwebsite- and more...
Besides that stages and oeuvre entities can be further classified to identify technologies, skills and general tags.
technologies describe what you worked with in a specific way:
- Programming languages (Python, JavaScript, etc.)
- Frameworks (React, Django, etc.)
- Tools (Docker, Git, etc.)
skills described what you did in a more general way, like
- Data Analysis
- Project Management
- Public Speaking
- System Operations
And finally tags tries to capute general attributes that are not covered by the other two, like:
- Maze Runner
- Open Source
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Configure
Edit config.tech.yaml (infrastructure) and config.content.yaml (identity & sources) to set:
- Your static identity information
- LLM backend (Remote AI provider or locally, like Ollama)
- Data sources (GitHub, Medium, blogs, etc.)
3. Ingest Data
# Runs a full ingestion with all provided sources and utilizes LLM for content extraction python ingest.py # Force refresh (ignore cache) python ingest.py --force # Fast mode: fetch without LLM enrichment (skips PDF sources, uses .yaml cache if available) python ingest.py --disable-llm # LLM-only mode: enrich existing entities (run after --disable-llm) python ingest.py --llm-only # Process specific source only python ingest.py --source github python ingest.py --source medium # Dry run (fetch but don't save to DB) python ingest.py --dry-run
4. Calculate Metrics (Optional)
Calculate metrics for all skills, technologies, and tags:
# Calculate metrics for all tags python recalculate_metrics.py # Calculate only specific tag type python recalculate_metrics.py --type skill python recalculate_metrics.py --type technology python recalculate_metrics.py --type generic # Force recalculation (ignore version) python recalculate_metrics.py --force # Verbose output with top 10 rankings python recalculate_metrics.py --verbose
Metrics are automatically calculated during ingest.py runs, but you can recalculate them independently without re-fetching data.
5. Run Server
# Start the FastAPI server with uvicorn uvicorn app.main:app --host 0.0.0.0 --port 8000 # Or with auto-reload for development uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Server will be available at: http://localhost:8000
Configuration
Basic Settings
Key settings in config.tech.yaml and config.content.yaml:
server: host: 0.0.0.0 port: 8000 admin_token: "change-me-please" llm: backend: ollama # or groq model: mistral-small:24b-instruct-2501-q4_K_M oeuvre: github: enabled: true connector: github_api url: https://api.github.com/users/YOUR_USERNAME/repos medium: enabled: true connector: medium_raw # or rss url: file://data/Medium.html # for medium_raw, or https://medium.com/feed/@username for rss
Scraper Modules
LinkedIn Profile (PDF Export)
LinkedIn does not allow API access to your profile data. Export your profile as PDF and use the linkedin_pdf connector to extract structured data with LLM.
YAML Cache Workflow:
- First run: Parses PDF with LLM → creates
<pdf_file>.yamlcache - Subsequent runs: Loads from YAML cache (unless PDF modified or
--force) - Re-parse triggers:
- PDF modification date newer than YAML
last_synced --forceflag provided
- PDF modification date newer than YAML
- LLM enrichment: Automatically checks each entity
- If missing tags/skills/technologies → re-enriches with LLM
- If already enriched → uses cached data
- Manual editing: Edit YAML to refine content (preserved across runs)
stages: connector: linkedin_pdf enabled: true url: file:///data/linkedin_profile.pdf # LinkedIn export PDF file llm-processing: true
How to get LinkedIn PDF:
- Go to your LinkedIn profile
- Click "More" → "Save to PDF"
- Save as
data/linkedin_profile.pdf
First run:
python ingest.py --source stages
Creates data/linkedin_profile.pdf.yaml with experience, education, certifications.
Edit and update:
- Edit
data/linkedin_profile.pdf.yamlmanually - Re-run:
python ingest.py --source stages(loads from YAML, enriches missing fields)
Force re-parse:
python ingest.py --source stages --force
Medium.com
Medium does not allow scraping your complete story list. To consider all your Medium articles, scroll to the end of your list of stories (https://medium.com/me/stories?tab=posts-published) and then open the DOM inspector. Copy the top node () and past it in a file (e.g., data/Medium.html). Then use the medium_raw connector to extract all articles from this file.
The scraper will:
- Extract article URLs, titles, and dates from the saved HTML
- Fetch each article's full content using a headless browser (bypasses Cloudflare protection)
- Create a YAML cache at
data/Medium.html.yamlfor manual editing
Prerequisites: Install Playwright for headless browser support:
pip install playwright playwright install chromium
The YAML cache will be used for future runs to avoid re-parsing the HTML file and re-fetching articles.
medium_raw: # can be named anything, but the `connector` must be `medium_raw` connector: medium_raw enabled: true url: file://data/Medium.html sub_type_override: article limit: 0 # 0 = all, you can optionally use the `rss` connector, but it will only consider the ~10 most recent articles from the RSS feed. ```yaml medium_rss: # Optional: for incremental updates only connector: rss enabled: false url: https://medium.com/feed/@nickyreinert sub_type_override: article limit: 0 # 0 = all available (RSS typically has ~10 most recent) cache_ttl_hours: 168 # 7 days llm-processing: true
limit: 0 # 0 = all available (RSS typically has ~10 most recent)
cache_ttl_hours: 168 # 7 days
llm-processing: true
**Best practice**: Use `medium_raw` once to get your complete article history (metadata only), then switch to `rss` for ongoing updates with content.
### GitHub
Fetches repositories from GitHub API with full pagination support.
Features:
- Automatic pagination (fetches ALL repos, not just first page)
- README fetching for richer descriptions (optional)
- Fork filtering (skips forks by default)
- Language and metadata extraction
- Stars and forks tracking
```yaml
github: # can be named anything, but the `connector` must be `github_api`
connector: github_api
enabled: true
url: https://api.github.com/users/nickyreinert/repos
sub_type_override: coding # Override default sub_type (coding/blog_post/article/book/website/podcast/video)
limit: 0 # 0 = all repos, otherwise integer limit (applied across all pages)
fetch_readmes: true # true = richer descriptions, slower
include_forks: false # false = skip forked repos (default)
llm-processing: true
Generic RSS/Atom Feeds
You can also add any RSS/Atom feed as source. The parser will extract metadata and content, which can be enriched with LLM to create a more detailed description of the article and extract technologies, skills and tags.
my_blog: # can be named anything connector: rss enabled: true url: https://myblog.com/feed.xml sub_type_override: blog_post limit: 0 # 0 = all available cache_ttl_hours: 168 # 7 days llm-processing: true
Sitemap Scraper (Multi-Entity)
Parse sitemap.xml and scrape multiple pages. Each page becomes a separate entity.
Cache File Workflow:
- If
cache-fileis specified and exists: loads from cache, skips fetching - If cache exists but LLM fields (tags, skills, technologies) are empty AND
llm-processing: true: reprocesses with LLM - If cache missing: fetches pages and saves to cache file
- Cache file allows manual editing of extracted data without losing changes on subsequent runs
my_blog: connector: sitemap enabled: true url: https://myblog.com/sitemap.xml limit: 0 # 0 = all, otherwise integer limit cache_ttl_hours: 168 # 7 days llm-processing: true cache-file: file://data/myblog_sitemap.yaml # Optional: cache for manual editing connector-setup: post-title-selector: h1 post-content-selector: article post-published-date-selector: 'time[datetime]' post-description-selector: 'meta[name="description"]'
HTML Scraper (Single-Entity)
Scrape a single HTML page (e.g., landing page, project website, portfolio).
Cache File Workflow:
- Same as sitemap connector, but creates single entity
- Ideal for project websites, portfolios, landing pages
my_project_website: connector: html enabled: true url: https://myproject.com sub_type_override: website llm-processing: true cache-file: file://data/myproject.yaml # Optional: cache for manual editing connector-setup: post-title-selector: h1 post-content-selector: main post-description-selector: 'meta[name="description"]'
Static Manual Data
You can provide a YAML file with manually curated entities. This is useful for static information that doesn't change often, or for data that cannot be easily scraped. The file should contain a list of entities with the same structure as the database entries. The data can be undertand as stages or oeuvre flavor. This connector checks the file date and compares it to the ingestion date to decide whether to reprocess the file with LLM or not, based on the cache_ttl_hours setting.
manual: # can be named anything connector: manual enabled: true url: file://data/manual_data.yaml llm-processing: true cache_ttl_hours: 168 # 7 days (if the file is updated within this timeframe, it will be reprocessed with LLM)
The required structure of the manual_data.yaml file is as follows:
entities: job1: flavor: stages category: job title: "Software Engineer at XYZ" description: "Worked on developing web applications using Python and React." company: "XYZ" start_date: "2020-01-01" end_date: "2022-12-31" skills: ["Python", "React", "Web Development"] technologies: ["Django", "Node.js", "AWS"] tags: ["Full-stack", "Remote"] project1: flavor: oeuvre category: coding title: "Personal Portfolio Website" description: "A personal website to showcase my projects and skills, built with Next.js and hosted on Vercel." url: "https://myportfolio.com" date: "2021-06-15" skills: ["Web Development", "UI/UX Design"] technologies: ["Next.js", "Vercel"] tags: ["Portfolio", "Open Source"]
Backend
LLM Selection
Option 1: Groq
pip install groq export GROQ_API_KEY=gsk_... # Update config.tech.yaml: llm.backend = groq
Option 2: Ollama
brew install ollama
ollama serve
ollama pull mistral-small:24b-instruct-2501-q4_K_M
# Update config.tech.yaml: llm.backend = ollamaRunning the Server
# Run with auto-reload uvicorn app.main:app --reload # Check health curl http://localhost:8000/health # View API documentation open http://localhost:8000/docs
Chat Connectors
please refer to Connectors on how to connect this MCP server to various chat platforms (e.g., Slack, Discord) for real-time interactions.
API Endpoints
GET /- API indexGET /greeting- Identity cardGET /entities- All entities (paginated)GET /categories- Entity types + countsGET /technology_stack- Technologies usedGET /stages- Career timelineGET /work- Projects & publicationsGET /search?q=...- Full-text searchGET /languages- Translation coverage
Full API docs: http://localhost:8000/docs
Data Storage
db/profile.db- SQLite database with all entities.cache/- Cached API responses and scraped contentdata/linkedin_profile.pdf.yaml- Auto-generated YAML cache for LinkedIn profile (with LLM enrichment)data/Medium.html.yaml- Auto-generated YAML cache for Medium articles (with LLM enrichment)data/*_sitemap.yaml- Optional YAML caches for sitemap sources (ifcache-fileconfigured)data/<source>_export.yaml- Exported entities by source for manual editing
YAML Cache Benefits:
- Manual editing without re-scraping
- Preserves LLM enrichment data (tags, skills, technologies)
- Tracks sync state with
last_syncedmetadata - Atomic writes prevent corruption
- Automatic detection of source file changes (PDF, HTML modification times)
Planned Features
- Add more connectors (e.g., Twitter, YouTube, personal website)
- Allow incoming and outgoing connectings to other meMCP instances to build a network graph of professionals
License
Personal project - configure and use as needed.