Hotnews
Hotnews is a high-performance news aggregator built with FastAPI for serving web requests. It aggregates news from a curated list of RSS feeds, scores them based on title uniqueness (lower score means more unique/hotter articles), and presents them in a clean interface.
Tech Stack
- Web Framework: FastAPI (for fast asynchronous HTTP handling)
- Templating: Jinja2
- RSS Parsing:
feedparser - Content Extraction:
BeautifulSoup(bs4) with lxml - Scoring: NumPy for similarity calculations
- Data Storage: JSON file (
data/articles.json) - Server: Uvicorn (ASGI server)
- Other Libraries: requests, python-dateutil, tldextract, user-agents
Features
- RSS Aggregation: Fetches articles from a wide range of tech, science, and news sources (configured in
app/settings.py). - Article Scoring: Scores articles based on title similarity to other articles (using sequence matching and mean similarity score).
- Content Extraction: Automatically fetches and extracts the main content/paragraphs of articles using BeautifulSoup.
- Cleanup: Automatically removes articles older than 48 hours to keep the content fresh.
- Views: Multiple views - hottest (lowest score), coldest (highest score), newest articles.
- Individual Article Reading: Dedicated page to read full extracted content.
- About Page: Lists all source sites and article count.
Setup & Installation
-
Clone the repository:
git clone <repository-url> cd hotnews
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Data Directory: The application stores articles in
data/articles.json. Ensure thedata/directory exists or it will be created automatically.
Usage
Running the Web Server
You can run the web server using uvicorn:
uvicorn main:app --reload
Or directly with Python:
The server will start on http://127.0.0.1:8000.
Fetching News
To populate the data with the latest news, run the fetch script:
This script will:
- Fetch entries from all configured RSS feeds.
- Clean up articles older than 48 hours.
- Calculate similarity scores for articles.
- Fetch the full content for new articles.
Note: Run this script periodically (e.g., via cron or a scheduler) to keep the news updated.
Routes
/- Hottest articles (most unique titles)/cold- Coldest articles (least unique titles)/new- Newest articles/read/{id}- Read individual article/about- About page with stats
Configuration
- RSS Feeds: Edit
app/settings.pyto add or remove RSS feeds in theFEEDSlist. - Data File: Change
DATA_FILEinapp/settings.pyto modify the storage location. - Headers: Update
HEADERSfor web scraping requests.