GitHub - lucianmarin/hotnews: Daily hottest news aggregator.

Hotnews is a high-performance news aggregator built with FastAPI for serving web requests. It aggregates news from a curated list of RSS feeds, scores them based on title uniqueness (lower score means more unique/hotter articles), and presents them in a clean interface.

Tech Stack

Web Framework: FastAPI (for fast asynchronous HTTP handling)
Templating: Jinja2
RSS Parsing: feedparser
Content Extraction: BeautifulSoup (bs4) with lxml
Scoring: NumPy for similarity calculations
Data Storage: JSON file (data/articles.json)
Server: Uvicorn (ASGI server)
Other Libraries: python-dateutil, tldextract, user-agents

Features

RSS Aggregation: Fetches articles from a wide range of tech, science, and news sources (configured in app/settings.py).
Article Scoring: Scores articles based on title similarity to other articles (using sequence matching and mean similarity score).
Content Extraction: Automatically fetches and extracts the main content/paragraphs of articles using BeautifulSoup.
Cleanup: Automatically removes articles older than 48 hours to keep the content fresh.
Views: Multiple views - hottest (lowest score), coldest (highest score), newest articles.
Individual Article Reading: Dedicated page to read full extracted content.
About Page: Lists all source sites and article count.

Setup & Installation

Clone the repository:
```
git clone <repository-url>
cd hotnews
```

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Data Directory: The application stores articles in data/articles.json. Ensure the data/ directory exists or it will be created automatically.

Usage

Running the Web Server

You can run the web server using uvicorn:

uvicorn main:app --reload

Or directly with Python:

The server will start on http://127.0.0.1:8000.

Fetching News

To populate the data with the latest news, run the fetch script:

This script will:

Fetch entries from all configured RSS feeds.
Clean up articles older than 48 hours.
Calculate similarity scores for articles.
Fetch the full content for new articles.

Note: Run this script periodically (e.g., via cron or a scheduler) to keep the news updated.

Routes

/ - Hottest articles (most unique titles)
/cold - Coldest articles (least unique titles)
/new - Newest articles
/read/{id} - Read individual article
/about - About page with stats

Configuration

RSS Feeds: Edit app/settings.py to add or remove RSS feeds in the FEEDS list.
Data File: Change DATA_FILE in app/settings.py to modify the storage location.
Headers: Update HEADERS for web scraping requests.