AI AI Newsletter - Miloslav Homer

14 min read Original article ↗

Keeping up with relevant news is hard and time-consuming. And we know that LLMs are great at summarizing, so I've made an app creating newsletters.

Here's the code, here's the daily example, here's a weekly example. The PoC works, a CLI tool with these features:

  • RSS Feed sources
  • AI filtering for relevancy and summarization into a newsletter
  • Abstract AI client with an example Mistral implementation
  • Daily/weekly reporting
  • Subscriber logic
  • Multiple newsletters support with specific instructions for filtering and summary
  • Sending the issue to emails through mailjet

If you think we've successfully automated journalism, think again. We'd have nothing without quality articles and source curation. Somebody needs to go out there and write/curate.

Is there enough value created so that we can name a price that would cover the tokens/hosting? Is there enough value to also cover the article costs? This produces little value and redirects a lot of it. This might make for a viable business model at the cost of a slow erosion of sources. But that's not a problem for Q2.

pexels-wendelinjacober-1440504

The printing press - how far we've come since then. (by Wendelin Jacober)

This project started with a LinkedIN post, in September 2025. As the competition was live, I held this article in the drawer for some time.

One of the entries for the competition was a newsletter generator that would summarize news into a daily/weekly newsletter. I like challenges, let's get our hands dirty.

I got a week to create a PoC, but I do want to setup myself for success if I'd continue with this. I'll do what I can, hoping to make good decisions along the way. The source code you'll find is mostly handcrafted, but it's definitely not 100% bio organic.

This article aims to explain the solution in detail as well as highlight opportunities for improvement. I am sorry for writing a long doc, I didn't have time to write a short one. There are missing parts, missing tests, weird errors, unhandled exceptions. I know. I know. I was able to do this in about 12 hours total so all of the inhibitions went off as I've sprinted towards the result. Sorry about that.

Installation / Usage

If you prefer to jump straight into it, welcome! Start by cloning the repo from GitHub.

Please prepare a mistral API key into the MISTRAL_AI_KEY (sic) environmental variable. Optionally, if you wish to recieve emails head over to Mailjet and prepare MAILJET_API_KEY and MAILJET_SECRET_KEY env variables.

The solution works with uv. You should just put all the files into a directory, and you should see the CLI help after entering uv run ai-newsletter.

I've tried to do an intuitive CLI, e.g. register a subscriber via uv subscribers register - then you'll be asked for the name and email.

The DB is provided, but you have the option of starting from scratch via the create-db command.

Requirements

This is what we got:

We are looking for creative and clever people who are not afraid to come up with their own ideas and show how they think. It's not just about the result - we are also interested in the path you took to get there. Choose one of the tasks below. A successfully delivered result is considered to be:

  • a summary document that describes the solution and theoretical basis.
  • functional solution (simple frontend, functionality is important over visual effect, navigation through human communication is preferred over buttons.

👉 Therefore, add inspiration from the world to your solution - what interested you, what guided you, and what was your point of view. Your ideas can be the basis for real projects on which we will build the future of artificial intelligence together.

Join us, show your talent and maybe you will become part of the DiusAi team. 🚀

4. News feed (news digest)

Task:

Collect AI and Tech news sources (RSS, articles, blogs) and make a short daily/weekly summary:

  • what happened
  • why is it important
  • ensure automatic sending to selected members via WhatsApp.

Ok, RSS is great for this and I'll try to abuse it maximally. You'd be surprised how often you can find RSS if you dig around enough. There are also services that translates blogs into RSS feeds. I hope that you trust me that this is a solved problem and using RSS only will get us very far.

Web Scraping

Going one step further is web-scraping. In 2025, web scraping is not easy as people have realized they might not want to allow such abuse, especially from bots that don't behave quite well. So I'll attempt to get the data, but I am prepared to fail quite hard if the sites won't allow me. The worst part? It might even work for me and you, but the minute you step into the production it would fail in accordance to Murphy's law. In other words, wrestling with anti-bot solutions is out of scope for me.

Delivery

The requirement I am purposefully ignoring is WhatsApp. Look, I've tried getting started, but it seems that this requires a business and a human review. I don't have the time for this. Another good joke was simple and transparent pricing. It's neither.

If you're thinking about workarounds, do so at your own risk, as highlighed by one of the more popular projects. And indeed, people are having issues. This goes back to the point about web-scraping - it's not 2005 anymore. I want to build a sustainable service, so I am steering clear from sketchy repos. You'll have to trust me that I can integrate with such an API if the need arises.

I am thinking about good old emails. I'll probably grab a free tier of one of the more popular services.

Source Gathering

There is an open question whether to allow automated source (blogs, articles) gathering. I'd advise against this approach, as any user has their own goals and questions. And that's what we want - to help the users.

It's also more time consuming to implement as you'd need to:

  • give AI access to web search, e.g. by MCP,
  • separate good sources from SEO slop,
  • separate good sources from AI slop (new!!),
  • more is not always better - you can get lost in too many sources.

So I am rolling manual source gathering.

Additional Feature: Multiple Newsletters

The feature I am adding is support for multiple newsletters. This has many benefits:

  • I'd like to separate some topics (AI, Cybersecurity, Local news),
  • It enables A/B testing of prompts, sources and approaches,
  • Maybe you can even have a personalized news aggregator for each user,
  • Uh... It's a good idea generally? It is!

So yea, those are the requirements, let's get rolling.

Design and Modelling

This solution wants to be a Django application when it grows up. But growing up takes time and time we don't have so I am rolling a CLI tool. It's mostly about routing and templates. Putting together html templates, views, transfers and URLs takes time from the interesting bits.

I've decided to invest in an ORM layer using SQLAlchemy, as we need to keep track of several entities:

  • A newsletter has a name, instructions/prompt to filter relevant articles and instructions/prompt to summarize what we have.
  • A subscriber also has a name. We also need some ways of delivery.
  • A source has a type (like RSS) as well as a reference (URL since we're talking external sources mostly, but mailing lists might work well too).
  • Newsletter can have many sources and one source might be present in many newsletters. So I have a supporting model called SourceAssignmentModel.
  • Subscribers can subscribe to many newsletters and newsletters have many subscribers. This is captured in SubscriptionModel.

You'll notice that we're missing an Article model. That's on purpose, caching is hard and we probably won't need to revisit sources later anyway. I need to move fast.

The general algorithm is simple. Trigger newsletter creation, check all assigned sources for the given timeframe (day, week), use filter instructions to select only the relevant articles and then grab all articles and use summary instructions to produce a newsletter. Send this newsletter to all subscribers.

A good practice is to separate concerns and I'll be separating the delivery from the summary. That's actually it, there are some options for further optimizations, but there's no time.

Implementation

Python CLI: Click

I am doing what I know best - that's a Python based CLI tool using Click. I am a fan of uv so it's centered around that. Sadly, I suffer from a seemingly incurable Java infection which I caught during my studies, so you'll find an OOP design within.

API and CRUD

CLI API flows from the model design and relevant CRUD operations. Most of the ORM layer is within the Clerk class as I needed something to track all of the boring details. Here's where I was able to maximally utilize AI for coding (check out my custom setup) as it's very boilerplate-y.

Data Sourcing

I've prepared the ground for extending the various source types. That included an abstract data source class, a simple factory to instantiate from DB and an interface.

All I want from a data source is the ability to fetch all of the articles within a given time-frame. I could have fallen into the time-zone rabbithole, but I've decided to avoid it by converting all to UTC and ignoring the rest.

Currently there's only RSS (see above). There's a great python feed parsing lib, called feedparser, so that's included.

AI Providers

Using the same OOP approach, I've abstracted and AI provider and I've provided a sample implementation for... Mistral. I started with them because I wanted to support EU company, and I had no reason to switch yet.

It should be quite easy to roll in other providers and play around with them. But I am not doing that right now, as I need to have something that works.

Agent

Finally, the interesting part. As always, let's start simple and add complexity when needed. I'll create the newsletter in two steps. First, I'll filter relevant articles from the heap, then I'll compile them into the newsletter. If the content is too long, it's better to summarize it, so that's another AI step to do.

I've given the agent two AI clients - a big one for newsletter generation and a small one for filtering and summarizing. This resulted in significant cost savings, one newsletter costs me around 5 cents now and I didn't notice a massive drop in the result quality.

As you can see, I've used simple, but direct prompts. The solution is also ready to use different agents and the agents are ready to use different providers.

I was thinking about naive RAG, but as I've seen before it somewhat falls apart when dealing with subtlety. I was also thinking about prioritization, but I'll be blunt with you - I'm now sitting in a train on Thu 2nd Oct coming back from a teambuilding. I can't really tinker with this more with this time constraint.

Practical Problems

Right, so the PoC is somewhat working, let's try a more robust run.

Curating Sources

As mentioned, we need to curate our sources. Here's a set I'm starting with:

Not all sources are created equal (BBC is more reputable than reddit). I was thinking about a reputation system, where you'd assign a weight to a source to select the best articles for newsletter. Sadly, I've got no time, let's jot it down as a potential improvement. In the meantime I'll disable low-quality sources.

I am also sad to disable the Arxiv source - there are so many articles published daily. SO MANY. I've got no doubt that the quality is quite high (ok, maybe some doubt), but still, if you're that hardcore AI enthusiast, you probably need to filter them down anyway.

Where that leaves me? Traditional journalism (BBC, ArsTechica) and reputable forums (Hacker News front page). So obviously, I wasn't able to make journalists obsolete within a week. I still see added value in this solution.

You'll notice that I was able to get quite far with RSS - most sites support RSS directly, even if you have to dig around for it. Anthropic is then an example of external RSS feed - parsing the site into a feed by a 3rd party. Not stable, but serviceable in this PoC.

Datetimes and Timezones

As we want recent news, we have to deal with timezones. If you don't know, timezones are hard.

But also, time representations can vary, so I've included some quick-n-dirty timezone best effort parsing.

Caching

As I was watching the solution at work, I've noticed how much data I am shuffling. Of course, it would be better to cache it for performance reasons.

But I also see potential in caching for content reasons - sometimes you discover a story early and the rest of the sources only replay it. By comparing against older articles you might serve better content.

The issue is, that this doesn't scale well as your heap of read articles skyrockets. There are still tricks - keeping only a fixed window is one of them.

Sending

I've created an abstract distributor class that takes the issue to send and a list of subscribers. One instance I've implemented is a Mailjet distributor. What can I say - they have an API and I've called it sucessfully. Check the example email.

AI Cost

There's one little silly business issue with this product and that is that you're paying for every token that you "read". And since we want to read a lot of news, the cost of generating these newsletters might not be trivial.

All of these experiments cost me around 3 EUR. But that was only one week of fiddling with the solution.

I am not sure how much would people pay for such a service, maybe there is a sweet spot to hit.

There are possibilities of improvements though - using a smaller, cheaper model for reading/filtering through papers yields massive savings and the result seems to be ok. We then can use the big model for the newsletter generation from the already processed sources. One set of reading for the same newsletter now costs me 0.05 EUR, which is much more pleasant.

Closing Thoughts

This was a real sprint for me. I got in only around 3 hours of coding per day as I have work and family duties and I've learned quite late about the competition.

I am happy that I was able to put the pieces together to generate at least something. There are plenty opportunities for improvement. Plenty.

Hope you've enjoyed the writeup. And let me thank dius.ai for the challenge, I had loads of fun.

Looking back on this from 2026, I see plenty opportunities for improvement. Hosting it live, perhaps setting it up so that anyone can put their sources for a personalized newsletter. Refactoring the LLM calls with something more general and robust, maybe even enabling users to select their providers and models.

There's a case to be made for vibecoding this. And I agree - I might even try to replicate this, but now purely with vibes. Oh well. Keep in mind that I've done this in September 2025, Claude code was still quite new.

I still don't think that source curation and the general theme of newsletters should be left to AI. I am reading a newsletter because it curates news with particular values within them. Maybe AI could capture this, but you need to explicitly tell it.

Then there's the question of sustainable business. Generating newsletters isn't free, one such newsletter now costs me approx 5 cents. But there's a catch - we got nothing without external articles. We're taking these for free now. I'm not even sure if this solution can pay for itself in the tokens means. Paying for licenses of articles seems to be out of the question entirely. I really don't envy journalists in 2026.