Ask HN: What are some unusual but useful Python libraries you've discovered?
Hey everyone! I'm always on the lookout for new and interesting Python libraries that might not be well-known but are incredibly useful. Recently, I stumbled upon Rich for beautiful console output and Pydantic for data validation, which have been game-changers for my projects. What are some of the lesser-known libraries you've discovered that you think more people should know about? Share your favorites and how you use them! Last year I posted a similar Ask HN;
it got 11 comments: https://news.ycombinator.com/item?id=38505531.
I asked, "What lesser-known Python libraries do you wish people knew about?"
The suggestions there are worth looking up.
Don't miss DiskCache (https://github.com/grantjenks/python-diskcache). I really like xmltodict (https://github.com/martinblech/xmltodict).
Despite the name, it works in both directions.
It is the most ergonomic library I have used for creating XML.
It has external type stubs: https://pypi.org/project/types-xmltodict/. Since you have recently discovered Rich, you may want rich-argparse (https://github.com/hamdanal/rich-argparse).
It colorizes argparse CLIs with little effort from the user. DeepDiff (https://github.com/seperman/deepdiff) has helped me with testing.
I needed to compare two nested data structures but ignore any differences in floats (timestamps).
DeepDiff let me do it: Thanks so much for pzp ! I use https://github.com/litl/backoff for configurable backoff + retry context: OpenAI API used to be super flaky back in the early days, i needed to retry my requests quite frequently and i found this Backoff doesn't look bad. I have used two libraries for retrying in Python: retry (https://github.com/invl/retry) and Tenacity (https://github.com/jd/tenacity).
Tenacity actually helped me with a recent GPT-4o mini experiment,
when OpenRouter gave me an error after a couple hundred requests.
I can recommend Tenacity.
A downside is that the API is fairly complex and verbose.
You will probably look it up each time if you don't use it often (or rely on an IDE or LLM). Backoff seems somewhat less flexible than Tenacity (https://github.com/litl/backoff/issues/125) but more concise.
Basic use requires as little code as retry and should be easy to remember. "stamina" is a wrapper for tenacity with nice defaults I didn't know about stamina.
I like it a lot at first glance.
Thanks for telling me. Compared to Backoff, it has docs, and you don't need to tell it each time you want exponential backoff.
(Exponential backoff is what you usually want when you don't want a fixed delay.)
You can use Tenacity directly when you need something more complex. Happy user of Andrew Moffat's https://sh.readthedocs.io for over a decade. For audio analysis: Audio track seperation: https://github.com/adefossez/demucs demucs works pretty well. Awesome! Thanks. Defopt generates cli from function interface: https://defopt.readthedocs.io/en/stable/features.html The alternative is either you maintain two interfaces with boilerplate, or write a cli only if that’s the first priority. Similar solutions exist, like fire. But fire’s cli is like an afterthought, in the sense it gives people a way to run things in command line where they already know how to run it from Python. I wrote some stuff a while back to try and bridge this kind of issue. I wrote a spec: https://gitlab.com/accidentallythecable-public/argstruct-spe... And additionally, a library to build cli args and flags as well as api data from the same structure: https://gitlab.com/accidentallythecable-public/python-module... This allows you to build an argstruct file specifying commands, ars, descriptions, etc. You can then run an arbitrary callback against the commands, or use argparser for commandline Opensoundscape for sound localization This has been an absolute game changer: https://github.com/pytries/marisa-trie (succinct trie with fast lookup and minimal RAM)
Use case and more details in the wonderful book by Ian and Micha: High Performance Python Exporting datasets - Tablib: https://tablib.readthedocs.io/en/stable/ pg8000 [1] is a Postgres library implemented in pure Python. I've spent enough time trying to get psycopg installed on MacOS and Docker that I appreciate just being able to pip install it at any time. genson for creating JSON schemaa - rich - einops - pytrees - torchinfo - joblib - symbex - nbdev
pzp (https://github.com/andreax79/pzp) is like fzf in pure Python to use in your programs.
Keep in mind it is currently version 0.0.x.
I have found bugs, but I think it is just cool that it exists. diff = DeepDiff(
run_session(config), run_session(config, force=True), exclude_types=(float,)
)
assert not diff