Gmail to SQLite

github.com

334 points by tehlike 4 days ago


unsnap_biceps - 4 days ago

I'm curious as to why you choose to break out specific headers in the schema.

For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.

If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.

For example

  CREATE TABLE IF NOT EXISTS "messages" (
    "id" INTEGER NOT NULL PRIMARY KEY, -- internal id
    "message_id" TEXT NOT NULL, -- Gmail message id
    "thread_id" TEXT NOT NULL, -- Gmail thread id
    "headers" JSON NOT NULL, -- JSON object of { "header": value },
    "subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
    ...
  );
  CREATE INDEX subjectidx on messages(subject);
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as

  ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
  CREATE INDEX dkimidx on messages(dkim);
  SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
or whatever you want.
terhechte - 4 days ago

I build something to visualize huge amounts of email (such as from Gmail) some years ago:

https://github.com/terhechte/postsack

the_mitsuhiko - 4 days ago

I really lament that you cannot sign in even with an application specific password any more and you need to get an oauth client and go through an oauth flow. It’s my email, but Google takes away an open standard even for myself to access it.

renegat0x0 - 4 days ago

I recently tried to integrate Gmail in my app [0], and I poured too much time on it. I decided it is not worth to support Gmail.

Gmail to SQLite describes 6 steps to get credentials working, but it is not true for me. After 6 steps:

- that Google said that my app was not published, so I published it

- Google said that app cannot be internal, because I am not a workspace user

- for external apps

- then it said I cannot use the app until it is verified

- in verification they wanted to know domain, address, other details

- they wanted to have my justification for scopes

- they wanted to have video explaining how the app is going to be used

- they will take some time to verify the data I provided them

It all looks like a maze of settings, where requiring any of users to go above the hoops required by Google is simply too much.

Links:

[0] https://github.com/rumca-js/Django-link-archive

oulipo - 4 days ago

What's the best open-source GMail backup software that exists? Someone has setup something like that? (also archiving attachments, etc)

TekMol - 4 days ago

Shouldn't this be "imap to sqlite" or something? Why tie it to one specific email provider?

bytter - 4 days ago

Funny... I did the same thing yesterday, just because I wanted to list my recipient emails by domain. Code is awful, but here it is: https://github.com/hugoferreira/gmail-sqlite-db

yread - 4 days ago

Would be nice to enable fulltext search as well

EvanAnderson - 4 days ago

I am reminded a little of the Postgres-backed IMAP server Archiveopteryx: https://github.com/aox/aox

The schema from AOX always looked really good to me, but I never have gotten to really giving it a try. I wanted to use it, primarily, to get analytics about my mail and for search (not a daily-driver IMAP server).

rantingdemon - 3 days ago

Interesting tool. I'm trying it out now. I had to jump through some hoops in Google's admin panel that probably had me creating some OAuth org for my personal account...

It is now syncing my messages, but very slowly. Some Async magic could probably be cool :)

ThinkBeat - 4 days ago

What is the cost for bandwidth here? As someone with a 40GB+ Gmail account, will I get billed for the transfer using this tool?

It is easy to fix though, since I can get Google Take Out (is that the name?) which I think is free and then parse file files once downloaded.

Still using this tool would be faster from a get it going perspective.

shinryuu - 3 days ago

Would have been nice if you supported google takeout with mbox instead.

hamburglar - 4 days ago

This looks great and simple. I’ll likely try it out. Any chance you’re working on including attachment metadata (and/or broken out access to the attachments themselves) in the future?

flas9sd - 4 days ago

having sqlite exporters for platforms is great help for archiving, but also general questions: I used https://github.com/ltdangle/mail2db to see how much mail volume I still receive monthly on a mail account that I want to move away from. A top10 of senders directed my un- and resubscribe actions.

vladgur - 4 days ago

Awesome!

Feature request: parse email content to extract unsubscribe links and allow me to unsubscribe from most frequent senders easily

alimbada - 4 days ago

I did something similar using Got Your Back and some C# hacked together in LinqPad to help me analyse my emails.

1vuio0pswjnm7 - 3 days ago

Wasn't there a period where one could get an XML feed of their Gmail, many years ago.

gitroom - 4 days ago

man, the whole gmail backup mess reminds me why i avoid locking myself into someone else's sandbox. figuring out what actually keeps me sticking with a platform even when i know all the downsides - is it just laziness or something deeper?

noer - 4 days ago

This is just a single table DB though? At that point, why not just export to a csv or dataframe or whatever and leverage analysis packages to analyze whatever you wanted to.

I admittedly might just not have or understand the use case nor have I thought about how large a Gmail account actually is so feel free to ignore if I'm missing something!

jokoon - 4 days ago

I would have preferred a script that parses the mail backup Google sends you.

I think it's a big eml file.

flashblaze - 3 days ago

Is there anything similar written in TypeScript?

pdimitar - 4 days ago

Would love a comparison to gbackup-rs[0].

To me having to install a tool through Python is a show-stopper.

[0] https://github.com/djipko/gbackup-rs

pdyc - 4 days ago

this is great if only there was a tool for whatsapp to sqlite it would make my data so much more useful

curtisszmania - 4 days ago

[dead]

frshOffTheBoat - 4 days ago

[dead]

einpoklum - 4 days ago

Let us stop using GMail:

* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.

* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.

* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.

* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).

* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).

and finally:

* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.