Transparent field level encryption for Django with pgcrypto PostgreSQL extension
github.comA patch has been proposed for PostgreSQL to support Transparent Column Encryption natively. It didn't make it into v16, but hopefully, it will be in v17.
https://www.postgresql.org/message-id/89157929-c2b6-817b-602...
Something I keep coming to here, and not seeing easy solns for, is a pure OSS docker-level approach to volume encryption. No wrangling cryptfs in the host, just a special overlay mode that docker & compose understand. Getting it down to a few docker-compose.yml volume annotations like "encrypted: true" seems like it'd open up a lot of users to doing encryption-at-rest without going full k8s. The threat model here is a bit limited vs other approaches, but I'd think can go far for the bulk of pets out there..
I love pgcrypto and Postgres in general. In this case, I'm wondering why you'd perform such widespread encryption on a per-column basis. If there is that much to be encrypted, there's whole-disk encryption at rest. For network transport there's TLS. Many column-level encryptions/decryptions would seem to me to put an undo CPU load on the database instance(s) when that load could be spread horizontally more easily at the app tier.
If you encrypt within Django, all Postgres needs to worry about are bytea columns. If the concern is being able to effectively use the decrypted data in relation joins, I think back again to whole-disk encryption. To use this stuff, it has to be decrypted in memory anyway.
As a thought experiment, you could create expression indexes for fast lookup, but that leads to data leakage through index queries, and you're right back where you started, only with higher CPU load.
For per-user encryption, that also seems best/most flexible at the app tier.
In short, for a limited use case like saving passwords or an opaque data blob, pgcrypto within Postgres makes sense to me. As an overarching whole-database encryption strategy, I'm far less sure of its utility.
So that people handling the servers are not tempted to look at them, so that backup don't contain them in clear text, so that export must explicitly chose to decrypt them or not, etc.
Great callouts. It's about ergonomics, making default actions more safe, and reducing unsafe surface area even if it can't be fully removed.
These are django fields, so you would only use them for your "sensitive columns" like payment information, emails, other rarely used stuff.
I checked the readme and I might have missed it but this doesn't seem to be suggesting you replace every column with these, these are just helpers to make encrypting specific columns easier.
Because it's rarely used columns, or columns you'd need to wait for an external api anyway (email, sms, payment, etc) the performance impact should be minimal. You wouldn't need these fields to be indexed.
The attack surfaces this addresses is the compromise of postgres or it's host, or miss handled backups. Preferably you'd be using this on top of full-disk encryption.
EDIT: The use of PGP is weird to me though, why not AES?
Whats the benefit of this inplace of using bcrypt or py-pgp? I really don understand this trend of piling on abstraction for simple problems. Its basically a hole github project that could be solved in 2 wrapper functions.
Having custom field classes for specific functionality that Just Work (TM) is idiomatic in Django. Be it for input widgets (phone number field and the like) validation and query capabilities (JSONField, JSONBField), 3rd party integrations (S3-backed file storage), if it's stored in the database, sticking it in the ORM just jives with how you use Django.
What are the performance implications of an application that uses this? I'd imagine constantly encrypting and decrypting would increase latency...
At least in my experience you would only encrypt the fields that are sensitive.
So of course anything related to payments, possibly emails/ips, that type of stuff.
In a lot of services you don't usually need to refer to these fields except on the settings pages and possibly a checkout page, things like that. So the Username, preferences, and content of your site will likely be unencrypted.
Because you can avoid using those types of fields on most pageloads/api calls it should have minimal impact, especially since those places often include using an external api you have to wait for anyway (email, checkout, etc).
Looks old and doesn't support a recent version of Django
The company behind this project no longer exists.
I know as I worked there a decade ago :-)
It's OSS - send a PR for "recent version of Django", someone may even merge into upstream.