PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky

8 min read Original article ↗
  • 1.
  • 2.

    Best practices arejust boring • Never follow them, try worst practices • Only those practices can really help you to screw the things up most effectively • PostgreSQL consultants are nice people, so try to make them happy

  • 3.

    How it works? •I have a list, a little bit more than 100 worst practices • I do not make this stuff up, all of them are real-life examples • I reshuffle my list every time before presenting and extract some amount of examples • Well, there are some things, which I like more or less, so it is not a very honest shuffle

  • 4.

    0. Do notuse indexes (a test one!) • Basically, there is no difference between full table scan and index scan • You can check that. Just insert 10 rows into a test table on your test server and compare. • Nobody deals with more than 10 row tables in production!

  • 5.

    1. Use ORM •All databases share the same syntax • You must write database-independent code • Are there any benefits, which are based on database specific features? • It always good to learn a new complicated technology

  • 6.

    2. Move joinsto your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level

  • 7.

    2. Move joinsto your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level • Now you only need to implement nested loop join, hash join and merge join as well as query optimizer and page cache

  • 8.

    3. Be intrend, be schema-less • You do not need to design the schema • You need only one table, two columns: id bigserial and extra jsonb • JSONB datatype is pretty effective in PostgreSQL, you can search in it just like in a well-structured table • Even if you put a 100M of JSON in it • Even if you have 1000+ tps

  • 9.

    4. Be agile,use EAV • You need only 3 tables: entity, attribute, value

  • 10.

    4. Be agile,use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type

  • 11.

    4. Be agile,use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • Whet it starts to work slow, just call those four tables The Core and add 1000+ tables with denormalized data

  • 12.

    4. Be agile,use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • Whet it starts to work slow, just call those four tables The Core and add 1000+ tables with denormalized data • If it is not enough, you can always add value_version

  • 13.

    5. Try tocreate as many indexes as you can • Indexes consume no disk space • Indexes consume no shared_bufers • There is no overhead on DML if one and every column in a table covered with bunch of indexes • Optimizer will definitely choose your index once you created it • Keep calm and create more indexes

  • 14.

    6. Always keepall your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all

  • 15.

    6. Always keepall your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space

  • 16.

    6. Always keepall your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data

  • 17.

    6. Always keepall your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data • Solve the problem using partitioning... one partition for an hour or for a minute

  • 18.

    7. Turn autovacuumoff • It is quite auxiliary process, you can easily stop it • There is no problem at all to have 100Gb data in a database which is 1Tb in size • 2-3Tb RAM servers are cheap, IO is a fastest thing in modern computing • Besides of that, everyone likes BigData

  • 19.

    8. Keep masterand slave on different hardware • That will maximize the possibility of unsuccessful failover

  • 20.

    8. Keep masterand slave on different hardware • That will maximize the possibility of unsuccessful failover • To make things worser, you can change only slave-related parameters at slave, leaving defaults for shared_buffers etc.

  • 21.

    9. Put asynchronous replica to remote DC • Indeed! That will maximize availability!

  • 22.

    9. Put asynchronous replica to remote DC • Indeed! That will maximize availability! • Especially, if you put the replica to another continent

  • 23.

    10. Reinvent Slony •If you need some data replication to another database, try to implement it from scratch

  • 24.

    10. Reinvent Slony •If you need some data replication to another database, try to implement it from scratch • That allows you to run into all problems, PostgreSQL have had since introducing Slony

  • 25.

    11. Use asmany count(*) as you can • Figure 301083021830123921 is very informative for the end user • If it changes in a second to 30108302894839434020, it is still informative • select count(*) from sometable is a quite light-weighted query • Tuple estimation from pg_catalog can never be precise enough for you

  • 26.

    12. Never usegraphical monitoring • You do not need graphs • Because it is an easy task to guess what was happened yesterday at 2 a.m. using command line and grep only

  • 27.

    13. Never useForeign Keys (Use local produced instead!) • Consistency control at application level always works as expected • You will never get data inconsistency without constraints • Even if you already have a bullet proof framework to maintain consistency, could it be good enough reason to use it?

  • 28.

    14. Always usetext type for all columns • It is always fun to reimplement date or ip validation in your code • You will never mistakenly convert ”12-31-2015 03:01AM” to ”15:01 12 of undef 2015” using text fields

  • 29.

    15. Always useimproved ”PostgreSQL” • Postgres is not a perfect database and you are smart • All that annoying MVCC staff, 32 bit xid and autovacuum nightmare look like they look because hackers are oldschool and lazy • Hack it in a hard way, do not bother yourself with submitting your patch to the community, just put it into production • It is easy to maintain such production and keep it compatible with ”not perfect” PostgreSQL upcoming versions

  • 30.

    16. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails)

  • 31.

    16. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout

  • 32.

    16. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout • Anyway, you can just start transaction and go away for weekend

  • 33.

    17. Load yourdata to PostgreSQL in a smart manner • Write your own loader, 100 parallel threads minimum

  • 34.

    17. Load yourdata to PostgreSQL in a smart manner • Write your own loader, 100 parallel threads minimum • Never use COPY - it is specially designed for the task

  • 35.

    18. Even ifyou want to backup your database... • Use replication instead of backup

  • 36.

    18. Even ifyou want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup

  • 37.

    18. Even ifyou want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script

  • 38.

    18. Even ifyou want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script • As complicated as possible, combine all external tools you know

  • 39.

    18. Even ifyou want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script • As complicated as possible, combine all external tools you know • Never perform a test recovery

  • 40.
  • 41.

    Questions or ideas?Share your story! ik@postgresql-consulting.com (I’am preparing this talk to be open sourced)