Exploring Databases Visually

di.nmfay.com

146 points by dmfay 5 years ago · 28 comments

Reader

I implemented a DB visualisation tool like this for sqlite3, also using graphviz. It's called, unimaginatively, sqlite3todot [1]

The couple tricks it has that I really like are a) it shows the table itself [which isn't uncommon], but also b) has the ability to group up tables, which for me turns out to be a devastatingly killer feature that no other tool has.

When I'm working on a schema [especially one I've not seen before], one of the things that drives me insane is not being able to say "these six go together. Group them". Then you can simply look at arrows that enter or leave that grouping for your next exploration step. By the time I'm done, anything not-categorised stands out [it's simply in the space between groups]; any boxes that have an excess of arrows between them can mean my mental model is wrong [or the schema has some nonoptimalities].

There's an example here; it only has two groupings ["input" and "output"], but it immediately gets the point across: https://icculus.org/~chunky/stuff/sqlite3_example/sqlitesche...

[1] https://github.com/chunky/sqlite3todot

tejtm 5 years ago

Not sure I would have made mine if I had come across yours first.
https://github.com/TomConlin/SQLiteViz
- mro_name 5 years ago
  
  wondering finished. Thx!
  - tejtm 5 years ago
    
    I doubt it is "finished" in that it has only been asked to parse a case or two and if it broke I would have to choose between extending it or just trying the GPs implementation.
mro_name 5 years ago

I wonder if there's something like this operating on the DDL statements file. Anybody?
- chunkyks 5 years ago
  
  Just the ddl? Just execute it into a temporary database.
  - mro_name 5 years ago
    
    possible, yes. I'd prefer without the SQLite dependency, however.
    But not parsing SQL is a pro, indeed.
    
    chunkyks 5 years ago
    
    As dependencies go, sqlite isn't a biggie, that's one of its best features

simonw 5 years ago

I've been doing something similar to this using an Observable notebook - this one joins the sqlite_master table against the pragma_foreign_key_list() SQLite function and uses the returned data to show a DOT diagram:

https://observablehq.com/@simonw/datasette-table-diagram

It defaults to showing the tables from my GitHub example database but you can paste in the URL to another Datasette to see that one visualized instead - try https://latest.datasette.io/fixtures for example

tracyhenry 5 years ago

As part of my research, I'm also building a new UI for exploring a database using data visualizations and an ER diagram. Here is a half-baked demo: http://mondial.kyrixdemo.live/

Hopefully the UI is self-explanatory - if not try clicking on visual objects to perform drill down, or search something in the top righthand corner.

The underlying dataset is a public DBMS called MONDIAL: https://www.dbis.informatik.uni-goettingen.de/Mondial/mondia...

najarvg 5 years ago

This is fantastic, thanks for sharing! The unique ability to drill down into connected tables and visualize the relationships and data within is unique (for me at least). Are you building this as a personal tool or building it in a manner to apply on any database? Would love to know more
- tracyhenry 5 years ago
  
  It's intended to support general relational databases. But right now the code is kind of hacky with little documentation. We wrote a paper on it but it's still under review.
  You can checkout our Github repos if you are interested: https://github.com/tracyhenry/kyrix https://github.com/tracyhenry/kyrix-j
victor106 5 years ago

This is cool. The node label should display the name of the table for some reason it seems to be missing.
Is this open source?
- tracyhenry 5 years ago
  
  Yep currently you need to hover over a node to see the table name. We need to find a way to show the labels without generating too much clutter.
  Also, it is open source: https://github.com/tracyhenry/kyrix https://github.com/tracyhenry/kyrix-j
  - victor106 5 years ago
    
    Thank you.

meesterdude 5 years ago

I'm building a 3D database that lets you explore graph and relational data that you've created. For research or creating your own mind palaces - you can freely join & filter collections of information. But it's more like a 3D excel of sorts.

ichverstehe 5 years ago

I'd love to test this, at any stage. Email in profile.
johnknowles 5 years ago

I'm curious what you're using to build this. I've been playing around with a similar idea, and have been prototyping in Godot. Visually representing the connections/structures between items has been a sticking point, but I've settled on two main abstractions: stacks and surfaces, mapping to 1D and 2D arrangement. I've been hesitant to explore 3D, as it unlocks many novelties in the design of a micro-world; but I do think that a 3D space brings a lot of our instincts from the physical world into digital ones. Would be curious to hear how you're approaching the problem, if you feel like sharing.
- meesterdude 5 years ago
  
  For what i'm after, 3D is the only way to go. I'm building it in unity so you can store markdown, webpages, video and images within 3D space and relate them to each other. (so not really a tool for visualizing a production DB or anything)

pm90 5 years ago

Isn't this what schemaspy does? http://schemaspy.org/

dmfayOP 5 years ago

That looks like another full-scale ERD generator, with much more detailed structural information too. Those make useful references, but what I'm trying to reach is a way of understanding parts of a relationship graph better and more quickly by navigating it from the inside with the terminal as a REPL of sorts (or, if you like, a very limited "parser" as in interactive fiction or a MUD) instead of the holistic and potentially overwhelming birds-eye view of an ERD.

_wldu 5 years ago

I love graphs (it was my favorite topic in algorithms) and I think they can be applied to many subjects to help us understand and visualize problems and systems better. In particular, I think graphs could really help network security monitoring efforts to quickly identify anomalies.

Also, graphviz is really cool.

JoelJacobson 5 years ago

Foreign Keys are great!

Too bad they were not invented until SQL-89, and could thus not be considered when the JOIN syntax we all know was designed and formalized by ANSI in SQL-86.

Imagine if we could simply give the name of a foreign key, as an argument to JOIN, letting the query planner look up the foreign key columns, to do the join on.

The default names for foreign keys are not very user-friendly, due to the SQL standard unfortunate decision to require the foreign key name to be unique per namespace, and not just per table, which would have made more sense.

In PostgreSQL, foreign keys just need to be unique per table, so there we could give them names such as "customer" instead of "rental_customer_id_fkey".

Imagine if we instead of:

  SELECT
    language.name,
    COUNT(*)
  FROM rental
  JOIN customer
    ON customer.customer_id = rental.customer_id
  JOIN address
    ON address.address_id = customer.address_id
  JOIN city
    ON city.city_id = address.city_id
  JOIN country
    ON country.country_id = city.country_id
  JOIN inventory
    ON inventory.inventory_id = rental.inventory_id
  JOIN film
    ON film.film_id = inventory.film_id
  JOIN language
    ON language.language_id = film.language_id
  WHERE country.country = 'Sweden'
  GROUP BY language.name

Would write something like:

  SELECT
    language.name,
    COUNT(*)
  FROM rental
  JOIN rental->rental_customer_id_fkey AS customer
  JOIN customer->customer_address_id_fkey AS address
  JOIN address->address_city_id_fkey AS city
  JOIN city->city_country_id_fkey AS country
  JOIN rental->rental_inventory_id_fkey AS inventory
  JOIN inventory->inventory_film_id_fkey AS film
  JOIN film->film_language_id_fkey AS language
  WHERE country.country = 'Sweden'
  GROUP BY language.name

After renaming the foreign keys:

  ALTER TABLE rental RENAME CONSTRAINT rental_customer_id_fkey TO customer;
  ALTER TABLE rental RENAME CONSTRAINT rental_inventory_id_fkey TO inventory;
  ALTER TABLE customer RENAME CONSTRAINT customer_address_id_fkey TO address;
  ALTER TABLE address RENAME CONSTRAINT address_city_id_fkey TO city;
  ALTER TABLE city RENAME CONSTRAINT city_country_id_fkey TO country;
  ALTER TABLE inventory RENAME CONSTRAINT inventory_film_id_fkey TO film;
  ALTER TABLE film RENAME CONSTRAINT film_language_id_fkey TO language;

We could write this as:

  SELECT
    language.name,
    COUNT(*)
  FROM rental
  JOIN rental->customer
  JOIN customer->address
  JOIN address->city
  JOIN city->country
  JOIN rental->inventory
  JOIN inventory->film
  JOIN film->language
  WHERE country.country = 'Sweden'
  GROUP BY language.name

And if allowing such a "foreign key operator" to be chained, we could write:

  SELECT
    language.name,
    COUNT(*)
  FROM rental
  JOIN rental->customer->address->city->country
  JOIN rental->inventory->film->language
  WHERE country.country = 'Sweden'
  GROUP BY language.name

This is similar to "4.9 Reference types" in the SQL standard ISO/IEC 9075-2:2016(E), but it wouldn't require a separate REF column, it would merely use the existing foreign keys which we already have in well designed proper database schemas. We would just need to give them better names.

Q: How would we name foreign keys if there are two going to the same table?

A: Imagine having a "users" table with two columns "child_user_id" and "parent_user_id", both referencing "users". The foreign keys on such columns could simply be named "child" and "parent".

steve-chavez 5 years ago

> Too bad they were not invented until SQL-89, and could thus not be considered when the JOIN syntax we all know was designed and formalized by ANSI in SQL-86.
TIL!
> After renaming the foreign keys
The idea about using FK as a JOIN target is interesting. While developing a syntax for PostgREST resource embedding[1], I also reached the conclusion that FKs would be a convenient way to join tables(also suggested renaming them as you do here).
IIRC, self joins are still an issue with FK joining.
[1]: https://postgrest.org/en/v7.0.0/api.html#embedding-disambigu...
- JoelJacobson 5 years ago
  
  Hi Steve,
  Many thanks for making PostgREST, I'm using it for all my projects!
  I'm trying to understand the problem you see with self joins. Why would it be a problem? At the end of my comment, I gave an example with a users table, for such a table, an expression such as "FROM users JOIN users->parent" would join-in users two times, but two different rows of users, since "users->parent" would follow the foreign key "parent".
dmfayOP 5 years ago

You may be interested in the USING clause SQL-92 added for foreign keys where the referencing columns have the same name as the referenced columns: JOIN b USING (a_id). The only major RDBMS that doesn't support it is SQL Server afaik.
- JoelJacobson 5 years ago
  Yes, I know about USING, and you probably know about the one big issue it has; if you add a column name with the same name to some table included, then you get an error due to the ambiguity.
  Example:
  SELECT language.name, COUNT(*) FROM rental JOIN customer USING (customer_id) JOIN address USING (address_id) JOIN city USING (city_id) JOIN country USING (country_id) JOIN inventory USING (inventory_id) JOIN film USING (film_id) JOIN language USING (language_id) WHERE country.country = 'Sweden' GROUP BY language.name
  Works fine, but if we do
  ALTER TABLE rental ADD address_id integer;
  Then we get
  ERROR: common column name "address_id" appears more than once in left table
  The foreign key based join approach doesn't suffer from this problem, since there which columns to join on is explicit and stable.

Settings

Exploring Databases Visually

Keyboard Shortcuts