Exploring Databases Visually
di.nmfay.comI implemented a DB visualisation tool like this for sqlite3, also using graphviz. It's called, unimaginatively, sqlite3todot [1]
The couple tricks it has that I really like are a) it shows the table itself [which isn't uncommon], but also b) has the ability to group up tables, which for me turns out to be a devastatingly killer feature that no other tool has.
When I'm working on a schema [especially one I've not seen before], one of the things that drives me insane is not being able to say "these six go together. Group them". Then you can simply look at arrows that enter or leave that grouping for your next exploration step. By the time I'm done, anything not-categorised stands out [it's simply in the space between groups]; any boxes that have an excess of arrows between them can mean my mental model is wrong [or the schema has some nonoptimalities].
There's an example here; it only has two groupings ["input" and "output"], but it immediately gets the point across: https://icculus.org/~chunky/stuff/sqlite3_example/sqlitesche...
Not sure I would have made mine if I had come across yours first.
wondering finished. Thx!
I doubt it is "finished" in that it has only been asked to parse a case or two and if it broke I would have to choose between extending it or just trying the GPs implementation.
I wonder if there's something like this operating on the DDL statements file. Anybody?
Just the ddl? Just execute it into a temporary database.
possible, yes. I'd prefer without the SQLite dependency, however.
But not parsing SQL is a pro, indeed.
As dependencies go, sqlite isn't a biggie, that's one of its best features
I've been doing something similar to this using an Observable notebook - this one joins the sqlite_master table against the pragma_foreign_key_list() SQLite function and uses the returned data to show a DOT diagram:
https://observablehq.com/@simonw/datasette-table-diagram
It defaults to showing the tables from my GitHub example database but you can paste in the URL to another Datasette to see that one visualized instead - try https://latest.datasette.io/fixtures for example
As part of my research, I'm also building a new UI for exploring a database using data visualizations and an ER diagram. Here is a half-baked demo: http://mondial.kyrixdemo.live/
Hopefully the UI is self-explanatory - if not try clicking on visual objects to perform drill down, or search something in the top righthand corner.
The underlying dataset is a public DBMS called MONDIAL: https://www.dbis.informatik.uni-goettingen.de/Mondial/mondia...
This is fantastic, thanks for sharing! The unique ability to drill down into connected tables and visualize the relationships and data within is unique (for me at least). Are you building this as a personal tool or building it in a manner to apply on any database? Would love to know more
It's intended to support general relational databases. But right now the code is kind of hacky with little documentation. We wrote a paper on it but it's still under review.
You can checkout our Github repos if you are interested: https://github.com/tracyhenry/kyrix https://github.com/tracyhenry/kyrix-j
This is cool. The node label should display the name of the table for some reason it seems to be missing.
Is this open source?
Yep currently you need to hover over a node to see the table name. We need to find a way to show the labels without generating too much clutter.
Also, it is open source: https://github.com/tracyhenry/kyrix https://github.com/tracyhenry/kyrix-j
Thank you.
I'm building a 3D database that lets you explore graph and relational data that you've created. For research or creating your own mind palaces - you can freely join & filter collections of information. But it's more like a 3D excel of sorts.
I'd love to test this, at any stage. Email in profile.
I'm curious what you're using to build this. I've been playing around with a similar idea, and have been prototyping in Godot. Visually representing the connections/structures between items has been a sticking point, but I've settled on two main abstractions: stacks and surfaces, mapping to 1D and 2D arrangement. I've been hesitant to explore 3D, as it unlocks many novelties in the design of a micro-world; but I do think that a 3D space brings a lot of our instincts from the physical world into digital ones. Would be curious to hear how you're approaching the problem, if you feel like sharing.
For what i'm after, 3D is the only way to go. I'm building it in unity so you can store markdown, webpages, video and images within 3D space and relate them to each other. (so not really a tool for visualizing a production DB or anything)
Isn't this what schemaspy does? http://schemaspy.org/
That looks like another full-scale ERD generator, with much more detailed structural information too. Those make useful references, but what I'm trying to reach is a way of understanding parts of a relationship graph better and more quickly by navigating it from the inside with the terminal as a REPL of sorts (or, if you like, a very limited "parser" as in interactive fiction or a MUD) instead of the holistic and potentially overwhelming birds-eye view of an ERD.
I love graphs (it was my favorite topic in algorithms) and I think they can be applied to many subjects to help us understand and visualize problems and systems better. In particular, I think graphs could really help network security monitoring efforts to quickly identify anomalies.
Also, graphviz is really cool.
Foreign Keys are great!
Too bad they were not invented until SQL-89, and could thus not be considered when the JOIN syntax we all know was designed and formalized by ANSI in SQL-86.
Imagine if we could simply give the name of a foreign key, as an argument to JOIN, letting the query planner look up the foreign key columns, to do the join on.
The default names for foreign keys are not very user-friendly, due to the SQL standard unfortunate decision to require the foreign key name to be unique per namespace, and not just per table, which would have made more sense.
In PostgreSQL, foreign keys just need to be unique per table, so there we could give them names such as "customer" instead of "rental_customer_id_fkey".
Imagine if we instead of:
SELECT
language.name,
COUNT(*)
FROM rental
JOIN customer
ON customer.customer_id = rental.customer_id
JOIN address
ON address.address_id = customer.address_id
JOIN city
ON city.city_id = address.city_id
JOIN country
ON country.country_id = city.country_id
JOIN inventory
ON inventory.inventory_id = rental.inventory_id
JOIN film
ON film.film_id = inventory.film_id
JOIN language
ON language.language_id = film.language_id
WHERE country.country = 'Sweden'
GROUP BY language.name
Would write something like: SELECT
language.name,
COUNT(*)
FROM rental
JOIN rental->rental_customer_id_fkey AS customer
JOIN customer->customer_address_id_fkey AS address
JOIN address->address_city_id_fkey AS city
JOIN city->city_country_id_fkey AS country
JOIN rental->rental_inventory_id_fkey AS inventory
JOIN inventory->inventory_film_id_fkey AS film
JOIN film->film_language_id_fkey AS language
WHERE country.country = 'Sweden'
GROUP BY language.name
After renaming the foreign keys: ALTER TABLE rental RENAME CONSTRAINT rental_customer_id_fkey TO customer;
ALTER TABLE rental RENAME CONSTRAINT rental_inventory_id_fkey TO inventory;
ALTER TABLE customer RENAME CONSTRAINT customer_address_id_fkey TO address;
ALTER TABLE address RENAME CONSTRAINT address_city_id_fkey TO city;
ALTER TABLE city RENAME CONSTRAINT city_country_id_fkey TO country;
ALTER TABLE inventory RENAME CONSTRAINT inventory_film_id_fkey TO film;
ALTER TABLE film RENAME CONSTRAINT film_language_id_fkey TO language;
We could write this as: SELECT
language.name,
COUNT(*)
FROM rental
JOIN rental->customer
JOIN customer->address
JOIN address->city
JOIN city->country
JOIN rental->inventory
JOIN inventory->film
JOIN film->language
WHERE country.country = 'Sweden'
GROUP BY language.name
And if allowing such a "foreign key operator" to be chained,
we could write: SELECT
language.name,
COUNT(*)
FROM rental
JOIN rental->customer->address->city->country
JOIN rental->inventory->film->language
WHERE country.country = 'Sweden'
GROUP BY language.name
This is similar to "4.9 Reference types" in the SQL standard ISO/IEC 9075-2:2016(E), but it wouldn't require a separate REF column, it would merely use the existing foreign keys which we already have in well designed proper database schemas. We would just need to give them better names.Q: How would we name foreign keys if there are two going to the same table?
A: Imagine having a "users" table with two columns "child_user_id" and "parent_user_id", both referencing "users". The foreign keys on such columns could simply be named "child" and "parent".
> Too bad they were not invented until SQL-89, and could thus not be considered when the JOIN syntax we all know was designed and formalized by ANSI in SQL-86.
TIL!
> After renaming the foreign keys
The idea about using FK as a JOIN target is interesting. While developing a syntax for PostgREST resource embedding[1], I also reached the conclusion that FKs would be a convenient way to join tables(also suggested renaming them as you do here).
IIRC, self joins are still an issue with FK joining.
[1]: https://postgrest.org/en/v7.0.0/api.html#embedding-disambigu...
Hi Steve,
Many thanks for making PostgREST, I'm using it for all my projects!
I'm trying to understand the problem you see with self joins. Why would it be a problem? At the end of my comment, I gave an example with a users table, for such a table, an expression such as "FROM users JOIN users->parent" would join-in users two times, but two different rows of users, since "users->parent" would follow the foreign key "parent".
You may be interested in the USING clause SQL-92 added for foreign keys where the referencing columns have the same name as the referenced columns: JOIN b USING (a_id). The only major RDBMS that doesn't support it is SQL Server afaik.
Yes, I know about USING, and you probably know about the one big issue it has; if you add a column name with the same name to some table included, then you get an error due to the ambiguity.
Example:
Works fine, but if we doSELECT language.name, COUNT(*) FROM rental JOIN customer USING (customer_id) JOIN address USING (address_id) JOIN city USING (city_id) JOIN country USING (country_id) JOIN inventory USING (inventory_id) JOIN film USING (film_id) JOIN language USING (language_id) WHERE country.country = 'Sweden' GROUP BY language.name
Then we getALTER TABLE rental ADD address_id integer;
The foreign key based join approach doesn't suffer from this problem, since there which columns to join on is explicit and stable.ERROR: common column name "address_id" appears more than once in left table