Visualizing Uber and Lyft trips in San Francisco: more than 200K trips a day

38 points by billygoat 9 years ago · 11 comments

Reader

Sure - PostgREST provides a RESTful API endpoint for any PostgreSQL database. It's kinda magic.

It's an interesting beast: it has almost no configuration at all, you just point it to one database schema, and it then uses the postgres permissions system to decide what tables & views to expose, and who gets to see them.

The only part of it that was a little tricky was creating multiple database users for anonymous views off the internet vs. internal staff who would want to do more than just select some rows.

ruslan_talpa 9 years ago

That is not actually correct. PostgREST does not "uses the postgres permissions system to decide what tables & views to expose".
PostgREST exposes everything within a particular schema and then every call gets translaed into a query and executed. It's the database (PostgreSQL) that is deciding if the query will be executed or it will raise a permissions error. PostgREST just kind of says "hey db, this is the current user. Now run this query for him"
From a high level, postgrest is just a pure function that translates a rest call to a sql query and executes it with the privileges of the user that is making the request. The database does all the heavylifting
- billygoatOP 9 years ago
  
  Thanks for the clearer explanation -- you're correct, the database decides that. That's the beauty of it!

byteshift 9 years ago

Cool stuff! We actually built a similar system to analyze the NYC taxi dataset (or any other geospatial dataset). We use the (PostgreSQL wire protocol compliant) HyPerSpace database in the backend [1]. Let us know if you're ever experiencing performance problems with PostGIS.

[1] https://db.in.tum.de/downloads/publications/hyperspace.pdf

billygoatOP 9 years ago

Wow, I'll take a look. Looks awesome! This particular dataset was very small because the agency which did the data collection aggregated everything into neighborhoods before they gave it to me. So, performance hasn't been a problem at all yet. We'll be tackling some larger datasets soon, so we'll see!

peatmoss 9 years ago

Very nice visualization and write-up! The tech details are pretty interesting. In particular, noting the back-end was done up in PostgREST. I've been itching for a project to kick the tires on that for a while.

Any more you can add about the experience of "writing" the back-end in PostgREST if you're hanging out in this thread, Billy?

ruslan_talpa 9 years ago

Can you share what parts of PostgREST way of doing things you found nice and what parts are lacking?

billygoatOP 9 years ago

Nice things: simple tool that does one thing well. As mentioned above, it lets the database itself decide on permissions. If you want SSL support, just put it behind an NGINX reverse proxy. It lets you get creative with queries in the query-string part of your url.
Lacking: I didn't really find anything lacking; it was able to do everything I needed. I'm a pretty basic end-user, not an expert. At first I was confounded by the limitation of one schema per PostgREST instance; but that's easy to work with or work around. I created a single "api" schema which contained nothing but views of tables that lived in other places. You can also just run more than one instance of PostgREST if you want to expose more than one schema.
- ruslan_talpa 9 years ago
  
  For your next PostgREST based project you might want to try https://github.com/subzerocloud/postgrest-starter-kit with the aid of https://github.com/subzerocloud/devtools
- ruslan_talpa 9 years ago
  
  single schema was discussed before. the idea behind it is that it's a simple mental model (what's in the api schema is accessible to http) so it catches errors like exposing things you don't want to especially since the webdev community is not very big on strict database permissions per role.
  One trick though, if you specify the schema as an empty string "" then entities in the query will not be fully qualified, so this will give the search_path the ability to do it's work
- ruslan_talpa 9 years ago
  
  Also, exposing ONLY views and stored procedures is the recommended way.

Settings

Visualizing Uber and Lyft trips in San Francisco: more than 200K trips a day

Keyboard Shortcuts