Settings

Theme

GraphQL: A data query language

code.facebook.com

274 points by dschafer 11 years ago · 83 comments

Reader

TheMakeA 11 years ago

Every time there's a new post about GraphQL, I become even more concerned that "GraphQL" was the wrong name to use publicly. It seems like most people assume it's just another combination of two buzzwords, Graph and QL, and incorrectly pattern match it.

In an attempt to help resolve this issue, I suggest that you think of it as ProductQL or ProductAPI instead.

It's not a storage layer. It's not really a query language. It's an alternative way to define an API that more closely matches the typical mental and product domains than REST.

It's critical to realize that you're implementing an API. GraphQL does not concern itself with sorting or fetching data as the name might lead you to initially believe. It has little to actually do with graphs (except they are more easily expressed). It is designed to be added on top of your application layer and business logic to provide a single, well defined API for your product and tools.

It primarily targets product developers. From the scheme/API side, it is up to the server developer to decide what to expose and how, based on the product and any technical considerations that need to be made.

  • dwenzek 11 years ago

    No, I disagree. GraphQL is definitely a query language, since it allows to specify pieces of data to be extracted from a whole data set.

    Compared to SQL, this is done using an unusual manner, giving a shape of the data we want around some seed, but the approach is both expressive and efficient. In light of this query principle, it is perhaps unfortunate that GraphQL use a different syntax for the query and the response. To this regard, freebase's MQL is more pure (http://wiki.freebase.com/wiki/MQL).

    About graph, I think the term applies too. Indeed, a GraphQL defines a subgraph to be extracted from a whole graph, even if the tree shape of the response forces to resort to ids to tie the loop.

    Compared to a graph traversal query language like Gremlin (https://en.wikipedia.org/wiki/Gremlin_%28programming_languag...), I find the pattern approach of GraphQL simpler to grasp; and even easier to implement efficiently in a distributed setting. I implemented years ago a distributed graph database for Yahoo!; and the query language, based on shapes to be extracted around seeds, was a key design choice for an efficient architecture and short query delays.

    • TheMakeA 11 years ago

      Let me clarify a little: I think the name GraphQL for sure makes sense from a technical point of view. But from a marketing point of view, it has been a nightmare and makes it _unusually_ difficult for others to initially grasp.

      • dwenzek 11 years ago

        So what you mean is that name of GraphQL minimizes one of its important aspect: the ability to query a service/product/app whatever are the actual persistence mechanisms and even so if part of the response is computed on the fly.

        Now, I better understand, why you would prefer a name like ProductQL or even ProductAPI in order to emphasise the will for this tool to be the single point of interaction from the outside with a product.

  • cbsmith 11 years ago

      In an attempt to help resolve this issue, I suggest that you think of it as ProductQL or ProductAPI instead.
      
      It's not a storage layer. It's not really a query language. It's an alternative way to define an API that more closely matches the typical mental and product domains than REST.
    
    That is a tremendous and useful insight.
  • devit 11 years ago

    I think "ObjectQL" or "NestedRPC" might be more appropriate: it's an IDL and protocol for a nested RPC facility for object-oriented systems.

    You start with a root query object and then call a bunch of methods on it of the client's choice and put the results in a dictionary.

    Then you do that recursively on the resulting objects until you have data consisting of nested dictionaries and lists containing only primitives, which is finally serialized as JSON and returned.

  • andy_ppp 11 years ago

    GraphQL:

    1) Give me data in the right shape for my problem please server.

    2) Server responds with data in a shape that matches what it was asked for.

    3) Update the store of your frontend and trigger any updates to UI (react does this automatically).

    Save countless hours of writing fine grained rest APIs to support various different ways of requesting data or munging data into the right format on the client.

    And by the way we've made it super easy to bundle requests together in a nice way saving trips to the server.

  • adriancooney 11 years ago

    I'd have to disagree in that it's not a query language, there is definitely ability to query different parts of the data. I'm just unsure where the Graph part of the name came from. I was under the impression it was some play on the words sounding like "graphical" since the appearance of the query language in code is a very close depiction of the actual data returned.

    • TheMakeA 11 years ago

      > I'd have to disagree in that it's not a query language, there is definitely ability to query different parts of the data

      For sure. If I could still edit the post, I would probably change it to be something closer to "it's not really a general purpose query language like the name might imply"

      It's definitely a query language, but folks start looking for the sorting/filtering/etc and get confused.

  • Amrinder12345 11 years ago

    Every time there's a new post about GraphQL

    • eggie 11 years ago

      And with good reason. I work on graph data structures and have had four people suggest I check out "GraphQL" for my work. Alas it has almost nothing to do with graphs. Every time I read about it I feel that Facebook tech has jumped the shark. I'll go back to my terascale rank/select dictionaries now...

      • aikah 11 years ago

        > Every time I read about it I feel that Facebook tech has jumped the shark

        No, Facebook does it on purpose, just like React has nothing to do with functional reactive programming. They are talented , not idiots , they know that a catchy name is useful to promote their tech even if it is misleading.

devit 11 years ago

How do GraphQL implementations avoid denial of service?

In other words, what stops anyone from easily disabling a website by making a few parallel extremely complex GraphQL requests that consume all CPU and I/O, and perhaps result in holding some locks for a very long time?

In normal APIs you can make sure most endpoints are cheap to run, and throttle, secure or otherwise control the ones that must be expensive, but that doesn't work if you expose a flexible layer like GraphQL (or SQL).

  • dschaferOP 11 years ago

    FB's GraphQL APIs are only used by our first-party applications; for third-party APIs (where you don't control the callers), it definitely gets trickier for the reasons you note. One option would be to do some analysis of the query in advance (effectively, assign each field a "cost"), and reject queries that have too high of a cost. The "cost" metric could basically be "around how many objects will this return", so in the

      user {friends {friends {friends {friends{id}} } } }
    
    case (which is the canonical example in FB's schema of a crazy query), we would note that there's a 5000 friend limit, and so that query would potentially query 5000^4 = 6.25e14 nodes, and based on that we would (hopefully) reject it.
    • scrollaway 11 years ago

      This is a concern even for first-party apps, as you are not secure from a malevolant client. Or hell, your own client could have bugs which create some insanely expensive queries on, say, 1% of your devices - didn't catch it in QA, end up pushing it to millions of device for a nice ddos.

    • devit 11 years ago

      Surely you know that anyone can access the binary code of your first-party applications (using a jailbroken iPhone and a rooted Android device), decompile it (using jd-gui, the Hex-Rays ARM decompiler, etc.) and arbitrarily use the APIs they expose, right?

  • lhorie 11 years ago

    The Youtube API has a concept of quotas, where requesting some types of information is more costly than others. I imagine you could implement a similar system that additively increases the cost of a query as it grows in complexity and fails if the quota is exceeded.

  • nstart 11 years ago

    It's the same here. Limit number of results to 10 by default (for example) and allow query to specify number if you need more. Then limit the number to a certain amount

    • devit 11 years ago

      There are more ways in GraphQL to create huge result sets than that though.

      For example, a query like "user {moviesWatchedByUser {usersWhoWatchedMovie {moviesWatchedByUser {usersWhoWatchedMovie ..." is allowed by GraphQL and will generate output with size exponential in the input size.

      You can also do "{a1: expensiveOperation, a2: expensiveOperation, a3: expensiveOperation, ..." and trigger expensiveOperation an arbitrary number of times (for each item in the list you apply that to).

      By using a sequence of fragments that include the next fragment more than once, it looks like you can trigger expensiveOperation an exponential number of times.

      It's not clear if there is a way to prevent all this without severely impacting usabilty (by warping the schema design and adding GraphQL limits to handle this) or reliability (by enforcing hardcoded low resource usage limits).

      • jon-wood 11 years ago

        Personally I'd probably enforce this at the level of whatever services the GraphQL layer is calling, assuming you're using it as an aggregation layer for lower level services within your organisation.

        Otherwise, it should be possible to apply throttling to (for example) expensiveOperation in the same way that you would a RESTful API at the moment.

  • mandeepj 11 years ago

    You have to write a functionality for this scenario. Similarly You can also ask how Sql server is going to avoid DOS attacks?

  • thomasahle 11 years ago

    Perhaps you just analyze the queries and deny the ones that look too expensive?

rattray 11 years ago

I keep hearing people say things along the lines of, "Relay looks great, but I don't want to store my data in a graph". My impression is that GraphQL has nothing to do with graphs, really - it's a bit more like SchemaQL if anything. Could someone from the facebook team clarify about the name?

  • dschaferOP 11 years ago

    Yep, GraphQL is agnostic as to how your data is stored. For example, https://github.com/graphql/swapi-graphql/ is a GraphQL schema that is backed by the swapi.co API. The examples at https://github.com/graphql/graphql-js/blob/master/src/__test... are backed by in-memory JSON objects. At Facebook, we have GraphQL types backed by data stored in a number of backends, including types backed by SQL tables.

    • iandanforth 11 years ago

      I'd like to read more about the backing data stores. If you're aggregating data across a lot of different stores, it seems you could easily add what looks like a tiny piece of data to your query, but, in truth, is much more expensive on the backend.

      • TheMakeA 11 years ago

        It's important to realize that you're essentially defining an API for your product. If something is expensive to access, you should either not expose it directly through your API, or add appropriate levels of caching to mitigate the costs.

  • nathancahill 11 years ago

    Even if your data isn't stored in a graph, it can be queried hierarchically, like a graph. Hence the name, GraphQL (as opposed to GraphDB).

  • forgotAgain 11 years ago

    It seems that the graph part is referring to how the input is evaluated one node at a time and applied to the output. The sub objects (child nodes) of the input are evaluated and then applied to the output. It's a graph in the sense that it's a tree being evaluated and built. I would think it's pretty optimal if you're using nosql stores that are cached heavily with good locality of data and you can send the request to the right server based on the first node of the input.

  • pluma 11 years ago

    It's not about graph databases but it is about graph data.

    But at the end of the day, most data can be considered graph data. It doesn't have to be represented as a formal graph.

Novex 11 years ago

Version Free sounds amazing - I can see the pros of being able to add/deprecate fields at the same API endpoint, but I find most of the reason we version our API is for field type changes as the data schema naturally evolves. We need to keep track of finer grained data in existing fields that wasn't originally thought of.

To take the Star Wars example at https://github.com/facebook/graphql and build on it. Let's say after this is deployed we need to expose a Planet's population as well. Now homePlanet goes from being a String to a Planet { name population } object.

This type change would break existing clients - the only real solution I can think of is introducing the planet object as PlanetDetails (essentially PlanetV2) and deprecating planet, but that's just back to versioning.

I feel like there must be a better way to deal with it? Interestingly, the graphql format allows this to be differentiated (as the old API won't request an object), but there appears to be no provision to union two non-objects into a single field?

  • xxbondsxx 11 years ago

    Sure there are strong similarities to versioning, but I think the difference is in how callers get migrated. In GraphQL you can update each callsite incrementally. Imagine half the app has been converted to calling PlanetDetails {name} and the other half still calls homePlanet. That's totally fine -- the app will totally work, compile, run, everything.

    Whereas contrasting with REST versioning or traditional versioning, it becomes quite difficult to mix API versions internally (each callsite needs to specify their desired api version before specifying fields) or impossible outright. If the latter, you're then forced to migrate all at once from a given version to another, which requires a ton of coordination across teams and big scary "flip the switch" moments.

    This gets worse as you scale up your org, which is why GraphQL has served FB well.

  • Jweb_Guru 11 years ago

    The most interesting work on this subject that I've read is probably FQL's approach: http://categoricaldata.net/fql/tutorial.pdf#subsection.2.3

    It's totally unclear to me that GraphQL solves data migration problems.

  • dschaferOP 11 years ago

    Yeah, introducing a new field and deprecating the old one seems like the best option here. The nice thing is that while this introduces new functionality, there's still only one version of the server; if you query for `homePlanet`, you always get the name, if you query for `homePlanetDetails`, you always get the planet object. This is particularly useful for tooling, since the API response is a function only of the access token and the query.

bobbylox 11 years ago

Why invent syntax when you can specify the queries in JSON, too? MQL did it, and it was amazing: http://wiki.freebase.com/images/e/e0/MQLcheatsheet-081208.pd...

  • leebyron 11 years ago

    The GraphQL language syntax is useful because it naturally expresses more patterns than JSON can. However GraphQL is eventually parsed into an AST which is represented as JSON, it's always possible to write that JSON directly.

    Ultimately syntax is useful to express domain-specific concepts in a terse way.

  • dwiel 11 years ago

    I still dream of the day that freebase's graphd or something like it is released. That was a really nice eco-system.

Geee 11 years ago

How about optimistic over-fetching? On Facebook it's very annoying to wait earlier comments to load when they are fetched like 10 at a time, and there's 5 second latency on every request. Why not fetch everything with a single request (even when they aren't displayed in the current view)?

  • catshirt 11 years ago

    responsibility of the client, no? to dictate how many results to return.

    • Geee 11 years ago

      Ugh.. My comment was meant in the Relay thread, but anyway, it sounds like Relay/GraphQL is designed to not fetch anything that isn't actually rendered. In my own SPA apps, I very frequently over-fetch to reduce latency. I was hoping there would be a nice compatible way to manage this use case.

      I'm not too familiar how this works, but maybe it's enough to query the additional data in parent component or somewhere and it just works.

      • TheMakeA 11 years ago

        To expand on lgas' answer, you can certainly over fetch with Relay/GraphQL if your product needs to.

        The intent is to prevent accidental/unintentional over or under fetching that leads to bugs or poor experiences.

      • lgas 11 years ago

        It fetches whatever you ask for.

jaked89 11 years ago

What about inequality operators (>, >=, <, <=)?

What about complex predicates (and, or)?

The language seems rather limited.

  • dschaferOP 11 years ago

    When building out a GraphQL schema, the schema developer chooses which functionality to expose to the client. So rather than having the client do operations or predicates directly, the server declares what functionality is available, and might expose functionality that ordinarily would have used operators or predicates.

    For example, we might have the following query on Facebook's GraphQL schema:

      {
        user(id: 4) {
          followers(isViewerFriend: true, birthdaysInRange: {before: -2, after: 2} orderBy:NAME) {
            name
          }
        }
      }
    
    EDIT: fix code formatting

    Which fetches Zuck's followers, and filters it to only my friends, and only those friends whose birthdays are within two days of today, and then orders them by name.

    The `isViewerFriend`, `birthdaysInRange` and `orderBy` parameters were explicitly added to the API by the API developer for clients to use.

    So clients don't have the ability to do arbitrary operators, but we also know that the client is only using functionality in the API that the API developer chose explicitly to allow.

    • 15155 11 years ago

      Any idea when this will work with Relay?

      • masterj 11 years ago

        This is how Relay currently works

        • 15155 11 years ago

          You can't presently pass other filtering arguments to a relay query. Just IDs.

          Unless I am missing something, the above query wouldn't work in Relay.

          • TheMakeA 11 years ago

            You totally can. One restriction (that you might be confused with) is that Relay currently only supports root fields with a single argument (which would typically be an ID).

            But arbitrary arguments on any other field are totally supported.

thomasahle 11 years ago

Are there still problems with GET requests not allowing very much data in the request? Will I have to send this as a POST, or is that all 200x?

grandalf 11 years ago

This looks cool but I've grown to really like Cypher. Anyone know if graphql is similarly expressive?

  • thomasahle 11 years ago

    Cypher looks a lot like a form of SQL? At least if you are refering to this one http://neo4j.com/developer/cypher-query-language/ It seems an entirely different beast. Not like something you'd want clients to have access to in a public api?

  • Zaheer 11 years ago

    GraphQL is more akin to a REST pattern and has nothing to do with the actual datastore.

    • grandalf 11 years ago

      True but doesn't facebook use it with a graph db? The convention is to replace REST with something that maps naturally to data stored in graph... sort of a structural match query that expects certain kinds of nodes/relationships (unless you manually map it to a relational or document db).

aaroninsf 11 years ago

Why mirror the structure, but not syntactic details, of JSON (or YML) in the pretty printing?

It's true that we've passed peak colon as a culture, and need to start reimagining life in a post-colonial way, but...

  • dschaferOP 11 years ago

    GraphQL queries are hierarchical so that the response mirrors the structure of the query. We found that there were needs of the query (query parameters and directives, for example) that didn't feel ideally represented in JSON, which is why we have a different syntax.

    We've got a reference lexer and parser in JS at https://github.com/graphql/graphql-js/tree/master/src/langua..., and we have a parser in C++ with C and C++ APIs (that can be used to build a parser for other languages) at https://github.com/graphql/libgraphqlparser.

    • TheMakeA 11 years ago

      Anecdote: when Relay/GraphQL were first announced, we tried to get Relay without GraphQL by writing JSON. This had some advantages (a query was a valid/renderable response!)

      ...but we had some really ugly JSON. It was worth it for us to get the readable syntax to just use GraphQL.

      With client tooling like GraphiQL and editor plugins, it should just get better.

muruke 11 years ago

I started working on something very similar for .NET and EntityFramework 7 based on some other ideas I've implemented over the years.

https://github.com/lukemurray/EntityQueryLanguage

Super early days as I haven't had too much time on it, and now GraphQL has specs etc. I might support more of it's syntax.

I actually build .NET expression so you can execute things against any LINQ provider - in-memory, Entity Framework, or some other ORM

hokkos 11 years ago

With what DB is there implementations ? It seems for now most of the implementations are in memory. The postgres direct sql seems dead, there is only a javascript mongo mongoose one it seems.

orclev 11 years ago

This works fine for read only (I.E. query interface), but what about for non-idempotent endpoints? It seems like you'd need to provide BOTH a REST API and a GraphQL API, GraphQL for doing queries, and REST for everything else. Admittedly the query endpoints tend to be the ugliest ones in a REST API, but I'm not sure that ugliness justifies the extra complexity and overhead adopting something like GraphQL would entail (not to mention potential performance and DoS issues others have brought up).

underyx 11 years ago

Pretty off topic, sorry, but I found it pretty interesting that all of the 9 comments posted here so far have a question in them.

graffitici 11 years ago

Want the specification for graphql already released? Are these just the official announcements, after the technical previews?

foo42 11 years ago

This reminds me a little of pattern matching (in my case in Elixir) - I specify an (arbitrarily nested) structure with the keys I want, and the variables I want those values unpacked into the pattern match the data into that. Graphql feels almost like letting me do that from the outside to the inside of my service(s)

foxhedgehog 11 years ago

I wonder how access controls will work in GraphQL/Relay.

  • dschaferOP 11 years ago

    The GraphQL API acts as a layer atop application code; it assumes that the application code takes care of any access controls (since those access controls would apply to anyone querying that data, not just GraphQL). So there's nothing for access control built-in to GraphQL, but GraphQL can map to arbitrary access controls that exist in the application layer.

    The GraphQL server can pass down authentication information through the query using `rootValue` (for example, it might pass the OAuth access token that the client provided in the request), which the mapping from GraphQL-to-application-code can pass to the application code's access controls.

Liron 11 years ago

How does Facebook "subscribe" to their GraphQL queries so that everything in the UI updates in realtime?

Amrinder12345 11 years ago

Good

dreamdu5t 11 years ago

GraphQL is underwhelming and half-baked. Am I missing something? There's nothing stopping people from creating endpoints that serve exactly what the client needs, and you have to implement everything in your application logic anyway to make graphQL work.

Implementing API-specific functions in the query? That looks an awful lot like adhoc RPC endpoints, because the functionality is application-specific.

I just don't get the hype. Poor documentation, no robust reference implementation... nothing novel about the query language itself. Function calls in queries is pretty much just RPC (even if you say "declarative" a lot). And if it's about graph data it's incredibly limited compared to SPARQL.

  • andrewingram 11 years ago

    One of the key problems GraphQL aims to solve is the very idea you present in your first paragraph.

    The proliferation of ad-hoc REST endpoints is a serious problem. Standards like JSON API (http://jsonapi.org/) do a good job of mitigating the issue (and are probably a good first step for preparing your API for being encapsulated within a GraphQL server), but they are just a band-aid around the problem.

    We've all concluded that mobile apps (for now) should only be making one API request to get all data needed (if possible), so we end up constructing custom end-points per use-case that return exactly what's needed. Then a new version of the app comes along with slightly different data requirements, so we make a new end-point. We end up in a really bad place. Developer discipline is part of the problem, but the fact is that we're probably using the wrong technology.

    SOAP was awful, REST is much nicer. But GraphQL is a dream. It genuinely feels like every time I add a new piece of data to my schema, my productivity accelerates. A well-considered, well-maintained GraphQL server is a significant improvement over REST endpoint. GraphQL may not be the ultimate solution here, but it solves a lot of pain-points i've been having for the last few years.

yzh 11 years ago

Shameless self-promotion: Gunrock, a high-performance GPU graph processing library: http://gunrock.github.io/gunrock/

  • mrinterweb 11 years ago

    I'm confused. What does your library have to do with GraphQL?

    • yzh 11 years ago

      We are trying to enable graph query into our library too. Currently working on subgraph matching, but might expand to more general graph query. I think in general, graph with rich attribute is quite difficult to deal with on the GPU. Thanks for your interest.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection