GitHub - commitspark/graphql-api: GraphQL API to manage structured data in a Git repository

8 min read Original article ↗

Introduction

Commitspark is a set of tools to manage structured data with Git through a GraphQL API.

This library provides the GraphQL API that allows reading and writing structured data (entries) from and to a Git repository.

Queries and mutations offered by the API are determined by a standard GraphQL type definition file (schema) inside the Git repository.

Entries (data) are stored using plain YAML text files in the same Git repository. No other data store is needed.

Installation

There are two common ways to use this library:

  1. By making GraphQL calls directly to the library as a code dependency in your own JavaScript / TypeScript / Node.js application.

    To do this, simply install the library with

    npm i @commitspark/graphql-api
  2. By making GraphQL calls over HTTP to this library wrapped in a webserver or Lambda function.

    You can find an example Node.js Express web server implementation here.

Installing Git provider support

This library is agnostic to where a Git repository is stored and relies on separate adapters for repository access. To access a Git repository, use one of the pre-built adapters listed below or build your own using the interfaces in this repository.

Adapter Description Install with
GitHub Provides support for Git repositories hosted on github.com npm i @commitspark/git-adapter-github
GitLab (SaaS) Provides support for Git repositories hosted on gitlab.com npm i @commitspark/git-adapter-gitlab
Filesystem Provides read-only access to files on the filesystem level npm i @commitspark/git-adapter-filesystem

Building your GraphQL API

Commitspark builds a GraphQL data management API with create, read, update, and delete (CRUD) functionality that is solely driven by data types you define in a standard GraphQL schema file in your Git repository.

Commitspark achieves this by extending the types in your schema file at runtime with queries, mutations, and additional helper types.

Let's assume you want to manage information about rocket flights and have already defined the following simple GraphQL schema in your Git repository:

# commitspark/schema/schema.graphql

directive @Entry on OBJECT

type RocketFlight @Entry {
    id: ID!
    vehicleName: String!
    payloads: [Payload!]
}

type Payload {
    weight: Int!
}

At runtime, when sending a GraphQL request to Commitspark, these are the queries, mutations, and helper types that are added by Commitspark to your schema for the duration of request execution:

schema {
    query: Query
    mutation: Mutation
}

type Query {
    everyRocketFlight: [RocketFlight!]
    RocketFlight(id: ID!): RocketFlight
    _typeName(id: ID!): String
}

type Mutation {
    createRocketFlight(id: ID!, data: RocketFlightInput!, commitMessage: String): RocketFlight
    updateRocketFlight(id: ID!, data: RocketFlightInput!, commitMessage: String): RocketFlight
    deleteRocketFlight(id: ID!, commitMessage: String): ID
}

input RocketFlightInput {
    vehicleName: String!
    payloads: [PayloadInput!]
}

input PayloadInput {
    weight: Int!
}

Making GraphQL calls

Let's now assume your repository is located on GitHub and you want to query for a single rocket flight.

The code to do so could look like this:

import { createAdapter } from '@commitspark/git-adapter-github'
import { createClient } from '@commitspark/graphql-api'

const gitHubAdapter = createAdapter({
    repositoryOwner: process.env.GITHUB_REPOSITORY_OWNER,
    repositoryName: process.env.GITHUB_REPOSITORY_NAME,
    accessToken: process.env.GITHUB_ACCESS_TOKEN,
})

const client = await createClient(gitHubAdapter)

const response = await client.postGraphQL(
    process.env.GIT_BRANCH ?? 'main',
    {
        query: `query ($rocketFlightId: ID!) {
          rocketFlight: RocketFlight(id: $rocketFlightId) {
            vehicleName
            payloads {
              weight
            }
          }
        }`,
        variables: {
            rocketFlightId: 'VA256',
        }
    },
)

const rocketFlight = response.data.rocketFlight
// ...

Technical documentation

createClient()

This function is used to create a Commitspark GraphQL API client instance.

Argument gitAdapter expects a Commitspark git adapter instance which is then used by the client to access the adapter's Git repository.

Client

postGraphQL()

This function is used to make GraphQL requests.

Request execution is handled by ApolloServer behind the scenes.

Argument request expects a conventional GraphQL query and supports query variables as well as introspection.

getSchema()

This function allows retrieving the GraphQL schema extended by Commitspark as a string.

Compared to schema data obtained through GraphQL introspection, the schema returned by this function also includes directive declarations and annotations, allowing for development of additional tools that require this information.

Picking from the Git tree

As Commitspark is Git-based, all GraphQL requests support traversing the Git commit tree by setting the ref argument in library calls to a

  • ref (i.e. commit hash),
  • branch name, or
  • tag name (light or regular)

This enables great flexibility, e.g. to use branches in order to enable data (entry) development workflows, to retrieve a specific (historic) commit where it is guaranteed that entries are immutable, or to retrieve entries by tag such as one that marks the latest reviewed and approved version in a repository.

Writing data

Mutation operations work on branch names only and (when successful) each append a new commit on HEAD in the given branch.

To guarantee deterministic results, mutations in calls with multiple mutations are processed sequentially (see the official GraphQL documentation for details).

Data model

The data model (i.e. schema) is defined in a single GraphQL type definition text file using the GraphQL type system.

The schema file must be located at commitspark/schema/schema.graphql inside the Git repository (unless otherwise configured in your Git adapter).

Commitspark currently supports the following GraphQL types:

  • type
  • union
  • enum

Data entries

To denote which data is to be given a unique identity for referencing, Commitspark expects type annotation with directive @Entry:

directive @Entry on OBJECT # Important: You must declare this for your schema to be valid

type MyType @Entry {
    id: ID! # Important: Any type annotated with `@Entry` must have such a field
    # ...
}

Note: As a general guideline, you should only apply @Entry to data types that meet one of the following conditions:

  • You want to independently create and query instances of this type
  • You want to reference or link to an instance of such a type from multiple other entries

This keeps the number of entries low and performance up.

Entry storage

Entries, i.e. instances of data types annotated with @Entry, are stored as .yaml YAML text files inside folder commitspark/entries/ in the given Git repository (unless otherwise configured in your Git adapter).

The filename (excluding file extension) constitutes the entry ID.

Entry files have the following structure:

metadata:
  type: MyType # name of type as defined in your schema
  referencedBy: [ ] # array of entry IDs that hold a reference to this entry
data:
#   ... fields of the type as defined in your schema

Serialization / Deserialization

References

References to types annotated with @Entry are serialized using a sub-field id.

For example, consider this variation of our rocket flight schema above:

directive @Entry on OBJECT

type RocketFlight @Entry {
    id: ID!
    operator: Operator
}

type Operator @Entry {
    id: ID!
    fullName: String!
}

An entry YAML file for a RocketFlight with ID VA256 referencing an Operator with ID Arianespace will look like this:

# commitspark/entries/VA256.yaml
metadata:
  type: RocketFlight
  referencedBy: [ ]
data:
  operator:
    id: Arianespace

The YAML file of referenced Operator with ID Arianespace will then look like this:

# commitspark/entries/Arianespace.yaml
metadata:
  type: Operator
  referencedBy:
    - VA256
data:
  fullName: Arianespace SA

When this data is deserialized, Commitspark transparently resolves references to other @Entry instances, allowing for retrieval of complex, linked data in a single query such as this one:

query {
    RocketFlight(id: "VA256") {
        id
        operator {
            fullName
        }
    }
}

This returns the following data:

{
  "id": "VA256",
  "operator": {
    "fullName": "Arianespace SA"
  }
}

Unions

In our rocket example, let's assume we want to store information about a rocket's stages. Assuming there are two different types of rocket motors for a rocket stage, a stage could be modeled as a GraphQL union type Stage, allowing different concrete types LiquidRocketMotor or SolidRocketMotor to be added to a rocket's stages list:

directive @Entry on OBJECT

type Rocket @Entry {
    id: ID!
    stages: [Stage!]!
}

union Stage =
    | LiquidRocketMotor
    | SolidRocketMotor

type LiquidRocketMotor {
    fuelTemperature: Int!
}

type SolidRocketMotor {
    fuelMass: Int!
}

During serialization, concrete type instances are represented through an additional nested level of data, using the concrete instance's type name as field name:

# commitspark/entries/VA256.yaml
metadata:
  type: Rocket
  referencedBy: [ ]
data:
  stages:
    - LiquidRocketMotor:
        fuelTemperature: 21
    - SolidRocketMotor:
        fuelMass: 200000

We can now query for a rocket and its stages like this:

query {
    Rocket(id: "VA256") {
        id
        stages {
            __typename
            ... on LiquidRocketMotor {
                fuelTemperature
            }
            ... on SolidRocketMotor {
                fuelMass
           }
        }
    }
}

This returns the following schema-conformant result data where the additional level of nesting has been transparently removed:

{
  "id": "VA256",   
  "stages": [
     {
        "__typename": "LiquidRocketMotor",
        "fuelTemperature": 21
     },
     {
        "__typename": "SolidRocketMotor",
        "fuelMass": 200000
     }
  ]
}

Error handling

Instead of throwing errors, this library catches known error cases and returns error information for GraphQL calls via the errors response field. The type of error is indicated in error field extensions.code, with additional information in error field extensions.commitspark (where available). This allows API callers to determine the cause of errors and take appropriate action.

Example GraphQL response with error:

{
  "errors": [
    {
      "message": "No entry with ID \"SOME_UNKNOWN_ID\" exists.",
      "extensions": {
        "code": "NOT_FOUND",
        "commitspark": {
          "argumentName": "id",
          "argumentValue": "SOME_UNKNOWN_ID"
        }
      }
    }
  ]
}

The following error codes are returned together with error codes of Git adapters as documented here:

Error code Description
BAD_USER_INPUT Invalid input data provided by the caller
NOT_FOUND Requested resource (entry, type, etc.) does not exist
BAD_REPOSITORY_DATA Data in the repository is malformed or invalid according to schema
BAD_SCHEMA Schema definition is malformed or invalid
IN_USE Entry cannot be deleted because it is referenced by other entries
INTERNAL_ERROR Internal processing error

License

The code in this repository is licensed under the permissive ISC license (see LICENSE).