Settings

Theme

Show HN: GitHub Action for repository traffic reporting

github.com

3 points by jgehrcke 3 years ago · 0 comments · 3 min read

Reader

Bonjour! I’d like to share a rather small project which I believe may be of use to some of you.

https://github.com/jgehrcke/github-repo-stats is a GitHub Action that I started building about 1.5 years ago, to overcome the 14-day limitation of the visitor/clone plots shown in GitHub's "Insights" tab.

Demo report: https://jgehrcke.github.io/ghrs-test/jgehrcke/covid-19-germa...

The project has some seemingly happy users and I believe we managed to address some of the more embarrassing bugs. That is, I believe this project is ready for a new wave of feedback. Which is where you come in, hopefully :-).

When I researched the solution space back then I found Actions that relied on an external system to push data to (like S3). I thought that it’s simpler to keep the time series data in the repository that the GitHub Action runs in. Also, I have opinions about data analysis and visualization and so I wanted to take full control of that. The goal from the start was to generate a self-contained report artifact on a daily basis (with good characteristics in terms of sharing and archiving), and to also persist that via git.

Quick technical description: this Action runs once per day. During each run, it

  - fetches traffic stats (visitor count, etc – for the last 14 days) via GitHub’s HTTP API.
  - persists new data points (in the GitHub git repository that this Action runs in!).
  - generates a minimal HTML and PDF report, both of which are also persisted in the repo that this Action runs in.
The HTML/PDF documents can be exposed right away via GitHub pages. The PDF report contains vector graphics so you can zoom in much more than necessary (and if your boss needs a print then this might look good enough). The plots in the report are meant to show a long time frame. For example, here you can see for the last 1.5 years how the daily visitor count (and other metrics) evolved for jgehrcke/github-repo-stats: https://jgehrcke.github.io/ghrs-test/jgehrcke/github-repo-st...

What I find really cool is that there are users who run this Action in a certain repository dedicated for reporting, for many target repositories. How to do that is described in the main README in the section 'Tracking multiple repositories ...'.

Interesting things I learned about through this project:

  - Vega/Altair for plotting
  - https://github.com/bats-core/bats-core for CLI testing
  - GitHub Actions: how to build them, how to do release management. I also learned that testing a GitHub Action is hard, especially given the heterogeneous set of environments that it might run in.
If you want to try this out then besides the main README a good place to start is this tutorial: https://github.com/jgehrcke/github-repo-stats/wiki/Tutorial

All of this was built rather pragmatically, and I want to keep it that way. There is a plethora of smaller and bigger things across all parts that are generally worth improving (docs, plots, architecture, cleanup), but before deciding what to touch next I'd love to hear your ideas. Thanks!

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection