GitHub - sadreck/Butler: GitHub Workflows Insights

If you have 2,000 repositories in your organisation, Butler can help you to identify:

All workflows & actions
All 3rd party actions, including unpinned & unpinnable actions
All reusable workflows
- Active workflows referencing reusable workflows/actions from archived repos
Usage of missing actions and/or references to invalid tags/branches
All runners across workflows, including unsupported ones
All organisation & repo secrets and variables
- Secrets & variables usage
- Usage of secrets: inherit across workflows
Workflows and actions that have invalid yaml files

Samples

Click here for sample reports for organisations like GitHub, OpenAI, Docker, AWS Labs - not mobile friendly.

Screenshots

Usage

GitHub Tokens

Permissions

Scope	Permission	Classic PAT	Fine-Grained Token	GitHub App
Repo	`Agent Secrets`		Optional	Optional
Repo	`Agent Variables`		Optional	Optional
Repo	`Contents`		Required	Required
Repo	`Dependabot Secrets`		Optional	Optional
Repo	`Secrets`		Optional	Optional
Repo	`Variables`		Optional	Optional
Org	`Agent Secrets`		Optional	Optional
Org	`Agent Variables`		Optional	Optional
Org	`Dependabot Secrets`		Optional	Optional
Org	`Secrets`		Optional	Optional
Org	`Variables`		Optional	Optional
N/A	`repo`	Required
N/A	`admin:org`	Optional

Installation

# Create virtual environment
python3 -m venv venv
. venv/bin/activate
pip3 install -r requirements.txt

Usage

By default, Butler reads the PAT from the GITHUB_TOKEN environment variable.

Default Environment Variable

export GITHUB_TOKEN=ghp_wpB...

Using a Different Variable

export MY_TOKEN=ghp_wpB...

# Pass name via --token
python butler.py [...] --token "MY_TOKEN"

Using Multiple GitHub Tokens

export GITHUB_TOKEN_1=ghp_aaa...
export GITHUB_TOKEN_2=ghp_aaa...
...
export GITHUB_TOKEN_N=ghp_aaa...

python butler.py [...] --token "GITHUB_TOKEN_*"

Using a GitHub App

export GITHUB_APP_KEY=$(cat /path/to/gh-app-key.pem)

# Pass key to --gh-app-key
python butler.py [...] --gh-app-key "GITHUB_APP_KEY" --gh-app-installation-id "1234567" --gh-app-client-id "Iv23liR6..."

Workflow Data Collection

The first step is to collect all workflows and actions from repositories.

--repo REPO           Target formatted as: org, org/name, or org/name@branch. To load targets from file use an absolute path or a path starting with ./
--workflow WORKFLOW   Download specific workflows, extension is optional
--database DATABASE   Path to SQLite database to create or connect to
--resume-next         Resume downloads on server errors
--all-branches        Download all branches, only works with --repo
--all-tags            Download all tags, only works with --repo
--include-forks       Include forked repos when --repo is an org
--include-archived    Include archived repos when --repo is an org
--all-repos           Download all repos, including archived and forks
--threads THREADS     Enable multithreading
--verbose, -v         Debug output
--very-verbose, -vv   Trace output

Download Entire Org

python butler.py download --repo "microsoft" --all-repos --threads 10 --very-verbose --database microsoft.db

Download Single Repo

python butler.py download --repo "microsoft/vscode" --very-verbose --database microsoft-vscode.db

Download All Tags/Branches for a Repo

python butler.py download --repo "microsoft/vscode" --very-verbose --database microsoft-vscode.db --all-branches --all-tags

Organisation & Repository Secret Collection

This feature is optional and requires additional permissions (see table above), ideally a GitHub App installed in the Org.

--org ORG             Organisation to download secrets and variables for
--database DATABASE   Path to SQLite database to create or connect to
--resume-next         Resume downloads on server errors
--threads THREADS     Enable multithreading

Example

python butler.py secrets_and_vars --org "microsoft" --database ./data/microsoft.db --very-verbose --gh-app-key ...

Data Processing

Once all workflows are collected they need to be processed.

--database DATABASE   Path to SQLite database to create or connect to
--threads THREADS     Enable multithreading
--verbose, -v         Debug output
--very-verbose, -vv   Trace output

Example

python butler.py process --database ./microsoft.db --threads 10 --very-verbose

Report Generation

Finally, generate a report to view the results.

--database DATABASE   Path to SQLite database to create or connect to
--repo REPO           Repo to generate report from
--output OUTPUT       Location to store output files
--config CONFIG       Configuration file (defaults to default_config.yaml)
--custom-query-path CUSTOM_QUERY_PATH
                    Path to custom query yaml files

Default Report

python butler.py report --database ./microsoft.db --output ./report --repo "github"

Use Custom Configuration

By default, the configuration used for generating reports is .src/commands/report/default_config.yaml. To use a custom version use the --config argument.

python butler.py report --database ./microsoft.db --output ./report --repo "github" --config ./custom-config.yaml

Custom Queries

Default queries are stored in ./src/commands/report/queries, to write custom queries use this guide.

python butler.py report --database ./microsoft.db --output ./report --repo "github" --custom-query-path ./my-queries

Writing Custom Queries

For the custom query reference click here

# Only v2.0 is supported.
version: '2.0'
# Name of query, will appear as the hyperlink/title in the report.
name: 'Usages of Workflows in Archived Repos'
# Short description, will appear under the hyperlink/title in the report.
description: 'Usage of archived workflows and actions from non-archived ones'
# CSV/HTML filename that results will be written to.
filename: 'archived-workflows-usage'
# Group under which these results will appear in the report, supported values are:
#   * actions
#   * hygiene
#   * runners
#   * secrets
#   * workflows
group: 'workflows'
# SQL query, filtering by the organisation the report is being generated for can use the :org placeholder.
sql: |
# Filter by org.
SELECT * FROM organisations WHERE id = :org;

# Filter by trusted orgs.
SELECT * FROM organisations WHERE id NOT IN (:org, $_TRUSTED_ORGS_$)

# Filter by runners.
SELECT * FROM job_data WHERE jd.value NOT IN($_UNSUPPORTED_RUNNERS_$)
# The keys to columns are the names that are returned from the query.
columns:
# Hide a column.
org_name: hide

# This column must be in the results, like "SELECT name AS repo_name FROM repositories"
repo_name:
  # Table header label.
  label: 'Repository'
  # Result values will be URL links and use the value of whichever column the 'link' property points to.
  type: 'link'
  link: 'repo_url'
  # Filtering available for the column, available values are:
  #   * list: Display a list of all values and allow text searches.
  #   * list-no-search: Display a list of all values and disable text searches.
  filters:
    column_control_alias: 'list'
archived:
  label: 'Archived'
  # Value alignment, bootstrap class and one of:
  #   * text-start (default)
  #   * text-center
  #   * text-end
  align: 'text-center'
  # Display an icon based on a 1 or 0 value.
  type: 'icon'
  format:
    # Bootstrap class when value is 1.
    style_true: 'text-warning'
    # Bootstrap class when value is 0.
    style_false: 'text-info'
category:
  # Map query raw values to hardcoded ones, '_' is a catch-all/default value (when omitted the raw column value will be displayed)
  value_mapping:
    actions: 'Actions'
    agents: 'Agents'
    dependabot: 'Dependabot'
    _: 'Default'
  popup:
    # The values will have a link which will show a popup with whichever values appear in wherever the 'field' property is pointing.
    # Values must be comma-separated and will be displayed as a list.
    title: 'Repositories'
    field: 'selected_repos'