GitHub Next | Visualizing a Codebase

4 min read Original article ↗

Imagine this: you’re looking at a new codebase, and you want to find the code for a specific function.

For example, in the create-react-app codebase,

How quickly can you find a test for react-dev-utils?

facebook/create-react-app

That wasn’t terribly difficult, but it also probably took a bit of time and exploration. Can we do better?

Instead of the typical folders & files view, we can create a visual representation of the code. Below, I've visualized the same repository, but instead of a directory structure, each file and folder as a circle: the circle’s color is the type of file, and the circle’s size represents the size of the file.

testtesttaskstaskspackagespackagesdocusaurusdocusaurusfixturesfixturesreact-scriptsreact-scriptsreact-error-overlayreact-error-overlayreact-dev-utilsreact-dev-utilscreate-react-appcreate-react-appcra-template-typescriptcra-template-typescriptcra-templatecra-templatewebsitewebsitedocsdocsjsconfigjsconfig__shared____shared__scriptsscriptsconfigconfigsrcsrcfixturesfixtures__tests____tests__templatetemplatetemplatetemplatestaticstaticsrcsrcsrcsrcutilutilutilsutilstemplatetemplateutilsutilseffectseffectscontainerscontainerscomponentscomponents__tests____tests__srcsrcpublicpublicsrcsrcpublicpublicpagespages.css.html.js.json.md.png.scss.sh.svg.ts.tsx

each dot sized by file size

This visualization gives enough of a “fingerprint” that viewers can glance at it and see the structure of the codebase. When we look at several codebases side-by-side, we can see how much variety there is between them:

paperjs/paper.js

Folders and files within the paperjs repo

numpy/numpy

Folders and files within the numpy repo

deepmind/alphafold

Folders and files within the alphafold repo

metafizzy/zdog

Folders and files within the zdog repo

Once you’re familiar with the visual language, it becomes much easier to see similarities, differences, and patterns across codebases.

Explore for yourself!

Try it out for yourself! Check out your own repositories or ones you’re curious about.

You can also create a direct link to your own repository.

But this website isn’t part of our current workflow - how could we integrate this visualization so that it becomes familiar enough to supplement our daily work?

Integrate into your own projects

If we add the diagram to our README, we can see it every time we work on the codebase. This kind of regular viewing can make us familiar with the shape of our codebase, giving us a baseline to detect and understand large changes in structure.

To make this easy to integrate, I built a GitHub Action to generate a diagram, and update it every time the codebase changes.

To use it, you just need to:

  1. create a new GitHub Action by adding a .yml file inside the .github/workflows directory. For example: .github/workflows/create-diagram.yml

  2. add the actions/checkout and githubocto/repo-visualizer Actions

  3. Add the diagram image to your README: ![Visualization of the codebase](./diagram.svg)

  4. Once you push, you can watch the Action run in the Actions tab of your repository. Within a minute, you should have a visualization of your codebase in your README. Watch it update whenever the code is updated!

You can see an example of this in action in the githubocto/repo-visualizer-demo repository. Read more and check out the code at githubocto/repo-visualizer.

Potential future directions

I timeboxed my exploration, but there are many ways to continue to exploring this space. A few in particular stood out to us as useful.

What files are connected?

When developing within a repo, it’s important to know how data flows from one file to the next. What files are imported into others, and what files stand alone?

To find these connections, I scanned the contents of each file for import statements, then linked that file with the one it imports from. There are often too many connections happening at once, so I only show connections from & to a file on hover.

Let’s look how a few React.js animation libraries are organized.

Where are changes made?

So far, I’ve only looked at file size and type, but there are many other metrics that can tell us about our codebases.

For example, where in the codebase are the most recent changes? This could be helpful for quickly getting up-to-date after a break, or to see which parts of the codebase are being neglected.

In the useHooks.ts codebase, we can see which hooks were most recently edited (useLocalStorage, useCounted, & useBoolean), and which parts haven’t changed recently (the favicon, legacy code, and the useScript hook).

Git log

Git log

Or, we could look at what files change the most often. This could be helpful for finding the most important files to keep an eye on, or for finding stale code.

For example, in the d3-geo codebase, the README file is always being updated, as well as the index.js file that imports all of the projections.

How has a codebase changed?

Now that we know our way around this visualization, we can start looking at changes over time. How has the structure grown over time? Does the code get updated one section at a time, or all at once?

Feedback

This is really the tip of the iceberg! I’ve taken an initial peek into how visualizing codebases could be helpful for developers day-to-day. We would love to see other explorations or hear your thoughts. Tweet us at @GitHubNext or send us an email at next@github.com.

✌️ ❤️

GitHub Next
Developer Experience Team