Settings

Theme

Ask HN: How do you discover features in unknown code bases?

26 points by hvasilev 3 years ago · 25 comments (24 loaded) · 1 min read


I'm realizing that one of the reasons why I don't do a lot of additional hobby programming is because I'm missing a fundamental skillset that I never developed (in a reliable way) over the years. I think I don't know how to discover features that I'm interested in, in code bases that I'm unfamiliar with. Example: In Chromium I want to find the algorithm that is building the DOM. I'm not sure if that is even part of the code base (https://github.com/chromium/chromium) How would you personally approach this problem?

lukasgraf 3 years ago

Version control can be a big help.

Look through closed tickets in the issue tracker, and try to find a change (bugfix or new feature) that must have, given its nature, touched the functionality you're looking for. Then try to find the changeset(s) where that ticket's change was implemented.

With some luck, the changeset will include a modification to the part of the codebase you've been looking for.

sillysaurusx 3 years ago

I would search WebKit, not chromium. But keep in mind that there’s a difference between “unknown code bases” and “one of the largest and most complex code bases ever created.” You’re asking, essentially, “in the Windows source code, where is the window layout algorithm?”

Stuff like that is certainly possible to find. But it requires a lot of time and dedication.

I would personally search for job listings for Chromium / Firefox, then use that knowledge to find someone who works on it. Then I’d ask them where it is.

But only in this specific case. My normal workflow is to build whatever it is I’m looking at, then change things until it breaks. It’s pretty quick to narrow down what I’m looking for at that point.

That doesn’t work here because building chrome requires close to a supercomputer.

EDIT: Actually, I would try to find a crash log related to the DOM. The stack trace will point you precisely where you’re interested in. Doing that is easier said than done, but I’ve pulled that trick a couple times, so it seems worth mentioning.

  • hvasilevOP 3 years ago

    How did you know it was in Webkit or Blink? What was your thought process that lead you to this conclusion? Was it prior knowledge?

    • sillysaurusx 3 years ago

      I tried to think of a less-unsatisfying answer, but it was just prior knowledge. All the major browsers use Webkit or a fork of WebKit, and WebKit had to be the thing that lays out the DOM since it’s not chrome-specific. (If you were looking for something specific to chrome, it’d probably be a lot harder to find. Or at least for me to find.)

      The other comments are good too: try searching the metadata for the code, like git logs, pull requests, etc.

      I suppose a new age solution might also be to type “// The DOM layout algorithm:” into Copilot and pray.

  • charcircuit 3 years ago

    >I would search WebKit, not chromium

    Why? Blink is a fork of part of WebKit so it would likely even be in the same file.

    • sillysaurusx 3 years ago

      Smaller surface area. More likely to find tests related to the DOM being built. And as you say, once you find it in WebKit, you can probably find it in chromium.

tacostakohashi 3 years ago

As well as poking around in the source code, do not discount non-source based approaches.

For example:

* Run chromium using strace, ltrace, gdb to see what's going on at runtime.

* Do some experiments / reverse engineering, treating the application / source code as a black box. Try different HTML input, inspect the DOM in chrome, possibly automate this process via selenium or something, and discover the runtime behavior of the algorithm that way.

The thing to keep in mind is that, for all you know, the DOM building algorithm is split across thousands of source files, or is in fact in some dependency and not in chromium itself, or is split across both. Presumably there is some particular aspect of the DOM building that you are interested in, so experiment with how that works, instead of trying to find / understand the entirety of chromium DOM building.

eurasiantiger 3 years ago

Clone the repository, open it and let my IDE build an index of the codebase, then use ctrl-p and/or ”grep -r keyword .”

biggedyb 3 years ago

Personally I've never needed to unpick a full blown standalone app like Chromium before, I'll be using repo code or dropped into some hulking spaghetti massive legacy app that needs a bugfix, but there's no-one left at the company that has any idea how it works and the documentation is _lacking_.. but if this helps then good, otherwise, oops, sorry for the pollute.

Breakpoint methodology wins for me simple and true.

I imagine it like pathfinding the minotaurs maze, you stand at the last place you recognise and can get back to (if that's literally the first active line then that's fine), and put something there (breakpoint, print statement, log line), run it and check you still know where you are. Then put another down as far forward from that point that you can 'see', if that's literally one operation step then fine, spin it and check. Breakpoints are easily put down and just as easily cleared back up again. Keep only as many as you need to see which branch you are on.

Pretty soon you'll have run the damn thing so many of times you'll know it's bootstrapping and foibles and they will be second nature. You'll start seeing how it's generally laid out, you'll know where the main start up branchings are. When they leap into async or hidden 'rooms', log lines are perfect.

When the engine of it starts moving in your head, then is the time to start throwing breakpoints, prints or log lines in places that originally were completely unknown but now you have a feeling for. It's at this point you'll be bloody close to where you want to be.

Oh and do future you a favour, at least jot down something as you're going through this. I find that this initial torchlighting is remarkably gratifying but if you don't make notes in six months time it'll be completely gone, and you'll have to do between a quarter to a half of this all over again before the lights start lighting on and you remember how it's laid out.

maattdd 3 years ago

It's hard. Normally I start with a word that I know is fairly unique to the domain I'm looking for (in your case maybe "cssselector"). And what you are looking for is in the Blink third party folder.

chimineycricket 3 years ago

Usually some kind of string search works. If it's frontend then search a string that's on the feature. If it's backend search a string for table name, http error messages, anything like that.

LostRick 3 years ago

I'm no expert but currently currently getting into some coding after a bit of a break. Usually there is documentation, but you never know how relevant it still is. For this example with chromium, it looks like each folder has a readme.md, one even links to a dev wiki/guide and in there you can find google docs with diagram of the architecture :) For other projects there might be focus on doxygen which sort of collects the comments from the sourcefiles and puts it into for example html witch class trees etc.

rad_gruchalski 3 years ago

I usually start by finding issues related to the code fragment I’m interested in. Those usually lead to pull requests in the code I’m interested in.

actually_a_dog 3 years ago

Read the tests.

  • eurasiantiger 3 years ago

    First you may need to find the appropriate tests, which is essentially the same problem OP describes.

    • actually_a_dog 3 years ago

      I would say it's a related, but much simpler problem. Often, codebases will have some sort of command (sometimes in a Makefile, or similar) to run the tests, which you can use to track down where the tests live. Failing that, tests are often found in files, functions, and directories containing the string 'test' in their names. Simple use of shell tools can accomplish the task of looking for those things in a semi-automated way.

simonblack 3 years ago

You have to know what it is that you want to find before you start discovering.

My way is to have a project. What that is is unimportant, but it needs to be big enough that you run into roadblocks.

Now you know what your weak point is, and what you need to learn to overcome that weak point. So now you also know what it is you need to search for in that code-base.

flamesofphx 3 years ago

Poke, probe, prod... See what changes, till you get a better idea... I mean what else can you do when there no comment/documentation and your dealing with something like:

function C($a, $b, $ba, %bb, $c, $d = NULL) { //insert random garbage with eval statement.. }

charcircuit 3 years ago

You use code search. For chromium it's hosted at https://source.chromium.org/chromium.

You can use filters to narrow down the results to the right languages and paths.

nottorp 3 years ago

Reading and drawing out the structure as you read. Boxes with arrows or whatever.

A good source structure tool will save some time but you’re not getting away without doing your own reading anyway.

teeray 3 years ago

Search for pull requests that fix bugs in the area you’re interested in. They’ll point you towards the responsible sections of the codebase.

iFire 3 years ago

sourcetrail is still modern. It hasn't fully bitorotted.

baq 3 years ago

ripgrep

But to not leave a one word answer, start searching for a feature you know about and look around from where you find it. It might help to add a super small feature yourself - when it finally works, you’ll have some idea of how the code is structured and will be able to infer where other features would live if they were there. (That might take a bit more than one addition ;))

In chromium’s case, what you’re trying to do is more like reverse engineering… I’d start with a debugger.

  • pizza 3 years ago

    Yup

    - ripgrep when you know what you’re looking for already

    - “find usages” when you’ve stumbled upon something interesting sounding, and want to see where it gets invoked (jetbrains ides)

    - (gist.)github.com/search when you want to see how others have used some api

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection