A brief history of code search at GitHub
github.blogI still think the changes made to only index repos that had activity during the last year [1] was the wrong call, and potentially makes code search dangerous for those that do not understand this.
At least for me a common use case for org-wide code searches is to answer questions like "is the library x still used somewhere?"
To not have this include old potentially low-traffic repos is a very bad thing. I don't understand why they would do this for enterprise customers. Like we would pay extra to not have it be this way.
[1] https://github.blog/changelog/2020-12-17-changes-to-code-sea...
Unfortunately I lost trust in GitHub search a while ago. Can't find files, can't find an exact string reference. This can be dangerous in certain situations like you said. Hopefully I can start to gain trust in the new version when released but I don't really see myself using it again.
Same here. GitHub search sucks for authoritative, governance use cases. I got burned a few times and now just pull everything down myself and search it.
I think there’s code search companies but they are too expensive for me and I suppose some people really value it more highly.
Comically, I need the old Google search appliance and just treat it as a web source for lots of my questions (who is using log4j, etc).
> I need the old Google search appliance
Or Google Code Search! Which turned out to be Russ Cox's platform for developing RE2.
Google Code Search was simply amazing. Killing Google Code Search really underlined how Google had given up on it's pro-social agenda. I mean, here was a best-in-class and as yet unmatched service, neatly tailored to both the needs of developers and to Google's expertise and operations, and they killed it. Why? Because it wasn't going to make any money? Was there ever an expectation that it would?
The death of Google Books, when they removed countless scanned items, most of which were clearly out of copyright, was also painful. But I could understand how dealing with the barrage of copyright lawsuits was simply a bridge too far for Google. But Code Search? Why, oh, why!? :(
> just pull everything down myself and search it.
`grep -r` or `git-all grep` or something better?
I'm not who you replied to but I use ag and it's pretty fast (faster than grep I think)
Yes this seems like a bad move to me too. Plenty of repos still useful may not change in a year. But still be relevant.
It won't search forked repositories either, which is a pain when trying to find something - you're better off pulling and grepping.
(If you have forked a long-dead project and are working on it Github support can "decouple" it from the original and then you can search it.)
Sounds like it would have been very difficult to implement this without some excellent open source projects. Is github planning to open-source other parts of the system?
=( I joined last week but still no preview
> Join the GitHub Code Search Waitlist Access is limited during the technology preview of GitHub’s future code search. Sign up today for your chance to try it and give your feedback.
You’re already on the waitlist for GitHub Code Search! We’ll email you when we’ve enabled it on your account. Make sure your primary email address is up-to-date.
Hey, I'm on the GitHub code search team. We've had a ton of people join the waitlist, sorry it's taking so long to get through everyone! We're hoping to get most people access early in the new year. Stay tuned!
Is there a list I can sign up to stay on the old code search forever? I'm really sad about the loss of old repos - I found that invaluable for finding interesting projects to learn from.
It sounds like the old code search doesn’t include old repos either: https://github.blog/changelog/2020-12-17-changes-to-code-sea...
Maybe message someone internally?
Given that this is a "history" I can't believe they don't mention grep.app. It's pretty much the way to search entire github.
There probably aren't as many drill-down features as the new official way, but speed and simplicity of UI more than make up for it.
grep.app is excellent, but I don't think you can search the entire github, on the front page it says "Search across a half million git repos".
You are right. It appears there is a cut-off for inclusion (perhaps low, < 20, star count?).
I personally don't care about wait time that much for search. I would have for years loved a simple grep that you could do from a browser. Even if it took a minute, that would still be faster than what I've had to do: manually clone the repo locally and run a grep on it on my machine.
I don't remember what my problems are with GitHub search, only that it almost never finds what I'm looking for. I've had to resort to cloning the repo and grepping for what I need more times than I can count.
Why don't they reuse CodeQL to provide a true code search using maybe a subset of CodeQL capabilities?
Was it always necessary to log in to use this feature?
For global code search it's been that way for quite a while