Show HN: Ghuser.io – Complete GitHub profiles
github.comOP here. The idea came because on your GitHub profile:
- I can't see all your code contributions since the beginning of time at a glance.
- For each of your pinned repos, I can't tell at the first glance whether you just fixed a typo or you wrote 90% of it.
ghuser.io tries to close this gap. You can show off with your entire GitHub "portfolio" and it gets refreshed every day.
And hopefully this is temporary. The best for the community would be to have all this directly on GitHub. The sooner ghuser.io can be switched off, the better ;)
This is already more popular than I thought, the queue of profile requests (and AWS SQS) contains already more than 10 users, at this point any new user user coming will have the sensation that nothing happens when they want to get a profile.
I didn't expect to reach this point so fast (thanks!). I'll work on improving this right after this wave of new users. For now, just making sure that you all get your profile.
Expect 10-20k visits if it rides the front page of HN, with anywhere between 100-2000 users creating their profile.
Just requested mine! I'm not ready to abandon GitHub just yet- I use it for the "Hub", not the "Git" and this sort of change is just the kind of thing I'd like to see more of- let's hope some of these ideas get integrated into GitHub itself.
I'm still optimistic that Microsoft will be keen to make some quality-of-life improvements when they take the wheel, as a show of good faith.
We had a chat with the "profile team" at GitHub recently and they are doing a great job as we speak at improving the GitHub profiles :)
I'm the guy fixing the typos in everybody's README :)
Hehe :) And this is really useful, I hope my message didn't make it sound like the opposite. (Unfortunately our readme is probably full of typos!)
Don't you tire of it after a while? I still do it from time to time but I've started to feel apathetic after seeing so many READMEs just put together as an afterthought.
damn it, you've been quicker at it, again...
Hi OP, seems nice but "Get my profile" seems to attempt to load but never gets done. Github username: franciscop. Slashdot effect? My profile is weird? Been coding for ~8 years OSS in Github and I've got a quite public profile [1], so I might have quite a bit of data there.
indeed, you're hitting what I just described in my other comment. Sorry about that, I really didn't think it would happen this fast (it's not the first time I post this on a forum).
Taking a few minutes now to display a temporary message that we're a bit overloaded.
This is a different type of forum: you're experiencing kind of a "hug of death". Be proud and move fast.
Thanks, my gh profile has become utterly useless due to overload.
Could you make this work for gitlab, bitbucket and maybe even just regular old git repos as well?
It would also be really cool if we could pull out basic private repo info as well (with the user's permission). E.g. Worked on X private repos covering 60% Go, 20% Javascript etc.
I am also a bit confused as to why you would build this if you don't work at github?
> Could you make this work for gitlab, bitbucket and maybe even just regular old git repos as well?"
Yes totally. Although we currently think that the best for the community is GitHub improving their GitHub profiles making ghuser.io obsolete. Right now we'd rather do this with GitHub and not against GitHub. But we don't know how much GitHub is willing to help us. (We need changes to the GitHub APIs for ghuser.io to be scalable.)
> private repo
Yea we could do that as well. Give permission to our public key and we're good to go.
> why you would build this if you don't work at github?
Because Aurelien (OP) wanted/needed that so he just built it.
To the both of us it's basically a fun experiment.
Also for me it's cool because we use the web framework Reframe (https://github.com/reframejs/reframe) which I'm currently building.
Just so you know, you have a name collision with a popular ClojureScript SPA library, re-frame: https://github.com/Day8/re-frame
I love these profiles. At GitLab we're very open to contributions to enhance user profiles. One of the hard things might be the performance of these pages, for that you probably need to compute metrics in advance in a background job.
OP again, we're way overloaded. We got 200 profile requests already in an hour and we didn't see it coming. We're extremely thankful and sorry to disappoint you at the same time. We're closing the "Get your profile" feature for now. Spinning up many EC2 instances to process the 200 profile requests faster in parallel as we speak. And we'll come back soon with a system that can handle this load.
Many many thanks!
Isn't 200 requests in an hour ... just three requests per minute? What is this service doing?
GitHub's API does't provide the exhaustive list of all your contributions. Instead we crawl GitHub's website which is slow.
Details at https://github.com/AurelienLourot/github-contribs#how-does-i...
We are in talks with GitHub and they know that we are crawling GitHub.
> Instead we crawl GitHub's website which is computationally expensive
I'm surprised to hear that you're bottlenecking on CPU time. Could you verify that my understanding is correct? I would've thought your bottleneck would be networking and connectivity as you have to wait for GitHub to process all of the requests.
You're right. I edited my answer.
The GitHub's unofficial API we are using is slow and per IP rate limited. We spin up several servers to have several IPs to circumvent the rate limit.
(GitHub knows that we do that and we are in contact with them.)
If GitHub are okay with you using multiple IPs to get that data then it's not inherently expensive on their side for you to be using this.
Surely a rate-limit exception could be in order, then?
And perhaps you could help them alpha-test a new API endpoint that just so happens to include all the info rolled up as one URL :D
(Hmmmmm.... GraphQL....)
Looks like they're hitting a GitHub rate limit, rather than bottlenecking on CPU.
this is the bottleneck: https://github.com/AurelienLourot/github-contribs
It uses a very slow non-official GitHub API and so it takes several hours to do the initial crawling of one single profile, and is limited by IP so you need several machines/instances in order to parallelize. We plan to use AWS Fargate for the future. (We thought this future would be much farther away)
Maybe it's cloning all GitHub repos of a user to compute activity?
We don't clone repos. Instead we use GitHub's API to get activity info.
E.g. https://api.github.com/repos/aurelienlourot/ghuser.io/contri...
Would it be practical to use puppeteer / headless chrome from within a Google Cloud Function to do this? Then you could do literally millions of profiles per hour. The only requirement is that it never takes more than 540 seconds for any one profile, and that you could work around if you write in the ability to export the execution state and send it off to a new function invocation if you are close to the deadline. This seems like a much easier problem if you can get out of VPS land and into lambda/cloud functions since its something like $0.00001 per invocation.
https://cloud.google.com/blog/products/gcp/introducing-headl...
That's amazingly cheap, but would each puppeteer instance have its own IP address? They're being rate limited on each IP.
I would think you get a random IP as they have millions of worker nodes, but I don't know.
You may want to delete my profile request, I have somewhere between 50k and 60k repositories and it tends to crash services. (user: sethwoodworth)
Haha! That is amazing! How, exactly, do you have so many?!
Can we clone the repo and run it locally on my own profile? I arrived late..
Not currently but we are thinking about it, see https://github.com/AurelienLourot/ghuser.io/issues/103
Check out https://sourcerer.io - GitHub and GitLab profiles with extensive timeline, language, library, repo analysis. Customizable.
Eh, you were too honest in your FAQs - chances are that Github will now close the hole. I expect there is a reason your use-case was not enabled in the API despite being already supported by the official frontend...
ghuser.io contributor here.
We talked with GitHub already. They think that what we are doing is cool and we agreed to tell them our roadmap and they tell us theirs.
We would be glad if GitHub copies us.
GitHub copying you! Now that's cool! achievement unlocked
Pretty bummed I missed a chance to try this! I had manually gone through my merged PRs list before to try and assemble a count of my contributions to each public project to come up with some sort of profile of stuff I've participated in/contributed to, as I really didn't see any of these type of pages doing that. This looks much more like what I'd like to see.
hopefully we'll get back soon with a scalable system and you'll be able to try it out :)
Given you disabled onboarding, is my onboarding request queued somewhere or will I have to comeback and re-request once you got over the hump?
We're working on serving all the requests that came in before we disabled onboarding (about 200 requests). If you give me your github username I can tell you if you're part of them :)
I am pretty sure I am not, as I received the message that onboarding is disabled.
If you have user requests in the logs, you should queue them up.
Am I understanding correctly that this is a manual process and not currently automated?
(And can I cancel my request so you can move on to others for now?)
Code-wise it is fully automated. But it's slow and it doesn't scale. We have to spin up new servers manually. As OP said, we'll need to make changes for ghuser.io to be scalable. Ideally GitHub adds an API that lists all your contribs. Which is (obviously) not in our hands but we are talking with GitHub.
It's cheeply automated to handle 10 profile requests per day, which is more than what we got in the past few months. So now we're giving it some human help and we'll have to rethink the system.
What is your username? I'll cancel your request, thanks!
Seems like the perfect match for aws lambdas. I’d consider setting the tasks for crawling in to SQS and then trigger the lambdas to go do single crawl functions.
For better control over the throttle and concurrency you can leverage dynamodb... I love it for controlling lambdas but not for storage.
If you need more power than a lambda then you can do a similar process with EC2. Populate the SQS Trigger the aws lambda to turn on EC2 machine. Consider spot instances to save a ton of money.
If you need ideas I’m sure HN readers would be glad to help solution for afar.
grafana/.github/CONTRIBUTING.md