Show HN: Ghuser.io – Complete GitHub profiles

157 points by lourot 7 years ago · 51 comments

Reader

lourotOP 7 years ago

OP here. The idea came because on your GitHub profile:

- I can't see all your code contributions since the beginning of time at a glance.

- For each of your pinned repos, I can't tell at the first glance whether you just fixed a typo or you wrote 90% of it.

ghuser.io tries to close this gap. You can show off with your entire GitHub "portfolio" and it gets refreshed every day.

And hopefully this is temporary. The best for the community would be to have all this directly on GitHub. The sooner ghuser.io can be switched off, the better ;)

lourotOP 7 years ago

This is already more popular than I thought, the queue of profile requests (and AWS SQS) contains already more than 10 users, at this point any new user user coming will have the sensation that nothing happens when they want to get a profile.
I didn't expect to reach this point so fast (thanks!). I'll work on improving this right after this wave of new users. For now, just making sure that you all get your profile.
- franciscop 7 years ago
  
  Expect 10-20k visits if it rides the front page of HN, with anywhere between 100-2000 users creating their profile.
- gadgetoid 7 years ago
  
  Just requested mine! I'm not ready to abandon GitHub just yet- I use it for the "Hub", not the "Git" and this sort of change is just the kind of thing I'd like to see more of- let's hope some of these ideas get integrated into GitHub itself.
  I'm still optimistic that Microsoft will be keen to make some quality-of-life improvements when they take the wheel, as a show of good faith.
  - lourotOP 7 years ago
    
    We had a chat with the "profile team" at GitHub recently and they are doing a great job as we speak at improving the GitHub profiles :)
rmetzler 7 years ago

I'm the guy fixing the typos in everybody's README :)
- lourotOP 7 years ago
  
  Hehe :) And this is really useful, I hope my message didn't make it sound like the opposite. (Unfortunately our readme is probably full of typos!)
- michaelmior 7 years ago
  
  Don't you tire of it after a while? I still do it from time to time but I've started to feel apathetic after seeing so many READMEs just put together as an afterthought.
- TomK32 7 years ago
  
  damn it, you've been quicker at it, again...
franciscop 7 years ago

Hi OP, seems nice but "Get my profile" seems to attempt to load but never gets done. Github username: franciscop. Slashdot effect? My profile is weird? Been coding for ~8 years OSS in Github and I've got a quite public profile [1], so I might have quite a bit of data there.
[1] http://git-awards.com/users/franciscop
- lourotOP 7 years ago
  
  indeed, you're hitting what I just described in my other comment. Sorry about that, I really didn't think it would happen this fast (it's not the first time I post this on a forum).
  Taking a few minutes now to display a temporary message that we're a bit overloaded.
  - MrQuincle 7 years ago
    
    This is a different type of forum: you're experiencing kind of a "hug of death". Be proud and move fast.
    https://en.wikipedia.org/wiki/Slashdot_effect
TomK32 7 years ago

Thanks, my gh profile has become utterly useless due to overload.

fabricexpert 7 years ago

Could you make this work for gitlab, bitbucket and maybe even just regular old git repos as well?

It would also be really cool if we could pull out basic private repo info as well (with the user's permission). E.g. Worked on X private repos covering 60% Go, 20% Javascript etc.

I am also a bit confused as to why you would build this if you don't work at github?

brillout 7 years ago

> Could you make this work for gitlab, bitbucket and maybe even just regular old git repos as well?"
Yes totally. Although we currently think that the best for the community is GitHub improving their GitHub profiles making ghuser.io obsolete. Right now we'd rather do this with GitHub and not against GitHub. But we don't know how much GitHub is willing to help us. (We need changes to the GitHub APIs for ghuser.io to be scalable.)
> private repo
Yea we could do that as well. Give permission to our public key and we're good to go.
> why you would build this if you don't work at github?
Because Aurelien (OP) wanted/needed that so he just built it.
To the both of us it's basically a fun experiment.
Also for me it's cool because we use the web framework Reframe (https://github.com/reframejs/reframe) which I'm currently building.
- jdormit 7 years ago
  
  Just so you know, you have a name collision with a popular ClojureScript SPA library, re-frame: https://github.com/Day8/re-frame
- sytse 7 years ago
  
  I love these profiles. At GitLab we're very open to contributions to enhance user profiles. One of the hard things might be the performance of these pages, for that you probably need to compute metrics in advance in a background job.

lourotOP 7 years ago

OP again, we're way overloaded. We got 200 profile requests already in an hour and we didn't see it coming. We're extremely thankful and sorry to disappoint you at the same time. We're closing the "Get your profile" feature for now. Spinning up many EC2 instances to process the 200 profile requests faster in parallel as we speak. And we'll come back soon with a system that can handle this load.

Many many thanks!

KenanSulayman 7 years ago

Isn't 200 requests in an hour ... just three requests per minute? What is this service doing?
- brillout 7 years ago
  
  GitHub's API does't provide the exhaustive list of all your contributions. Instead we crawl GitHub's website which is slow.
  Details at https://github.com/AurelienLourot/github-contribs#how-does-i...
  We are in talks with GitHub and they know that we are crawling GitHub.
  - ChristianBundy 7 years ago
    
    > Instead we crawl GitHub's website which is computationally expensive
    I'm surprised to hear that you're bottlenecking on CPU time. Could you verify that my understanding is correct? I would've thought your bottleneck would be networking and connectivity as you have to wait for GitHub to process all of the requests.
    
    brillout 7 years ago
    
    You're right. I edited my answer.
    The GitHub's unofficial API we are using is slow and per IP rate limited. We spin up several servers to have several IPs to circumvent the rate limit.
    (GitHub knows that we do that and we are in contact with them.)
    
    exikyut 7 years ago
    
    If GitHub are okay with you using multiple IPs to get that data then it's not inherently expensive on their side for you to be using this.
    Surely a rate-limit exception could be in order, then?
    And perhaps you could help them alpha-test a new API endpoint that just so happens to include all the info rolled up as one URL :D
    (Hmmmmm.... GraphQL....)
    
    rjacksonm1 7 years ago
    
    Looks like they're hitting a GitHub rate limit, rather than bottlenecking on CPU.
- lourotOP 7 years ago
  
  this is the bottleneck: https://github.com/AurelienLourot/github-contribs
  It uses a very slow non-official GitHub API and so it takes several hours to do the initial crawling of one single profile, and is limited by IP so you need several machines/instances in order to parallelize. We plan to use AWS Fargate for the future. (We thought this future would be much farther away)
- iokanuon 7 years ago
  
  Maybe it's cloning all GitHub repos of a user to compute activity?
  - brillout 7 years ago
    
    We don't clone repos. Instead we use GitHub's API to get activity info.
    E.g. https://api.github.com/repos/aurelienlourot/ghuser.io/contri...
sam0x17 7 years ago

Would it be practical to use puppeteer / headless chrome from within a Google Cloud Function to do this? Then you could do literally millions of profiles per hour. The only requirement is that it never takes more than 540 seconds for any one profile, and that you could work around if you write in the ability to export the execution state and send it off to a new function invocation if you are close to the deadline. This seems like a much easier problem if you can get out of VPS land and into lambda/cloud functions since its something like $0.00001 per invocation.
https://cloud.google.com/blog/products/gcp/introducing-headl...
- gitgud 7 years ago
  
  That's amazingly cheap, but would each puppeteer instance have its own IP address? They're being rate limited on each IP.
  - sam0x17 7 years ago
    
    I would think you get a random IP as they have millions of worker nodes, but I don't know.
sethish 7 years ago

You may want to delete my profile request, I have somewhere between 50k and 60k repositories and it tends to crash services. (user: sethwoodworth)
- lytedev 7 years ago
  
  Haha! That is amazing! How, exactly, do you have so many?!
ishanjain28 7 years ago

Can we clone the repo and run it locally on my own profile? I arrived late..
- brillout 7 years ago
  
  Not currently but we are thinking about it, see https://github.com/AurelienLourot/ghuser.io/issues/103

sergey4096 7 years ago

Check out https://sourcerer.io - GitHub and GitLab profiles with extensive timeline, language, library, repo analysis. Customizable.

kelsolaar 7 years ago

Just signed up, love it! Your https://github.com/sourcerer-io/hall-of-fame is great too!
- sergey4096 7 years ago
  
  thank you!

toyg 7 years ago

Eh, you were too honest in your FAQs - chances are that Github will now close the hole. I expect there is a reason your use-case was not enabled in the API despite being already supported by the official frontend...

brillout 7 years ago

ghuser.io contributor here.
We talked with GitHub already. They think that what we are doing is cool and we agreed to tell them our roadmap and they tell us theirs.
We would be glad if GitHub copies us.
- ghoshbishakh 7 years ago
  
  GitHub copying you! Now that's cool! achievement unlocked

ocdtrekkie 7 years ago

Pretty bummed I missed a chance to try this! I had manually gone through my merged PRs list before to try and assemble a count of my contributions to each public project to come up with some sort of profile of stuff I've participated in/contributed to, as I really didn't see any of these type of pages doing that. This looks much more like what I'd like to see.

lourotOP 7 years ago

hopefully we'll get back soon with a scalable system and you'll be able to try it out :)

ohnoesjmr 7 years ago

Given you disabled onboarding, is my onboarding request queued somewhere or will I have to comeback and re-request once you got over the hump?

lourotOP 7 years ago

We're working on serving all the requests that came in before we disabled onboarding (about 200 requests). If you give me your github username I can tell you if you're part of them :)
- ohnoesjmr 7 years ago
  
  I am pretty sure I am not, as I received the message that onboarding is disabled.
  If you have user requests in the logs, you should queue them up.

delinka 7 years ago

Am I understanding correctly that this is a manual process and not currently automated?

(And can I cancel my request so you can move on to others for now?)

brillout 7 years ago

Code-wise it is fully automated. But it's slow and it doesn't scale. We have to spin up new servers manually. As OP said, we'll need to make changes for ghuser.io to be scalable. Ideally GitHub adds an API that lists all your contribs. Which is (obviously) not in our hands but we are talking with GitHub.
lourotOP 7 years ago

It's cheeply automated to handle 10 profile requests per day, which is more than what we got in the past few months. So now we're giving it some human help and we'll have to rethink the system.
What is your username? I'll cancel your request, thanks!
- social_quotient 7 years ago
  
  Seems like the perfect match for aws lambdas. I’d consider setting the tasks for crawling in to SQS and then trigger the lambdas to go do single crawl functions.
  For better control over the throttle and concurrency you can leverage dynamodb... I love it for controlling lambdas but not for storage.
  If you need more power than a lambda then you can do a similar process with EC2. Populate the SQS Trigger the aws lambda to turn on EC2 machine. Consider spot instances to save a ton of money.
  If you need ideas I’m sure HN readers would be glad to help solution for afar.

7oorih 7 years ago

      grafana/.github/CONTRIBUTING.md

Settings

Show HN: Ghuser.io – Complete GitHub profiles

Keyboard Shortcuts