Settings

Theme

Show HN: Effort to clone unmaintained SourceForge projects to GitHub

a-sf-mirror.github.io

146 points by hydragit 11 years ago · 89 comments

Reader

pavlov 11 years ago

Why Github? Copying from one commercial provider to another doesn't solve the fundamental problem. Using git helps, but most of those old repos will never get cloned.

In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.

  • scrollaway 11 years ago

    > In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.

    So fix it in 10 years. Git makes that easy.

    Point me to a good alternative to github that matches all your ideals. A free alternative to github - free as in beer, unless you're willing to fund this effort yourself, of course?

    We migrated one of our projects from Sourceforge to Github, and all the stallmen came out of their rock to tell us how Github is evil, how Savannah is the only true alternative, pah. "Absolute freedom of software" is nice but it's not the only requirement. Savannah has the usability of a rusty wrench and will probably shut down without warning long before Github "turns evil".

    Some people are just so far detached from reality when suggesting that stuff isn't perfect. Github is pretty damn amazing. If you want to use foss alternative like Gitlab, more power to you, but that doesn't make them ideal in every situation.

    • yellowapple 11 years ago

      > Point me to a good alternative to github that matches all your ideals. A free alternative to github - free as in beer, unless you're willing to fund this effort yourself, of course?

      Gitlab? Savannah (of the nongnu.org variety)? Bitbucket? Gitorious? A basic VPS with SSH and git?

      • carussell 11 years ago

        The source for Bitbucket is no more available than that for GitHub.

        http://atlassian.bitbucket.org/

        • yellowapple 11 years ago

          The only explicitly-stated condition was "free as in beer", which BitBucket is for even private repos. BitBucket also has a self-hosted equivalent, though said equivalent is not, alas, free-as-in-beer.

          The implicit free-as-in-speech condition is adequately fulfilled by other alternatives, like (as I mentioned) Gitlab and (IIRC) GNU Savannah (along with - again - just installing git on an SSH-able server).

      • sytse 11 years ago

        At GitLab you are very welcome. Please let me know if you have any questions or concerns.

        • scrollaway 11 years ago

          Props guys, I think you do an incredible job. GitLab is in fact a reasonable alternative, I should have mentioned it in my post.

          The reason we went to Github rather than Sourceforge is because of the community. Nethertheless, I think it's fairly foolish to focus on the platform (in the case here of archiving) - it's just a host. Stuff goes on domain 1 instead of domain 2. In either case it's still open source, the archives are all there, git is decentralized and perfect for the job.

          • carussell 11 years ago

            Contradictory signals in your rationale here. It comes off as if you're filling in a post-hoc justification.

            Is it just an archive or isn't it? If it is, what's so important about the community, then?

            It's fine that you chose what you did, of course. It's just that that in your defending it, you can't seem to decide whether you want to have your cake or eat it.

            If you wanted, this would have been a perfectly fine reason to give: "We went with GitHub because it's just what we use mostly, and other stuff not so much."

            • skitzmikler 11 years ago

              Who fucking cares? He's having it both ways and you can't stop it? Your tone is offensive. Relax.

            • scrollaway 11 years ago

              You're very confused. I don't work with the Archive Team, I was offering my experience regarding our project's move move from Sourceforge to Github and why GGGGGGGGP's (or something...) comment was way off base.

              • carussell 11 years ago

                I originally wrote my comment[1] allowing for the idea that you were an uninvolved bystander, until a reread of your comment strongly suggested that you were part of the project being discussed. I suppose I got confused when you began talking about "we" and "our project".

                1. The meat of which remains: if it's just an archive and the host is only being used as dumb storage, how is the "community" aspect of it a plus?

                • scrollaway 11 years ago

                  You're right, I shouldn't have used "We" in the original reply. This was my bad - my english still has rough edges.

          • sytse 11 years ago

            No problem, that for your kind comment. And feel free to update your post :)

  • goldfeld 11 years ago

    Ten years from now for all we know we could all have so much cheap storage and bandwidth and good, open p2p software that all coders get to archive their own full copy of github's repos. So the focus should be on getting today's job done now.

    • hk__2 11 years ago

      > open p2p software that all coders get to archive their own full copy of github's repos

      Do you mean git?

    • yellowapple 11 years ago

      Didn't we all say the same exact thing ten years ago?

      • minot 11 years ago

        Now we have SSD which means a small one-step-backward for storage. However, we will supposedly have 10TB SSD and beyond within a couple of years which should give some breathing room.

        Even with all this development, I doubt we will be able to have everything on Github locally on our computers. I imagine the typical Github project to be tiny -- probably tens of megabytes at the most so I'd say we can have all the projects that we care about available locally. One can only care about so much.

  • Klathmon 11 years ago

    Do you have any other suggestions? Hosting these repos on donated/personal machines is (IMO) significantly less likely to stand the test of time.

    At least with a commercial entity there is a bit more "trust" involved that they won't disappear out of the blue one day. And if the time comes that Github starts to collapse, the process can be repeated.

    Just because something isn't permanent doesn't mean it's pointless.

    • riffraff 11 years ago

      savannah? In don't think it's a very good alternative and I subscribe to the "we'll fix ten years problems in ten years" but it doesn't have the same issues.

    • blazespin 11 years ago

      We can't it stand test of time? Wikipedia seems to do well.

    • wslh 11 years ago

      > Do you have any other suggestions?

      Archive.org ?

      And weird that nobody suggested the Bitcoin block chain. I don't think binaries are a good fit but source code doesn't require a lot of space. With the current and future block size it will take sometime to make it happen.

      • joshstrange 11 years ago

        Archive.org (while an amazing resource created lovingly by amaing people) is not a great front-end for stuff like this. Github is very easy to get started with and excels at code hosting. As for the blockchain that's a terrible idea, there are so many things wrong with it including the cost to push all of that data into the blockchain and the fact that while source code can be small it isn't always and it's magnitudes of times larger. Right now each block is 1MB and blocks take some 10 minutes for just 1 confirmation so you are looking at < 1.7KBpss (13.6Kbps) "upload" speeds. IF you actually attempted this you would have to have some sort of header on each transaction to tie it all together which lowers the speed even further. I'd bet money that if you started uploading nodes would either ban you or the core devs would do something to stop the chain from being filled with shit that now has to get replicated to 10's of thousands of machines across the globe.

        • wslh 11 years ago

          I received a lot of downvotes but my comment was a bit ironical since in many forums (even in HN in the past) when someone talked about backups many people suggest the block chain.

          I said: With the current and future block size it will take sometime to make it happen.

          • ajkjk 11 years ago

            The absurdity of that suggestion it not absolved by the fact that someone else has said it before, or by the fact that you said it will take 'some time for it to happen'. It's completely not an option for the migration we're discussing.

            • wslh 11 years ago

              It is not absurdity if it is irony and I clearly said that it can't be done now but may be in the future. I can't see the future, do you?

              • hughw 11 years ago

                It's not practical, but it's a worthy goal for blockchain computing, or some descendant of it. So I am glad you brought it up.

  • hydragitOP 11 years ago

    For now, Github is not ad-ridden as SourceForge is. Github is monetizing some repositories: https://github.com/pricing I don't know if they're sustainable, but from my naive point of view, closed-source projects on github pay for the hosting of open-source projects on github.

    • mdasen 11 years ago

      Looking at GitHub's business model, it looks a lot more sustainable than SourceForge's.

      My company uses GitHub Enterprise. Unless we have some sort of deal/discount above the built-in, we're paying over $30,000/year for it and we run it on our own servers. I'm guessing a lot of other companies do as well. Developers are quite used to using both git and GitHub and $30,000 is nothing if you have a hundred developers costing you $150k a piece (not just salary, but computers, benefits, desks/office space, payroll taxes, etc).

      SorceForge counted on their open-source stance limiting who would use their service and, by extension, limiting the resources they would need to serve those people. GitHub works the opposite way. They want everyone to think of GitHub as "the place I put stuff". Have a code snippet? Stick it on GitHub! Want a basic wiki for something vaguely code related? Create a GitHub repo just for the wiki! Collaborating with friends on a class project? GitHub! And then, years later, GitHub feels like second-nature to you and you love it when employers are using it paying GitHub tens of thousands a year for it.

      I'm not accusing GitHub of doing something nefarious to lock people into GitHub. Just noting that GitHub feels very familiar and that makes GitHub a very reasonable choice for companies who pay them money. Without that familiarity, the value of GitHub isn't the same. If you're a company spending millions per year, $30k is a drop in the bucket for software your developers are already familiar with and software that works well, is well supported, and can handle your problem.

      Yes, GitLab exists and has both open-source and enterprise versions, but I'm not sure that a business feels that differently about $5,000/year for a 100 person team and $25,000. I'm glad GitLab exists, I'm glad Bitbucket exists. They'll make sure that GitHub has to continue being great and they'll provide services to people that want something a bit different. But GitHub's business model seems pretty sound. The more people use GitHub for free, the more likely high-rollers are to pay for GitHub.

      I mean, the GitHub subscription per developer costs less than the additional money my company pays for Apple gear for developers. By targeting open source with a premium, free, non-ad driven product, GitHub opened the door to lucrative business sales. They seem like a sustainable business and it even seems like the free, open-source repositories are part of that business plan.

      I'm not saying that Apple gear is so overpriced or that it isn't a better platform to develop on, but we don't need retina displays to do our work. And many people argue that you don't want to force devs to work on a platform they're less productive on. The same applies to GitHub. If your devs are more productive or, heck, even happier or more comfortable using it, $250/year isn't something a company is going to blink at if it's paying $150k+ per dev - just as the company won't mind paying an extra $100, $500, or $1,000 in equipment for that dev.

    • johansch 11 years ago

      In my naive mind, I think it might be sustainable.

      (During the past week I paid for the first month of private github hosting on my personal CC for the company I work for. Will get it re-imbursed and transition it to some company CC when I get the time.)

  • duskwuff 11 years ago

    Github may be a commercial provider, but at least it's a commercial provider based on an open protocol. If things do start going wrong at Github, escape is a "git clone" and a "git push" away.

  • jlarocco 11 years ago

    Who cares?

    If that happens, the projects can be re-hosted somewhere else. For the time being Github is the best option.

    Sometimes the hypothetical situations free software people bring up hurt their cause more than they help.

  • readme 11 years ago

    I doubt that. Github is a paid service and has several enterprise level clients. If it's ever going to flip flop, there will be quite a few warnings before hand.

  • hippich 11 years ago

    somebody have to pay for git hosting. who will be better alternative in your opinion?

  • usaphp 11 years ago

    I like people like you , always slashing ideas and not suggesting your own...

TazeTSchnitzel 11 years ago

Why aren't you mirroring the binaries? These are vital for people in the future who do not have the time to set up a build environment for software from a decade ago.

I'd also echo the concerns of others about GitHub.

Proper archivists should do for SourceForge what they did for other projects. Archive Team, maybe? Looks like they have a wiki page: http://www.archiveteam.org/index.php?title=SourceForge

  • bentpins 11 years ago

    This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

    Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior

    • nadams 11 years ago

      Honestly I think ignoring robots.txt in this case is acceptable. Even if he programs in code to respect robots.txt - once the management at sourceforge get wind of what he is doing - what is stopping sourceforge from putting up robots.txt everywhere blocking him?

    • frik 11 years ago

      Sourceforge doesn't host the binaries themselves. Universities and others offer mirrors (like HEANET) for free!

      So the mirrors should just cut the upload write permission for Sourceforge and transfer it over to archive.org or ArchiveTeam.

  • hydragitOP 11 years ago

    Regarding binaries, I know these could be useful and I'd like to provide them, but I'm afraid some "not (yet?) very popular mirroring project" can't show how we can trust it regarding binaries. After all, a known site like SF is untrustable, so why would an unknown site would be more?

Osmium 11 years ago

Honestly, this is a serious issue for my field. There are so many obscure academic binaries hosted on SF... I hope someone manages to mirror them. [The fact that a lot of the scientific community is so backwards in adopting modern coding standards is another conversation for another day.]

estrabd 11 years ago

Sourceforge is on the radar here, but maybe it's time to step it up.

http://www.archiveteam.org/index.php?title=Fire_Drill

Update: seems others have linked to archiveteam.org, so maybe that's the best route. Is the OP part of the AT effort or do they know about each other? Maybe they should.

lcswi 11 years ago

Nice! But in my opinion better help archiveteam with their efforts!

jmkni 11 years ago

Nice.

I agree with what the others are saying, there's a lot of source code for solving obscure programs that is only on Sourceforge.

One example I found recently is a program called QLumEdit. I recently had to figure out how to work with EuLumdat files, and if it wasn't for the source code for this program on Sourceforge I would have been completely stumped (well not quite, but it would have taken me ages).

If SF goes down the toilet, a lot of knowledge goes with it so this is awesome to see!

If anybody is interested, I was converting this code from C++ to .net, my horrible hacky unrefactored effort is here - https://github.com/bumblebeeman/eulum.net

I am planning to make this code nicer, and develop it into a WPF app when I have time!

I am getting pretty close too, here is my .net generated version of the images this program produces: http://imgur.com/PCmpnJ2

ksherlock 11 years ago

That's great. I started doing that myself (my own git server, not github) for some projects I care about. This effort seems include a very narrow list, though.

For CVS, though, I suspect cvs-fast-export [1] will do a better job than git-cvsimport.

1 http://www.catb.org/esr/cvs-fast-export/

jaytaylor 11 years ago

What about creating a torrent containing all these unmaintained SF projects (with binary downloads included)?

This would dramatically increase the odds that the content is never lost.

  • nextweek2 11 years ago

    The problem with torrents is the lack of incremental update support. If the base torrent gets updated it gets a new hash identifier. How do you know its been changed to ensure you get the latest version. When you do the swarm effectively gets diluted because some are on the new architecture and some are on older versions.

    • jaytaylor 11 years ago

      If the SF projects aren't being updated, then what's the issue? The information is, by definition, static.

frik 11 years ago

> Currently, for each cloned project, we mirror its CVS repository and its website.

Please add "SVN" (Subversion)

coliveira 11 years ago

This seems much more like a temporary fix, not really a solution. A few years from now GitHub can do the same thing that sf did. This after all seems to be the fate of commercial companies that explore open source, once they start to lose users to new competitors.

egsec 11 years ago

Note 1: Moving things to GitHub or elsewhere does not remove them from SourceForge. So SF can continue to host and enjoy links on unmaintained websites, search engines etc.

Note 2: If their business model is offering popular binaries and source, they can just copy these from other sites and repackage them. Open source software allows you to do this. If no one else is interesting in bundling and monetizing, then they can buy traffic and still succeed.

Note 3: Remember that academy award winning movie from 1943? Not so great it today's light. While perhaps one of the goals of the Internet and cheap storage is to keep a copy of everything, and its often better to not re-invent the wheel, if something fall by the wayside, and its needed, it will be created.

Note 4: There are plenty of websites which catalog useful abandonware, that someone had to find a physical disk drive from. If the software has value, chances are someone will eventually repost it somewhere without a massive organized effort.

----

There is clearly value in moving over some project to GitHub or elsewhere, but if some things are not migrated or moved life will go on.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection