DataJoy is shutting down
getdatajoy.comWhy don't companies feel comfortably "code dumping"? Just throw everything online as a tarball, and say "we aren't supporting this and we don't want to have anything to do wit h this, but here's the source."
DataJoy Co-founder here. A lot of the DataJoy code IS available (https://github.com/sharelatex/web-sharelatex/tree/datajoy). Our other product ShareLaTeX has an open-source version that you can run locally and is very similar to the version we host at sharelatex.com. DataJoy naturally shares a lot of code with ShareLaTeX (if you look at the two products, you'll see they're very similar). However, with DataJoy, we never got the product to a stage where we felt it made sense to invest time into 'good open source' (documentation, installation guides, etc), but the 'code dump open source' version has always been there.
The main thing that isn't open source with DataJoy is our backend for running code. At the moment this is so tied into Docker, S3, and how we deploy it in our infrastructure, that I don't think it would be much use to anyone else. The innovations here have been in how we deploy and provision it, not in the code itself.
if it's innovative, putting it up is still useful as a case study on how to perform said innovation. It's not bringing you any revenue anyway, so it's not like you'd lose anything. Just take out all of the keys and passwords etc from the repo!
I'm sorry to hear y'all are shutting down, I've been a big fan of Sharelatex for going on two years now and have always used it to build my resumes.
I think they are only closing down Datajoy.
Indeed, and one of the reasons is the success of ShareLaTeX means that it takes our team's whole attention to keep up with the growth of ShareLaTeX, and keep investing in feature development to keep up with demand. ShareLaTeX isn't going anywhere.
So that sounds like success (even though it's sad for datajoy users)! You guys tested the waters with 2 products, one has found product-market fit, and now you guys are focusing on growing that one.
> Why don't companies feel comfortably "code dumping"? Just throw everything online as a tarball, and say "we aren't supporting this and we don't want to have anything to do wit h this, but here's the source."
I have never worked on any non-trivial project where not almost all issues were present.- It may contain configuration information. - It may contain private keys or passwords. - It may contain customer specific code (if you maintain customer specific features either via feature toggles or branches), which may leak information of your paying customers. - It may have unintended copyright violations. - It may contain software that is licensed in a way that makes it a copyright violation to distribute your software outside or your company (publishing it is distributing it). This may also apply if you distribute your sources without any (source or binary) parts of the proprietary dependency. - It may fall under the export restrictions for cryptographic software (these have been mainly dropped, but not completely). - It may directly or indirectly make your patent violations public (oh, you have them already, but nobody knows about them). - It's of embarrassing quality. - It may make it public that your company has defrauded its customers / and or users. - It may make it public that your company has supported its customers to commit fraud and/or other crimes (the RICO act makes this more easier to follow up on for law enforcement).I know this isn't a general rule, but based on my experiences with proprietary projects, even with the best of intentions most became a jenga-esque pile of hacks and shortcuts that few developers would want to show off in public even if the overall system works well.
> most became a jenga-esque pile of hacks and shortcuts that few developers would want to show off in public
Not just that they don't want to "show off", but even more that they don't think it will be useful.
It takes a non zero amount of work to document a project such that it can be used and extended by someone who didn't write it.
I'm a proponent of publishing code that's useless, obsolete, buggy, poorly documented, etc, if the rights holders are so inclined. The reason is that I think it can still be useful as training data for, say, designing new programming languages/tools around coding patterns seen in the wild.
If you have a product people have paid money for, and say you're releasing it as open source because it isn't sustainable, the people looking for that code aren't typically interested "training data".
I get your point that messy code may be useful for other purposes, but believe that the vast majority of users are looking for code to solve their problems, not "training data" or research.
That said, as you say, rights holders can do what they will.
Yep, I'm just saying that if someone can and wants to release their code, they shouldn't let "is this even useful to anyone?" stop them. It could turn out to be useful in ways that nobody has foreseen.
> The reason is that I think it can still be useful as training data for, say, designing new programming languages/tools around coding patterns seen in the wild.
Do you happen to have any evidence of that ever being done even once in the history of mankind?
It's a nice idea, but you should know by now the world isn't a slave to your desires.
Here is some evidence of mining source code for common structure and idioms. I didn't even have to dust off my bull whip: http://homepages.inf.ed.ac.uk/csutton/publications/idioms.pd...
I don't have evidence of this approach actually being used to inform language or tool design, but it's not like it's an outrageous concept that nobody's thought of before. Your hostility is perplexing.
Any examples of some successful open source dumps of proprietary projects? By dump I mean they just release the code with no docs or support.
The only one I can think of is id software, but they spent some time to clean it up I think. Also their products are games and already very popular.
No examples come to mind of someone just releasing a bunch of proprietary code without some effort involved. But people sometimes poke fun at the Apache Foundation as a place for companies to wash their hands of old code that they don't want to keep maintaining, and there are some success stories there.
As a part of a startup that just went under (by "just", I mean literally yesterday) they either can't release the code due to fiduciary responsibility to investors (it could be worth something, particularly if someone decides to dump more money in or the company somehow miraculously recovers) or because of contracts into which they have already entered with investors.
Even if they wanted to open source or just share the source, there could be some real effort involved in reviewing the code to ensure that nothing sensitive will be exposed.
Re the other comment about owning copyright on everything (eg. Due to components being licensed from other vendors), they could rip that out. Thinking about this triggered a memory of when Descent 1 source was published, they excluded the sound library for this reason. Could be tough to excise something more integral though I guess.
Often they don't own the copyright for everything they own. Also, wouldn't code be an asset to be sold in the event of liquidation?
Can you legally do this? Not sure how the legal terms around assets are normally structured...
If the code is your sole asset, wouldn't investors get access to it? This is kind of like lighting your office chairs on fire vs returning them to investors.
In this particular case, they have no external investors. They say in their announcement: "...but as a small company without external investment..."
I would be interested in hearing the opinions of people who've been in this situation, too. Seems like it could be beneficial to the community.
If my current company went under, I don't see who would put in the effort to sift through all 3rd party items we use for licenses to see if we're even allowed to do it.
Someone also owns the code, I guess, and they might want to use it for something else at some point.
Some products might have been hack-jobs form the beginning that ballooned, they might work but they could be total hack jobs that would either A) be worthless on their own or B) slightly embarrassing to show.
I've not been in the situation from a company perspective, but I have taken a product I've made off from the shelves and the factors above came into play. I am sure there are more as well.
Was this 3rd party code that you bought or stuff you found on github? In the latter case, basically, you're saying that you didn't know and didn't care about compliance because no one could have found out?
>Was this 3rd party code that you bought or stuff you found on github? In the latter case, basically, you're saying that you didn't know and didn't care about compliance because no one could have found out?
That's a rather uncharitable interpretation.
I think it's more likely that they don't know if they're licensed to publish the code in question.
- Libraries included even before my arrival.
- Libraries licensed for usage but not for open sourcing unless X, Y and Z.
Etc.
I wish I knew about this, I've been learning the python data analysis ecosystem recently and this would be an excellent resource. Maybe visibility is your issue.
https://cloud.sagemath.com is pretty similar in functionality. (I work on SageMathCloud.)
I never hear about these companies until their shutdown announcement.
Perhaps if they spent more time becoming visible and getting people's attention, things would work out.
In our defence, this post has had more than twice the attention of our Show HN post! I think there's more drama in something like a shutdown and so it's more 'viral'. Getting people to take interest in a new product is much harder.
(Also, HN is not really a good target audience for us, so if you get news from here, it makes sense you'd see this but not the product itself).
With thousands and thousands of startups competing for your attention, it's not that easy!
In many markets, there are several well-funded startups that can afford to spend millions on marketing. That makes it even harder to get visibility as a bootstrapper.
I don't think that's fair. We live in an attention economy, where the rare few are good at getting noticed.
I'm curious what the founders / anyone else think went wrong? Especially compared to ShareLaTeX
The short answer is we didn't find product / market fit. It made some people happy, and was useful to some people, but it didn't make people go out and tell everyone they know to start using it. ShareLaTeX on the other hand was growing organically and had people singing it's praises even when it would sometimes randomly lose 30 minutes worth of your latest changes... (yes really! That's very fixed now though don't worry). ShareLaTeX just filled a much deeper need for people. There are so many other Python/R options out there that we never filled a deep need with DataJoy.
The exception to that is in teaching. It did fill a big need there, but we never managed to make the business model work (long high touch sales cycles, but universities only willing to pay very low prices per class). We also never found a growth model for this.
If people are looking for a similar service I have used http://dataquest.io and I really have liked it.
They have a wonderful example library https://www.getdatajoy.com/examples/ that would be too bad to lose if they eventually shut down their site.
Anybody know of any similar alternatives to datajoy? That basically just has an r or python environement online I've been using datajoy, a least a little bit, basically everyday for the past 6 months and I'm sad to see it go.
SageMathCloud (https://cloud.sagemath.com) using a Jupyter notebook with the R kernel, or a Sage worksheet in R mode. (Disclaimer: I work on this.)
SageMathCloud is somewhat similar in functionality to DataJoy + ShareLaTeX, ShareLaTeX is by the same people as DataJoy, and I think ShareLaTeX is not shutting down anytime soon. I had always wondered why they built DataJoy as a separate product, rather than just expanding the functionality of ShareLaTeX. In the case of SageMathCloud, I built something more like DataJoy first, then expanded the functionality to cover LaTeX typesetting, rather than making a separate product. Also, SageMathCloud is 100% open source.
I recently posted https://news.ycombinator.com/item?id=12169979 here which was really about having several separate products using similar technology but different names, versus having one big product.
DataJoy is also similar to http://sagecell.sagemath.org/, which requires no sign in and lets you run Python (and much more) directly from a website, is pretty battle tested at this point, and is entirely free to use and open source. Disclaimer: I pay for some hosting of SageCell.
Kaggle has a free R/Python/Julia/Jupyter environment with lots of code to browse. See:
https://www.kaggle.com/kernels https://www.kaggle.com/datasets
Yeah I was using them for a while but that's more of a tutorial that has to be paid instead of an open environment right?
Ah, that sucks!
This is the first time I've heard of this – had I known a few weeks or months ago, I'd have jumped at the opportunity to learn how to use R.
Homejoy, Datajoy, killjoy?
Maybe the clue is in the name!