InfiniSQL

49 points by ahassan 12 years ago · 60 comments

Reader

A few things (I'm the author of InfiniSQL)

1) I include keystore-like stored procedures in the source. They do get/set with integer key and string val. I haven't done thorough benchmarking, but I expect them to outperform the other benchmark I've published, which is quite a bit more complex workload

2) (camus2) agreed, nothing ever dies in IT. But roll back the clock a few years. How much noSQL would come into exisence if there was a free xzySQL that scaled across nodes, was fast, etc. I believe the answer is that there'd be very few network-based noSQL for operational workloads if that had been the case.

3) jwatte: Yeah! Jagged edges too!

4) stephen24: Also, I intend to change the license from AGPL to GPL next time I push out some code. No excuse not to try it out.

5) siliconc0w: There's an architectural write-up at High Scalability: http://highscalability.com/blog/2013/11/25/how-to-make-an-in... -- I believe that the actor model architecture is distinct in InfiniSQL.

6) diwu1989: Yes and no. Yes, MemSQL is more mature. No,

(a) I'm not sure how MemSQL scales horizontally (especially since that was a feature added after v1 of their code was released), and,

(b) MemSQL isn't free software

7) itsbits: for now InfiniSQL is mainly for hackers and early adopters--the dependencies are pretty clearly documented but it requires some effort to work with in its current state

jsmthrowaway 12 years ago

Please consider the Apache License or some other license instead of the GPL. There are many organizations that cannot use any flavor of GPL, including LGPL, for legal reasons. You can debate the wisdom of that amongst yourselves, but alas, that's how it is in some places.
(And I really want to try this...)
- DannyBee 12 years ago
  
  "There are many organizations that cannot use any flavor of GPL, including LGPL, for legal reasons"
  To be clear, there are no legal reasons I can think of that would ever prevent internal use of LGPL/GPL software.
  You mean these companies (Apple, for example) have policies.
  Policies like this often change because someone decides the cost vs risk tradeoff is worth it.
  Changing a license because of bad policies of certain companies is not a great reason to change a license (in fact, it's, IMHO, an actively bad one).
  You really should only change licenses if you find the license you chose does not suit the needs of your users (and policies are not really needs).
  - philwelch 12 years ago
    
    I find that to be a strangely ideological response. Your prospective users' requirements are up to them to decide, not up to you. They're the ones who are going to decide whether or not to use your software.
    
    DannyBee 12 years ago
    
    ?? Of course they are up to the users to decide, but policies and needs are different. I'm curious, how do you think policies like this change?
    Most of the developers i've seen will happily sell you a commercial license if you don't like the software. After paying for it enough, most companies start to ask "well, actually, how risky is this, really?", and this is how policies change.
    In any case, my other point stands - there are no actual legal reasons to not use LGPL/GPL software internally. It would have zero legal impact.
    
    philwelch 12 years ago
    
    If InfiniSQL was an established incumbent where the choice was between living with GPL and buying a commercial license I would agree with you, but it's a newcomer where the main choice is whether to use it at all.
- mtravis 12 years ago
  
  I assume these shops have Linux in their environments, including the GNU toolchain. There must be some contradiction somewhere that I'm not aware of.
  Based on FSF feedback, I'm going to modify the license to include a Classpath-like exception. The intention is to allow people to write stored procedures that link against infinisql without triggering the copyleft. Only if the source to infinisql itself is modified (and distributed) will the copyleft apply.
  I'm curious to know the rationale against the GPL in general (not just the AGPL), and how those shops allow Linux & gnu toolchains in spite of their rule against the GPL.
  - philwelch 12 years ago
    
    Generally, Linux and the GNU toolchain are carefully managed exceptions and there is massive commercial pressure against continuing to use anything GPL-licensed. Linux itself is strong enough to hold out against this pressure, but other things like GCC are not, which is why there is so much work being invested into LLVM/Clang.
    
    DannyBee 12 years ago
    
    "massive commercial pressure against continuing to use anything GPL-licensed"
    I actually generally see just the opposite - even automakers, who are traditionally stalwarts about anything, are now starting to use GPL software in cars.
    "which is why there is so much work being invested into LLVM/Clang"
    This is a weird opinion, that i've seen a few times.
    This is not why LLVM was/is chosen, AFAIK. LLVM was/is chosen for greater control over destiny, a better platform, and a better community.
    If LLVM was GPL there is exactly one company that would theoretically stop contributing (admittedly, it's been about 2 months since i calculated the list of companies that contributed in the past year). I doubt that would actually happen, too (mainly because I asked once if they would)
    I was just at an LLVM social this evening, and not a single person there worked for a company that chose LLVM because of "massive commercial pressure against the GPL".
    
    philwelch 12 years ago
    
    LLVM may have been a poor example, but I'm not sure that justified downvoting my comment when there are, in fact, lots of companies with more restrictive policies against use of GPL software vs. other licenses. (Not even necessarily contribution, but even use). That GPL is allowed at all is a result of the fact that there are some essential GPL licensed projects with no good alternatives, like Linux. InfiniSQL is not one of them.
    
    stormbrew 12 years ago
    
    Considering the adoption of llvm and clang came with a freezing of the version of gcc used/distributed by Apple to a version before a particularly notable GPL version (among other GNU projects being similarly frozen, like bash) was applied to it, it would take a hell of an alternate explanation to dislodge the notion that Apple's endorsement of the project wasn't significantly related to licensing.
    
    DannyBee 12 years ago
    
    This is an interesting rewrite of history. Chris gave a presentation at Apple and told them he could make a better compiler[1], and they hired him to work on LLVM (plus started to grow his team) well before they decided to stop contributing to GCC. There used to be parallel teams, each with about the same number of people. Hell, for a while, the LLVM team had less.
    Apple was frustrated they couldn't get what they wanted out of GCC, and were getting patches and designs constantly rejected. This, combined with getting control over their destiny, and having serious needs for a modular compiler frontend for XCode (and their design for a compiler server/etc for GCC got shot down), plus Chris demonstrating good performance results + trajectory, led to them choosing LLVM.
    But what do I know - I was there, in both communities, talking to the people who were involved in these decisions.
    Realistically, if the SVP/VP in charge of Apple's developer tools had decided GCC was still the way to go, they would be working on GCC, GPLv3 policy or not.
    Policies are not an end unto themselves.
    All of this is completely orthogonal to the freezing of the GCC version. They could get what they wanted out of it in the pre-GPLv3 versions, given their future plans were LLVM based anyway, so they didn't make an exception for GCC when they banned GPLv3.
    Of course, I'm not going to claim that apple didn't do other things more for licensing reasons, which a lot of can be explained by the desire to be able to share code between OS X and IOS in some places (and eventually, in a lot of places), and GPLv3 would have disastrous effects if they messed up. They calculated the eng cost, came up with "we have good alternatives, and can rewrite the rest", and did that, and banned GPLv3. However, they were making exceptions for years for certain pieces of software already. So if you had chosen any other example than LLVM, i'd probably agree with you. LLVM is just not a great example of "commercial pushback against GPL".
    Apple's dropping of Samba would be a good example, since that is directly the reason they dropped Samba.
    [1] One of my GCC friends walked out of this presentation complaining that he was selling them a bill of goods. Of course, he turned out to be wrong, but ...
    
    stormbrew 12 years ago
    
    I don't really think I've rewritten anything. Is anything factually wrong with my post? GCC's version in OSX' dev tools, along with bash and afaik all gnu binaries, was frozen at the last version available with gpl v2. This is plain fact.
    The conclusion I've drawn from it is that GPLv3 was a significant driver in the decision to seek out and drive forward a non-GPL compiler project. I didn't say it was the only factor, but I stand by my conclusion that it must have been a significant one.
    
    DannyBee 12 years ago
    
    The rewrite is the part where you continue to claim it was driven at all by licensing
    "The conclusion I've drawn from it is that GPLv3 was a significant driver in the decision to seek out and drive forward a non-GPL compiler project. I didn't say it was the only factor, but I stand by my conclusion that it must have been a significant one."
    I believe i've completely rebutted this statement with my response. I believe I accurately explained exactly what went into the decision to fund and use LLVM, and "seeking out and driving a non-GPL compiler project" was literally not on the list of things the decision makers (Ted, in this case) cared about.[1] If you have actual historical evidence to the contrary, that contradicts my explanation of what drove the decision to use LLVM, i'd love to hear it. So far what you've put forth is a single data point which I already explained, was, AFAIK, completely unrelated to the decision to use LLVM.
    Also, Apple/Chris first suggested merging LLVM and GCC (http://gcc.gnu.org/ml/gcc/2005-11/msg00888.html), which would seem an odd strategy if licensing was the huge driver you claim it was.
    Historically, the timeline isn't even close to right for your conclusion to be correct. Apple started seriously investing in LLVM in 2005, and the GCC GPLv3 switch didn't happen until 2009.
    So, basically, you are welcome to stand by your conclusion, but it's, well, wrong :)
    [1] In fact, Ted literally did not care about the licensing at all. They were considering using ICC as well, but this mostly got dropped after the switch to x86.
    
    stormbrew 12 years ago
    
    The timing is a fair point, though I think you ignore one factor: While gcc didn't switch to gplv3 until 2009, gplv3 was released in draft form in 2006.
    So I'll concede that it likely wasn't a direct cause of the move to and support of llvm, clang didn't come along until later -- and after it must have been clear that gcc would eventually be gplv3 licensed.
    God knows GCC's codebase is a rats nest that few people really want to work with, but if it hadn't been for gplv3 do you really maintain that apple wouldn't have stuck with it as a frontend for longer? The early releases of clang (and gcc+llvm before it) were problematic for a lot of mac developers at the time, after all.
    
    DannyBee 12 years ago
    
    "but if it hadn't been for gplv3 do you really maintain that apple wouldn't have stuck with it as a frontend for longer? "
    Yes. Absolutely. It's been mentioned numerous times at conferences and other in-person meetings.
    Apple did not write clang because of GPLv3. They wrote clang because they needed something that was
    1. Faster than GCC 2. Offered better diagnostics 3. Could offer code completion and indexing for XCode.
    
    mtravis 12 years ago
    
    Thank you, Phil. I'm conflicted about this--I was convinced recently to move away from AGPL having to do with what I was previously unaware of as seemingly legitimate acceptance issues. I feel good about using GPL instead of AGPL.
    But I'm conflicted about GPL vs Apache (or BSDish) in the sense that I'm getting the message that I have to bend over backwards just a little bit further before somebody, somewhere might be willing to use my software, maybe. Free isn't enough. I also have to let them fork it, keep it proprietary, wrap their own brand around it, before maybe they might consider using it.
    That said, I really want people to use it, and of course help me hack on it. But I'm conflicted.
    
    tracker1 12 years ago
    
    I say keep it GPL.. AGPL may be too far for many companies.. but GPL should be fine for the core product. As long as any protocols are well document, and client libraries are under more permissive licenses, I don't see an issue with it.
    
    mtravis 12 years ago
    
    Cool. The wire protocol is PostgreSQL's, so they provide the clients (BSD).
    
    philwelch 12 years ago
    
    You can do what you want because it's your software. But from the open source policies I've seen companies use, there are generally three lists of licenses. The first list is "you can use any open source software that follows these licenses". BSD, MIT, Apache, etc. are on this list. The second list is "you have to get approval from Legal to use software with these licenses but we would generally prefer for you not to." GPLv2 is generally on this list. The third list is "don't even think about it", and GPLv3 and AGPL are on this list.
    My impression is that the second list exists solely because there exists GPLv2 licensed software with no viable alternatives to it. Unfortunately, your project is not one of them. It's your project so you can do whatever you want, but GPL is an obstacle to adoption in industry.
    
    DannyBee 12 years ago
    
    I'm very curious where you are getting this info.
    I own open source licensing policy at one very large company (which doesn't really work like you suggest), and am in contact with about 50-100 other open source counsel on a regular basis, and the only software most ban is AGPL (and a few other licenses which aren't talking about here, as they are wildly uncommon).
    Most companies also do not treat GPLv2 and GPLv3 differently from a licensing perspective, only those that ship embedded devices do.
    At least, this is my experience. I'm curious where yours is coming from.
    
    philwelch 12 years ago
    
    I'm a developer at a company that uses a lot of open source software, and I've spoken to other developers at other companies as well. Where I work, it's required to get legal approval to use GPL software. Software under more permissive licenses may be imported and used freely at the discretion of the development team. So in my experience, InfiniSQL faces a much higher barrier to adoption due to GPL.
    Perhaps you work with companies where adopting technology stacks is more of a top-down decision where legal counsel is always involved. In those situations, GPL doesn't pose a particular barrier because all open source software faces that same barrier. But some companies give more autonomy to their developers, and in those cases there's a difference in overhead when managing GPL compliance.
    
    DannyBee 12 years ago
    
    "I'm a developer at a company that uses a lot of open source software, and I've spoken to other developers at other companies as well. Where I work, it's required to get legal approval to use GPL software. Software under more permissive licenses may be imported and used freely at the discretion of the development team. So in my experience, InfiniSQL faces a much higher barrier to adoption due to GPL."
    Interesting. We use about 8000 open source packages, and add roughly 90 a week right now.
    "Perhaps you work with companies where adopting technology stacks is more of a top-down decision where legal counsel is always involved. In those situations, GPL doesn't pose a particular barrier because all open source software faces that same barrier. But some companies give more autonomy to their developers, and in those cases there's a difference in overhead when managing GPL compliance."
    Actually, i work at a company (Google) where autonomy is given. People are free to use basically anything but AGPL. We simply tell them what will be required of them if they use it, and enforce that this happens.
    The overhead of GPL compliance is not any more than the overhead of any other license compliance, for us, in practice.
    You still have to do stuff for BSD and MIT anyway, so you need a process that knows what is going into shipping software.
    The short version is that:
    Overhead is kept low by doing it as part of the same check-in process as any other source code (IE you don't fill out some magical form and send it to lawyers), among other things.
    Shipping time is simple verification that nothing changed (and the build system will verify it anyway).
    My experience is that companies find GPL compliance overhead higher because they aren't doing the right thing for other licenses anyway. In particular, they never produce correct attribution for MIT/BSD/etc, so having to do "anything at all" is higher overhead.
    This experience comes from reviewing a large number of companies for acquisition :)
    
    philwelch 12 years ago
    
    Google has a lot more top-down technical mandates than some companies (programming languages for instance), and I'm not surprised they're more GPL-friendly either.
    I'm willing to concede that GPL is not a disadvantage to adoption by Google or any of the hundreds of startups Google might aquihire in a given quarter :)
    
    belorn 12 years ago
    
    It always funny when people include the Apache license like that, given that one of the two significant changes made between GPLv2 and GPLv3 was copying the apache license text into the GPL regarding patents. Sure, they did add a clause about patent agreement, but that is only relevant if you got patent agreements.
    The second change from GPLv2 to GPLv3 is the DRM clause, or the "you can't use a technical method to bypass the legal requirements". Again, only really relevant if the company uses DRM, but would be willing to use GPLv2. That is a very short list.
- zobzu 12 years ago
  
  So many reasons to keep GPL. They can use GPL just fine, it's just that they don't wanna contribute if they modify it.
  - justin66 12 years ago
    
    > So many reasons to keep GPL. They can use GPL just fine, it's just that they don't wanna contribute if they modify it.
    More charitably, they don't want to be _legally obligated_ to contribute if they modify it.
tintor 12 years ago

Regarding MemSQL: - we have just released v2.5 with full support for JSON and online ALTER TABLE across cluster - MemSQL performs great on both OLAP and OLTP - it scales well: we have several hundred node cluster in production at Zynga - license cost for startups is $1
- mtravis 12 years ago
  
  Congratulations!
  Do you have benchmark reports?

jacob019 12 years ago

I'm supposed to use the perl api for user and schema management? Perl holds a special place in my heart, but I'm not too excited about managing my database with it. How about an interactive console?

I'm currently using MySQL, how similar is the SQL syntax?

mtravis 12 years ago

On backlog to fix. But InfiniSQL is for hackers and early adopters at this stage.
The SQL support is documented (http://www.infinisql.org/docs/index/)
- coolsunglasses 12 years ago
  
  Hackers and early adopters are using Perl in 2013? Sure you aren't off by 12-15 years?
  - mtravis 12 years ago
    
    This you? http://favstar.fm/users/hipsterhacker
    Also, the main application is in C++. A python script launches the C++ daemons. Perl scripts are quick and dirty tests and deployment scripts. The main hacking I'm looking for is with C++, and I don't care so much if the other stuff gets re-implemented in some other language.
    
    coolsunglasses 12 years ago
    
    Nope, just a guy that fucks with databases.
    No API, got it.
- jacob019 12 years ago
  
  Awesome project and a killer concept. No one has been able to really solve relational database scalability yet. I'll have to study the implementation. I was just talking with some friends a few weeks ago about this problem and we concluded that if someone came up with a distributed relational database with decent scalable performance they would be very successful indeed. Will try it out and follow the progress. Hope it takes off.
  - diwu1989 12 years ago
    
    Have you tried Vertica? One of the big data project my team did used more than 200 servers in a single Vertica cluster. At the enterprise OEM level, the pricing is actually really affordable. You should try out Vertica Community Edition, the free 3 node version.
    
    mtravis 12 years ago
    
    Vertica's a data warehouse. InfiniSQL is geared for OLTP. --------- Thanks, jacob019. Please fork/follow on github, twitter if you're into that, etc.
- arnorhs 12 years ago
  
  Did you mean to link to http://www.infinisql.org/docs/index ? I was getting an error on /docs/
  - mtravis 12 years ago
    
    Thanks, edited.

camus2 12 years ago

I believe the original subtitle is "Extreme Scale Transaction Processing" . "The NoSQL killer" is kind of childish, nothing is going to kill anything.

yeukhon 12 years ago

Same thought and it being at an early stage, ugh. And there goes at least a dozen of competitors out there trying to be different than MongoDB. I am just sort of happy that in the SQL world we usually either look at MySQL or PostgreSQL (well, Oracle and SQL servers are probably more relevant to corporate web service)... but I think people are trying to migrate too.
- tracker1 12 years ago
  
  I think that even in a NoSQL driven domain, that a classic SQL based RDBMS has a place. It's that certain types of load have acceptable levels of relaxed constraints.. that can increase when your data is searched/read over 1000 times for every write. That joins are expensive and even mirroring data to a nosql store has benefits over purely rdbms.
  I like document stores like MongoDB and RethinkDB and feel they are a great fit for most scenarios. I also feel that caching layers with Redis or Memcached can help...
  Cassandra is interesting in the primary storage space as well, and imho has resolved a lot of issues, while others remain. I'm interested to see if this database can get there faster than Cassandra/CQL can get to more parity with traditional SQL systems.
  While I appreciate the options, there is no one solution for everything... If you never break 100 simultaneous users, memory-mapped flat files and map/reduce could be sufficient.
ashah 12 years ago

sensationalism sells, probably why your "original" link was missed by poster

wimpycofounder 12 years ago

So...uh...how does it work? Anyone know if there is an architecture overview somewhere? And why there isn't a link to it on the damn front page?

jfim 12 years ago

From their documentation:
> InfiniSQL currently is an in memory database. This means that all records are stored in system memory, and not written to disk. This provides very high performance--but it also means that InfiniSQL currently lacks the property of Durability. If the power goes out, all data is gone. This limitation is temporary.
They do mention that they'll implement persistence, but that's likely to lower performance, as you're limited to how fast the write ahead log can be written, even if updates to on-disk structures are batched.
They also mention:
> No sharding is necessary with InfiniSQL: it partitions data automatically across available hardware. Connect to any node, and all of the data is accessible.
I haven't looked at how joins are done across large tables that span over multiple nodes (or if it's even supported), but that's not likely to be fast either, for obvious reasons.
- mtravis 12 years ago
  
  1) persistence: battery-backed UPS and synchronous replication. No WAL anywhere. I'm thinking about ways to do disk-based storage without synchronous IO, to provide decent performance with higher storage capacity
  2) no joins supported yet. However, the benchmark that I performed (on the blog) involves 3 updates across random nodes. I designed InfiniSQL specifically to perform multi-node transactions very well, because that's the Achilles' heel of every other distributed OLTP system. I plan to implement joins, but expect them to perform decently for the workload you describe.
  - jfim 12 years ago
    
    Gotcha, it's for OLTP, don't know how I missed that.
    Should be quite easy to do equijoins especially if you're joining a couple thousand rows at most at a time; it only gets hairier when you're joining all records of very large tables that don't necessarily fit in memory, which is not very OLTP-y.
    With regards to persistence, I'm really curious to hear how you're planning to have durability without writing something to disk on every transaction. It could work if you're relaxing the definition of durable to mean written to memory on at least $n$ nodes, though that's likely to be surprising to someone with a stricter definition of durable.
    Edit: By the way, it's really cool that you have a C++ implementation of actors, I'll have to look into it. Have you thought about turning that into a library?
    
    mtravis 12 years ago
    
    For durability, check out http://www.infinisql.org/docs/overview/#idp37053600
    I've thought about having an actor library, or minimally, to have the actor basis of InfiniSQL independent of specific workload, but haven't thought it through entirely. I'd be supportive of any efforts to that effect if you want to work on it!
sb057 12 years ago

Front page > Documentation > Overview
It practically is on the front page.

jbellis 12 years ago

Last week's discussion here: https://news.ycombinator.com/item?id=6795263

siliconc0w 12 years ago

Can you compare InfiniSQL to existing in-memory clustered relational database solutions like Galera?

diwu1989 12 years ago

I see this as fairly similar to memSQL, but less mature.

diger44 12 years ago

I actually thought this was another joke at first...

stephen 12 years ago

"Not just a teaser version". Nice!

glibgil 12 years ago

It uses 2pc so it won't really scale.

mtravis 12 years ago

I think you mean 2PL.
It does really scale, check out the benchmark report on the blog. http://www.infinisql.org/blog/2013/1112/benchmarking-infinis...
For deadlock-prone workloads, it will likely not be as good, admittedly.
I'm considering a variation on MVCC that gets around the single transactionid bottleneck, but the currently implementation is based on 2PL. http://www.infinisql.org/docs/overview/#ftn.idp37098256
For concurrency management algorithms, there are no good ones. Only those that are less bad than others in some cases.
- MichaelGG 12 years ago
  
  Have you given any more thought to ... not multithreading it? Since you're scaling across servers, apply the same concept across cores. Presto, no more bottleneck on atomically incrementing an ID.
  - mtravis 12 years ago
    
    Good thinking, but I think that shifts the issue--namely, that each inter-thread message uses atomic compare and swap to create the message. I assume there'd be a similar bottleneck on the actor that generates the transactionid limited by the number of messages it can send & receive.
    Instead, a friend and I have been thinking about how to perhaps modify MVCC to work with distinct transactionid's per partition. Namely, I'm already generating what I call "subtransactionid"'s for each partition involved in a transaction. And those must be ordered for synchronous replication, so I think the way to implement a variation on MVCC may already be mostly there.
    I know I still owe you an architectural doc...fixin' ta, ya know.

itsbits 12 years ago

so many dependencies to install...

jwatte 12 years ago

Oooh! Shiny!

Settings

InfiniSQL

Keyboard Shortcuts