32 bit version of KDB+ is now free for commercial use
kx.comCk out Arthur Whitney's abridged manual for fun:
For using the 32-bit version (from Limits):
22 Limits
Each database runs in memory and/or disk map-on-demand -- possibly partitioned. There is no limit on the size of a partitioned database but on 32-bit systems the main memory OLTP portion of a database is limited to about 1GB of raw data, i.e. 1/4 of the address space. The raw data of a main memory 64bit process should be limited to about 1/2 of available RAM.
This has come up before here, and the recent GNU APL stuff reminded me, but in summary, if you have ever been curious about APL or mildly suspicious of more conventional database approaches you owe it to yourself to take a look at the concepts at work here, especially primacy to columns instead of rows.
The & "where" operator in raw k has stayed with me over the years as a particularly inspired way to deal with column based data.
For those of you curious about array-based / columnar programming languages, there's an APL/J/K reddit: http://www.reddit.com/r/apljk
it came up here before, but this time is different. It is now free for commercial use and is not restricted with timeouts or expiry.
Careful with these guys. I once built an open source implementation of the q language, and these guys immediately threatened to sue me, my employer, and our clients. The language is not that interesting, it's easy to reproduce, and these guys will threaten you if you prove this.
> The language is not that interesting
I would say the language is very interesting. It is probably not interesting enough to get sued for, though ....
I suspect times have changed - there are implementations that have been out there for years (https://github.com/kevinlawler/kona implements k3 with sprinkles of k4, and http://althenia.net/kuc implements an almost-k4 with a JIT and writable closures).
IIRC, when you did your implementation it was when k4 was still a "technology preview" and not their main product (or was just released) - I remember understanding the panic in those action, even though I totally disagree with them. (I didn't know about the threats, but I do remember seeing it appear and disappear within a day, and assumed something was happening behind the scenes)
and now they'll sue you for libel ;-)
For the unindoctrinated, KDB+ is an extremely fast, column oriented, in-memory database. It's based on a language called Q and has been used at many banks to store exchange related data.
Syntax is hard to read and easy to make mistakes in considering how it overloads every letter of the alphabet as a command, but the extreme speed pays off I think.
K is only hard if you try to read it without first studying it. Looping is achieved through adverbs. The key to it is understanding what is a noun (data), verb (operator/function) and adverb (takes a verb, creates a new verb to be used infix). A verb with a noun to its right is a dyad if there is also a noun to its left, and is otherwise a monad. If it is needed, the monad can be specified by appending a colon to the right of the symbol. Fortunately, most kdb+ developers program in Q, which has a bunch of helper routines defined in k, and assigns monads to names such as neg x instead of -:x.
It's the same thing I said in the recent J discussion: J (and probably Q) is meant to be read with a help of computer. Reading and writing J consists of incrementally building/decomposing expressions in a REPL. You have wonderful tools to visualize expressions structure in the REPL and you are expected to use them and to experiment with the expressions. You're not supposed to read it as prose, don't even try.
There is also an sql like interface in addition to the Q and K languages. This is probably easier to get started than diving into Q if that is too daunting.
You're incorrect. No individual letters of the alphabet are commands. Every symbol on the keyboard however is an operator (excluding semicolon, braces, brackets, and parens, which operate as line/expression terminators, function definitions, function invocation/array access, and list definitions, respectively).
For anyone trying this out for the first time, Jeff Borror's q for mortals is the best guide out there http://code.kx.com/wiki/JB:QforMortals2/contents
I'm not sure about the Q language, but their C API reads like obfuscated C contest entry: http://kx.com/q/c/c/k.h
If you look closely at it, there's not much there - it's actually easy to understand - defines a variant struct and an bunch of accessors to the different types within the embedded union. He prefers short names, and finally years later, java recommends short variable names for lambdas too!
So, I guess I need to look closer than the pixels then:
typedef struct k0{signed char m,a,t;C u;I r;union{G g;H h;I i;J j;E e;F f;S s;struct k0k;struct{J n;G G0[1];};};}K;
Sorry I guess I'm just not seeing the "not much there and actually easy to understand"
Whatever a 'H' is
Well, it is kind of pointless to look at the K<->C interface without knowing K. If you read the Python.h it would be about as understandable (assume you don't know what a "class" is when studying the Python.h file - because K does a lot of things in ways different enough from most languages).
To elaborate: K uses one letter mnemonic codes for all of it's basic storage types:
(Note how G,H,I,J follow each other?)G = General = 8-bit unsigned int H = sHort = 16-bit signed int I = Integer = 32-bit signed int J = bigger integer = 64-bit signed int
(Again, they are near each other)E = 32-bit floating point "rEal" F = 64-bit Floating point
And that's mostly it; the last unnamed union (with fields "n" and "G0") is for vectors, n being the length and G0 being the data.S = Symbol K = "general list type", the central K language typeThe only other field you are ever going to need is "t" for type (saying whether which union member is actually in use). The rest are internal implementation details, but are also easy to remember: r=reference count; u=flags; m and a have something to do with memory mapping and allocation).
There are a few more basic types: b=boolean, t=time,d=date,p=datetime,u=month - but they are merely different interpretations of the EFGHIJSK members above; to access data from C, all you need is the list given above.
interesting.
My thoughts on this area have changed a bunch, I think when I was young I was a lot more about cleverness and conciseness.
Now that I'm older and I've worked on a large variety of software systems, I am starting to believe that readability of code is one of the most important values. After all, you read the code a lot more than you write it.
I can say definitively that: - i have often regretted using single letter variables (outside of loop 'i') - I have very regretted using non-descriptive names - I have never regretted using longer variable/method names
Now a days in an IDE environment, longer names doesn't even convey a typing penalty. Yeah yeah I know Java, but it's a safe language, and in a world where I want to deliver working, correct, bug free code, safety is more important than single letter expressiveness.
After all, I don't think people hold up APL as good code.
> My thoughts on this area have changed a bunch, I think when I was young I was a lot more about cleverness and conciseness.
I'm almost the other way around. I always valued elegance, which often manifests itself in conciseness (but most conciseness is NOT an example of elegance). Followed by readability, which usually manifests itself in verbosity (though most verbosity is NOT actually readable). And when decision time came, I'd prefer verbose inelegant code to non-verbose concise code.
But then, I spent some time using K. And I realized I need 100-1000 times less lines to achieve the same thing, AND it usually works about as fast (despite an interpreter), AND I have less bugs AND those bugs tend to be of one kind (off-by-one) and easy to find.
e.g., look at this example by Stevan Apter: http://nsl.com/k/t.k - 14 short lines implement an efficient (fast and memory compact) memory database than includes joins, selects, inserts, deletes, aggregates, grouping and sorting.
To the uninitiated, it looks like code golf, but this is actually very readable K if you know K. The definition order:where:{[t;f;o]{x y z x}/[_n;o;t f]} gives two names ("order" and "where") to an idiomatic K expression that takes a table "t", and paired lists "f" (of field names) and "o" (of re-index functions that can be used to filter or sort), and returns the resulting table after reindexing, one by one, applying those functions to their relevant fields. A sorting function would be "desc:>:" (that is, ">:" hereforth also named "desc") which returns a sorting permutation for its argument. A filtering function would be "&3>" (prnounced "where 3 is larger than" and can be written "where 3>" in the Q dialect).
Now, this conciseness does NOT come from golfing. It comes from eschewing the now obligatory object oriented programming, sticking with "down to the metal" data types, and rather than trying to find the minimal base of operations and endless compositions (like most languages do), use a wide base of operations and a precisely chosen set of compositions.
It is true that reading/writing a K line takes 10-50 times as long as reading a C line. But I've been unable to consider modern software engineering anything but ludicrous when a different set of primitives (and experience) can get you the same results for 1/100 or 1/1000 of the size of the executable specification, and everyone puts a SEP field on it.
This convention is used within the type system throughout kdb+. i for 32bit integer, j for 64bit int, h for 16bit int, e for real, f for float etc. Anyone who knows q will automatically recognize these types in the the c-api. They'll also recognize the ref count r, the type t and experienced c programmers will recognize the trailing array idiom.
It definitely is strange to look at, but it's quite easy to understand if you know k/q.
H is a short in k/q, so H refers to a short here as well. This naming scheme is true for all of the above letters.
This is my favourite:
Removes clutter indeed... :)// remove more clutter #define O printf #define R return #define Z static ...Short names make sense if they are easily understandable locally: This means either something extremely common throughout the codebase (I think the most common example being localisation wrappers for string literals. They should ideally be linked to a more clear explanation easily, e.g. from renaming import statements), or defined (clearly) and used only within a very small area of code. This API is neither of those.
Short names also make sense if the same conventions appear over and over throughout the code base. While they can be quite opaque at first, they're very consistent, and people with experience in other APLs will recognize many of them.
Currently using MongoDB for my historical quotes ticks database. Any peeps in trading use KDB+ in production or for fun think it's expressive enough to write queries directly to it for backtesting?
Except financial applications what is it good for ?
This would actually make a terrific replacement for something like redis when you need a more structured schema.
The q language is very powerful, and expressive - interesting mix of lisp and APL. You can do really powerful analytics without writing tons of code for it.
You really have to see how fast KDB is compared to most nosql products out there.
Are there any open source projects or blog posts with examples of this?
They have a reference wiki with some example programs. Have a look at this page: http://code.kx.com/wiki/Cookbook/ProgrammingExamples
Almost 10 years ago, I did an undergrad independent study at NYU contributing to some PhDs' Query by Humming music search engine. We used q to query a kdb full of catchy-melody time series data -- short sequences of "is this pitch higher, lower, or the same as the last?" and "is this note short, long, or medium?" (and, of course, gobs upon tons of variations as we iterated!).
I barely did any q / kdb; only made a functional and usable UI, and did some prototyping of new ideas in other languages (Java, Max/MSP, Csound). I spent some time looking into q and was thoroughly baffled. Still am. It was really, really fast, though!
As I vaguely understand and can explain it, the k/q system made it easy to do fuzzy searches and deal with missing pieces of data. If the user missed a note, or our pitch detection failed, or our source data was bad, we were still able to find matches. (Yes, I wish I'd been able to understand this more at the time. Bygones, now...)
It's great for basic data-analysis tasks, where you just want to slurp in a few CSV files, join them together, filter out some rows, and spit out the results.
Sure, you can do the same in R or python, but the whole process is very quick and easy in q.