Impending kOS
archive.vector.org.uk>Kdb+ has sharp elbows.
No shit. I used to work as a quant, and while I was an okay quant and mediocre trader at best, I survived for three years in the industry because of my kdb+ proficiency: the firm I was at spent a couple of million dollars on kdb+ only to find out that most people could not wrap their heads around kdb+ let alone debug it effectively.
My (former) colleagues were definitely smart people. In many ways, they were way smarter than myself. But I somehow could get a much better handle of kdb+'s idiosyncrasies, and my ability to stare at dense k/q code (usually no more than a dozen lines) and figure out what's wrong with it earned me the reputation as the "q guy" - and some level of job security.
The firm eventually phased out kdb+ completely after my boss and I left (the two proponents of kdb+).
I've worked in many shops that use kdb+ and the ones who really benefit are the ones who bothered to get some training on it rather than those who just assume they'll wing it somehow. Kx themselves have been running great intro workshops for a couple of years now. Some guys at the next desk attended one and came back buzzing with excitement at how they now saw through the noise. So the take away is - if you didn't bother to learn it, stop complaining that you don't understand it. It's not difficult to get when it is explained well, but you're not gonna get it just by staring at it.
>I've worked in many shops that use kdb+ and the ones who really benefit are the ones who bothered to get some training on it rather than those who just assume they'll wing it somehow.
Yea, I know all about the training and First Derivative. My employer also hired them.
In their defense, every First Derivative KDB+ consultant that I worked with was very sharp and an excellent teacher. They really knew their stuff, and First Derivative is no small part of what has made KDB+ so successful. However, even with their excellent pedagogy, most of my co-workers were totally lost/weren't willing to apply themselves to learn q/kdb+ well.
Here is another way to think about it: many people can't ever get their heads around certain conceptually difficult topics, say, measure theory or quantum physics. I don't think kdb+ is nearly as hard, but it seemed that way looking at my peers who were no slowpokes.
I initially read that as:
> ... came back buzzing with excrement
Why were you a proponent of kdb+?
For several reasons, some more legitimate than others.
1. kdb+ was (and maybe is) a good solution to the problem that we had: doing complex data manipulation/simple statistical calculations against billions of rows of time series data. Hadoop is the term du jour for data processing, but truth of the matter is that finance doesn't have really huge data. At best, it's a couple of terabytes, and most of the time, you are working with a small subset of it. Running KDB+ on a beefy server or two would usually do the job (rather well).
2. Maybe because I studied math, but I find k/q's vectorial/functional sematics appealing. I think the syntax is horrible, but the semantics is very neat.
3. Finally, because it helped me keep my job. It was rather amazing to me that all these Ph.D. statisticians that I worked with couldn't bring themselves to learn kdb+ effectively. Apparently this stuff can be very hard for even the smartest people (or maybe they thought it was such a niche skill with a low ROI).
It almost sounds like kdb needs an alternative syntax that is more human readable.
It has one: q. But once you get over the syntax, you realize that you also need to grok different semantics that you are used to.
Some q is readable english - e.g., an expression like
is (to the uninitiated) more readable than the equivalent ksum price where size>3
but that only works for simple stuff. The (idiomatic!) computation of maximum-subarray-sum[0]+/price@&size>3
becomes|/0(0|+)\
which is not more readable. And you can drive the point ad absurdum by making it even more verbose:max over 0 (0 max +) scan
The syntax seems like it is what stops you from understanding it because it is the first thing you meet. But it's the semantics that you need to grok, and the syntax just matches them.max over zero (zero max plus) scanI would find something like
over(max, scan(0, max(+, 0))
slightly more readable, even if that still gives me a higher-order-headache.
Because, y'know, it's confusing to have symbol salad with a weird mixture of infix and postfix operators, even more so if you have type raising (or the moral equivalent thereof) thrown in for good measure.
The q in your latter example is the same complexity in my book as a dense Python list comprehension. Could be worse.
The equivalent python is:
assuming a definition "scan" (which is like "reduce", except it gives you all intermediate values), an example of which is:mss = lambda x: max(scan(lambda a,b: max(0,a+b),0,x))
Note that the K is idiomatic whereas the python is (arguably) not. Of course, it could be worse; the advantage is that, much like math, the 9-char K version is pattern-matched by your eyes once you are familiar with it, whereas no other version presented here (or in almost any other language) can utilize that feature of your brain.def scan(f,x0,x): r = [x0] for x1 in x: x0 = f(x0, x1) r.append(x0) return rOh goodness... don't do scan like that. Way easier (and more efficient) to use a generator:
Even with python2, you could make a non-recursive version that'd be still shorter than your scan and faster. Alternatively, you could pair a coroutine with that reduce function....def scan(f, iterator, initial=0): yield initial yield from scan(f, iterator, f(initial, next(iterator)))But always be suspect of your code if you are iterating and appending to a list. Likely there is a much better way.
First, it is not equivalent - next() cannot apply to range() output, for example - you will need to do some iter() games and watch out for iteration order side effects if your values are iterators vs. lists.
Second, it is ~10% faster, but that speed difference disappears completely if you eliminate the namespace lookup (that is, add e.g. "o = r.append" before the loop, and call o() instead of r.append() inside the loop). It potentially uses less memory - but not the way you did it (unless Python 3 gained TCO when I wasn't looking. Did it?) - your formulation does not load the call stack, but it does create len(iterator) generators that - until the innermost StopIteration - all need to live somewhere on the heap. recursive solutions without TCO are rarely good enough to replace iteration.
Even if you did it right, it's more efficient, but not significantly so timewise, and slightly easier to use iterators in general, yes. It is mostly space-efficient in general.
I think it is more idiomatic, though - and also Python2 compatible - to just replace references to 'r' with yield in my code, than using the recursive definition you gave above - which is more idiomatic in functional languages, but less in Python (and harder to debug in any language than the iterative version)
> First, it is not equivalent - next() cannot apply to range() output, for example - you will need to do some iter() games and watch out for iteration order side effects if your values are iterators vs. lists.
It uses generator/iteration semantics instead of list semantics. If you wrap the whole thing with a decorator like function that does:
You get the exact same semantics. For most cases (including the one you cited), the alternate semantics are actually better, more flexible, and avoid requiring a list to be built in the first place.def scan_wrapper(f, x0, x): return list(scan(f, iter(x), x0)No problem with iteration order side effects either unless your f() somehow invalidates your iterable... and you still have some potential exposure there in your original implementation.
> Second, it is ~10% faster... > It potentially uses less memory...
Yeah, I think you are understating it to say the least. Not only are you using less memory, but you are saving having to rejuggle/resize the list all the time.
> (unless Python 3 gained TCO when I wasn't looking. Did it?)
I guess in a way it sort of did for the case of yield from: 'The iterator is run to exhaustion, during which time it yields and receives values directly to or from the caller of the generator containing the yield from expression (the "delegating generator").'
So, even without full on TCO (which is still possible... I'm not sure if they did it with yield from), you at least have direct pass through from the generator to the caller. Because of iterator semantics, that should mean that each of the generators gets created on an as needed basis and the previous generator should get destroyed right thereafter. It is possible though that it isn't quite doing it right, in which case I'll concede that I'm still allocating an N deep generator stack, but that is still likely to be more memory efficient because it isn't having to reallocate/resize/copy increasingly larger lists throughout the execution.
> recursive solutions without TCO are rarely good enough to replace iteration.
As I mentioned, you can do the recursive solution as well, and it has the advantage of working with old Python. Still simpler and still far more efficient (here it is with extra wrapping to keep the semantics the same):
> It is mostly space-efficient in general.def scan(f, x0, x): def scan_helper(): yield x0 for x1 in x: x0 = f(x0, x) yield x0 return list(scan_helper())You say that like when doing statistical analysis space-efficiency isn't a concern...
> I think it is more idiomatic, though - and also Python2 compatible - to just replace references to 'r' with yield in my code, than using the recursive definition you gave above - which is more idiomatic in functional languages, but less in Python (and harder to debug in any language than the iterative version)
I was actually mostly getting at using yield instead of list append. I was just trying to express it as tersely as possible, which unsurprisingly became Python 3 and a functional style mechanism.
While I agree that often there is a struggle to understand functional programming, I think in this case it is very idiomatic Python (particularly since they defined "yield from" specifically for cases like this), and the code is very simple, readable, and easier to verify for correctness.
> Second, it is ~10% faster... > It potentially uses less memory...
> Yeah, I think you are understating it to say the least.
I actually measured it. It was 10% faster with a call to 'r.append', and within 0.1% with the append lookup hoisted out of the loop, on 1000 external iterations over 50,000 list items, minimum of 3, inconsistent which version was faster. my scanned function was def f(x,y): return max(0,x+y)
> You get the exact same semantics. For most cases (including the one you cited), the alternate semantics are actually better, more flexible, and avoid requiring a list to be built in the first place.
range() on Python3 is not a list, and the original works on it and yet you needed scan_wrapper(). I didn't try to fix it - I just tried to use your version and stumbled on the differences.
> but that is still likely to be more memory efficient because it isn't having to reallocate/resize/copy increasingly larger lists throughout the execution.
Instead you allocate a generator state each time. Surprisingly, it doesn't make a difference in practice (I've measured) - the python lists grow exponentially, so we have e.g. 20 allocations+copies instead of 1,000,000 generator state allocations. I would have expected the list to be faster - but there's no measurable difference.
> As I mentioned, you can do the recursive solution as well, and it has the advantage of working with old Python. Still simpler and still far more efficient (here it is with extra wrapping to keep the semantics the same):
Recursive takes about 10 times more memory. Seriously - no TCO means call stacks are not free. Instead of one object which I stored, you store about 10 in each call frame!
But yes, your latest version is the idiomatic preferred version style, IMO: it's also the most efficient. (And unlike the earlier python3 functional, it doesn't need a helper to fix the INPUT)
> You say that like when doing statistical analysis space-efficiency isn't a concern...
Of course it's a concern. I'm not saying it is not useful, I'm saying it is overrated, especially when compared to recursive solutions, which -- in python -- cost about 10 times more in stack space then you save in list space.
> I was just trying to express it as tersely as possible, which unsurprisingly became Python 3 and a functional style mechanism.
... and taking much more memory in the process, though believing that you are using less. Not blaming you - python overlooks these things. One of K's nice features is that the costs are mostly evident on a cursory look.
> I think in this case it is very idiomatic Python (particularly since they defined "yield from" specifically for cases like this), and the code is very simple, readable, and easier to verify for correctness.
The recursive functional version is speedwise comparable, and spacewise much less efficient than my (admittedly, stupid) version. It is easier to prove formally, but harder to debug with a debugger because of the generator stack (which is a call stack, even though it does not exhaust the C call stack like regular calls do).
Can we agree that your latest scan(), if we dropped the two lines that have "scan_helper" in them, is the most efficient, most idiomatic, most (py2 and py3) compatible and clearest?
> Can we agree that your latest scan(), if we dropped the two lines that have "scan_helper" in them, is the most efficient, most idiomatic, most (py2 and py3) compatible and clearest?
Yeah. Actually, I just generally liked your feedback here.
The last line of TFA reads like the beginning of some sort of movie in the drama/thriller category. "kOS is coming. Nothing will be the same afterwards."
That's what irritated me most.
What I'd like to understand is - what led the author to this particular conclusion? Is it the fact that this language is super expressive and concise? Is it that it routinely [1] outperforms its C counterparts even if it ultimately translates to C? Is the Z graphical interface so superior that it'll blow the pants off Cocoa and Quartz and X.org or Wayland or what have you? Why would one rewrite emacs or vim on it? I don't want some basic 4 line text editor - I would like to be productive. Why would Mozilla spend energy porting firefox to it? Or Google, chrome? Or bash?
Simply talking about the history of K/kdb+ and how brilliant its creator is simply doesn't help the reader understand why they should be excited about it. If that was the intention of this article, then the real points to make should've started after that line.
That would've been much more interesting.
[1] - No pun intended, of course
What you're actually seeing is testimony: people saying they are seeing something amazing, and they aren't very good at explaining what they saw.
Btw: k doesn't translate to C. It's actually a quite simple interpreter. The fact that it outperforms other languages so easily should be saying more about those languages than it should be saying anything about k.
Well, does there exist a technical analysis written by someone who understands k that explains why it is so much faster than X languages?
As I said, I think this is the wrong question.
The real question is why is X language so slow?
This is not intended to be glib: but I do not think I can put it more simply than that. X language is slow because it uses lots of library code that it doesn't actually use (to get a friendly interface), it has a lot of redundancy (because of the wrong abstraction level), and because it wastes memory (in order to have an API that links well with others). Or it's slow because it thinks B-Trees are really cool. Or because it has the wrong intrinsics. I don't know.
But I am convinced this is the discussion we need to be having.
It's not the wrong question. You just rephrased it. And in the process of doing so, you missed the point of my question. I emphasized the word "technical" because I was trying to politely ask for evidence. Evidence should be some combination of code, benchmarks and analysis.
The code should provide isomorphic samples from the languages (or implementations of languages) that are being tested. Ideally, the code samples should be idiomatic.
Benchmarks should test the aforementioned code such that performance comparisons can be made. (With some degree of accuracy.)
Analysis should summarize and draw tempered conclusions from the code and benchmarks. Trade offs should be documented.
The Computer Language Benchmarks Game[1] is a first approximation of this. It provides the first two things but has no analysis in prose.
Does such a thing for k exist? Even if it's a very rough blog post somewhere from 5 years ago?
I ask this because there's a lot of lofty claims being made in this thread, and the closest thing to evidence I've seen is, "trust us, it's made a lot of money doing [financial things]. oh, and it can update a million records in a second." Since I don't believe that a free lunch can exist, I am naturally inclined to reach the truth of the matter. I see that k is supposed to be wicked fast---at what cost? Where are the examples? Where is the analysis documenting this?
Ok... Now how do I run those programs such that they are comparable to the ones on the benchmark game?
Make a request at kparc.com for the research version of k (k5).
If that doesn't work download q/kdb+ from kx.com (Windows/Linux/Mac OS X). It can runk4.
/ k4 implementations of the Bell Labs benchmarks are available http://kx.com/a/k/examples/bell.k
So would k be even faster if it were compiled to machine code rather than being interpreted? Would that involve an unacceptable speed versus space trade-off? Or am I missing some crucial reason why k code has to be interpreted?
According to wikipedia's uncited 'performance characteristics' section of their page on K (programming language):
> The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor.
Sounds like the overhead's acceptable?
.
Edit: the following page (2002) says
> Even though K is an interpreted language, the source code is somewhat compiled internally, but not into abstract machine code like Java or Python. The interpreter can be invoked interactively or non-interactively, and you can also compile a source file down to a binary file for product distribution, if you wish.
Thanks for the link to kuro5hin btw - it's a nice intro to k but also I've not been on k5 for years, I had no idea it was still going :)
It would be even faster, with a sufficiently smart compiler (a feasible one, not they mythical beast[0]). E.g., the function
which sums pow(a,4) for a from 1 to x (its argument), when interpreted, allocates a vector of length x, initializes it to the numbers 0..(x-1), then allocates another vector of length(x), to which it copies the first vector while adding 1 to each element, and calls it a. Then makes another vector of length(x), puts a * a in respective elements, etc. - and finally, sums the elements of the remaining vector.sumsquares:{+/a*a*a*a:1+!x}If we can learn from numpy (K equiv) -> numexpr (same work, smarter about allocations) -> numba (jit compiler, avoids allocations), we can expect x2 faster speed in both "upgrades" - about x4 overall.
If that sounds too little speedup for so much effort, you are not giving modern caches enough credit. Almost every single memory access here is in L1, as they are all sequential and perfectly predictable. So it's just 2-5 slower than doing the perfect register allocation with no memory access - but with the cost of ballooning code size (going out of L1 itself ...). Also, interpretation overhead is nearly nonexistent because operation on vectors are coded in tight cache aware C loops.
Overall in the interpreter/compiler tradeoff, K comes out far enough ahead of everything else that it did not make sense for them to invest in a sufficiently smart compiler. In fact, it is only a few months ago that they bothered using SSE/AVX vector instructions for significant speedup. It is so far ahead of the curve (for its use cases) that doing that didn't really give any marketing advantage until recently (and I'm not sure it does now).
I don't think so. Many of K's primitives operate on arrays using stride 1 access. Being cache-friendly accounts for much of the performance gains.
In addition, K is written in ANSI C to maintain portability.
The same story told again -- it's aim is to generate this magical atmosphere around a fast db engine and language that's deliberately obfuscated to make people who work in it feel smart, so they try to spread that it's the best. KDB/Q is a nice tool, not the holy grail how they put it. And it's not fast because they know something better - it's because it lacks almost any safety measure.
I totally agree. That's why I wrote a parser for the language to help me and my coworkers debug =) https://github.com/kiyoto/ungod
As I commented above, the syntax is horrible. But I still think it is an effective tool for certain problems.
I'm very interested in hearing more. What do you mean by lacking safety? As in the DB invariants aren't? Or lack of durability? Or as in, easy-to-misuse query language?
And to help calibrate, what are your preferred languages and styles?
Here's the text editor they're talking about: http://www.kparc.com/edit.k
The code is, well, not the easiest to understand.
Here is the four line version: http://www.kparc.com/$/edit.k (lines beginning with / are comments)
Since I don't know how to read K, and it's not really very friendly to amateurs who're just scanning a program without a reference open... is this equivalent to a bare C program implementing an editor with no libraries, or is it relying on stuff the system provides?
I see cz cx and cv in the four-line version, which seem like they are referring to Ctrl-Z, Ctrl-X, etc. If that is the case, then there is a lot of behind the scenes support stuff to get from those four lines of code to a window on the screen or a buffer in a terminal.
I think the first line imports view.k: http://www.kparc.com/$/view.k
Obviously bug free! Regarding intelligibility / comprehensibility: it's somewhere between brainfuck and Perl's worst examples of line noise
Could someone who can parse (reverse engineer?) this write up an explanation? (if you exist)
I'll give this a shot. I'll try to explain what's in my mind as I read it as well.
First, get out the reference manual: http://kparc.com/k.txt and we'll do the first couple lines.
The sequence that goes f x applies x to f. this f is unary.
The sequence that goes x f y applies x and y to f. this f is binary (and just labelled verb).
Some things (adverbs) go f a x and apply f in some special way to x.
Last hint: You read this code from left to right (like english). Do not scan it.
Now let's dive in:
This says: c is a view of where a is a newline. That is, if "a" contains a file, then c is the offsets of each newline.c::a$"\n"A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values. This is similar to how a cell in Excel can refer to other cells and reflect whatever the value of those cells happens to be.
This says: b is a view of zero join one plus c. That is, where "c" contains the offsets of each newline, 1+c would contain the beginning of each new line. We joint zero to the beginning because of course, the beginning of the file is also the beginning of a line.b::0,1+c
This says: d is the view of the (count of c) joined with the max reduce, of each pairs difference of b. That sounds like a lot, but "each pairs difference of b" (where b is the position of all the new lines) is also the length of each line, and the max reduce of that is the longest line. You might say that "d" is the dimensions of the file.d::(#c),|/-':b
This says: i is a view of x (?) joined with j (?) minus b (the offset of the beginning of each line) applied to each x which is defined as the bin of j in b.i::x,j-b x:b'jj hasn't been defined yet, but you can see that x is defined later in the definition, and used to the left. This is because K (like APL) executes from right to left just like other programming languages. you write a(b(c(d()))) in C and you're executing the rightmost code first (d) then applying that leftwards. We can do this with anything including definitions.
The other string thing is that we know that b is the offset of the beginning of each line, and yet we're applying something to it. This is because k does not distinguish between function application and array indexing. Think of all the times you write in JavaScript x.map(function(a) {return a[whatever] }) when you'd really just like to write x[whatever] -- k let's you do this. It's a very powerful concept.
On that subject of binning: b'j is going to find the index of the value of b that is smaller or equal to j. Since we remember that b is the offset of the beginning of each line, then if j is an offset in the file, then this will tell us which line it is on(!)
But we don't understand what j is yet; it's the next definition:
This says: j is a view of the last (first reverse) of k. We don't know what k is yet.j::*|k
This says: f is defined as an empty string.f:""
This says: g is a view of the offsets of f (an empty string) in a. Initially this will be null, but other code will assign to f later making g a value.g::a$fNext line.
This is very straightforward; & is and, and | is or. While excel doesn't let us use a cell in it's own definition, k does: It means the old value of s. So this is literally: the new view of s is i and the old view of s or i minus w (?) minus 2.s::i&s|i-w-2We don't know what w is yet.
This is a function (lambda). x is the first argument. If we called S[5] it would be the same as setting s to zero or 5 and d (dimensions) minus w. Double-colon changes meaning here; it no longer means view, but set-global.S:{s::0|x&d-w}
This requires some knowledge of g/z: http://kparc.com/z.txtpx:{S s+w*x}Note that px is defining a callback for when the pageup/pagedown routines are called. x will be 1 if we page down and -1 if we page up. It may now become clear that S is a setter of s that checks it somehow. When we understand what w is (later in the program) it will be absolutely clear, but pageup/pagedown are changing s by negative w when pageup, and positive w when pagedown.
Consulting the g/z documentation, we can see this has to do with wheels. Note we modify s again relative to 4 times x; x is -1 when wheelup and 1 when wheeldown. It becomes clear that s is modified by the pageup/pagedown and the mouse wheel.wx:{S s+4*x}
Again: in the documentation, 9' stashes something in the clipboard. cc is a function that takes the first slice of offsets k (we still don't understand) in a (the file). The g/z documentation says that cc is the callback for control-C. This is expected as control-C traditionally copies things to the clipboard. Since the slice of offsets k in a are being saved in the clipboard, we may guess at this point that k will contain the current selection.cc:{9'*k_a}This process is time consuming, but it is to be expected: Learning english took a while at first, and often required consulting various dictionaries. Eventually you got better at it and could read somewhat quickly.
I don't know if you want to try the next few lines yourself to see if you can get a feel for it, or if you want to try one of the less dense examples:
* http://www.kparc.com/$/view.k * http://www.kparc.com/$/edit.k
... or if you want me to keep going like this, or if you want to ask a few questions. What are your thoughts?
I have one question. I learned some J some time ago, but never really talked about it with anyone, and so my programs - a few lines' scripts, really - were always written with long, meaningful variable names. I read your explanation and every time you wrote "we don't know what it is yet" I wondered "why the heck isn't it just appropriately named?". I mean, why is 'c' better than something like 'nl_pos' for example? I get it that reading J, K or APL programs requires some serious work and I'm ok with that, but why would I need to burden my short term memory with one- or two-letters identifiers on top of that?
This is a honest question and I feel like there is some upside to those names I just keep missing. As I said, I'm not fluent in J, but while learning it I wrote and read quite a bit of it, and I only made it through some longer (like, longer than half a line!) examples thanks to a sheet of paper and sheer determination. I often was going through a fairly complicated expression and was starting to see what is it about, only to be stopped by an 'x' or 'c': I then had to go back a couple of lines, read 'x' definition again, and retry parsing that line from the beginning, hoping that I will remember what 'x' is this time. I started taking notes for this reason (it worked quite well I think).
Anyway, you seem to have no problems reading such code, so I figured I'd ask you: why and what is this style of naming good for, and what one needs to do to master it?
j is abstract. It seems like it should be easier if it is concrete, but I find that this tends to lead to assumptions and encourages scanning instead of reading.
When I sit down to a program, I have an idea about what I want to do. I don't have home/end keys on my keyboard and I note that pressing Fn-Left and Fn-Right is annoying because I don't usually push them (I use emacs sometimes). So when I sit down to edit I want to add emacs keys.
I can see at http://kparc.com/z.txt that hx handles home/end, and that cX is control-X, so I think it should be something like:
Now I look at the characters spent: Is it possible I could say this simpler? I don't think so, but I'd love someone to tell me.ca:{hx 0 -1};ce:{hx 0 1}How about delete? I think what I want to do is select the next character and remove it. So where is the cursor? Well I remembered that hx moves the cursor, so I read it's definition:
Okay, so I remember d is "dimensions" and "i" is something, and L I haven't seen yet. So I go look at L:hx:{L i+d*x}
Now I have two new words: K and B to look up:L:{K@B/0|x&d-1}
Okay, B is straightforward: b is indexed by line, so b x returns the offset of line x. We add this to y (which is the second value of |x&d-1) and take the min of it and the offset for c. I might think at this point B converts the x/y coordinate into a linear offset. But what about K?B:{c[x]&y+b x} K:{J(;|\(*k),)[H]x}K is using J and H (but it turns out we only need J):
And here we see the definition of k: x&-1+#a is the smallest of x and the second to last character in the file (a), and the max of that and zero (0|). This reads like "bounds check x to the shape of a". Taking 2 from that simply takes two values and assigns them to k so if x is only one-value k will still have two.J:{k::2#0|x&-1+#a}k is the cursor selection.
From here, I can make an attempt at implementing delete. My first attempt looked something like:
Oh! But I've seen a lot of these terms before! Maybe I can do better. I note that kx is documented as the callback for keystrokes http://kparc.com/z.txt and can come up with:cd{a::((0,k)_a),(((k+1),(#a)-(k+1)))_a}
This seems about as simple as I can make it. I'd like to see someone do better, but knowing that K is a setter for the cursor selection and j is the offset of the cursor, then j+!2 simply returns offset and offset+1; K will select it and kx"" will remove it.cd:{K j+!2;kx""}Now maybe if k were called cursorSelection I might have gotten my first cd faster, but I would've missed the opportunity to see how these functions and variables were interconnected and I might not have written the second one.
I noticed however, that not scrolling really helped this exercise. I just moved my eyes around, and jotted a few symbols down on my notepad. I feel like if I had to scroll or switch windows I would not have been able to do this.
As for your last question: What does one need to master it? For now, I would say practice. Try writing a program in as dense a manner as possible. Remove as much redundancy as you can. Read it and re-read it until you feel like it can be no more abstract.
I am working on a much better answer, but I hope that one will do for now.
My guess is that the language's "day job" uses many similar-but-ever-so-slightly-different intermediate variables which are difficult to descriptively name in any manner that facilitates understanding faster than just re-reading the definition. Long descriptive names have a concrete cost: they impair your ability to recognize visual patterns / turn common compositions into "pictographs" and they don't refine well: instead of gaining an index and adding a bit to the definition the author has to come up with a bundle of new similar-but-slightly-and-meaningfully-different names from which someone else (including their future self) will be able to reverse-engineer the definition.
The shortcut of using descriptive names just doesn't have the same ROI in all kinds of code. The first time I was forced to abandon my descriptive-naming ways was when I started writing finite element solvers. I don't think it's much of a stretch to believe that (some areas of) finance have the same cost/benefit profile. Since this is a desktop programming example it's relatively easy to come up with good descriptive names, so the short names are almost certainly a holdover.
Or it's just a macho thing. There's enough pixie dust floating around this press release that I wouldn't be surprised.
> why and what is this style of naming good for, and what one needs to do to master it?
It's bullshit. The letters are not letters, they're symbols. Replace them with JPGs of pokemon if you want. It's just as well if they were ancient hieroglyphs. If you forgot what they all meant, you have to read the code and keep track of what variable name contains what. Or make a note on some paper.
This is why one-letter variable names are an antipattern: when there's no rhyme or reason to what value is associated with what symbol (why is `c` the index of the newline and not `n`?), your brain isn't going to remember it, and you've instantly forgotten what that code does. And everyone after you needs to do the same thing: you need to re-remember how each line of code works each time you work with it. And more importantly, looking at one dense line of code provides no context as to what the code around it does, compared to C or Python where one function can give you a decent idea of what its use is.
It's even more of a nonsense issue because the actual name of the variable is just a number: every one-letter variable name maps to an integer. Why not support proper identifiers and replace them with unique integers at runtime? The same goes for whitespace: it doesn't need to be kept in the program at runtime, but makes the application infinitely more readable because units of logic can be grouped on their own line.
So to answer your question: no, this is not good for anything and no, you shouldn't try to master it because there are much more useful things that you could be doing with your life.
One reason for one-character names is the habit of these languages to sidestep naming problem. Naming things is one of the hardest problems in software. APL programs are ideally read as "variable X is ...", not "do this, then this, then this" - so appropriate names of variables could be something like "max of reduce by difference..." - hence the hardness of coming with good names.
Another reason is the same as in math. When you write equations on a whiteboard - the long-sought golden standard of expressiveness - you don't use long names - at best you encode names with subscripts, indexes etc. But names themselves are usually pretty short. APL languages use the same rationale.
Just like you need to read carefully every line of a math equation to understand what's going on, you have to read carefully each symbol of programs in APL family languages. It's unusual for programmers who got used to more help along the line - but the vocabulary of all such languages is quite short and doesn't extend that often. Another reason why APL could be used for teaching math.
I don't agree that this style of naming is not good for anything. Somehow majority of programmers in these languages agree with that. Regarding much more useful things -
"If you are interested in programming solutions to challenging data processing problems, then the time you invest in learning J will be well spent." (http://jsoftware.com/)
That was awesome, thanks for explaining that for us. It makes a lot of sense the way you explain it, and I quickly got the idea that you can make some powerful expressions this way.
The smooth creation of lists is I think one of the most important language features higher level languages have over lower level languages like C.
Just this thing:
That's all I needed to be convinced that modern languages should have similar view constructs. In ruby it'd be:c::a$"\n"
Quite a mouthful, mainly because Ruby lacks a neat way to do '$'. But doing the same thing in C would really be awkward, and likely not as efficient unless you have some fancy code for building enumerators in C.c = a.map.with_index{|a,i| a == "\n" ? i : nil }.reject{|i| i.nil? }The view automatically gets updated whenever a gets updated. Every time you change a (directly or indirectly), then c will automatically get updated.
Doing this generally in Ruby I think is impossible, but you might be able to get close if all your objects are based on ActiveModel::Dirty
> Doing this generally in Ruby I think is impossible
Not so much. At least with views over most Enumerables, its quite possible in Ruby -- that's the whole reason that Enumerable::Lazy exists.
The existing File class doesn't quite support it because of the way its iterators are implemented (particularly, they are one way) but the class is easily extended to allow it, e.g.:
Then you can create a synced view that has the character positions of the newlines in a file like this:class RewindFile < File def each rewind each_char {|x| yield x} end enda = RewindFile.new "myfile.txt" c = a.lazy.each .with_index .find_all {|x,_| x=="\n"} .map {|_,y| y}If I go `a=whatever` or `a[42]=whatever`, then `c` will not get updated if I have already consumed those values; I still need to "reset" c every time I update a.
that seems overly complicated for ruby if a is a file.
As far as $, you may know it from Regex as the new line indicator.c = []; a.lines{c << a.pos}a is a mapped string. You could do:
but $"\n" wasn't special, and this allocates tons of memory. tinco's implementation is much closer to what k is actually doing.c=[];n=0;a.lines{|x|c<<(n+=x.size)-1}
Certainly you could wrap up the $ operator into some Ruby method?
Yes, Ruby is Turing complete, but that's missing the point.
The value K provides is the collection of operators like `$` that implement a high level language for the sorts of problems K programmers face.
If you went through and implemented all of those operators in Ruby and only used those instead of things like loops, your code would be "unreadable" to the standard Ruby programmer; essentially you'd be programming in a different language.
Sure, simply do:
And then you could do:module Enumerable def dollar(c) map.with_index{|a,i| a == c ? i : nil }.reject{|i| i.nil? } end end
There is of course a reason this dollar method is not a part of the standard library. Its name makes no sense and it's oddly specific, how often would you want the indexes of matches to a character? Most modern languages don't like to work with indexes a lot, and in my day to day work I don't need indexes very frequently either. This I guess is just a thing that these finance/apl people do more often, so they have a specialized standard library.c = a.dollar("\n")(So when I said 'a neat way to do $' I meant it has no find_all_with_index method, which would make my implementation much cleaner)
Regarding find_all_with_index, you can do:
.find_all.with_index { |item, index| ... }except .find_all.with_index passes the index to the predicate block, but still only returns a view of the matching elements from the list.
What I think is being sought is more like:
module Enumerable def find_indices each.with_index .find_all {|x,_| yield x} .map {|_,y| y} end end
Wow, that was fascinating. K looks utterly mind-expanding, thanks for breaking this down.
You obviously have some experience working with K, and it sounds like at least Javascript, too? K is so foreign I expect it has a lot of interesting thoughts locked up in there that maybe don't get the attention they deserve.
Would you say there are any "killer features" of the language / environment that you miss when working with more traditional languages?
And now with less snark :).
K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)
with no changes.
<pre> And now with less snark :).
K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)
with no changes. </pre>
Killer feature --------------
Not having to write:
for(int i=0; i<n; i++) { ... }
which generally obscures the fact that I'm trying to do a map or fold (e.g. - reduce) operation :).
ES5
> c::a$"\n"
The code is full of these things, that looks conceptually a lot like Perl oneliners (which I don't mean in a negative way). But the trick to coding with these short idioms is to make sure your input is strict in a very specific way that fits your particular problem.
A nontrivial codebase could not be written in this way. A real life editor would for example have to the concept of active parts of files to edit files larger than fits in memory, the ability parse different encodings and line endings without changing the on-disk format unless asked for, and many other things (and this just to implement the bare bones of an editor from the 80s).
This looks a lot like a macro language, with its many ready made functions for accessing ctrl-key sequences, cursor movements etc. Would it really hold up outside its problem domain?
> A real life editor would for example have to the concept of active parts of files to edit files larger than fits in memory
Why? Address space is cheap, and a is simply map'd to the file.
I don't have any text files bigger than 64 bits of address space. Do you?
> the ability parse different encodings and line endings
vi doesn't do this. acme doesn't do this. I would agree it's a popular feature, but I still think it's a mistake: If every program on my computer has to continuously parse and deparse bytes into text, and be aware of all the different things every other operating system calls text, then it seems like a waste; it seems redundant. It's a waste of code, and of memory, and I think any mere editor that "needs" this lacks sufficient composition.
> A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values.
Its actually not at all an unusual concept; its well known from SQL, for instance, but also Scala has them (Scala views aren't views in the sense that you use the term, but the lazily-implemented transformers they implement produce views of the type under discussion) as does Ruby (via Enumerable::Lazy). In fact, while the name "views" isn't exactly commonly used for them outside of SQL (though, as noted, Scala uses the term for the construct through which one obtains them), they are pretty ubiquitous in modern languages.
A view can be thought of as a dependency expression. In version 3 of K there was a data driven GUI. The dependencies allowed one to to Functional Reactive Programming (FRP) quite easily.
Unlike SQL, views in k are not limited to queries as dependency expressions. The expressions can be arbitrary.
That's an understatement. If we are allowed code like this, I can write an editor in ONE line of JavaScript. :)
Note that his lines are < 60 chars.
Can you write this editor in one line of less than 6000 chars of javascript (without a "textarea")?
Why do you exclude textarea? It's a high-level functionality available in JS. That q code uses the high-level functionality of q. What's the difference?
No, it's part of webkit/blink/gecko, it's not part of JS. But that's not why I exclude textarea:
I exclude textarea for the same reason dictionaries define something without using the same word and base form.
And the argument is generally meaningless: You may just as well call emacs and say you are using "the operating system's high level functionality".
Does that k code implement cursor-movement / scrolling / etc itself, or is it relying on some sort of system functionality for it?
Someone skilled in writing compact JS probably can. People write amazing programs in 1k & 4k of JS: http://www.google.com/webhp?#q=javascript+4k
Yes.
There is, however, a fundamental difference between codegolfed K and codegolf in essentially any other language:
Codegolfed k uses 1-letter variable names, and drops white spaces (which, unfortunately, is kind of idiomatic). But everything else is idiomatic even for the non-golfers.
Codegolfed JS or C much more often than not is non-idiomatic in order to save all that space.
e.g., compare the K Sudoku solution from http://kparc.com/z/game.k to any other language, and you'll see that the K code is almost a generic solver (the 3, 9 and 10 stand for box size, board size and number of options respectively while setting up the constraints for the solver), whereas every other short solution (that usually come about twice as long to 10 times as long) is extremely specific to the standard sudoku table.
That's not random - idiomatic K solutions tend to be more generic than other languages. The syntax and semantics gently push you towards more general and more efficient solutions compared to most languages, in that the more general (and/or efficient) solution is usually shorter and simpler to code than a specific case.
I know it sounds unbelievable. But you might believe it when you see example after example and realize that it's not just a "one trick pony". And still, most people surround it with a SEP field. I recommend spending the time needed to grok it instead. [Or APL or J; my personal preference is K, but to each his own]
It does not matter. This is not a programming language; languages are designed for human comprehension. Human comprehension is an irreducible part of the whole paradigm of programming, being necessary to understand the nearly always complex logic required to describe anything of value.
This is gibberish. It fails at the one task of a programming language: to translate something a human can understand into something a computer can understand, with as little confusion as possible.
No, it is just not a language for the masses who have gotten used to all programming language syntax being minor variations on the Algol/Pascal/C theme.
Much like math and physics, this is an extremely terse notation that is immensely useful to those who practice it, and looks like gibberish to those who have not attempted to study it. You could give the same comment about an article in any area of math whose notation you are unfamiliar with -- and you would be just as equally wrong.
Any language you don't know is gibberish—hence the proverb, "it's all Greek to me." Please don't dismiss the unfamiliar just because it's strange.
You owe to yourself to read the Iverson's Turing lecture, "Notation as a tool of thought". You can find it. Apparently APL family can claim relationship to math notation - which is not a small claim. Lisp family does it too - in different way.
I think he does a better job than uglifyjs can
Open a new tab and enter this (sans quotes) into the address bar:
data:text/html, <html contenteditable>
Done :)
The APL family was developed to think about math and linear algebra in particular. Iverson's "Notation as a tool of thought" f'rinstance: http://www.jsoftware.com/papers/tot.htm
If you've worked with such things for your day job, exposure to an APL language is mind blowing in the same way as exposure to Lisp is. You'll rapidly find out that an awful lot of the numerics world is an ad-hoc reinvention of an APL language. Leading thinkers in the numerics world have noticed. Have a look at the Tensor type inside Torch7, or -idx- class in Lush (the same thing): they are, in fact, a sort of APL with a more conventional, aka painfully wordy, notation.
Writing an editor in an array language seems crazy, but then, writing a parallel processing system in a language that was designed to run applications in your web browser also seems crazy. If people had stuck with APL style languages, well, databases, particularly distributed databases (Kx and 1010, both K based systems, scale to Pentascale, and have for a long time), would suck less, as would CUDA programming. Their revival could make life easier in these problem domains.
Googled around, Kuro5hin (that's a name I haven't seen for some time) has a tutorial for K from 2002: http://www.kuro5hin.org/story/2002/11/14/22741/791
The download link at http://www.kparc.com/ asks for password, so I'm not sure whats going on with that.
You have to have a personal invite to download the code atm. The whole article is telling you it is not available yet, but "if coming".
I may be mistaken but I believe zokier is referring to a download link for k whereas you are referring to a download for kOS. No?
Right. The part of the page that says "download (then $chmod +x k)" and linking to http://kparc.com/download/k is password-protected. I just want to follow along some of this tutorial material. I'll try to do this with Kona.
The language looks fascinating. Check out Kona, an open source implementation: https://github.com/kevinlawler/kona
Interesting article. Found this (http://queue.acm.org/detail.cfm?id=1531242, submitted here https://news.ycombinator.com/item?id=8476120) interview with Arthur Whitney (from 2009) which is also really interesting.
> “It is a lot easier to find your errors in four lines of code than in four hundred.”
Looking at his code on http://www.kparc.com/edit.k I'd like to disagree with that statement
If I posted four lines of Chinese or Sanskrit, it's likely that native English speakers would disagree that they had much meaning either. However, this doesn't mean that those lines are inherently devoid of meaning or difficult to parse.
That is a fun experiment. I studied linguistics in college, and I do not think anyone ever discussed textual density of different languages with the "same" content (the latter part would be its own terrifying chestnut; if you have not studied machine translation and semantic eval and good luck ever confirming such a statement).
I studied Arabic a lot, and Chinese about a year. I cannot speak to Chinese with only one hazy year under my belt, but I can speak to Arabic.
Because Arabic has lots of syntax realized at the morpholgical level, you can encode a whole sentence (subject (with declension inherent and gender variable, verb conjugated (to passive/active, past/present/future, standard/subjunctive) and direct object (declension inherent and gender variable) all in one word as we know the in English.
أضربه (A-dr-b-u; a (I) dr-b (hit) u (him/it): I hit him (present tense)
And that is a super simple example. I have seen much more compicated setences in one word, and even better in two or three. So, I hypothesized Arabic is very, very dense. I think and Russian and others could be considered similar.
However, with this level of density (maybe we argue "compression" from a CS perspective) I noticed books and their translation were routinely about the same length in pages. Never identical mind you, but never something crazy like 50 pages more (I am guessing; it has been a long time since I made such an experiment and would have trouble agreeing with someone on what is significant).
Now, one could hypothesize a shitload about what this means, but computation is realized as the same "stuff" (machine code instructions) in programming languages, where no parallel exists in human language for mapping human language to computaion, as far as I know from my between minor and major courseload in linguistics, specifically computational linguistics. If someone can contradict me, I would LOVE to read about measured cognition and language constructs.
It's important to separate spoken information density from written information density. Some languages win at one while losing at the other. Your arabic example was shorter than the equivalent english on paper, but longer when spoken (4 syllables vs 3).
In terms of information density per syllable, mandarin wins, with english coming in a close second. When speaking, english usually has more syllables per unit time than mandarin, so english has the highest spoken information density of any language. Japanese is the on the opposite end of the spectrum. Despite having the highest syllabic rate, it has the lowest information density.[1]
For written information density, logographic languages win. This is pretty obvious if you've seen a Chinese or Japanese translation of something familiar, such as a Harry Potter book. They're ludicrously thin.
1. See the figures at the end of this paper: http://www.ddl.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegri...
This is very cool, man. Thanks for the link. It is so much fun when on HN and someone brings up a topic and someone throws out established research for said topic without much delay, no matter how big or small.
Like Apple fanbois have "there's an app for that", I love HN moments "Oh I got a citation for that" and for topics I would find very difficult to research at a cursory glance!
> When speaking, english usually has more syllables per unit time than mandarin, so english has the highest spoken information density of any language.
Of the seven languages in the study, using 20 specific short texts, that were originally written in English then translated (well?) in other languages.
They recognized this issue and accounted for it. From the paper:
Since the texts were not explicitly designed for detailed cross-language comparison, they exhibit a rather large variation in length. For instance, the lengths of the 20 English texts range from 62 to 104 syllables. To deal with this variation, each text was matched with its translation in an eighth language, Vietnamese (VI), different from the seven languages of the corpus. This external point of reference was used to normalize the parameters for each text in each language and consequently to facilitate the interpretation by comparison with a mostly isolating language (see below).
It shouldn't be particularly surprising that english comes out ahead. It has a huge vocabulary, tons of phonemes, and makes many parts of speech optional. It lacks tones, but would probably have to sacrifice some phonemes to stay comprehensible.
That just deals with the variation in length of the texts, not the effect of translation quality or other possible problems with the experiment, like written -> spoken conversion.
Russian is actually less dense than even english, but compensates it with flexibility. The phrase above could be written in a lot of different ways, which would emphasize different parts of the sentence, and give it a different tone.
The written a lot of different ways is also the case with Arabic, except for the one word limitation, since obviously Subject-Verb-Object encoding in one word requires the word order.
If we loosen that req, it gets more interesting. I assume Russian will line up with the following.
In Arabic, the default in formal Standard Arabic (not the dialects, that is another can of Bedouin worms) is Verb Subject Object. You can, however, have VSO, SVO, OSV, OVS, depending on context. I think you decline and conugate verbs in Arabic as you would in Russian. So you can probably play with written form, emphasizing different parts as you suggest in a similar way.
Am I way off? That is what I gathered from Russian/USSR republic kids I have befriended over the years. Not sure if that scans.
I disagree: Translations are rarely accused of being as good or as comprehensive as the original. The fact that you can tell a story in 300 pages in Arabic and 300 pages in English is irrelevant.
Iverson received a turing award[1] for his work on this subject.
Cool. Will definitely read more about this then.
Those crazy middle-easterners. How can they calculate with such terse number notation? As if 27 is more readable than XXVII! ;-)
I think that the issue is that just measuring simplicity in terms of number of lines is a bad metric. You can have extremely complex expressions in a single statement that are at least as hard to read and debug as an equivalent, much longer piece of code that employs temporary variables and single-purpose statements.
I disagree fundamentally.
I have noticed every page I scroll causes a comprehensive loss of around 90%, so in reading something that is 10 pagefuls long, I might only be able to produce a tiny part of the program.
Your milage may vary.
I find not scrolling, and just moving my eyes, I rapidly absorb the program, and I find most bugs just by reading the code. This practice is absolutely impossible for me if I have to scroll very far and made difficult by scrolling at all.
It is for this reason that I find simply counting the actual words to be an excellent estimate of complexity.
By the way: There are several temporary variables in that code; c:: creates a view called "c" which automatically updates whenever the dependent variables on the right side change.
Yes, the research literature on software development has consistently found that code size is the best measurement of complexity and predictor of error rates. (Sorry I don't have citations handy but we've discussed this many times on HN, and there's a recent study in the book "Making Software" that adds to it.) What's interesting is how strongly this goes against what most people think they know about good programming and clear code.
I just had to troubleshoot a small helper app that took some HTTP input and wrote to a DB. The code was in C# and had about 10 files spread over 3 namespaces, plus a separate test infrastructure project. All sorts of factory models were used to setup an "HTTP pipeline" and authentication modules. The problem I had to fix: after a server upgrade, authentication was broken.
After digging around for a while, I discovered there was no bug. The partner's client code had the auth disabled, and the pervious server was misconfigured to not require auth. All which would not have been a problem if the system just did an "if headers.auth != "Basic ..." - but buried in this forest of stuff, it was overlooked.
It seems that some developers just love their edifices. They build all this "infrastructure", expanding code by an order of magnitude or more. It's considered good and robust and so, so much writing online is dedicated to this pursuit. I think it gives those programmers a feeling of import, as if they're really architecting something, not just pushing a few form fields around.
Even on the line by line basis, it's shocking how they love verbosity. Type inference? Nope, that makes things too compact and hard to read. Higher order functions to wrap up common patterns? Too difficult to understand. I'm not sure if developers simply lack the tiny bit of extra intelligence, or if they've tried it and honestly concluded that overflowing verbosity is the key to readability. Either way, it's sad, and holding back progress slightly.
Right, there seems to be a group thinking like this and a group aggressing against it and vice versa. I recently had a discussion about it and the 'architecting' bunch (we need 20 layer deep directories with 1000s of file with < 10 lines / file) keep shouting about maintainability. The problem is, that after 25 years of professional coding in many different circumstances, I see that most good programmers are much quicker to understand the 'non architected' (putting between '' because good code is not gibberish, it is architected but not by randomly generating design patterns and applying them) and the not so good programmers say that the 'architected' code is much more maintainable but take weeks or months longer to do anything worthwhile as they are 'grokking the architectural choices'.
If someone produces smaller and faster code than me, then I should want to learn from it. I wonder why other people have the exact opposite reaction.
Why do you think that is?
I think that "It's what I'm used to." is the main reason - intellectual comfort zone.
Having learned BASIC, FORTRAN and Pascal, C seemed like line noise - at first. As did PERL. And then k.
Btw, COBOL seemed "too verbose".
Once I actually started writing many k programs and then reading even more of them, I was able to recalibrate for the abstraction/density. I moved my intellectual comfort zone. Ironically, I was already there with mathematics. However, programming languages were different :).
Now, as a result, every time I have to read Java, I suffer from a kind of fatigue - having to read way too much code to glean the writer's intent. I just want them to get to the F'ing point.
N.B. - Mathematical literature/writing went through this same transition during the Renaissance. Equations were described in natural language (not unlike COBOL). A simple polynomial could require a paragraph of text to describe.
I'm not sure -- I know that after the fourth or fifth time solving a problem on projecteuler.net in 20 lines of code and seeing someone post a 1-line J/K solution, I went and downloaded J. I even managed to solve a few euler problems with it, which I regard as a large accomplishment for a novice. I like to tell people I've written a whole twenty or so lines of code in J!
I do enjoy learning about such things, but, for most of the work I do, performance is nowhere near at the top of the list of things I care about. Also in the past I've been burned by code that's small/fast but is otherwise utterly unmaintainable. I'm not saying that's the case here, but... past experience, and all that tends to color perceptions.
I think with a language like k or q, which appears to be purpose-built for certain types of problems, people look at it and get easily confused and discouraged because it's so different from all the more mainstream general-purpose programming languages they're used to. And it's a lot easier to put down something you don't understand than to admit you don't get it, or to spend lots of time learning something that may not be of much use to you. Kinda sucks, but it's often human nature.
> I think with a language like k or q, which appears to be purpose-built for certain types of problems,
The thing is, it's not purpose built, and it doesn't even appear to be if you suspend your disbelief. The only reason you'd think it is purpose built is because "well, it can't be this short if it wasn't purpose built". But if you go over the manual, and find special built operators, please tell us what they are.
e.g., to compute an average, you can use the function avg:{(+/x)%#x} - with the exception of parentheses, every character has an orthogonal function. Similarly, the maximum subarray sum solution mss:|/0(0|+)\ ; and there are many others. And it's not just math stuff - http://nsl.com has lots of other examples of many kinds -- and most importantly -- is an operating system + GUI not general enough?
How would you represent a graph and implement DFS in K?
If you are really interested, http://nsl.com/ is a treasure trove - quite a few of the examples are extremely well documented, some or not, but there's a wealth of information there.
Specifically about graphs, you can look at:
http://nsl.com/papers/order.htm - topological sorting
http://nsl.com/k/tarjan.q - strongly connected components
http://nsl.com/k/loop.q - find loops in graphs
I think in all of these the graph is represented either as a list of edges or a dictionary of node->(list of nodes that it has edges to)
Thanks, I've missed the SCC one. I will try to understand it. (The reason I've asked about DFS in particular is that it is inherently sequential. This SCC algorithm probably encapsulates some kind of DFS.)
K has interesting sequential goodies as well: over ("fold" in Lisp/Haskell, "reduce" in python), and scan (same, with all intermediate results returned as well). But it also has the unary ("monadic" in APL terminology) counterparts to these essentially binary operators, which I don't remember from Lisp or Haskell (but I'm neither a Lisper or a Haskellite, they probably are there somewhere..)
Unary over is the "fixed point"/"converge" adverb, which does
until x stabilizes (to within floating point tolerance if it is a float), returns to its first value, or goes through a requested number of iterations.x <- f(x)The best example of this that I can think off is the K "flatten" idiom:
read: "concat over, converge". That is, given a general list, it concatenates all its items promoting atoms to one-element lists - thus, flattening one level of the list; And then applies it again and again until there is no further change, thus flattening successive levels of the list.,//Is this the most efficient way to do this? No! in fact, for an unbalanced one sided list it will do O(n^2) where n is the number of items, with a best (and idiomatic Lisp/Haskell) solution being O(n), although it's usually 100 chars rather than 3.
But the actual code orchestrated by these 3 chars behind the scenes is all tight C loops, so for small n it will beat complex solutions. And it is all of 3 self-describing, easily remembered, easily recognized, easily optimized (if Arthur ever cared ...) characters. If you care about worst case, you can easily code the standard Lisp/Haskell solution just as you would in those languages. See [0] for more.
The underlying computational model fits sequential, parallel, SIMD, and almost every other paradigm much better than all the popular programming languages. Unfortunately, there's a learning curve that puts of most people (and is perhaps insurmountable to some people who have no problem with Python, Java, C or PHP) - it's much more Math-oriented.
[0] http://www.math.bas.bg/bantchev/place/k.html
edit: added [0] link and ref
Numbers in arrays can be treated as pointers, into that same array, giving a graph. It's just a question of context. If you were storing RDF triples in K you'd simply have an array for subjects, one for objects, one for predicates, and one of the URIs/text. Simply store the index of the URI/text item in each of the subject, object and predicate columns. An individual triple would be formed by the same index applied to the subject, object and predicate arrays.
DFS is then a variation of the more familiar functional style of tackling the problem where you have your end condition (i.e. something that matches what you're looking for) and failing that do something else (typically recursion).
I can't recall enough of the K syntax these days to actually implement that right now though, or if K has TCO.
One benefit to a short program is that there's not much code to rewrite if you can't read something.
This doesn't happen very often, but I find the thought comforting.
It all depends on how you define "small and fast".
Comments obviously are not code, so it's reasonable to complain about lack of comments.
You suggested wordcount, I think wordcount is good, so it's reasonable to complain about single letter words rather than descriptive words.
uberalex's suggestion for reformatting wouldn't change the algorithm or speed. It would simply spread operations across more lines. That also seems like a reasonable thing to ask, to me. They can learn your method either way.
Edit: I mean, I'm sure fitting more on the screen is valuable, but people already know how to fit many times as much code onto a screen. They avoid it on purpose for whatever reason.
>I mean, I'm sure fitting more on the screen is valuable, but people already know how to fit many times as much code onto a screen. They avoid it on purpose for whatever reason.
I think this reason (whatever it happens to be) is probably wrong.
I don't understand it either. But it happens everywhere, not just in code. Most people are only interested in the "truth" and "facts" as long as it fits within their existing world view.
And things like K rarely do.
Hey, just read bytecode, then. As small and fast as you can get.
But, is number of lines a particularly good size measurement?
Is there evidence one way or the other on whether it's better to measure size with, say, number of lines, number of tokens, or number of nodes in a parse tree? or something else?
My understanding of the literature is that no one has found a better way to measure program complexity than lines of code. In particular, the fancier metrics (cyclometric complexity and so on) don't add any value over simple LoC.
We've debated the merits of counting tokens before, but I don't recall anyone mentioning a study about it. In real programs—i.e. when you're dealing with idiomatic code as opposed to something designed to game a metric—I doubt that LoC, lexical length, and number of tokens differ much.
Scrolling doesn't bother me, but unnecessary code does. So long as I can see the algorithm on the screen that's fine. Love the kOS idea, keep on it!
Actually it is a good metric, certainly to the first order. Yes, you can have a line or three that is more complex than the rest but practically it isn't going to reduce the line count that much.
And token counts don't help as code that insists that each brace must be on its own line detracts from readability. For one thing it pushes the last bit of the function off the bottom of the screen meaning you have to scroll.
A line that is overly complex is eventually get rewritten.
I say this as someone who has written large bodies of code in sigma 5 assembly, Fortran II and IV bliss 36, C, C++, and Lisp. Perhaps more to the point, these days I read large bodies of code measured in millions. Lines of code dictates how long it will take to understand it.
Peter Norvig in paip gives some examples of small code and how it can be exceedingly clear.
It is a pity that they don't use Chinese symbols but invented their own. I really hope one day I can write programs in Chinese, i.e., programs are also valid Chinese, and have the same meaning.
Looks closer to BrainFuck[0] than any other language.
There's a contradiction I always see in these pieces: they talk a lot about the importance of using the right data structure. But these languages get their incredible conciseness by not giving you any choice about your data structures; their array type is hardcoded into the language, and if you want to use something else then your code balloons.
K basically has 3 data shapes:
atom (int, float, char, date, symbol, ...)
list (one dimensional array of atoms, dicts, flips or lists)
dict (a map from one list to another)
There's also a flip, which exchanges the first two indexes applied to an item (so, e.g., it effectively transposes a list of lists) but it is just sugar (both syntactic and semantic).
You can trust Whitney that all of these are properly implemented, including appends.
It's not often that you actually need more. I've discovered this after using K for a while, and going back to python.
Back in my pre-K (ha!) C++ and Python day, I had an awful lot of classes everywhere. After using K for a while, my Python and C both have much much fewer (structs in C more often than python, as C is missing python's dict). And the code has gotten much shorter and more efficient. Arguably, more readable as well. And I've essentially dropped C++ for C, because the extra complexity is just not worth it.
k/q really doesn't have to be this unreadable, that's just Arthurs style. Here's some code in C by him for comparison: http://kx.com/q/cs107/a.c
I tried cleaning it up a bit: https://gist.github.com/lukechampine/f54fce8fd756254cefb2
But the actual meaning of the program is still lost on me. I can only guess it has something to do with parsing files (note the checks for curly braces). Feeding it its own source code produces some output, but I have no idea what it actually modified.
It's a solution to CS107 assignment 1 : http://web.stanford.edu/class/cs107/assign1.html
Guess I could have just looked at the URL, huh. Well, it was fun trying to reverse Whitney's code anyhow.
Humorously, it looks like this code would have received a poor grade. It meets just about every standard for low quality outlined here: http://web.stanford.edu/class/cs107/landmarks.html, particularly "Fast code which doesn't work quite right." (Due to an off-by-one error, this code fails to properly reconstruct the example text.)
> Due to an off-by-one error, this code fails to properly reconstruct the example text
What's your example? It seems to work fine to me on "{all is well}{ell that en}{hat end}{t ends well}"
In case anybody is still interested, it's now up on http://kparc.com/cs107/readme
The original is 404ing, do you have a mirror?
That's the original C file, where did your "cleaned-up" version go?
Here is a style for K (http://nsl.com/papers/style.pdf) 1995. Interestingly, most of the concepts still apply.
I believe the audience is developers using K in a commercial/production environment.
The biggest differences, compared to Arthur's style of writing, are: * Less code on each line * A separate comment column on each line * Nominal use of spaces for readability
Q isn't unreadable. Its implemented in standard English words.http://code.kx.com/wiki/Reference K is considered unreadable by many.
I was more referring to the almost exclusive use of single letter variable names. If one would use at least short words to name things, the implementation would be a whole lot more understandable.
Good lord, I need a drink now.
If that's true, why aren't there DSLs that compile to k?
I love K/Q and is using it in my startup. Thanks a lot for Kx's recent freeing up the 32bit version. To use APL-like languages I have to really shift the way of designing/modeling things. Most importantly K may not be best for lots of developers working on the same thing. Object oriented languages will fit better in that case. K projects often only involve one or two developers who model things in vector thinking (column based thinking in data domain), know exactly what to do and how to do it. Vector thinking is not suitable for all problems, but works really well if it does. Btw, new 3.2 version of kdb appears to be even faster than before. It also improves websocket integration and JSON data conversion. Very nice to integrate with nodejs/Qt.
We also use Forth which I think is a really another way of shifting mind.
What kind of computation is done at your startup? I've used k/q for trading, graph analytics and computer vision (scene reconstruction).
I seriously thought that the article, and some of the code examples people are posting [0] was a joke! I need to go rethink my career.
That's certainly good - we should learn languages which are different enough to justify learning.
Your code in whatever language you work will benefit from that rethinking.
So K is a general purpose programming language? If the claims are true, why don't they submit some entries to the Computer Language Benchmarks Game?
It's proprietary.
<sarcasm>Yeah, that's why.</sarcasm>
Perhaps they asked if their K programs were wanted:
Perhaps they asked if their K programs were wanted:
And…?
I seem to remember being asked about one of those array languages and saying no thanks.
Does a good formal introduction exist for K, or Q, or APL, or J, or any other languages in this family? Something with, you know, a syntax definition at least, and any kind of formal definition of the semantics.
The closest I could find is this [1] but "The model is expressed in SHARP APL", so from the start, it's circular.
It's not formal, but it is helpful to watch these series of videos by Martin Saurer on J:
https://www.youtube.com/watch?v=VSJpJt3c11c
It goes form solving some Euler problems to a full-blown web app in J.
For J, IMO, jsoftware.com has good resources. That includes vocabulary (the way to specify the language), a few textbooks (JforC, J Primer), essays, examples of short code... And J forums are pretty helpful.
Coming back to the question, for J its vocabulary on jsoftware.com is a good resource.
I wonder what K would look like with bit reader-friendly syntax. I tried running a program that supposedly "will take a K expression and produce its English translation" (http://kx.com/a/k/examples/read.k), but either it doesn't work with kdb+/q or I can't figure out how to use it. Does anyone have some example output, or advise?
A language/kernel/db/ui to watch. Reading the comments here, it seems the language separates the wheat from the chaff; the average brogrammer won't he able to handle this, but many of us are very interested in exploring.
An OS this small is incredibly exciting to me.
Worked at Morgan Stanley's fixed income desk many years ago as my first job after grad school, programming A+. It was a decidedly acquired taste. The language and style were utterly alien to anyone coming from a background in C (and at that time an early version of Java.) While A+ was fast for manipulating arrays (much of what the trading floor needed), it seemed that doing GUI development with A+ was really pointless. The speed of the language didn't matter in GUI interactions, and it was very hard do understand other people's code.
Um, so what happened?
It's not done yet.
This summer, Pierre and I got kOS to boot directly into g (the graphical interface; formally called z) with ISR, keymap, modesetting, basic filesystem, etc weighing in around 100 lines of C. That was pretty exciting. Could probably be done with less with some deeper changes to Arthur's code, but it's still very useful to run k under Linux. Oleg made a silly little game in kOS.
Arthur and Oleg did some performance benchmarks staging k against q (current kdb+), Postgres, some "popular RDBMS" (that I can't name), and MongoDB. It was impressive that k is so much faster than q, but it also really underscores the cost of the wrong data structure (and how hard it is to get the right one with SQL or MongoDB).
How much are you planning to opensource?
I realize you have a thriving commercial software company and that's cool. But...wow. This is exactly the sort of thing Alan Kay's team has been working on for the past five years, and you guys seem to be beating them to it, with a completely different approach. It would be pretty amazing to be able to dig into it, find out how the whole system works, and contribute.
I'm not actually employed by kx, so I can't say for certain.
I don't think Arthur's opposed to open sourcing bits though.
Would it run under Kona? [1]
No. A lot of the verbs have changed in k.
so would it work if Kona changed?
Kona is K3. K5, the subject of the article, is being invented now. I'm quite certain there is no plan to make Kona K5 compatible.
What hardware is your team currently targeting with kOS? An x86 virtual machine under something like VirtualBox seems to be a popular choice among developers of alternative operating systems, since it's a way of avoiding the diversity of PC hardware and the need for lots of drivers. So are you doing that? Or sticking to things that are pretty well standardized but outdated, like IDE and VGA as opposed to SATA and modern GPUs? Or are you targeting a particular subset of PC hardware?
Arthur, Oleg and Pierre use Asus UX31A; I have a MBA. I did the first kernel in qemu, but Pierre got an EFI boot going with Intel modesetting pretty quickly.
I'm more interested in smaller devices though (ARM, etc).
So is a laptop like the UX31A or the MBA actually usable as a laptop with kOS yet? I imagine power management is going to be complicated to implement, if you haven't gotten to that yet.
> I'm more interested in smaller devices though (ARM, etc).
Yes. I imagine a smartphone with a very low-power processor and a couple orders of magnitude less RAM than the mainstream models could have very impressive battery life.
Quote from OA
"Whitney sent Oleg and Pierre some of the C code he was working on, and notes on a problem he didn’t know how to solve. They emailed back a solution, coded in his style."
Did Pierre and Oleg think their solution out in standard code first and then make it Whitney-like, or did they find themselves thinking in Whitneyese straight away? I imagine their teacher may have noticed Whitney tendencies and that is what led to the original encouragement to make contact.
I do not think they code in what you call "standard code", but I don't know how "straight away" it was either.
Writing in a dense fashion facilitates thinking about the solution.
"Writing in a dense fashion facilitates thinking about the solution."
What I was thinking, thanks
So, I read about the fast interpreter and small language. How do something like this?
"Whitney’s strategy was to implement a core of the language – including the bits everyone thought most difficult, the operators and nested arrays – and use that to implement the rest of the language. The core was to be written in self-expanding C. As far as I know, the kdb+ interpreter is built the same way.
Unlike the tall skinny C programs in the textbooks, the code for this interpreter spills sideways across the page. It certainly doesn’t look like C."
This mean that the code is very unreadable C? Like a kind of code-golf?
How replicate it for built a speedy interpreter? And what if I use lua or python instead?
There are two opposite schools of readable C.
The mainstream says: readable C has function and variable and type names that express meaning, so a function is read like a narrative with verbs, adjectives and nouns. The fact that this narrative scrolls over pages, is unimportant.
The APL/K/J school says: readable C has functions, variables, and types named with single letters, so that the totality of a function is short enough to fit in one glance - preferably, on one line that does not need a scrollbar. The function does exactly what it says, no more and no less; its intent is thus completely clear. To name it descriptively would ruin the ability to grasp the whole thing as a gestalt.
Ok, so APL/K/J is for people that read compressed code..
But that how relate to how build a fast interpreter? Faster than C?
I hate to love KDB because its a very expensive closed platform. But if you understand some of the concepts and how easy it is to achieve those concepts with a few lines of q code you can do some brilliant things. Yes, KDB is a great data store and provides very quick methods for crunching that data with its vector-based approach.
However, whats really impressed me with KDB is that you can do so much more with it. In some banks it has effectively become the messaging middleware for connecting hundreds of disparate data sources. In addition to passing messages you get the data storage and analytics tools for free...
This seems to be a very fun language to write in, in the same way optimizing assembly code in the ancient times was fun; the age of skilled artisans. Unfortunately (for programmers - great for the rest) now is the time of a factory worker. Due to that, I don't think this language and its platform has a long future.
Its continued survival is rather a testament to excellency of marketing (as evidenced by this article!) rather than actual merits.
I think the opposite :) - marketing of R is light years ahead. Language is very obscure, but so good that from time to time you're going to hear wonderful stories from new enlightened.
And with the rise of parallel programming we're going to actually switch more to more vector way of describing and solving problems. Just like Lisp ideas are spreading everywhere in modern languages, APL ideas are also fruitful.
I'm interested in a kOS with a "god says ..." program built in ...
get Terry Davis on the job. Is he opposed to porting his holy code to other esoteric platforms?
> kOS is coming. Nothing will be the same afterwards.
There seems to be this strange idea going around that if we just get the right tool, everything else is going to change forever. I see this a lot with people trying to create IDEs that let non-programmers create programs without really knowing how to code.
But the thing is, most people just don't have anything worth coding. The problem isn't that the tools don't exist. They do, even if they aren't perfect. It's that making something that matters isn't an easy thing to do. And no tool can change that.
If I am from another planet, and say I don't know why programs are so big and slow and buggy, and the most complicated program you see I've produced is a glorified calculator, it's too easy to be patronising and say well, that's because you haven't done anything complicated.
However if I then show you a programming language, a database engine (similar in capability to SQL but around 1000x faster), a graphical desktop with icons, mouse, editors, filesystems, ISR, and so on- and it's still under 400 lines of C code, then maybe it's easier to have this conversation that I think we need to have: That maybe all you programmers have simply been doing it wrong, and there's something fundamentally wrong with the way you and everyone else programs computers.
I think most programmers take so much offence from the thesis: that everyone programs wrongly (or badly) and that a fundamental shift in the way we program could make programming much better, that it's been difficult to actually work on this problem. Just look at all the people who are complaining about the text editor being difficult to read without saying I can't read this, and I want to get better.
And I think it should be obvious: You've built bridges for thousands of years, so you expect you're pretty good at it now, and yet there are still improvements in bridgemaking today; why do you expect programming to be any different? Why do you have "best practices" for programming if you've only been doing it for a few decades and don't really know how to do it yet?
Maybe it's much easier to think we're working on giving non-programmers the ability to program, but we're not: we're trying to make programming suck less.
I'm hesitantly excited about this language, after reading this article. The power increase I've gained going from an "OO" imperative mindset to functional has been huge. The simple volume of code from other styles just boggles my mind. I can't understand why people prefer their verbose code with so much edifice. Making another leap like that sounds very promising.
At the same time, it does sound slightly off. What's the catch? In the text editor example, how capable is it? Often, such claims rely on a trick, like saying "<textarea/>" is a text editor in 11 characters. Or the tiny Haskell quicksort that's actually rather inefficient.
I think I'll try playing with it. How long should a Logo interpreter in K be?
I wanted to be able to type K and press a button and get the result, so the first bit of k code I wrote was the following:
When you type !10 and press control-I it prints the result where the cursor is and selects it. This way you can press backspace to delete the output and change the code easily.np::*((l$[=/k;(*i),0;i]),j)_a;ci:{{J y;x:3:. x;J y;kx x;K(y,j)}[np;j]}How much code should that take? Probably less than that (I didn't know about bin yet), but I think that's part of the learning process.
I think a minimal logo would only take a couple lines.
that's awesome. It's a one line REPL.
I think Douglas Adams' SEP[0] field is a good description of this. Most people ignore what doesn't fit with their logic - doubly so if fitting it in would show them that they have been wrong/wasted money/wasted time in the past. (The Upton Sinclair quote about "it's difficult for someone to understand something when their salary depends on not understanding it" is also relevant).
It's that way with religion, paradigms in medicine, and even chemistry/crystallography (see e.g. Shechtman vs Pauling[1]). It's human nature.
It is also worth pointing out that while this specific version of K is new, K itself is 20, and it is mostly purified APL, itself over 60 years old. People have been ignoring this since forever (APL found stellar success in the OR community in the 70s and 80s on its own, in trading floors in the 80s and 90s thanks to Arthur Whitney, but has since mostly disappeared)
[0] http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem#Ficti...
[1] http://www.theguardian.com/science/2013/jan/06/dan-shechtman...
Nothing encapsulates PNAS' worthlessness as a journal more than that fiasco.
If the SQL is I/O bound, the kdb version cannot be 1000x faster. And if it's not I/O bound, you're not pushing it right. (Nite: compare columnar SQL here.) K can't make your I/O subsystem faster, and the new OS probably doesn't natively support 99.9% of existing high performance hardware.
The SQL was not I/O bound, per se - that is database/implementation-specific. The benchmarks were evaluated on "hot" data with plenty of RAM available (like any good production system).
Some databases don't manage that situation as well as they should. I.e. - they were developed in an era of small RAM & large disk.
> And if it's not I/O bound, you're not pushing it right. That's exactly the point. k is "pushing it right" while the others don't do nearly as well.
> the new OS probably doesn't natively support 99.9% of existing high performance hardware.
At <500 LOC of ANSI C there's not much to port to new hardware, given a decent C compiler.
>And I think it should be obvious: You've built bridges for thousands of years, so you expect you're pretty good at it now, and yet there are still improvements in bridgemaking today
Right, and I'm not saying programming won't improve. It obviously has. No one wants to write a web app in C++ or assembly. What I'm specifically taking issue with is the idea that there is some sort of monumental change out there that is going to enable us to do... what exactly?... well no one can really answer that, but the claim is that it is big and exciting and will change everything.
VPRI (vpri.org) was already mentioned here. The goal of Alan Kay, as I understand it, to reduce complexity of computer systems - by making them smaller. k shows you potential results with this approach - doing things differently, you can have small and functional systems, so they could be easier to understand while remaining feature-rich.
When I can fit my program on the screen, I don't make any mistakes.
I think this is true of most programmers.
While "hello world" type programs tend to be the pedagogical example, kOS demonstrates that the complexity of such a one-screen low-defect program is much higher than people previously thought.
Not being facetious, but does this extend to using larger screens, multiple screens, wider columns or multiple windows, and smaller fonts? Or was one screen just meant as an estimate for 200 lines or so?
I think it is the act of scrolling or window-switching or head-moving that is the cost. To that end, I find using a MBA screen to be perfectly adequate for programming because it fits entirely in my field of view.
My screen is about 55 lines tall and maybe 170 characters wide. I sometimes shrink or increase the font slightly to make programs fit on screen.
I'm just a regular web and app developer, but would it be at all possible to write a layer on top of k that makes it at least more readable while retaining most of the speed advantage?
To me, part of being an engineer is not taking offense when being told there are potentially better ways to do things, but at the same time I can understand why some people jump to that kind of statement since at first glance it seems like a regression of sorts, especially when the industry has a continuing history of trying to lower the entry barrier. Even though underneath the programming concepts may be revolutionary, people may also be quick to balk at it when it's in such a form (hence the first question).
Industry does try to lower the barrier to entry. At the same time it strives to reach ever-higher tops. APL family of languages have a high barrier to entry - which is of course a problem - but even remaining novice in the language you can be routinely more efficient working with APL than working with another language. So - that high barrier to entry is a disadvantage, but for that you can have pretty high levels of efficiency in the same language going from newbie to intermediate to advanced levels.
I think this "readable" bit is the theory behind q which you can download from kx.com; Some people think it is more readable. q/kdb+ supports websockets and has a built-in http server so it is very possible to make web apps with it.
However I think there is value in the dense coding style that has nothing to do with it's performance: it's that it reduces bugs and helps you think about the problem.
You've sparked my interest, for sure. I'd love to pull it down and give it a whirl. Hell, one could conceivably write those things with K as well.
Thanks for the perspective. Good luck to you all with kOS and others.
Having studied the History of Mathematics, I think that the same discussion/argument must have occurred in Italy when transitioning from Roman to Arabic numerals.
Would you rather do:
IXDCCCLVI * VIIDCCCXLIX
or
9856 * 7849 ?
> similar in capability to SQL but around 1000x faster
If it's really similar in capabilities, why not strap an SQL parser on it and sell it? You can do it client-side to avoid wasting performance where it matters. Surely there's money to be made in the database business if you are an order of magnitude faster than everyone else -- let alone three.
You don't seem to be aware that Arthur did exactly that:
http://code.kx.com/wiki/Startingkdbplus/qlanguage http://www.kx.com/q/d/kdb+.htm http://www.kx.com/q/d/kdb+1.htm
It's a commercial product and I understand kx does pretty well.
I was not. Thank you. What is the source of the performance figures?
http://www.listbox.com/member/archive/107499/2014/05/sort/ti...
The list is subscriber-only, and I apologise that the information isn't very easy to get at.
I don't know, think bigger. The internet changed things forever, and before that computers. Revolutions have been happening at exponentially increasing rates. The explosion of software complexity we see today is perfectly ripe for innovation and maybe even could cause another revolution. The software industry isn't exactly mature when you take a longer time perspective.
Part of what made the internet such a big change to "things" was that it didn't attempt to replace "everything" wholesale. I'm fairly sure that form of the idea was tried -- for example, PC/GEOS -- but the idea that succeeded was the one that allowed the most people to try it out and see what it did.
I'm trying not to add more words to that paragraph, though I could/should.
Not knowing how to code is not really an issue because we can and do teach people how to code the same way we teach people how to count and do basic algebra. What we can't do well is teach people how to solve novel problems that they haven't encountered before.
The reason k, q or other array languages like J (which I use) help, not teach, people to solve novel problems is that they give you abstractions to quickly play with in order to tackle novel problems. My take on this whole thing is that these array languages (APL, k, J, q) are a perfect fit for things that are essentially arrays - big data, images, etc... And to those who shy away from the seemingly 'noisy' syntax, I point them to mathematics. Until you understand the sigma symbols and others, it is noise too. Why are all the new languages getting hot on 'vectorization' - Julia and company - because GPUs and data are all array-based. To have the array as your lingua franca puts you in a good position for all sorts of attacks on modern computing needs. And for those who cannot live with the syntax, and say how can you create a dsl - you can. Here is a J example of computing averages:
You can set this up as:+/%#
oravg=: +/%#
How's that for a dsl? Which allows this now:tally =: # divide =: % sumall =: +/
EDIT: I forgot to add that these operations, as others in J or APL, work on scalars, vectors and arrays without any special typing or handling. See this presentation for at least the first 5 minutes to see J in action with explanation: http://www.infoq.com/presentations/j-language The conciseness allows you to start seeing the small patterns in the small expressions just like mathematics, hence bringing clarity and speed of abstraction to your concept manipulations.avg=: sumall divide tally avg 2 4 6 4I think part of the difficulty here is that it's almost like Haskell's point free style, but not quite. It isn't really clear where the arguments to avg go. It seems that it's meant to be a bit like this:
I guess you just need to know how the argument you give to avg when you use it distributes over the functions that comprise the expression. The list argument to tally isn't a problem, but why does sumall get it's own copy of it? Or is the execution model something else entirely?avg list -> (sumall list) / (tally list)In J there are two special ways to combine functions which are written using special syntax. Namely,
1) when you want to calculate f(y, g(y)) , you write (f g) y - this is "hook" of one argument (monadic, in J terms)
2) when you want to calculate f(x, g(y)) , you write x (f g) y - this is "hook" of two arguments (dyadic)
3) when you want to calculate f(g(y), h(y)) , you write (g f h) y - this is monadic "fork"
4) when you need f(g(x, y), h(x, y)) , you use x (g f h) y - dyadic fork
5) when you have a train - say, (a b c d e) x - longer than 3 elements, then you consider rightmost 3 functions (functions are called verbs in J) as a single fork - say, f - and then consider (a b f) x . So (a b c d e) x is
b(a(x), d(c(x), e(x)))
If the train length is even, the last operation becomes hook - so x (a b c d) y is
a(x, c(b(x, y), d(x, y)))
You can express any computation as a sufficiently complex train.
A mite of pedantry: the J train (a b c d) is a hook with a fork on the right, and so dyadically acts like
a(x, c(b(y), d(y)))
Roger himself has dismissed [0] hooks as an unfortunate result of J4's myriad train rules, made in the name of tacitable everything, which I lament because for some reason, tacit programming is just so much more satisfying than normally solving the problem.
[0]: http://www.jsoftware.com/jwiki/Essays/Hook%20Conjunction%3F
Yes, my mistake regarding 4-element dyadic train.
That takes some time to digest, but I guess is absolutely crucial to be able to do anything with this language.
Would you say that this is 90% of the reason it looks so uncomprehensible?
No, it's not absolutely critical. You can do a lot of your own programming without using hooks and forks. Your programs will be somewhat simpler - and whenever you need to use a variable in several places, you'll have to explicitly name it - but still.
Incomprehensibility happens in part because all ASCII characters are used - so you have a lot of differently-looking symbols; because [ and { aren't paired with ] and }, and " isn't paired either; because you often have . and : as the second character of built-in entities, so you've got a lot of dots... APL used the symbols invented explicitly for those purposes, but the price - non-ASCII alphabet - was considered too high. So - no, I wouldn't say forks and hooks are the source of 90% of uncomprehensibility.
Sounds great! Where's the documentation on IPC / mutexes / threading?
> todo
> files, procs, tcp/ip, usb, ..
Oh.
E.g. - (k4/q) http://code.kx.com/wiki/Reference/hopen shows how to open a file/proc.
The rest of the wiki is quite useful including a references and tutorials for q.
k5 uses operators instead of words like q.
Btw, being able map/reduce w/ 1000 procs on a 8GB Linux vm (using k5) is both useful and fun.
Kind of curious to play around with that text editor. Any chance of K running on OSX?
You can get kdb+ for free from here: http://kx.com/software-download.php
However, they've been working on a new version of k that's not publicly available yet and I suppose the kparc stuff requires that.
Hey Geo, if you read this, congrats :)
Tristan
joke: search jqk on google images
I don't buy this story. Maybe kdb is fast and great, but the article attempts to describe it as a work of a genius, better than anything else because it's 100 lines of code, doing its own memory management and running on bare metal. But in fact many other programs do their own memory management and can run on bare metal. JVM or .Net do their own mem management, all database servers too, and many, many others. So what's left on the table? 500 lines of C code that's meant to change the world? I really doubt it (or, these guys arent using line breaks). We've read so many similar stories about Lisp and how it's one language that can do everything in 5 lines of code or how you can build an entire OS and all applications in Lisp, but where's that OS now?
What's left on the table is the efficient use of resources.
When I mentioned to a friend that the current version of K5 was a binary < 100KB, his response was "I don't believe it. I can't write HelloWorld in less than 100 KB!".
Access to more resources does not mean that one should be wasteful.
K benefits from a dedication to avoiding waste and duplication and efficient use of mathematical concepts. The Fundamental New Computing Technologies at VPRI has similar goals (an entire end-user system including "Office" apps in 40 KLOC or less).
Being able to prototype a multi-proc map/reduce algorithm in k with 1000 procs on a laptop with 8 GB RAM is quite nice.
> 500 lines of C code that's meant to change the world? I really doubt it (or, these guys arent using line breaks). There are line breaks, undoubtedly. That said, I'm sure the code is concise - much like k code.
K3 was 1200 lines of code and included the language, windows(GUI), database, IPC, REPL (w/ simple debugger), FFI and OS interaction. The Windows executable was 320 KB.
Every ten years, some company comes around and claims that their functional/data flow/columnar/meta shortish is orders of magnitude better. Usually, the one thing they have going for them is that they focus on only a small subproblem. That lets them be small. Every demo is one of no edge cases, no exceptions, and no I/O errors. (Or they cram all that into some "standard" library.)
The real challenge is that, 99% of the time, requirements and integration is what kills you, not raw performance. For the cases where performance (or formal correctness, or whatever) matters, the main challenge is usually to convince the market that it's worth paying for, and then finding the right developer project match.
This is all false for APL. First, APL is right there in the history with Fortran and Lisp - and its "every ten years" became irrelevant already when it was used to analyze IBM 360 hardware and found some bugs which were fixed in time for shipping. Second, these languages are truly for very wide ranges of problems - it's the industry curse that APL isn't used more widely; I guess the reason is it's harder to learn. But you can do can use it for really everything. Who'd thing JavaScript would be the language to write Linux emulator? So it's less wonder that k is used for OS writing.