Kawa — fast scripting on the Java platform

10 min read Original article ↗
LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.

Kawa is a general-purpose Scheme-based programming language that runs on the Java platform. It aims to combine the strengths of dynamic scripting languages (less boilerplate, fast and easy start-up, a read-eval-print loop or REPL, no required compilation step) with the strengths of traditional compiled languages (fast execution, static error detection, modularity, zero-overhead Java platform integration). I created Kawa in 1996, and have maintained it since. The new 2.0 release has many improvements.

Projects and businesses using Kawa include: MIT App Inventor (formerly Google App Inventor), which uses Kawa to translate its visual blocks language; HypeDyn, which is a hypertext fiction authoring tool; and Nü Echo, which uses Kawa for speech-application development tools. Kawa is flexible: you can run source code on the fly, type it into a REPL, or compile it to .jar files. You can write portably, ignoring anything Java-specific, or write high-performance, statically-typed Java-platform-centric code. You can use it to script mostly-Java applications, or you can write big (modular and efficient) Kawa programs. Kawa has many interesting features; below we'll look at a few of them.

Scheme and standards

Kawa is a dialect of Scheme, which has a long history in programming-language and compiler research, and in teaching. Kawa 2.0 supports almost all of R7RS (Revised7 Report on the Algorithmic Language Scheme), the 2013 language specification. (Full continuations is the major missing feature, though there is a project working on that.) Scheme is part of the Lisp family of languages, which also includes Common Lisp, Dylan, and Clojure.

One of the strengths of Lisp-family languages (and why some consider them weird) is the uniform prefix syntax for calling a function or invoking an operator:

    (op arg1 arg2 ... argN)

If

op

is a function, this evaluates each of

arg1

through

argN

, and then calls

op

with the resulting values. The same syntax is used for arithmetic:

    (+ 3 4 5)

and program structure:

    ; (This line is a comment - from semi-colon to end-of-line.)
    ; Define variable 'pi' to have the value 3.13.
    (define pi 3.13)

    ; Define single-argument function 'abs' with parameter 'x'.
    (define (abs x)
      ; Standard function 'negative?' returns true if argument is less than zero.
      (if (negative? x) (- x) x)

Having a simple regular core syntax makes it easier to write tools and to extend the language (including new control structures) via macros.

Performance and type specifiers

Kawa gives run-time performance a high priority. The language facilitates compiler analysis and optimization. Flow analysis is helped by lexical scoping and the fact that a variable in a module (source file) can only be assigned to in that module. Most of the time the compiler knows which function is being called, so it can generate code to directly invoke a method. You can also associate a custom handler with a function for inlining, specialization, or type-checking.

To aid with type inference and type checking, Kawa supports optional type specifiers, which are specified using two colons. For example:

    (define (find-next-string strings ::vector[string] start ::int) ::string
      ...)

This defines find-next-string with two parameters: strings is a vector of strings, and start is a native (Java) int; the return type is a string.

Kawa also does a good job of catching errors at compile time.

The Kawa runtime doesn't need to do a lot of initialization, so start-up is much faster than other scripting languages based on the Java virtual machine (JVM). The compiler is fast enough that Kawa doesn't use an interpreter. Each expression you type into the REPL is compiled on-the-fly to JVM bytecodes, which (if executed frequently) may be compiled to native code by the just-in-time (JIT) compiler.

Function calls and object construction

If the operator op in an expression like (op arg1 ... argN)) is a type, then the Kawa compiler looks for a suitable constructor or factory method.

    (javax.swing.JButton "click here")
    ; equivalent to Java's: new javax.swing.JButton("click here")

If the op is a list-like type with a default constructor and has an add method, then an instance is created, and all the arguments are added:

    (java.util.ArrayList 11 22 33)
    ; evaluates to: [11, 22, 33]

Kawa allows keyword arguments, which can be used in an object constructor form to set properties:

    (javax.swing.JButton text: "Do it!" tool-tip-text: "do it")

The Kawa manual has more details and examples. There are also examples for other frameworks, such as for Android and for JavaFX.

Other scripting languages also have convenient syntax for constructing nested object structures (for example Groovy builders), but they require custom builder helper objects and/or are much less efficient. Kawa's object constructor does most of the work at compile-time, generating code as good as hand-written Java, but less verbose. Also, you don't need to implement a custom builder if the defaults work, as they do for Swing GUI construction, for example.

Extended literals

Most programming languages provide convenient literal syntax only for certain built-in types, such as numbers, strings, and lists. Other types of values are encoded by constructing strings, which are susceptible to injection attacks, and which can't be checked at compile-time.

Kawa supports user-defined extended literal types, which have the form:

    &tag{text}

The

tag

is usually an identifier. The

text

can have escaped sub-expressions:

    &tag{some-text&[expression]more-text}

The

expression

is evaluated and combined with the literal

text

. Combined is often just string-concatenation, but it can be anything depending on the

&tag

. As an example, assume:

    (define base-uri "http://example.com/")

then the following concatenates

base-uri

with the literal

"index.html"

to create a new URI object:

    &URI{&[base-uri]index.html}

The above example gets de-sugared into:

    ($construct$:URI $<<$ base-uri $>>$ "index.html")

The $construct$:URI is a compound name (similar to an XML "qualified name") in the predefined $construct$ namespace. The $<<$ and $>>$ are just special symbols to mark an embedded sub-expression; by default they're bound to unique empty strings. So the user (or library writer) just needs to provide a definition of the compound name $construct$:URI as either a procedure or macro, resolved using standard Scheme name lookup rules; no special parser hooks or other magic is involved. This procedure or macro can do arbitrary processing, such as construct a complex data structure, or search a cache.

Here is a simple-minded definition of $construct$:URI as a function that just concatenates all the arguments (the literal text and the embedded sub-expressions) using the standard string-append function, and passes the result to the URI constructor function:

    (define ($construct$:URI . args)
      (URI (apply string-append args)))

The next section uses extended literals for something more interesting: shell-like process forms.

Shell scripting

Many scripting languages let you invoke system commands (processes). You can send data to the standard input, extract the resulting output, look at the return code, and sometimes even pipe commands together. However, this is rarely as easy as it is using the old Bourne shell; for example command substitution is awkward. Kawa's solution is two-fold:

  1. A "process expression" (typically a function call) evaluates to a Java Process value, which provides access to a Unix-style (or Windows) process.
  2. In a context requiring a string, a Process is automatically converted to a string comprising the standard output from the process.

A trivial example:

   #|kawa:1|# (define p1 &`{date --utc})

("#|...|#" is the Scheme syntax for nestable comments; the default REPL prompt has that form to aid cutting and pasting code.)

The &`{...} syntax uses the extended-literal syntax from the previous section, where the backtick is the 'tag', so it is syntactic sugar for

    ($construct$:` "date --utc")

where

$construct$:`

might be defined as:

(define ($construct$:` . args) (apply run-process args))

This in turns translates into an expression that creates a

gnu.kawa.functions.LProcess

object, as you see if you

write

it:

    #|kawa:2|# (write p1)
    gnu.kawa.functions.LProcess@377dca04

An LProcess is automatically converted to a string (or bytevector) in a context that requires it. This means you can convert to a string (or bytevector):

    #|kawa:3|# (define s1 ::string p1) ; Define s1 as a string.
    #|kawa:4|# (write s1)
    "Wed Nov  1 01:18:21 UTC 2014\n"
    #|kawa:5|# (define b1 ::bytevector p1)
    (write b1)
    #u8(87 101 100 32 74 97 110 ... 52 10)

The display procedure prints the LProcess in "human" form, as a unquoted string:

    #|kawa:6|# (display p1)
    Wed Nov  1 01:18:21 UTC 2014

This is also the default REPL formatting:

    #|kawa:7|# &`{date --utc}
    Wed Nov  1 01:18:22 UTC 2014

We don't have room here to discuss redirection, here documents, pipelines, adjusting the environment, and flow control based on return codes, though I will briefly touch on argument processing and substitution. See the Kawa manual for details, and here for more on text vs. binary files.

Argument processing

To substitute the result of an expression into the argument list is simple using the &[] construct:

    (define my-printer (lookup-my-printer))
    &`{lpr -P &[my-printer] log.pdf}

Because a process is auto-convertible to a string, no special syntax is needed for command substitution:

    &`{echo The directory is: &[&`{pwd}]}

though you'd normally use this short-hand:

    &`{echo The directory is: &`{pwd}}

Splitting a command line into arguments follows shell quoting and escaping rules. Dealing with substitution depends on quotation context. The simplest case is when the value is a list (or vector) of strings, and the substitution is not inside quotes. In that case each list element becomes a separate argument:

    (define arg-list ["-P" "office" "foo.pdf" "bar.pdf"])
    &`{lpr &[arg-list]}

An interesting case is when the value is a string, and we're inside double quotes; in that case newline is an argument separator, but all other characters are literal. This is useful when you have one filename per line, and the filenames may contain spaces, as in the output from find:

    &`{ls -l "&`{find . -name '*.pdf'}"}

This solves a problem that is quite painful with traditional shells.

Using an external shell

The sh tag uses an explicit shell, like the C system() function:

    &sh{lpr -P office *.pdf}

This is equivalent to:

    &`{/bin/sh "lpr -P office *.pdf"}

Kawa adds quotation characters in order to pass the same argument values as when not using a shell (assuming no use of shell-specific features such as globbing or redirection). Getting shell quoting right is non-trivial (in single quotes all characters except single quote are literal, including backslash), and not something you want application programmers to have to deal with. Consider:

    (define authors ["O'Conner" "de Beauvoir"])
    &sh{list-books &[authors]}

The command passed to the shell is the following:

    list-books 'O'\''Conner' 'de Beauvoir'

Having quoting be handled by the $construct$:sh implementation automatically eliminates common code injection problems. I intend to implement a &sql form that would avoid SQL injection the same way.

In closing

Some (biased) reasons why you might choose Kawa over other languages, concentrating on those that run on the Java platform: Java is verbose and requires a compilation step; Scala is complex, intimidating, and has a slow compiler; Jython, JRuby, Groovy, and Clojure are much slower in both execution and start-up. Kawa is not standing still: plans for the next half-year include a new argument-passing convention (which will enable ML-style patterns); full continuation support (which will help with coroutines and asynchronous event handling); and higher-level optimized sequence/iteration operations. I hope you will try out Kawa, and that you will find it productive and enjoyable.

Index entries for this article
GuestArticlesBothner, Per