Language Field Trip: IDL

All aboard the school bus, we’re going on a field trip. Did you know there are things that might actually be worse than PHP? It’s true! It’s true and it causes me to doubt the goodness of the cosmos. If you are a tumblr URL purist, I apologize for the deviation from the strict theme of the PHP manual, but I promise there are truly some other masterpieces of program design to be unearthed.

A few years ago, as I was finishing up my degree, I tried very hard to get a job as a programmer for some radio astronomers because radio astronomy freakin’ rules. Unfortunately I graduated right into the very heart of the bad economy, so that didn’t pan out and now I’m a professional hacker or something (I’m not really sure). Preparing for the interviews, however, brought me into sustained contact with a commercial programming language environment called Interactive Data Language aka IDL. It’s for scientific programming, it has some neat things like built-in cartography data, and it’s terrible.

IDL dates to the late 1970s and it shows in every facet of its being. There is of course a reason anyone ever used it in the first place: it is a language oriented to efficient transforms of entire arrays, which is exactly what scientists working on datasets want. In modern times, languages like Python have filled this role about six bajillion times better, but the dark legacy of, well, legacy code lives on. There have been improvements in recent years – apparently it now has automatic GC(!) and a lot of new graph types – but most of the things I’ll point out here can’t be changed without breaking legacy code, and perusing the current code samples on the site does not make the language seem particularly fundamenantally improved.

To avoid the tedium of retypesetting several tables and code listings, this manual masterpiece is structured around screenshots taken from a book called Practical IDL Programming by Liam E. Gumley. It’s a bit dated but, as mentioned, they had a legacy problem then and they have a legacy problem now. (The current website of IDL does a good job of not making it at all obvious where the official documentation is. It’s here.) The screenshots constitute a very small portion of the overall book, mostly from chapter 2, used for critique purposes bla bla bla. (If you are in tumblr dashboard view, click/tap on any image thumbnail to expand all of them.)

Let’s begin with giving you a taste of what we’re dealing with: this is a while loop from a larger program.

I want to point out one thing in particular. on_ioerror sets a goto for any future IO errors within the current function scope (so why is the statement inside a while loop?). That should set the tone for how this language works. (For the record, I am a fan of a well-placed goto in low-level code; after all, sometimes I program in asm for fun.)

I don’t even have any idea what order I should present these in. It’s just a steady trickle of arbitrary WTF.

Tiny Integers

Quick! What’s the default integer size in a Big Data scientific programming environment? 64-bit, or do we cheap out and use 32-bit to align with the native width of more modest machines? Or do we define it to be the width of the currently executing machine?

Don’t be ridiculous! Integers are sixteen bit. 32 bits are for long types! (And note how it freely admits that Typecast Hell is a threat you must stand ever vigilant against, as though a programming language actively creating problems for you is simply how they are.)

There may be some vague idea that this is to align with 16-bits-per-pixel image storage formats which I presume were more dense on the ground in the 80s. Or maybe most scientists really did have 16-bit machines (can’t afford a VAX?) and the performance penalty of using larger integers was a huge problem. I don’t know, I wasn’t born yet. In any case, this is a wonderful inheritance passed down from generation to generation: having to remember in 2013 that all your low integer literals are being declared as signed 16-bit. Check this out:

That’s right! You have to remember to explicitly cast your literals if you want them to be comparable to a number north of about thirty two thousand!

Has your head hit the desk yet? Get a pillow. Trust me.

Odd and Even Booleans

Hey, you know what would save like, one whole opcode in the runtime’s boolean routine? If we only checked the lowest bit of an integer! BRILLIANT here is your Christmas bonus, Engineer Shortsighted! Oh no your Christmas bonus is an even number of dollars so if(ChristmasBonus) doesn’t evaluate to true.

So… 2.0 is true and 2 is false? – faithful follower @sakjur

This has consequences that break perfectly sensible design patterns:

And standard library routines explicitly defy the linguistic definition of boolean:

And this is as good a place as any to mention that not setting a flag is not necessarily the same thing as setting the flag to false??? Apparently an example is /noclip in graph drawing. I dunno.

Procedures and Functions, Parameters and Keywords

IDL maintains a first-class distinction between procedures (doesn’t return a value) and functions (does return a value) which I think most people see as kind of pointless these days; even C doesn’t care very much. This in and of itself is just a quirk, but the syntax for calling them is completely different and in the case of procedures is just weird:

IDL> procedurename, argument, argument

It’s just like… a comma-delimited list, floating in space? The name of the procedure is not differentiated from its arguments except by virtue that it’s first. It’s gross and I don’t see any reason it should be structured differently from function calls, which take a more typical name(arg, arg) style.

IDL also has a first-class distinction between mandatory arguments, called parameters, and optional arguments, called keywords (in contrast to what “keyword” means in most other languages). “Mandatory” is apparently a bit too strong of a term because “a well-written procedure or function will check that any mandatory input arguments are defined before doing anything else.” An apparently intentional misfeature is you can pass non-existent variables as arguments and expect them to suddenly have meaningful contents in the caller’s scope as a side effect of the function.

Of course, the language contains both pass-by-value and pass-by-reference, and which applies when is of course entirely consistent and intuitive!

I mean, it’s obvious to everyone here that an array which is a subset of another array is a fundamentally different type of data than an array that isn’t, right? Of course such rules would be totally different. (I’m contractually obligated to tell you that my best friend wants you to know this is also how Python does it. Well, I never claimed to want to marry Python, now did I! Edit: Except in Numpy, apparently, where it works the way I think is Right and True, which is probably why I thought Python was Right and True, as I’ve used Numpy for something before.)

Since procedures and functions are completely not the same thing, of course the error messages for not being able to find one are completely different:

Read that second one carefully and let the horror sink in: it cannot distinguish between an invalid function name and an uninitialized array. Unless of course you happened to use a single keyword argument to your invalid function name, in which case you get a third unique error message:

Arrays

This sounds reasonable in isolation:

But these are both true in the same language. You see, bad indexes in an array are less bad than lone wolf bad indexes. The companionship tames them.

We already hinted at this one: shipping syntax ambiguity, waking up the next morning, and shipping both ambiguity and non-ambiguity going forward.

Pointers

Yes, it’s a high level language. Yes, there are pointers. I suppose they’re really handles or something.

Accessing undefined variables through pointers: a critical and useful feature and definitely not a cause of interesting bugs.

Quirky?

Yucky.

Assorted Brain Damage

Followed by a code sample that explodes for x = 0 due to lack of short circuiting, of course.

The creat school of function naming thought - ie Ken Thompson’s Regret.

Excerpt from a much larger table – the implication being that there is no hexadecimal notation.

Wait, there are objects?! (And strings are limited to 32 kilobytes?!) THERE ARE OBJECTS?!?! AND YOU’RE JUST NOT GONNA MENTION THAT AGAIN IN OVER FIVE HUNDRED PAGES?!?!

“Don’t bother correctly specifying the expected input. That will just increase the rate at which malformed data is rejected instead of stuffed into places it doesn’t fit!”