A look at the J language: the fine line between genius and insanity
I’ve been looking for a decent TSDB for years now. Took a shot at writing one in Lush using HDF5 (as others have done), but the experiments I did raised more questions than I got answers. I’m sure it can be done; I’m also sure it will be a compromise and source of endless suffering. Since many people are using Q/KDB+ to store order book and tick data, I figured I’d have a look at the APL family of languages, in case the same trick is possible elsewhere. I’ve fiddled in Q before; it’s pretty good, and the APL-ness doesn’t scare me. Anyone who has fiddled with functional programming can do things in Q. The problem is, the price is not right for me at this stage.
The first APL I looked at was A+, by complete accident, because I’m writing some funny stuff for Taki on some self-regarding numskulls who call themselves by the same name. A+ is a venerable language, apparently still used at Morgan Stanley. It seems to be old school APL, using the wacky character set and everything. There’s something to be said for the wacky character set, but I don’t want to have to memorize keystrokes or deal with weird fonts, so I quickly moved on.
Second one I looked at was Kona, which is a copy of the K3 language (an earlier version of the thing that Kx systems KDB+ is based on; they’re now up to K4). The C source code for this is intensely beautiful, and very concise. Go look at it! Reading his source will make you a better person, even if you don’t understand what is going on. I was hoping it would have some doodads built in it, and that I could recycle work done in K3 into K4/Q/KDB+, but it’s not there yet.
A wise old futures trader (who has been using old school APL since the punched card era) told me about J and JDB. J is an ancestor of K. It’s now up to J7. JDB is a columnar database written in J; pretty much a free version of KDB+. I was expecting the usual dead language experience, but found a small, friendly and patient community of very smart people. Not the mixture of programmy smartness and smarm you’ll find in Lisp-land. These guys are math smart, stats smart, programmy smart, and just plain smart smart! I guess writing code that looks like line noise means, you have to be smart. They’re also extremely helpful. I mean, I asked a fairly n00b question, and got this in response. They didn’t just feed me a rote answer; they gave a damn if I understood what’s going on, and what the right way to do it was. No tinfoil helmeted one-true wayism involved. These fellas have a good tool; if you’re interested, they’ll show you how to use it. If you’re not, well, that’s OK too. They certainly seem to have a sense of humour; self-deprecation is a nice break from the galloping narcissism of many programmer communities.
Community is important. Ecosystem is more important. What did I find there? The first thing I noticed is the installation package; it’s a shar archive. That caused me to cock an eyebrow: old school. The second thing I found is I could not use my beloved emacs in an easy way with the latest version of J (you can with the last version). On the other hand, they wrote a very good interactive development environment; jgtk; it has all the standard knobs and buzzers; CPAN-like package installer, debugger, source control hooks, console, project manager; the works. They also included an excellent tool (JHS) for running J things in your browser. Why would you want to do that? Well, to go through examples in the excellent tutorials, wiki examples and labs that it links to. My first impressions are, both the browser and GTK IDEs are very good. I’m not used to them yet, but they are very well thought out. Most such things are thrown together in a slapdash way, and have obvious flaws on initial inspection. This one has no obvious flaws.
Why no big flaws? I’m guessing because this is a language that demands attention and mindfulness. Every character carries a lot of meaning. A line of J could replace a page of just about anything else. In any other language, your brain ain’t working half the time; instantiating things, making iterators go, building brain dead switch statements, dealing with preposterous function call overhead or declarations, writing dumb helper patterns that are a tiny variation on something you have done 100 times before. With J you have to pay attention at all times. The line-noise look of the language makes you think better. I’m hoping it also makes you more productive; I figure a page of code a day is a decent amount of output; a page of J will do a lot of useful work.
The language: I don’t know it very well yet, so I can’t say anything too clever. It is definitely a data oriented language, assuming your data fits into an array. It has boxed cells for things that are not arrays. Since it is an array language, it has built-in sparse arrays, which are a big help for serious numerics work. It also does OO type things using namespaces when you need that sort of thing. The “lists” are vectors, though since it’s an array language, you shouldn’t miss linked lists much. One thing about J and the APL languages which is fairly different; it is structured like a spoken language (at least, like an Indo-European language). It has verbs (more or less like functions), nouns (data), adverbs (things which change functions) and compound verbs. The verbs can function on things which come before or after, more or less like real-language verbs. Loops are very much depreciated: noun and verb rank dictate what happens when you want to “apply a function to many things,” and you can modify what happens by using conjunctions. This sounds trivial, but it’s not. This means you can use the same code on things of radically different “shape.”
Learning: the best quick intro I’ve seen so far is the primer. Deeper (I’m only halfway through myself) is J for C Programmers. Reading and altering code didn’t work for me out of the gates, as there are no familiar landmarks to the syntax, which really does look like line noise. It’s pretty easy to put bits of it together , once you know the basics.
One thing which isn’t well documented in the learning process: foreigns and global settings. For example, 9!:3 ] 2 5 has found its way into my defaults, as it helpfully prints out a sort of graphical s-expression of verb expansions (you can do it in tree or parenthesis format as well, but this “boxed” format makes the most sense to me). “Foreigns” like this are useful and it’s an appealing way of controlling things -much more so than using R’s options settings. Intuitive? No. Neither are R’s bewildering options, which are a continual source of misery. There are all manner of neat things accessible in this way. For example, want to know what the pool allocator is doing, type: 7!:3 ” -or how much memory an object p uses, 7!:5 <’p’ -it’s not obvious at first, but it is documented and helpful.
One thing I found perplexing: the special forms used to define verbs. Why is:
myverb=. 3: 0 (stuff) equivalent to
myverb=. verb define (stuff)
I guess it saves some typing, and you do get used to it, but that’s just WEIRD. I’m assuming this is historical stuff, programming IBM 360 registers directly or something, the way cdr/car used to mean something physical on the computing machine. There’s lots of tricks like that; it would be nice if they were all documented in one handy place, with the more conversational alternatives.
Language feel: J is metal. It’s spare and powerful, and for an interpreted language, it feels very close to the hardware. It’s small enough to understand the intestines, and most of it seems to be very lean and speedy. Memory management is malloc/free, reference counting for arrays and a pool for smaller objects; it works very well and thus far I haven’t been able to make it burp or run out of memory embarrassingly, even when working with data much larger than the memory on my laptop. I’m pretty good at running out of memory in R or Lisp; I’ll probably figure out a way to do so in J eventually, but so far, so good. Of course, J is excellent at dealing with large amounts of data on the disc without thinking about it too much; something most languages suck at. The FFI also looks dirt simple, and it seems to have decent facilities for calling foreign libraries.
Packages: there is a good set of packages. It isn’t anywhere near what CPAN or CRAN is, but it’s got a some helpful tools in it, and they’re easily installed using the package installer in the IDE. One of note is the plot package, which produces output as nice or nicer than what R does. Examples from the plot demo below. I’m told the plot package is useful enough, it is used as an adjunct to Q, which lacks such a thing. There are several other plotting packages with different capabilities, though I can’t see myself exploring them much.
Other useful stuff I’ve explored a bit, an excellent profiler (load ‘jpm’), the aforementioned data/JDB columnar database, a decent date/time class in types/datetime, various other database interfaces, some primitive optimization routines in math/deoptim, math/fftw for Fourier transforms, lapack (not supported on 64 bit apparently), tools for talking to R in stats/r, some basic statistical distributions in stats/distrib, a lint system, and the excellent set of “labs” designed to help the user learn about J, generally while teaching some interesting piece of math.
Will I actually use this thing to solve useful problems? Hard to say at present; I’m having fun with it for now. The potential killer app which could keep me in J-land is the JDB database. I haven’t developed a test script for it yet to really put it through its paces, nor do I have a big machine capable of acting as a ticker plant, but early experiments are encouraging, and such things will eventually be explored more fully. It probably doesn’t offer any significant performance advantages over a home made HDF5 type thing, and probably even has drawbacks on very large data. On the other hand, most of the hard work is done, and that counts for a lot. I probably won’t be doing things like reaching for J to write new kinds of Hidden Markov models (R has more doodads for that), but I might use it to code up a Kalman filter or two. Certainly, finding new ways of using it to talk to R will be mandatory (calling J from R, rather than the other way around seems useful). If you’re interested in numerics or different programming paradigms, it is worth a look. There is a reason it has lasted as long as it has. It is really a shame things like Matlab ended up taking over this problem space; the APL family is a much more elegant solution to this sort of problem. Yes, J is kind of bonkers, but it’s a good kind of bonkers. Even if I never use it, J is a fascinating view into how a very smart group of folks solve hard problems.
Cool things to look at:
J dudes like puzzles.
Lots of helpful articles at the British APL association’s publication Vector
All kinds of educational J/mathematics essays