Locklin on science

A look at the J language: the fine line between genius and insanity

Posted in J, tools by Scott Locklin on September 18, 2012

I’ve been looking for a decent TSDB for years now. Took a shot at writing one in Lush using HDF5 (as others have done), but the experiments I did raised more questions than I got answers. I’m sure it can be done; I’m also sure it will be a compromise and source of endless suffering. Since many people are using Q/KDB+ to store order book and tick data, I figured I’d have a look at the APL family of languages, in case the same trick is possible elsewhere. I’ve fiddled in Q before; it’s pretty good, and the APL-ness doesn’t scare me. Anyone who has fiddled with functional programming can do things in Q. The problem is, the price is not right for me at this stage.

The first APL I looked at was A+, by complete accident, because I’m writing some funny stuff for Taki on some self-regarding numskulls who call themselves by the same name. A+ is a venerable language, apparently still used at Morgan Stanley. It seems to be old school APL, using the wacky character set and everything. There’s something to be said for the wacky character set, but I don’t want to have to memorize keystrokes or deal with weird fonts, so I quickly moved on.

Second one I looked at was Kona, which is a copy of the K3 language (an earlier version of the thing that Kx systems KDB+ is based on; they’re now up to K4). The C source code for this is intensely beautiful, and very concise. Go look at it! Reading his source will make you a better person, even if you don’t understand what is going on. I was hoping it would have some doodads built in it, and that I could recycle work done in K3 into K4/Q/KDB+, but it’s not there yet.

A wise old futures trader (who has been using old school APL since the punched card era) told me about J and JDB. J is an ancestor of K. It’s now up to J7. JDB is a columnar database written in J; pretty much a free version of KDB+. I was expecting the usual dead language experience, but found a small, friendly and patient community of very smart people. Not the mixture of programmy smartness and smarm you’ll find in Lisp-land. These guys are math smart, stats smart, programmy smart, and just plain smart smart! I guess writing code that looks like line noise means, you have to be smart. They’re also extremely helpful.  I mean, I asked a fairly n00b question, and got this in response. They didn’t just feed me a rote answer; they gave a damn if I understood what’s going on, and what the right way to do it was.  No tinfoil helmeted one-true wayism involved.  These fellas have a good tool; if you’re interested, they’ll show you how to use it. If you’re not, well, that’s OK too. They certainly seem to have a sense of humour; self-deprecation is a nice break from the galloping narcissism of many programmer communities.

Community is important. Ecosystem is more important. What did I find there? The first thing I noticed is the installation package; it’s a shar archive. That caused me to cock an eyebrow: old school. The second thing I found is I could not use my beloved emacs in an easy way with the latest version of J (you can with the last version). On the other hand, they wrote a very good interactive development environment; jgtk; it has all the standard knobs and buzzers; CPAN-like package installer, debugger, source control hooks, console, project manager; the works. They also included an excellent tool (JHS) for running J things in your browser. Why would you want to do that? Well, to go through examples in the  excellent tutorials, wiki examples and labs that it links to. My first impressions are, both the browser and GTK IDEs are very good. I’m not used to them yet, but they are very well thought out. Most such things are thrown together in a slapdash way, and have obvious flaws on initial inspection. This one has no obvious flaws.

Why no big  flaws? I’m guessing because this is a language that demands attention and mindfulness. Every character carries a lot of meaning. A line of J could replace a page of just about anything else. In any other language, your brain ain’t working half the time; instantiating things, making iterators go, building brain dead switch statements, dealing with preposterous function call overhead or declarations, writing dumb helper patterns that are a tiny variation on something you have done 100 times before. With J you have to pay attention at all times. The line-noise look of the language makes you think better. I’m hoping it also makes you more productive; I figure a page of code a day is a decent amount of output; a page of J will do a lot of useful work.

The language: I don’t know it very well yet, so I can’t say anything too clever. It is definitely a data oriented language, assuming your data fits into an array. It has boxed cells for things that are not arrays. Since it is an array language, it has built-in sparse arrays, which are a big help for serious numerics work. It also does OO type things using namespaces when you need that sort of thing. The “lists” are vectors, though since it’s an array language, you shouldn’t miss linked lists much. One thing about J and the APL languages which is fairly different; it is structured like a spoken language (at least, like an Indo-European language).  It has verbs (more or less like functions), nouns (data), adverbs (things which change functions) and compound verbs. The verbs can function on things which come before or after, more or less like real-language verbs. Loops are very much depreciated: noun and verb rank dictate what happens when you want to “apply a function to many things,” and you can modify what happens by using conjunctions. This sounds trivial, but it’s not. This means you can use the same code on things of radically different “shape.”
Learning: the best quick intro I’ve seen so far is the primer. Deeper (I’m only halfway through myself) is J for C Programmers. Reading and altering code didn’t work for me out of the gates, as there are no familiar landmarks to the syntax, which really does look like line noise. It’s pretty easy to put bits of it together , once you know the basics.

One thing which isn’t well documented in the learning process: foreigns and global settings. For example,  9!:3 ] 2 5  has found its way into my defaults, as it helpfully prints out a sort of graphical s-expression of verb expansions (you can do it in tree or parenthesis format as well, but this “boxed” format makes the most sense to me). “Foreigns” like this are useful and it’s an appealing way of controlling things -much more so than using R’s options settings. Intuitive? No. Neither are R’s bewildering options, which are a continual source of misery. There are all manner of neat things accessible in this way. For example, want to know what the pool allocator is doing, type: 7!:3 ”  -or how much memory an object p uses, 7!:5 <‘p’ -it’s not obvious at first, but it is documented and helpful.

One thing I found perplexing: the special forms used to define verbs. Why is:

myverb=. 3: 0   (stuff) equivalent to

myverb=. verb define (stuff)

I guess it saves some typing, and you do get used to it, but that’s just WEIRD. I’m assuming this is historical stuff, programming IBM 360 registers directly or something, the way cdr/car used to mean something physical on the computing machine. There’s lots of tricks like that; it would be nice if they were all documented in one handy place, with the more conversational alternatives.

Language feel: J is metal. It’s spare and powerful, and for an interpreted language, it feels very close to the hardware. It’s small enough to understand the intestines, and most of it seems to be very lean and speedy. Memory management is malloc/free,  reference counting for arrays and a pool for smaller objects; it works very well and thus far I haven’t been able to make it burp or run out of memory embarrassingly, even when working with data much larger than the memory on my laptop. I’m pretty good at running out of memory in R or Lisp; I’ll probably figure out a way to do so in J eventually, but so far, so good. Of course, J is excellent at dealing with large amounts of data on the disc without thinking about it too much; something most languages suck at. The FFI also looks dirt simple, and it seems to have decent facilities for calling foreign libraries.

Packages: there is a good set of packages. It isn’t anywhere near what CPAN or CRAN is, but it’s got a some helpful tools in it, and they’re easily installed using the package installer in the IDE. One of note is the plot package, which produces output as nice or nicer than what R does. Examples from the plot demo below. I’m told the plot package is useful enough, it is used as an adjunct to Q, which lacks such a thing. There are several other plotting packages with different capabilities, though I can’t see myself exploring them much.

Other useful stuff I’ve explored a  bit, an excellent profiler (load ‘jpm’), the aforementioned data/JDB columnar database, a decent date/time class in types/datetime, various other database interfaces, some primitive optimization routines in math/deoptim, math/fftw  for Fourier transforms, lapack (not supported on 64 bit apparently), tools for talking to R in stats/r, some basic statistical distributions in stats/distrib, a lint system, and the excellent set of “labs” designed to help the user learn about J, generally while teaching some interesting piece of math.

Will I actually use this thing to solve useful problems? Hard to say at present; I’m having fun with it for now.  The potential killer app which could keep me in J-land is the JDB database. I haven’t developed a test script for it yet to really put it through its paces, nor do I have a big machine capable of acting as a ticker plant, but early experiments are encouraging, and such things will eventually be explored more fully. It probably doesn’t offer any significant performance advantages over a home made HDF5 type thing, and probably even has drawbacks on very large data. On the other hand, most of the hard work is done, and that counts for a lot. I probably won’t be doing things like reaching for J to write new kinds of Hidden Markov models (R has more doodads for that), but I might use it to code up a Kalman filter or two.  Certainly, finding new ways of using it to talk to R will be mandatory (calling J from R, rather than the other way around seems useful).  If you’re interested in numerics or different programming paradigms, it is worth a look. There is a reason it has lasted as long as it has. It is really a shame things like Matlab ended up taking over this problem space; the APL family is a much more elegant solution to this sort of problem. Yes, J is kind of bonkers, but it’s a good kind of bonkers. Even if I never use it, J is a fascinating view into how a very smart group of folks solve hard problems.

Cool things to look at:

http://www.jsoftware.com/papers/elegant2.htm

J dudes like puzzles.

Lots of helpful articles at the British APL association’s publication Vector

All kinds of educational J/mathematics essays

About these ads

12 Responses

Subscribe to comments with RSS.

  1. a. bonser said, on September 19, 2012 at 2:20 pm

    Wow ! One of the best articles I’ve read in a long time and how serendipitous. I’ve been trying to find a language like APL for the better part of 6 months and you’ve brought up J. Java has always felt like trying to walk through a bog and Matlab always cries “Buy more stuff !”. BTW, excellent turn of phrase “It’s small enough to understand the intestines”; I’m stealing that.

  2. joatmon said, on September 19, 2012 at 4:21 pm

    Hey Scott, it’s great you’re enjoying J, we love helping others in our community to grow and learn. Another channel you might be aware of but didn’t mention that you might find useful is the #jsoftware IRC channel on Freenode. There is also a subreddit at reddit.com/r/apljk that has useful links posted and in the sidebar.

    Welcome and have fun!

    • Scott Locklin said, on September 19, 2012 at 8:08 pm

      Thanks!

  3. Adi S said, on September 19, 2012 at 4:22 pm

    > Why is:
    > myverb=. 3: 0 (stuff) equivalent to
    > myverb=. verb define (stuff)

    Typing ‘verb’ and ‘define’ into the jijx window will be instructive here – they’re just names for 3 and :0 respectively :)

  4. dcaisen said, on September 19, 2012 at 7:37 pm

    was also recently looking at kdb vs hdf5 vs ? for storing a tick db and found this very interesting- ty

    • Scott Locklin said, on September 19, 2012 at 9:08 pm

      Ticks, J can do. Order books; I’m not so sure, though I intend to find out.
      HDF5 can do all this as well. I guess the advantages of J are development time and the idea that somebody else made decent engineering choices for you.

  5. Petri said, on September 23, 2012 at 9:11 pm

    “A+” for Arthur:

    http://kx.com/executive-team.php

    • Scott Locklin said, on September 23, 2012 at 9:16 pm

      He’s the man.

  6. D M said, on October 16, 2012 at 9:58 pm

    Would be great to get a follow-up at some point on your J experiences. I’m a heavy R user for data analysis (but using C++ for more back-end stuff).

    • Scott Locklin said, on October 16, 2012 at 10:31 pm

      Eventually.

  7. vikram krishnan (@eipiplusoneiso) said, on February 18, 2013 at 1:29 pm

    Hey Scott, enjoy your posts a lot. Have you had a look at Go:

    http://gigaom.com/2012/09/13/will-go-be-the-new-go-to-programming-language/

    It apparently is the next C++ killer (haven’t we heard that before). Apparently, it has been designed for high performance (static typing etc)

    • Scott Locklin said, on February 18, 2013 at 7:42 pm

      I glanced at it when it came out. It didn’t have some very obvious thing I need (I don’t remember what it was; probably floating point math), and it appeared to be … yet another java clone with no obvious reason for existing. Sure, it makes some design choices which appear to be better than the ones Java did, but that’s no compelling reason to switch.
      That article is wrong: Ruby/Python have REPLs. Go doesn’t. That means it will never inhabit the same place in the ecosystem. Go is really competing with Java, C++ or Scala. Also, it bothers me that a google employee picked such an unsearchable name, even though they seem to have fixed it in their back end.
      It’s worth noticing that Google tried to hire me about 6 months ago to sling … Objective-C. Why they use Objective-C, or think I do, I don’t know, but if they don’t believe in their own language well enough to use it, why should I? FWIIW, Objective-C is a very good language; it is a shame this didn’t win over C++. Whoever is using it at Google is smart.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 335 other followers

%d bloggers like this: