rando Q vs J thoughts
“APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums.” -Edsger Dijkstra
TLDR: J is a larger and more powerful programming language than Q in many ways (as a language, not as a tool). Q is a lot easier to use, especially if your problem set falls into the broad category of timeseries/trading problems. J is more complete in that there are a lot of ideas baked into J; it is a huge language in comparison to Q, but Q solves specific problems. This is natural as J evolved directly from APL: it’s evident from looking at the language features it was originally meant to be a sort of Matlab meets Real Programming Language (Matlab is hot garbage as a language; great for cranking out numerics though). This is a very general set of ambitions compared to what Q does, which is solve timeseries problems for banks.
When I talk about various features in Q, I figure some of it applies to K as well. As K4 is more or less undocumented, it’s difficult to say. BTW if anyone has a K4 manual I’d love to see it; I have everything from K3, and it probably ain’t that different (I assume the main difference is the dictionary behavior), but I sure would appreciate it. My understanding of K3 and what I know of K4 is that it is a much more general and paradigmy idea than Q is, though it is still bent towards being an implementation language for Q/KDB+.
One of the most important differences between them is the concept of rank and arrays. Q is, rather awkwardly, not really array oriented; its arrays are tables, which are dictionaries of equal length vectors. These are laid out very differently in memory from J arrays. Most sane people work with 2-d arrays and tables work just fine for this purpose; hence the success of Pandas adding the table concept to NumPy’s Fortran arrays. If you want to do something mathematical involving tensors, aka higher rank matrices, J has that capability baked in. This is probably due to J’s origin as a compilation of ideas in numerics and mathematics. Until recently J had stuff like the ability to take the numeric derivative of a function as a core part of the language for example. You can do this in Q as well, but you have to write code to do it (now it’s a J addon as well which is just as well, as the C bits that accomplished this were terrifying). Since J is more of a matlabby thing, the ability to do linear algebra directly on the arrays makes sense from a numeric linear algebra point of view. Of course you can do the same thing in Q/K, but something weird might happen if you’re taking matrix inverse and have the wrong type on a column.
I spoke with Art once on J-rank over a brewski, and while I was fond of J/APL rank at the time and defended the idea (I had probably just done something clever -now I feel like a toddler who pooped his pants and showed it to daddy), I think in hindsight he’s absolutely right to dispose with the idea and make everything a table. Lawler said same things of course. A few mentats occasionally make arrays with rank > 2, and the applied math guy in me who knows about tensors and quaternions or whatever finds the flexibility to be reassuring, IRL you don’t need those to solve the kinds of problems K and Q solves, and removing them saves a lot of time and mental overhead. In fact, very few people will ever in their lives use rank>2 arrays. I’ve done it, but it was mostly because I could. I find myself often worrying about rank in J. K removes this worry and does what most people expect, which more or less is to process “arrays” by column. If you’re one of the 1/1000 who want tables that act like rank>2 J arrays, you can write code which takes care of this, and that’s how it should be, because for almost all of data science and even engineering use cases (object rotation in space? I dunno), high rank arrays is the corner case and K style tables is the most common use case.
Another difference is the profusion of types in Q. This is pretty handy; and while people might disagree with the dozen-odd different kinds of time/date type, when you’re dealing with times and dates, having these built in and doing something natural and useful with all the other verbs (since they’re various kinds of ints and floats under the covers) is pretty useful. You can also save lots of memory using various kinds of shorts, which you’d have to program in J (probably in C if you want all the core verbs to do the right things). J has more functional abstraction than Q: you can do some wild stuff I don’t know how to do in Q. You could probably build most of the functional abstraction using Q primitives. I think you’d have a harder time making a more exotic type system in J (I never tried this). You can always quote escape a number type and run it in a namespace, but that’s not as clean as how it works in Q.

q.k -kind of interesting for lookin’ at
Namespaces: J’s locales were a revelation to me; at long last, scoping and namespaces done right. Q also has nice namespaces, similar in design I think, but handier. You can type a Q namespace at the cli and it will tell you what’s in it. J you have to do a couple of steps to figure this out. As ridiculous as it sounds, typing _’s to get the locative in a J locale isn’t as easy as typing ‘.’ either. So, Q is handier. I’m not sure how the execution stack in Q namespaces works; probably the same. Also when Q pushes you into a different namespace, you can tell from the CLI, where with J you have to ask if you’ve been dumped into a namespace that isn’t the base locale or ‘z.’ Again, handier. You could probably jury rig jqt to do the same thing; not sure about j-emacs. I think J locales might be more flexible and powerful (using locatives, etc), but the stuff they tell you about Q namespaces are simple enough you probably won’t get into trouble. I think Q is a sort of inside out namespace in K, and I’m pretty sure there is a way to do that in J also for other superset languages (the jd database does a parsable quote escaped sublanguage), but I’ve never seen it done in the same way that Q is implemented.
Point free coding: You can do this in J, or you can do it more the Q way, where you define explicit input variables and manipulate them. Realistically most of the time I want to do without point-free coding, excepting perhaps to eke out a bit more performance, in which case I can do it the Q way of defining explicit x’s and y’s and using 13: to make it point free, yet still readable to normies. More complicated expressions will unfortunately sometimes incur a performance hit, but as J is quite fast anyway it probably doesn’t matter. I think Q’s design decision to do without point-free coding is right as it’s more readable without using the dissect utility. The other thing J has which Q/K doesn’t; hooks and forks. Now I can deal with hooks and forks, especially knowing about [: @ and ~ (if I was better at long trains I feel like I wouldn’t need ~, but I do). Most of the time I can deal with short trains without resorting to () and other such ugliness, but consider, for example this histogram maker:
histogram =: <: @ (#/.~) @ (i. @#@[ , I.)
This is the kind of thing I would write (I didn’t) as it uses lots of parens @’s and tildes. Q is more direct; it basically acts like there are J-@’s everywhere, except you don’t have to write the bloody things (the @[ is different):
histogram:{@[(1+max x)#0;x;+;1]}
I dunno maybe most people find these equivalently obtuse (the [;;;] stuff ends up doing a lot of interesting things that are not obvious any more than [, is in J), but the Q version is a lot more straightforward to me; and this sort of parenthesis @ soup makes me wish hooks and forks were a sublanguage in J rather than the normal way to do things. Just to give you an idea of what I mean; in J, you literally have to count verbs to figure out how the composite train works; then you have to look to the left and do it all over again at least once, because hooks and forks behave differently in dyadic and monadic use. If you write like the verb above, it’s not so bad, but I’ve seen insane stemwinders with no @’s and ()’s which, while I am sure the authors were proud of these wacky sentences, they seem designed to troll the mere mortal who has to read his code. Granted we now have the dissect toolkit to figure out what’s going on with these snarls, but, well, you need something called “dissect” to read the code.
I think I know why the J guys designed it this way; this sort of functional form can allow for certain optimizations in both compiled and interpreted code: point-free code is more or less an AST already. K/Q almost certainly has something like this going on under the covers in its parser (Kerf did). Anyway, the Q stuff is both denser and easier to read for this kind of thing, and is less likely to wear out my 2 key typing @’s. I am definitely of the Art camp on this one; everything should be atop; hooks and forks should be some kind of sublanguage for mentats. I can do it, I just mostly wish I didn’t have to. I might change my mind one day: sometimes they just write themselves, but I think it’s an accident when this happens. I’m not smart enough to know how to write it without quote escapes, but it would be real neat to have a sort of J sublanguage implemented in J (in the same sense Q is implemented in K) which has these qualities.
Documentation: J’s NuVoc is quite excellent at this point, especially taken with the historical vocabulary (which, like the JfC book might eventually get stale with the progress of the language). On the other hand, the online help is basically pointing you to NuVoc which sucks (just include it with the J distrib for those bad network days or people who don’t want to use the mouse -same as the rest of the documentation). Q has adequate (but not as good as NuVoc) web documentation for Q on the Kx website, no documentation for K, and a number of canonical and decent textbooks. Q has various online help facilities you can add to larger projects; these are real lifesavers. Frankly just being able to print what’s in a namespace by typing the namespace is amazing. J has a couple of books, some printed, some which come with the language. One thing which Q has is Nick Psaris books, which actually tell you how to organize code and how to use it to perform standard tasks: I’m not sure how people did things before his book existed, probably by working at First Derivatives or AquaQ. Jeff Boror’s book tells you how the language works, really in a very clear way, but I didn’t get as much sense of how to write something useful. There are no advanced books on how to use J; they are all unfortunately beginners books. This is a shame, as I have written fairly large projects in J, and I have discovered patterns to how to get large projects done, but I’m sure my way is informed by my use of other languages, and there are definitely more “J native” ways. I can see them implemented in jd, jod and other large J projects. Perhaps all of these sorts of idioms get discussed and debated somewhere, but I haven’t come across it. I suppose I am complaining about something that only exists for a few big languages (Scott Meyers books for his era of C++), and the comparison with Q isn’t fair as J is less focused on the specific task of managing trade and quote data, but it seems like an opportunity.
Interprocess communication: the idea of having a Q process talk to another one (or lots of Q processes talking to each other) is pretty much built into the thing, and afaik it has been built into K from K version-1. It’s part of the idiom, it’s how the thing generally gets used. I’ve done IPC in J, but whoo, man, it ain’t easy, and it’s obviously not designed for it. There’s probably a way to build out this functionality using existing facilities, but I bet in the process of doing so (see what I did there), some defects and weaknesses would be found. I haven’t used jd in anger since 2015 or so, but it’s possible this has built the functionality out a bit. I know J has a lot of improvements in internal threading and SIMD either coming soon, or in the latest which is also quite nice, but most of the time in Q-land you’re using a lot of Q processes (Q has always used some SIMD kind of thing to make them vectors sing).
Random thought:
/\.
is how we make something like exponential smoothing in J. Until recently this was quadratic. That’s not nice and it shouldn’t have ever been that way (There is a math reason why it is; each adverb being linear seems fine, but you really should have special code to handle the combined case). Q doesn’t do this. The J team fixed this relatively recently, but it’s the type of odd thing you’re likely to run into in J, and not in Q because those warty bits were burned off a long time ago for building finance doodads.
I guess the way K/Q looks, it’s a sort of well worn pattern for solving data science and trading problems in the financial services community. Art sawed off all the pieces that weren’t necessary or got in the way and you’re left with something that is simple and powerful and solves the problem.
Notation as a tool for thought: Wavelets in J
I’ve recently had a need for Wavelets in my work in J. There is no wavelet toolbox in J, so, I wrote my own (to be released to pacman and github, eventually, along with a version of this blog as a “lab”). This exercise was extremely valuable in two ways. One, it improved my J-ing: I am, and will probably always remain, relatively weak of the J sauce, though this exercise leveled up my game somewhat. Two, it vastly improved my understanding of the discrete wavelet transform algorithm. Arthur Whitney said something to the effect that you can really understand a piece of code when it all fits on one page. Well, it turns out that’s very true.
Wavelets are filters recursively applied to a timeseries. In the DWT, you’re effectively recursively subsampling and filtering a time series with a low pass filter and a high pass filter. The resulting elements are at different levels, with a remaining low pass filtered time series at the end. The operations at each level are the same. This is the well-known “pyramid algorithm” of Stephane Mallat.
In C, the way you would do the individual levels is by allocating vector with half the number of elements, then filling up each element with the correct subsampled filtered values. In J, the way you do this is by projecting the elements and the filters in the correct way, then multiplying them out and getting the answer. J allocates more memory to do this, but it’s something that could be mitigated if it becomes a problem. As is often the case with J, the runtime of the J version, without spending a moment to optimize for space or time, is quite comparable to that of the C version; for the wavelet transform, about a factor of 2 within the C results. Here’s what a wavelet level calculation looks like in C. I took it from the excellent waveslim package in R, FWIIW:
void dwt(double *Vin, int M, int L, double *h, double *g,
double *Wout, double *Vout)
{
int n, t, u;
for(t = 0; t < M/2; t++) {
u = 2 * t + 1;
Wout[t] = h[0] * Vin[u];
Vout[t] = g[0] * Vin[u];
for(n = 1; n < L; n++) {
u -= 1;
if(u < 0) u = M - 1;
Wout[t] += h[n] * Vin[u];
Vout[t] += g[n] * Vin[u];
}
}
}
J doesn’t have functions; it has verbs. “dyad define” is for defining verbs which operate on two ‘nouns.’ The first one comes before the verb, the second one comes after the verb. They have implied names x (left noun) and y (right noun) inside the verb. “dyad define” is almost never used by J hackers, as it evaluates to the string ‘4 : 0’ (don’t ask me; you get used to it). Similarly the monad define in the wdict verb evaluates to ‘3 : 0’; and monadic verbs only have y (right) noun inputs. The other thing to remember about J; it evaluates from right to left, like Hebrew. The following verb is for evaluating the compiled C function above, using J’s very cool FFI.
LDWT=: '/path/to/dwt.so' NB. where your compiled C lives
dwtc=: dyad define
'hpf lpf'=.wdict x
Newy=.((#y)%2)#(2.2-2.2)
Wout=.((#y)%2)#(2.2-2.2)
cmd=. LDWT, ' dwt n *d i i *d *d *d *d'
cmd cd (y+2.2-2.2);(<.(#y));(#hpf);hpf;lpf;Wout;Newy
Wout;Newy
)
For purposes of this exercise, wdict is a simple dictionary that returns the boxed high pass and low pass filter. Boxing is literally the same thing as a struct in C, though in J the use of structs ends up being quite different. To save typing, I know I can derive the HPF from the LPF for a given wavelet type, which I accomplish with the HpLp verb. To get the High Pass filter from Low Pass you multiply the odd elements by -1, then rotate their order. So, in J, remembering J executes from right to left, a monadic # with a noun to the right is the tally operation: it returns the number of elements. i. generates an integer index, so i.# is a constructor of a sequence on the number of elements in y. [: is the cap which makes the i.# group of verbs into a sequential operation, preventing it from being evaluated as a fork. _1 is -1, * is dyadic times as it is in most programming languages, and |. is the “rotate vector” verb. ;~ does magic which flips this result around, and sticks a box between original low pass value, which is represented by ]
NB. get the high pass from the low pass, return a box list of both
HpLp =: ] ;~ |. * _1 ^ [: i. #
wdict=: monad define
select. y
case. 'haar' do.
HpLp 2 # %: % ] 2
case. 'db4' do.
HpLp 0.482962913144534, 0.836516303737808, 0.224143868042013, _0.12940952255126
end.
)
If you compile the C piece and run it with the J FFI on a 2^n long vector, you’ll get the right answer. But the J implementation is cooler, and runs within a factor of 4 of the C.
oddx=: ] {~ ([: (] #~ 0 1 $~ #) [: i. [: # ]) -/ [: i. [: # [
dwt=: dyad define
'hpf lpf'=.wdict x
yvals=. hpf oddx y
(yvals +/ . * hpf);yvals +/ . * lpf
)
Since we used “dyad define” -we know that it takes a left and right noun argument (x and y in the verb). The way this one is used looks like this:
'db4'dwt myts
Same calling convention and comments apply to the C version. The real action of the dwt verb happens with oddx. oddx is a tacit verb. I wrote it this way for conciseness and speed. It’s not a very good tacit verb, because I am not a very good slinger of J sentences, but it does what I want it to. Unless someone smarter comes along and gives me a better idea, that’s what we’re stuck with. I am not going to give you a breakdown of how it works inside, but I will show you what it does, and how it helps encapsulate a lot of information into a very small place. oddx is a dyad; it takes x and y values. In tacit programming, the x’s and y’s are represented by [ and ] respectively, with [: as described above, as a sort of spacer between chunks. I will resist explaining all the pieces; feel free to examine it in J 8’s excellent sentence dissector. I have already explained the i. verb. The values 0:15 are created by i.16. So, a useful way to examine what this very does is to run it on something like i.16.
(1,2)oddx i.16
1 0
3 2
5 4
7 6
Huh. So, it’s taking the y input, which is 0:15, and making two columns, which seem to be the odd and even sequence of the first 4 elements… You might guess that it is taking the size of the x noun and operating on the y input. You would guess correctly. Let’s try it on something bigger:
(1,2,3,4)oddx i.16
1 0 15 14
3 2 1 0
5 4 3 2
7 6 5 4
9 8 7 6
11 10 9 8
13 12 11 10
15 14 13 12
Aha, now it is starting to make sense. Let’s try something bigger still;
(i.6)oddx i.16
1 0 15 14 13 12
3 2 1 0 15 14
5 4 3 2 1 0
7 6 5 4 3 2
9 8 7 6 5 4
11 10 9 8 7 6
13 12 11 10 9 8
15 14 13 12 11 10
Now you can see what it does. The columns alternate between odd and even values of y. The rows are the rotated versions of the first two. The more values you add, the deeper the rotation becomes. Why did I do this? Well, a wavelet is these very index elements, multiplied by the high pass and low pass filters. If you look at the last two lines of the dwt verb, you can see it:
yvals=. hpf oddx y
(yvals +/ . * hpf);yvals +/ . * lpf
yvals contains the rotated elements. The next line contains two pieces. The one on the right is the rotated yvalues array multiplied (+/ . * is matrix multiplication) by the low pass filter. The invocation of oddx happens with hpf, but it could take lpf; it is just using the size of the filter to construct the rotated matrix for multiplication. This is the wavelet decimated y, which you need to reconstruct the time series. The next one is the rotated yvalues multiplied by the high pass filter. This is the level 1 wavelet. They are separated by a ‘;’ which puts them in “boxed” form (remember: typeless structs), so the decimated yvalues and the level 1 wavelet are separated. There is probably a way of making this a one liner using forks and such, but at my present state of enlightenment, it’s not obvious, and this suits me just fine for clarity and runtime sake. Now, almost nobody cares about the level1 wavelets; usually you want a couple of levels of wavelets, so you need a way to recursively apply this guy to the results. I used the power conjunction. Raul Miller fixed up my messy verb into a very clear adverb on the J-programming list.
dwtL=: adverb define
:
'wn yn'=. m dwt y
wn; (<:x) (m dwtL)^:(x>1) yn
)
You’d call it like this
2 ‘db4’ dwtL y
Adverbs return verbs based on modifiers. In this case the adverb returns a dyadic verb modified by m, which in this case is the kind of wavelet. dwtL is a dyad. Kind is the kind of wavelet (‘db4’ and ‘haar’ are the only ones defined in wdict monad above). You can see dwt being used to calculate the decimated yn and wn wavelet. The tricky piece is the last line. This is the power conjunction; the ^:(lev>1) piece. dyadic ‘>’ does what it does in regular programming languages (in this case anyway). It checks if lev is > 1, and the ^: power verb calls dwtL recursively as long as this is true. More or less, this is a while loop. At the front of the dwtL call, you can see wn being prepended to the output with a ; to box it. You can also see a (<:lev). This is a decrement operator on the noun ‘lev.’ So, the level is called until it is at 1. The wavelet levels are prepended, and you end up with a boxed set of all the wavelets from 1:Level, with the decimated y stuck at the end. Once you grasp how the power conjunction works, this is a really obvious and sweet implementation of Mallat’s “pyramid algorithm” for calculating the levels of wavelets.
Taking the inverse wavelet transform is conceptually more difficult, since you’re stitching together the values of y with the different levels of wavelet. There is more than one way to skin this cat, but the way I was thinking when I wrote these verbs was “how do I go from the smaller/subsampled values to the next level up in y in the reverse pyramid algorithm?”
First, I needed to rearrange my mother wavelets to do the inverse. I needed a verb like this:
filtrot =: [: |: [: |. (2 ,~ 2 %~ [: # [) $ [: |. ]
You can see what it does if you apply it to i.4 or i.8
filtrot i.4
1 3
0 2
filtrot i.8
1 3 5 7
0 2 4 6
Next, I need a verb which rearranges the values of the input such that it is aware of the shape of the rotated mother wavelet pattern. This was hairy, but I did it with the following verb, which takes the rotated wavelet as the x value, and the yval or wavelet level as the y value:
drot =: ] {~ ([: i. [: # ]) +/ 0 , (_1 * [: # ]) + [: }. [: i. [: # [: {. [
You can see how it works like this
(filtrot i.4) drot i.8
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 0
(filtrot i.6) drot i.8
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 0
7 0 1
As you can see, the pattern is to rotate the original values in the opposite direction as you did with the wavelet transform.
Finally, I need to double these guys up, and element-wise multiply them to the correct number of rotated wavelets. For the double up part, ‘2 #’ is the correct verb. Notice it is # called dyadically. Dyadic overloadings of monadic verbs is confusing at first, but you get used to it.
2 # 2 # (filtrot i.4) drot i.8
0 1
0 1
1 2
1 2
2 3
2 3
3 4
3 4
4 5
4 5
5 6
5 6
6 7
6 7
7 0
7 0
Putting it all together with the element wise multiplication, we use the verb “reducer.” I’m getting tired of explaining what all the pieces do, so I’ll write it in explicit to tacit (13: gives explicit to tacit) form to show all the implied x’s and y’s, and point you to the documents for $/shape:
13 : ' (2 # (x drot y)) * ((2*#y)$x)'
reducer=: (2 # drot) * [ $~ 2 * [: # ]
Finally combining into the following:
idwt=: dyad define
'wv yl' =. y
'hpf lpf'=. filtrot each wdict x
+/"1 (lpf reducer yl) + hpf reducer wv
)
So, the inverse transform runs the reducer on the wavelet, the decimated y, and then adds them together.
The real sweet part is running the inverse on all the levels. When I first encountered the / adverb, I thought it was sort of like “apply.” This is wrong; what it does exactly is stick a copy of the verb which comes before it in between all the values of what comes after it. Since you’ve stacked all these wavelets together in a convenient way, you can use the / adverb to stick the inverse transform operator between all the wavelets. You need a few extra pieces to make it evaluate properly, but since J evaluates from right to left, and the decimated values are on the far left; this is a very satisfying inverse.
idwtL=: dyad define
idw =. [: < [: x&idwt ,
> idw/y
)
I’m sure dwtL and idwtL could be compressed into a one liner, and it could be read by someone familiar enough with tacit J programming. But this is a good level of abstraction for me. It’s remarkable that the performance is as good as it is, as I basically just wrote down the first verbs which occurred to me (the inverse is fairly inefficient: OCaML speed instead of C speed). I’ve used wavelets for years; they’re very handy things for signal processing, and I have even written code to calculate them before. Writing them down in J really brought my understanding of what they are to a new level. In an ideal world, people would write algorithms in this compressed form, so that mathematicians, engineers and computer scientists could communicate ideas more clearly and concisely. Alas, Iverson’s dream of notation as a tool of thought hasn’t yet caught on. Maybe some day. While something like J is confusing to people who have been programming for years, there is nothing inherently difficult about it. Various educators have been teaching J to kids in grammar school. It’s actually very simple compared to something like C, and since it has many similarities to ordinary written language, for kids it might even be simpler. Maybe some day.
Shannon information, compression and psychic powers
Fun thing I found in the J-list, a game of code golf. Generate the following pattern with the least number of bytes of code:
_____*_____
_____*_____
____*_*____
___*___*___
__*_____*__
**_______**
__*_____*__
___*___*___
____*_*____
_____*_____
_____*_____
The J solution by Marshall Lochbaum is the best one I’ve seen, taking only 18 bytes of code.
'_*'{~4=+/~ 4<.|i:5
I’ll break this down into components for the folks who don’t speak J, so you can see what’s going on:
i:5 NB. -5 to 5;
_5 _4 _3 _2 _1 0 1 2 3 4 5
|i:5 NB. make the list positive
5 4 3 2 1 0 1 2 3 4 5
4<.|i:5 NB. cap it at 4
4 4 3 2 1 0 1 2 3 4 4
+/~ 4<.|i:5 NB. ~ is magic which flips the result around and feeds it to the NB. verb immediately to the left. NB. the left hand +/ verb is the + operator applied / across the NB. list
8 8 7 6 5 4 5 6 7 8 8
8 8 7 6 5 4 5 6 7 8 8
7 7 6 5 4 3 4 5 6 7 7
6 6 5 4 3 2 3 4 5 6 6
5 5 4 3 2 1 2 3 4 5 5
4 4 3 2 1 0 1 2 3 4 4
5 5 4 3 2 1 2 3 4 5 5
6 6 5 4 3 2 3 4 5 6 6
7 7 6 5 4 3 4 5 6 7 7
8 8 7 6 5 4 5 6 7 8 8
8 8 7 6 5 4 5 6 7 8 8
4=+/~ 4<.|i:5 NB. Which of these is equal to 4?
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 1 0 0
1 1 0 0 0 0 0 0 0 1 1
0 0 1 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0 0
0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
'_*'{~4=+/~ 4<.|i:5 NB. { is an array selector, ~ pops the previous result in 1's NB. and 0's in front of the string '_*', which creates the NB. required array of characters and voila!
_____*_____
_____*_____
____*_*____
___*___*___
__*_____*__
**_______**
__*_____*__
___*___*___
____*_*____
_____*_____
_____*_____
I asked some facebook pals for a solution, and got some neat ones:
Python from Bram in 94 bytes:
for i in range(-5,6):print ''.join(['*_'[min(abs(i),4)+min(abs(j),4)!=4] for j in range(-5,6)])
A clever 100 byte Perl solution from Mischa (to be run with -Mbigint):
for$i(0..10){print($_%12==11?"\n":1<<$i*11+$_&41558682057254037406452518895026208?'*':'_')for 0..11}
Some more efficient solutions by Mischa in perl and assembler.
Do feel free to add clever solutions in the comments! I’d like to see a recursive solution in an ML variant, and I’m pretty sure this can be done in only a few C characters.
It’s interesting in that the display of characters is only 132 bytes (counting carriage returns, 121 without), yet it is hard in most languages to do this in under 100 bytes. The winner of the contest used a 73 byte shell script with gzipped contents attached. Pretty good idea and a praiseworthy hack. Of course, gzip assumes any old kind of byte might need to be packed, so it’s not the post efficient packing mechanism for this kind of data.
This gets me to an interesting point. What is the actual information content of the pattern? Claude Shannon told us this in 1948. The information content of a character in a string is approximately
If you run this string through a gizmo for calculating Shannon entropy (I have one in J, but use the R infotheo package), you’ll find out that it is completely specified by 3 bits sent 11 times, for a total of 33 bits or 4.125 bytes. This sounds completely insane, but look at this coding for the star pattern:
000 _____*_____
000 _____*_____
001 ____*_*____
010 ___*___*___
011 __*_____*__
100 **_______**
011 __*_____*__
010 ___*___*___
001 ____*_*____
000 _____*_____
000 _____*_____
So, if you were to send this sucker on the wire, an optimal encoding gives 33 bits total. Kind of neat that the J solution is only 144 bits. It is tantalizing that there are only 5 distinct messages here, leaving room for 3 more messages. The Shannon information calculation notices this; if you do the calculation, it’s only 2.2 bits. But alas, you need some kind of wacky pants quantum computer that hasn’t been invented yet to send fractions of bits, so you’re stuck with a 3 bit code.
The amount of data that can be encoded in a bit is generally underestimated. Consider the 20 question game. Go click on the link. it’s a computer program that can “read your mind.” It’s an extremely clever realization of Shannon information for a practical purpose. The thing can guess what you’re thinking about by asking you 20 questions. You only give it 20 bits of information, and it knows what you’re thinking about. The reason this is possible is 20 bits is actually a huge amount of information. If you divide the universe of things a person can think of into 2^20 pieces, each piece is pretty small, and is probably close enough to something it seems like the machine can read your mind. This is also how “mentalism” and psychic powers work. Psychics and Mentalists get their bits by asking questions, and either processing the answers, or noticing the audience reaction when they say something (effectively getting a bit when they’re “warm”). Psychics and TV mentalists also only have to deal with a limited number of things people are worried about when they talk to psychics and TV mentalists.
Lots of machine learning algorithms work like optimal codes, mentalists and the 20 questions game. Decision trees, for example. Split the universe of learned things into a tree, which is effectively 1’s and 0’s (and of course, the split criterion at each node).
It’s little appreciated that compression algorithms are effectively a form of decision tree. In fact, they’re pretty good as decision trees, in particular for “online” learning, and sequence prediction. Look at a recent sequence, see if it’s like any past sequences. If it is a novel sequence, stick it in the right place in the compression tree. If it occurred before, you know what the probability of the next character in the string is by traversing the tree. Using such gizmos, you can generate artificial sequences of characters based on what is in the tree. If you encode “characters” as “words,” you can generate fake texts based on other texts. If you encode “characters” as “musical notes” you can generate fake Mozart based on Mozart. There are libraries out there which do this, but most folks don’t seem to know about them. I’ve noodled around with Ron Begleiter’s VMM code which does this (you can find some primitive R hooks I wrote here) to get a feel for it. There’s another R package called VLMC which is broadly similar, and another relative in the PST package.
One of the interesting properties of compression learning is, if you have a good compression algorithm, you have a good forecasting algorithm. Compression is a universal predictor. You don’t need to do statistical tests to pick certain models; compression should always work, assuming the string isn’t pure noise. One of the main practical problems with them is picking a discretization (discretizations … can also be seen or used as a ML algorithm). If the data is already discrete, a compression algorithm which compresses well on that data will also forecast well on the data. Taking this a step further into crazytown, one can almost always think of a predictive algorithm as a form of compression. If you consider something like EWMA, you’re reducing a potentially very large historical time series to one or two numbers.
While the science isn’t well established yet, and it certainly isn’t widely known (though the gizmo which guesses what google search you’re going to do is a prefix tree compression learner), it looks like one could use compression algorithms to do timeseries forecasting in the general case, hypothesis testing, decryption, classification, regression and all manner of interesting things involving data. I consider this idea to be one of the most exciting areas of machine learning research, and compression learners to be one of the most interesting ways of thinking about data. If I free up some time, I hope to write more on the subject, perhaps using Ron’s code for demonstrations.
Fun people to google on this subject:
Boris Ryabko & Son (featured in the ants thing)
The ubiquitous Thomas Cover
Ruins of forgotten empires: APL languages
One of the problems with modern computer technology: programmers don’t learn from the great masters. There is such a thing as a Beethoven or Mozart of software design. Modern programmers seem more familiar with Lady Gaga. It’s not just a matter of taste and an appreciation for genius. It’s a matter of forgetting important things.

talk to the hand that made APL
There is a reason I use “old” languages like J or Lush. It’s not a retro affectation; I save that for my suits. These languages are designed better than modern ones. There is some survivor bias here; nobody slings PL/1 or Cobol willingly, but modern language and package designers don’t seem to learn much from the masters. Modern code monkeys don’t even recognize mastery; mastery is measured in dollars or number of users, which is a poor substitute for distinguishing between what is good and what is dumb. Lady Gaga made more money than Beethoven, but, like, so what?
Comparing, say, Kx systems Q/KDB (80s technology which still sells for upwards of $100k a CPU, and is worth every penny) to Hive or Reddis is an exercise in high comedy. Q does what Hive does. It does what Reddis does. It does both, several other impressive things modern “big data” types haven’t thought of yet, and it does them better, using only a few pages of tight C code, and a few more pages of tight K code.

This man’s software is superior to yours
APL languages were developed a long time ago, when memory was tiny compared to the modern day, and disks much slower. They use memory wisely. Arrays are the basic data type, and most APL language primitives are designed to deal with arrays. Unlike the situation in many languages, APL arrays are just a tiny header specifying their rank and shape, and a big pool of memory. Figuring out what to do with the array happens when the verb/function reads the first couple of bytes of the header. No mess, no fuss, and no mucking about with pointless loops.
Code can be confusing if you don’t drink the APL kool-aide, but the concept of rank makes it very reusable. It also relegates idiotic looping constructs to the wastebin of history. How many more for() loops do you want to write in your lifetime? I, personally, would prefer to never write another one. Apply() is the right way for grown-assed men do things. Bonus: if you can write an apply(), you can often parallelize things. For(), you have to make too many assumptions.

Roger Hui, also constructed of awesomeness
One of the great tricks of the APL languages: using mmap instead of scanf. Imagine you have some big chunk of data. The dreary way most languages do things, you vacuum the data in with scanf, grab what is useful, and if you’re smart, throw away the useless bits. If you’re dealing with data which is bigger than core, you have to do some complex conga dance, splitting it up into manageable chunks, processing, writing it out somewhere, then vacuuming the result back in again. With mmap, you just point to the data you want. If it’s bigger than memory …. so what? You can get at it as quickly as the file system gets it to you. If it’s an array, you can run regressions on big data without changing any code. That’s how the bigmemory package in R works. Why wasn’t this built into native R from the start? Because programmers don’t learn from the masters. Thanks a lot, Bell Labs!

Fred Brooks, Larry Breed, Joey Tuttle, Arthur Whitney, Eugene McDonnell, Paul Berry: none of these men can be held responsible for inflicting the horrors of S+ on the world
This also makes timeseries databases simple. Mmap each column to a file; selects and joins are done along pointed indexes. Use a file for each column to save memory when you read the columns; usually you only need one or a couple of them. Most databases force you to read all the columns. When you get your data and close the files, the data image is still there. Fast, simple and with a little bit of socket work, infinitely scalable. Sure, it’s not concurrent, and it’s not an RDBMS (though both can be added relatively simply). So what? Big data problems are almost all inherently columnar and non-concurrent; RDBMS and concurrency should be an afterthought when dealing with data which is actually big, and, frankly, in general. “Advanced” databases such as Amazon’s Redshift (which is pretty good shit for something which came out a few months ago) are only catching onto these 80s era ideas now.
Crap like Hive spends half its time reading the damn data in, using some godforsaken text format that is not a mmaped file. Hive wastefully writes intermediate files, and doesn’t use a column approach, forcing giant unnecessary disk reads. Hive also spends its time dealing with multithreaded locking horse shit. APL uses one thread per CPU, which is how sane people do things. Why have multiple threads tripping all over each other when a query is inherently one job? If you’re querying 1, 10 or 100 terabytes, do you really want to load new data into the schema while you’re doing this? No, you don’t. If you have new data streaming in, save it somewhere else, and do that save in its own CPU and process if it is important. Upload to the main store later, when you’re not querying the data. The way Q does it.
The APL family also has a near-perfect level of abstraction for data science. Function composition is trivial, and powerful paradigms and function modifications via adverbs are available to make code terse. You can afflict yourself with for loops if that makes you feel better, but the terse code will run faster. APL languages are also interactive and interpreted: mandatory for dealing with data. Because APL languages are designed to fit data problems, and because they were among the first interpreters, there is little overhead to slow them down. As a result, J or Q code is not only interactive: it’s also really damn fast.
It seems bizarre that all of this has been forgotten, except for a few old guys, deep pocketed quants, and historical spelunkers such as myself. People painfully recreate the past, and occasionally, agonizingly, come to solutions established 40 years ago. I suppose one of the reasons things might have happened this way is the old masters didn’t leave behind obvious clues, beyond, “here’s my code.” They left behind technical papers and software, but people often don’t understand the whys of the software until they run into similar problems.
Some of these guys are still around. You can actually have a conversation with mighty pioneers like Roger Hui, Allen Rose or Rohan J (maybe in the comments) if you are so inclined. They’re nice people, and they’re willing to show you the way. Data science types and programmers wanting to improve their craft and increase the power of their creations should examine the works of these masters. You’re going to learn more from studying a language such as J than you will studying the latest Hadoop design atrocity. I’m not the only one who thinks so; Wes McKinney of Pandas fame is studying J and Q for guidance on his latest invention. If you know J or Q, he might hire you. He’s not the only one. If “big data” lives up to its promise, you’re going to have a real edge knowing about the masters.
Start here for more information on the wonders of J.
http://conceptualorigami.blogspot.com/2010/12/vector-processing-languages-future-of.html
17 comments