# Locklin on science

## Shannon information, compression and psychic powers

Posted in information theory, J by Scott Locklin on October 31, 2013

Fun thing I found in the J-list, a game of code golf. Generate the following pattern with the least number of bytes of code:

_____*_____
_____*_____
____*_*____
___*___*___
__*_____*__
**_______**
__*_____*__
___*___*___
____*_*____
_____*_____
_____*_____

The  J solution by Marshall Lochbaum is the best one I’ve seen, taking only 18 bytes of code.

'_*'{~4=+/~ 4<.|i:5


I’ll break this down into components for the folks who don’t speak J, so you can see what’s going on:

i:5
NB. -5 to 5;


_5 _4 _3 _2 _1 0 1 2 3 4 5

|i:5
NB. make the list positive


5 4 3 2 1 0 1 2 3 4 5

4<.|i:5
NB. cap it at 4


4 4 3 2 1 0 1 2 3 4 4

+/~ 4<.|i:5
NB. ~ is magic which flips the result around and feeds it to the
NB. verb immediately to the left.
NB. the left hand +/ verb is the + operator applied / across the
NB. list


8 8 7 6 5 4 5 6 7 8 8
8 8 7 6 5 4 5 6 7 8 8
7 7 6 5 4 3 4 5 6 7 7
6 6 5 4 3 2 3 4 5 6 6
5 5 4 3 2 1 2 3 4 5 5
4 4 3 2 1 0 1 2 3 4 4
5 5 4 3 2 1 2 3 4 5 5
6 6 5 4 3 2 3 4 5 6 6
7 7 6 5 4 3 4 5 6 7 7
8 8 7 6 5 4 5 6 7 8 8
8 8 7 6 5 4 5 6 7 8 8

4=+/~ 4<.|i:5
NB. Which of these is equal to 4?


0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 1 0 0
1 1 0 0 0 0 0 0 0 1 1
0 0 1 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0 0
0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0

'_*'{~4=+/~ 4<.|i:5
NB. { is an array selector, ~ pops the previous result in 1's
NB. and 0's in front of  the string '_*', which creates the
NB. required array of characters and voila!


_____*_____
_____*_____
____*_*____
___*___*___
__*_____*__
**_______**
__*_____*__
___*___*___
____*_*____
_____*_____
_____*_____

I asked some facebook pals for a solution, and got some neat ones:

Python from Bram in 94 bytes:

for i in range(-5,6):print ''.join(['*_'[min(abs(i),4)+min(abs(j),4)!=4] for j in range(-5,6)])


A clever 100 byte Perl solution from Mischa (to be run with -Mbigint):

for$i(0..10){print($_%12==11?"\n":1<<$i*11+$_&41558682057254037406452518895026208?'*':'_')for 0..11}


Some more efficient solutions by Mischa in perl and assembler.

Do feel free to add clever solutions in the comments! I’d like to see a recursive solution in an ML variant, and I’m pretty sure this can be done in only a few C characters.

It’s interesting in that the display of characters is only 132 bytes (counting carriage returns, 121 without), yet it is hard in most languages to do this in under 100 bytes. The winner of the contest used a 73 byte shell script with gzipped contents attached. Pretty good idea and a praiseworthy hack. Of course, gzip assumes any old kind of byte might need to be packed, so it’s not the post efficient packing mechanism for this kind of data.

This gets me to an interesting point. What is the actual information content of the pattern? Claude Shannon told us this in 1948. The information content of a character in a string is approximately $\sum\limits_{i=1}^n {prob_i \log_2(\frac{1}{prob_i}})$

If you run this string through a gizmo for calculating Shannon entropy (I have one in J, but use the R infotheo package), you’ll find out that it is completely specified by 3 bits sent 11 times, for a total of 33 bits or 4.125 bytes. This sounds completely insane, but look at this coding for the star pattern:

000 _____*_____
000 _____*_____
001 ____*_*____
010 ___*___*___
011 __*_____*__
100 **_______**
011 __*_____*__
010 ___*___*___
001 ____*_*____
000 _____*_____
000 _____*_____

So, if you were to send this sucker on the wire, an optimal encoding gives 33 bits total. Kind of neat that the J solution is only 144 bits. It is tantalizing that there are only 5 distinct messages here, leaving room for 3 more messages. The Shannon information calculation notices this; if you do the calculation, it’s only 2.2 bits. But alas, you need some kind of wacky pants quantum computer that hasn’t been invented yet to send fractions of bits, so you’re stuck with a 3 bit code.

Lots of machine learning algorithms work like optimal codes, mentalists and the 20 questions game. Decision trees, for example. Split the universe of learned things into a tree, which is effectively 1’s and 0’s (and of course, the split criterion at each node).

It’s little appreciated that compression algorithms are effectively a form of decision tree. In fact, they’re pretty good as decision trees, in particular for “online” learning, and sequence prediction. Look at a recent sequence, see if it’s like any past sequences. If it is a novel sequence, stick it in the right place in the compression tree. If it occurred before, you know what the probability of the next character in the string is by traversing the tree. Using such gizmos, you can generate artificial sequences of characters based on what is in the tree. If you encode “characters” as “words,” you can generate fake texts based on other texts. If you encode “characters” as “musical notes” you can generate fake Mozart based on Mozart. There are libraries out there which do this, but most folks don’t seem to know about them. I’ve noodled around with Ron Begleiter’s VMM code which does this (you can find some primitive R hooks I wrote here) to get a feel for it. There’s another R package called VLMC which is broadly similar, and another relative in the PST package.

One of the interesting properties of compression learning is, if you have a good compression algorithm, you have a good forecasting algorithm.  Compression is a universal predictor. You don’t need to do statistical tests to pick certain models; compression should always work, assuming the string isn’t pure noise. One of the main practical problems with them is picking a discretization (discretizations … can also be seen or used as a ML algorithm). If the data is already discrete, a compression algorithm which compresses well on that data will also forecast well on the data. Taking this a step further into crazytown, one can almost always think of a predictive algorithm as a form of compression. If you consider something like EWMA, you’re reducing a potentially very large historical time series to one or two numbers.

While the science isn’t well established yet, and it certainly isn’t widely known (though the gizmo which guesses what google search you’re going to do is a prefix tree compression learner), it looks like one could use compression algorithms  to do timeseries forecasting in the general case, hypothesis testing, decryption, classification, regression and all manner of interesting things involving data. I consider this idea to be one of the most exciting areas of machine learning research, and compression learners to be one of the most interesting ways of thinking about data. If I free up some time, I hope to write more on the subject, perhaps using Ron’s code for demonstrations.

Fun people to google on this subject:

Boris Ryabko & Son (featured in the ants thing)

Paul Algoet

Nicolò Cesa-Bianchi

Gabor Lugosi

The ubiquitous Thomas Cover

## On the Empire of the Ants

Posted in brainz, information theory by Scott Locklin on July 2, 2013

The internet is generally a wasteland of cat memes and political invective. Once in a while it serves its original purpose in disseminating new ideas. I stumbled across Boris Ryabko‘s little corner of the web while researching compression learning algorithms (which, BTW, are much more fundamental and important than crap like ARIMA). In it, I found one of the nicest little curiosity driven  papers I’ve come across in some time. Ryabko and his coworker, Zhanna Reznikova, measured the information processing abilities of ants, and the information capacity of ant languages. Download it here. There was also a plenary talk at an IEEE conference you can download here.

In our degenerate age where people think cell phone apps are innovations,  it is probably necessary to explain why this is a glorious piece of work. Science is an exercise in curiosity about nature. It is a process. It sometimes involves complex and costly apparatus, or the resources of giant institutes. Sometimes it involves looking at ants in an ant farm, and knowing some clever math. Many people are gobsmacked by the technological gizmos used to do science. They think the giant S&M dungeons of tokomaks and synchro-cyclotrons are science. Those aren’t science; they’re tools. The end product; the insights into nature -that is what is important. Professors Ryabko and Reznikova did something a kid could understand the implications of, but no kid could actually do. The fact that they did it at all indicates they have the child-like curiosity and love for nature that is the true spirit of scientific enquiry. As far as I am concerned, Ryabko and Reznikova are real scientists. The thousands of co-authors on the Higgs paper; able technicians I am sure, but their contributions are a widows mite to the gold sovereign of Ryabko and Reznikova.

Theory: ants are smart, and they talk with their antennae. How smart are they, and how much information can they transfer with their antennae language? Here’s a video of talking ants from Professor Reznikova’s webpage:

Experiment: to figure out how much information they can transfer, starve some ants (hey, it’s for science), stick some food at random places in a binary tree, and see how fast they can tell the other ants about it. Here’s a video clip of the setup. Each fork in the path of a physical binary tree represents 1 bit of information, just as it does on your computer. Paint the ants so you know which is which. When a scout ant finds the food, you remove the maze, and put in place an identical one to avoid their sniffing the ant trails or the food in it.  This way, the only way for the other ants to find the fork the food was in is via actual ant communication. Time the ant communication between the scout ant and other foragers (takes longer than 30 seconds, apparently). Result: F. sanguinea can transmit around 0.74 bits a minute.  F. polyctena can do 1.1 bits a minute.

Experiment: to figure out if ants are smart, see if they can pass on maze information in a compressed way. LRLRLRLRLRLR is a lot simpler in an information theoretical sense than an equal length random sequence of lefts and rights. Telephone transmission and MP3 players have this sort of compression baked into them to make storage and transmission more efficient.  If ants can communicate directions for a regular maze faster than a random one, they’re kind of smart. Result: in fact, this turns out to be the case.

Experiment: to find out if ants are smart, see if they can count. Stick them in a comb or hub shaped maze where there is food at the end of one of the 25 or more forks (you can see some of the mazes here). The only way the poor ant can tell other ants about it is if he says something like “seventeenth one to the left.” Or, in the case of one of the variants of this experiment,  something more like”3 over from the one the crazy Russian usually puts the food in.” Yep, you can see it plain as pie in the plots: ants have a hard time explaining “number 30” and a much easier time of saying, “two over from the one the food is usually in.” Ants can do math.

The power of information theory is not appreciated as it should be. We use the products of it every time we fire up a computer or a cell phone, but it is applicable in many areas where a mention of “Shannon entropy” will be met with a shrug. Learning about the Empire of the Ants is just one example.

People in the SETI project are looking for  alien ham radios on other planets. I’ve often wondered why people think they’ll be able to recognize an alien language as such. Sophisticated information encoding systems look an awful lot like noise. The English language isn’t particularly sophisticated as an encoding system. Its compressibility indicates this. If I were an alien, I might use very compressed signals (sort of like we do with some of our electronic communications). It might look an awful lot like noise.

We have yet to communicate  with dolphins. We’re pretty sure they have interesting things to say, via an information theoretical result called Zipf’s law (though others disagree,  it seems likely they’re saying something pretty complex). There are  better techniques to “decompress” dolphin vocalizations than Zipf’s law: I use some of them looking for patterns in economic systems. Unfortunately marine biologists are usually not current with information theoretical tools, and the types of people who are familiar with such tools are busy working for the NSA and Rentech. Should I ever make my pile of dough and retire, I’ll hopefully have enough loot to strap a couple of tape recorders to the dolphins. It seems something worth doing.

The beautiful result of Ryabko and Reznikova points the way forward. A low budget, high concept experiment, done with stopwatches, paint and miniature plastic ant habitrails produced this beautiful result on insect intelligence. It is such a simple experiment, anyone with some time and some ants could have done it! This sort of “small science” seems rare these days; people are more interested in big budget things, designed to answer questions about minutae, rather than interesting things about the world around us. I don’t know if we have the spirit to do such “small science” in America any longer.  American scientists seem like bureaucratized lemmings, hypnotized by budgets, much like the poor ants are hypnotized by sugar water. The Rube-Goldberg nature of this experiment could only be done by a nation of curious tinkerers; something we no longer seem to have here.

Dolphin language could have been decoded decades ago. While it is sad that such studies haven’t been done yet, it leaves open new frontiers for creative young scientists today. Stop whining about your budget and get to work!