Locklin on science

Not all programmers are alike: a language rant

Posted in Clojure, Design by Scott Locklin on September 12, 2012

I came across this video presentation the other day.  It’s an hour long weird-assed advocacy for Clojure by a guy (“Uncle Bob”) who has used OOprogramming most of his professional life. This entire tirade is probably useless to anyone who has not watched it already. Since I’m annoyed that I spent a time sliced hour of my life listening to it, I don’t recommend you listen to it either. This is possibly my most useless, therapeutic “I can’t believe he said that” WordPress post of all time.

http://skillsmatter.com/podcast/agile-testing/bobs-last-language/wd-4946

This guy gives an amusing talk, but he’s wrong in countless ways, and I have to talk about it. His premise started out reasonably well; there really hasn’t been much progress in language design over the years, I grew progressively more angry while listening. He posits that Clojure could be “the last programming language.” While I am a fan and advocate of Clojure, I emphatically disagree with him.

  • He bags on graphical languages as not being a new programming paradigm likely to influence the future of coding. Such languages are   already very good and widely used in  fields dealing with data acquisition, control and analysis (Labview, Igor). Labview beats the snot out of anything else for building, say, custom spectrometer control and data acquisition systems. I know, because I had to do this.   I’ve seen mooks try to do the same thing in C++ or whatever, and laugh scornfully at the result.  It’s worth noticing that such graphical languages also very easy to write in a mature Lisp with GUI hooks; you can find a nice one in the source code for Lush (packages/SN2.8/SNTools/BPTool if you’re interested -I’ve seen them in Common Lisp as well). Interface designers are bad at making them, but some day there will be more of these. Why there are not more people automating the dreary assed LAMP/Rails stack with graphical languages, I don’t know. Probably because such drudges don’t know how to write a graphical language. This would actually be a very good Clojure application, once someone writes a native GUI for Clojure which compares to, say, Lush’s Ogre system (which, like everything else in Lush, is a small work of genius).
  • Programming paradigms are indeed useful for keeping idiots out of trouble, but a language is more useful if you can break paradigms, or switch to other paradigms when you need to.  Sure, most weak-brained people who are over impressed with their own cleverness shouldn’t try to break a paradigm, but sometimes you have to. I mean, macros are almost by definition broken paradigms, and that’s where a lot of Lisp magic happens. If you look at things that succeed, like C or C++ (or to a lesser extent, OCaML), there is a lot of paradigm breaking going on. Clojure is mostly functional, and partially parallel, but the ability to drop back into Java land is paradigm-breaking gold.
  • He thinks Clojure is an OO language. If you hurt your eyes staring at C++, Java and UML for most of your career: Forth or APL probably looks object oriented. I am sure you could write some OO style code in Clojure, but it would be breaking the programming paradigm, which he considers bad. I don’t consider that bad (though it will break parallelism); hell, I had a stab at array programming in Clojure. JBLAS array programming in Clojure is not a great fit, but the ability to do things like this is one of the things that makes Clojure useful.
  • Anyone who thinks, like this fella does, that garbage collected virtual machines are always a good idea has never done serious numerics, data acquisition or real time work, which is half of what makes the world go around. Most people who consider themselves programmers are employed effectively selling underpants on the internet using LAMP. Therefore most people think that’s what programming is. To my mind, that’s not as important to civilization as keeping the power on and the phone company running. Sure, some of the power and phone company run on virtual machines (Erlang is awesome -though slow): a lot of it don’t, and won’t ever be, as long as we’re using Von Neuman architectures and care about speed. Virtual machines are generally only optimized for what they are used for. People brag about how fast the JVM is; it’s not fast. Not even close to what I consider fast. For some things it is damn slow. Example: my ATLAS based matrix wrappers beat parallel Colt on the JVM by factors of 10 or more. And that’s with the overhead of copying big matrices from Clojure/Java. And that’s after the JVM dudes have been working on array performance for …. 20 years now? R* and kd-trees are preposterously slow on the JVM compared to the old libANN C++ library, or naive kd-tree implementations. Factors of 100k to 1E6. I may be wrong, but I’m guessing trees confuse the bejeepers out of the JVM (if some nerd becomes indignant at this assertion: you’re only allowed to comment if you have a kd-tree or R* tree  running on the JVM within a factor of 100 of libANN for sorts and searches on dimensions > 5 and 100k+ rows).  Sure, the JVM is modestly good at what it ends up being used for. What if I do other things? So don’t tell me “the last programming language” won’t have a compiler. A proper “last programming language” would work like OCaML: with compiler when you need it, and bytecode VM when you don’t.

Of course, there will never be a “universal language.”  Some languages are very good for specific purposes, and not so good in general. Some are  useful because they have a lot of legacy code they can call. All languages have strengths and weaknesses. Some languages are vastly more powerful than others, and can’t be used by ordinary people. Human beings have hierarchies in their ability to program, just as they have hierarchies in their abilities to play basketball, chess or run. Part of it is personal character, lifestyle and willingness to take it to the next level. Part of it is innate. Anyone who tells you otherwise is selling something.

There is also the matter that “programming” is an overly broad word, kinda like “martial arts.” A guy like “Uncle Bob” who spends his time doing OO whatevers has very little to do with what I do. It’s sort of like comparing a guy who does Tai Chi to a guy who does Cornish Wrestling; both martial arts, but,  they’re different. My world is made of matrices and floating point numbers. His ain’t.

As for Clojure: it’s a very good language, but the main reason it is popular is the JVM roots, and the fact that Paul Graham is an excellent writer. The JVM roots make it popular because there are many bored Java programmers. They also make it more useful because it can call a bunch of useful Java. Finally, Clojure fills a vast gaping void in the Java ecosystem for a dynamically typed interactive language that can seamlessly call Java code that Java programmers already know about. REPL interactivity beats the living shit out of eclipse, even if you never do anything Lispy.

IMO, there are better designed lisps; Common Lisp probably is (parts of it anyway). On the other hand, design isn’t everything: Clojure is more useful to more people than Common Lisp. Consider the differences between lein and ASDF. Lein as a design is kinda oogly; it’s basically a shell script which Does Things. Yet, it works brilliantly, and is a huge win for the Clojure ecosystem. Common Lisp native ASDF is probably very well designed, but it is practically useless to anyone who isn’t already an ASDF guru. ASDF should be taken out back and shot.
Clojure won’t be the last language. I forecast a decent future for Clojure. It will be used by Java programmers who need more power, and Lisp programmers who need useful libraries (it’s unbeatable for this, assuming you do the types of things that Java guys do). I will continue to invest in it, and use it where it is appropriate, which is lots of different places. I’ll invest in and use other tools when that is the right thing to do.

Foreshadowing: I’ve been playing around in APL land, and have been very impressed with what I have seen thus far.

Only fast languages are interesting

Posted in Clojure, Lush, tools by Scott Locklin on November 30, 2011

If this isn’t a Zawinski quote, it should be.

I have avoided the JVM my entire life. I am presently confronted with problems which fit in the JVM; JVM libraries, concurrency, giant data: all that good stuff. Rather than doing something insane like learning Java, I figured I’d learn me some Clojure. Why not? It’s got everything I need: JVM guts, lispy goodness; what is not to love?

Well, as it turns out, one enormous, gaping lacuna is Clojure’s numerics performance. Let’s say you want to do something simple, like sum up 3 million numbers in a vector. I do shit like this all the time. My entire life is summing up a million numbers in a vector. Usually, my life is like this:

 (let* ((tmp (rand (idx-ones 3000000))))
    (cputime (idx-sum tmp)))

0.02

20 milliseconds to sum 3 million random numbers enclosed in a nice tight vector datatype I can’t get into too much trouble with. This is how life should be. Hell, let me show off a little:

(let* ((tmp (rand (idx-ones 30000000))))
    (cputime (idx-sum tmp)))

0.18

180 milliseconds to sum up 30 million numbers. Not bad. 60 times worse than I’d like it to be (my computer runs at 2Ghz), but I can live with something like that.

Now, let’s try it in Clojure:

(def rands (repeatedly rand))
(def tmp (take 3000000 rand))
(time (reduce + tmp))

Java heap space
[Thrown class java.lang.OutOfMemoryError]

Restarts:
0: [QUIT] Quit to the SLIME top level

Backtrace:
0: clojure.lang.RT.cons(RT.java:552)
(blah blah blah java saying fuck you java blah)

Oh. Shit. Adding 3 million numbers makes Clojure puke. OK. How well does it do at adding, erm, 1/10 of that using my piddley little default JVM with apparently not enough heap space (@130mb).

(time (reduce + tmp)) "Elapsed time: 861.283 msecs"

Um, holy shit. Well, there is this hotspot thing I keep hearing about…

 

user> (def ^doubles tmp (take 300000 rands))
user> (time (reduce + tmp))
  "Elapsed time: 371.451 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 107.619 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 46.096 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 43.776 msecs"

Great; now I’m only a factor of 20 away from Lush speed … assuming I run the same code multiple times, which has a probability close to zero. Otherwise, with a typedef, I’m a factor of 200 away.

Maybe I should try using Incanter? I mean, they’re using parallel Colt guts in that. Maybe it’s better? Them particle physicists at CERN are pretty smart, right?

user> (def tmp (sample-uniform 300000 :mean 0))
#'user/tmp
user> (time (sum tmp))
"Elapsed time: 97.398 msecs" 150158.83021894982
user> (def tmp (sample-uniform 3000000 :mean 0))
#'user/tmp
user> (time (sum tmp))
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user>

A bit of hope, then …. Yaaargh!

Let’s look into that heap issue: firing up jconsole and jacking into fresh swank and clojure repl processes, I see … this:

I can’t really tell what’s going on here. I don’t really want to know. But it seems pretty weird to me than an idle Clojure process is sitting around filling up the heap, then garbage collecting. Presumably this has something to do with lein swank (it doesn’t do it so much with lein repl). Either way, this isn’t the kind of thing I like seeing.

Now, I’m not being real fair to Clojure here. If I define my random vector as a list in Lush (which isn’t really fair to Lush), and do an apply + on it, the stack will blow up also. The point is, Lush has datatypes for fast numerics: it’s designed to do fast numerics. Clojure doesn’t have such datatypes, and as a result, its numeric abilities are limited.

Clojure is neat, lein is very neat, and I’ve learned a lot about Java guts from playing with these tools. Maybe I can use it for glue code somewhere. I’m not going to be using it for numerics. Yeah, I probably should have listened to Mischa, but then if I had, I’d be writing things in numeric Perl.

 

Edit Add:

Thanks to Rob and Mike for showing me the way, and thanks everyone else for demonstrating my n00bness and 4am retardation

(let [ds (double-array 30000000)]
(dotimes [i 30000000] (aset ds i (Math/random)))
(time (areduce ds i res 0.0 (+ res (aget ds i)))))

"Elapsed time: 65.018392 msecs"

 

I daresay, this makes clojure “interesting” -or at least more interesting than it was a few hours ago. It would be nice if someone had already written some package which makes taking the sum of 3 million numbers a bit less of a chore (a la idx-sum). I mean, what’s going to happen when I have to multiply two matrices together?