Choose your weapon: Matlab, R or something else?
I don’t want to use three tools for dealing with data, but it’s actually necessary right now. I don’t think it will be necessary forever.
Lush is my general purpose programming language. It’s insanely great. Parts of it are wonky and slow, and parts of it are broken or missing, but it’s a lisp, it’s fast where I need it, and I like it a lot. More on this in a future entry. I use Lush for speed and original research. If there are no complex algorithms like what I need written in Matlab or R, I might as well write them in Lush. Lush is a high level language with low level speed when you need it. It would be perfect if it had more libraries. The only thing I may potentially like better is OCaML/F#, and frankly, I find the type inferencer there to get in the way more than it helps. If they made an OCaML where you could turn the type safety off most of the time, that would be better. Or, I could just be like everyone else and use Python or Java for this sort of thing. Not that there is anything wrong with that.
Matlab would be my second choice for hacking out original research. Why Matlab? Matlab is reasonably fast, but one of the main value adds is that it is extremely intuitive if you’ve used Fortran or C, and if you don’t know how to do something, the help system is very informative. Matlab code is also extremely well supported. The debugger, profiler and editor are all excellent; some of the best I’ve used. Sure, someone will argue that they have more powerful debugger, but Matlab’s is the most handy I’ve yet used. I don’t need to read a manual to use it; I just use it. Sure, emacs is way better than the Matlab editor, but it isn’t as handy as Matlab’s editor. You can use Matlab to do just about anything. I’ve used it to code up embedded systems using xPC target and Real Time Workshop. I’ve used it to code up trading systems, from data feed to broker interface. I’ve embedded it in Excel for end users. I’ve deployed it in Enterprise software used by Fortune 100 companies. It’s amazingly useful stuff, especially if you have the proper toolbox to accomplish your tasks. You can build reasonably good numeric software with it as long as you don’t need fancy “programmy” features like concurrency. If Matlab had a way of making fast compiled code, it would be close to perfect for the type of thing I do. I wouldn’t bother with Lush any more, except when I was trying to write interpreter type things. Alas, Matlab’s way of doing this is to write code for your time critical pieces in C, and embed it into your code in a fairly laborious process. The only real drawbacks to Matlab are speed, plotting and expense.
What is R good for then? Well, R is free, so many academics use it to share their latest econometric or machine learning software with everyone else. As such, just about everything statistical under the sun exists in R. And it’s free! What is not to love. Well, sadly, there is plenty not to love about R. First off, there is speed. R doesn’t seem to have anything that makes it inherently slow for an interpreted language: it should be comparable to Matlab in this regard. But it’s slow enough that most people do their heavy work in other languages. Most of the modules written for it have most of the code written in C or Fortran. This is somewhat true of Matlab also, and for the same reasons, but Matlab has a trivial way of telling you what you need to speed up, so R will always end up slower in practice. Second there is debugging. R is hard to debug. First off, it doesn’t drop you into an interactive top level the way Matlab (or Lush, or Python or anything where you write Real Programs) does. That sucks a lot, and removes a bunch of the utility of using an interpreted language. Oh, sure, there is a debugger, but it is buggy, poorly documented, and doesn’t work in the simple way that Matlab’s does. Thirdly, there is the syntax. Personally, I like the syntax; it’s a lot like OCaML. But most people don’t. What is more; the help system is very close to worthless if you’re trying to remember a simple command. People may say this is unfair, as I am just not used to R, but the fact is, I’ll never get as used to it as Matlab, and neither will anyone else. Oh, it’s OK for finding packages you want if you can think of the right keyword for them. But compared to Matlab, or even something like Lush, its online help is pretty worthless. Fourthly: for programming, while it should be better than Matlab in many ways, I haven’t ever seen a legible R program which was over 100 lines. I don’t know how they manage this. Part is doubtless the IDEs are rather bad. I don’t know anyone who claims they can write good, large pieces of software for R. I once asked a guy how he wrote big pieces of software, and he said, “very carefully.”
This sounds pretty bad, but there are solid reasons to use R. For one thing; it’s free. There is a lot to be said for free. Among other things, if you want to give some code away for others to play with, R is going to be a better vehicle than distributing raw C or a matlab package. For another thing, it has a tremendous amount of work done on various hard numeric problems, and installation is trivial: just press a button. Want to wire the latest AdaBoost up to your database, and plot some nice results: pretty easy in R. I might be able to do all this in Matlab, with the correct packages and so on, but in R, it’s the work of seconds. Another thing: it’s a lot easier to make fancy plots in R than it is in Matlab. Matlab’s plotting utility is from the dark ages. It’s insanely bad. You can abstract some of its badness away with objects, but … you shouldn’t have to. Finally, for interacting with data, R wins. Matlab’s matrix paradigm makes it easy to use, but data.frames are more powerful.
Here’s how my decision tree works. When I first heard about Benford’s law, I decided it was simple enough; I’d hack it out in Lush. I did. It worked, and I fiddled with it. Then I realized that goodness of fit to Benford’s distribution might be nice. I had chi-squared distributions already coded up in Lush, and some curve fitting stuff … but wiring it all together, then fiddling with the plotting routines: ugh. So, google informed me that some nice statistician had done all that work for me in R. So I used R. Probably, someone did it in Matlab also (actually, someone did), but it’s a pain to fire up my Windows laptop with Matlab on it, so I just went with R. That’s what R is good for. At some point, I’ll get Lush talking to R, at which point I may cease using Matlab unless someone pays me to do so. It will never be as slick as Matlab, and I will miss all the great user productivity features that Matlab offers, but it will get the job done better and quicker, I think.