Locklin on science

Please stop writing new serialization protocols

Posted in Design, fun by Scott Locklin on April 2, 2017

It seems that every day, some computer monkey comes up with a new and more groovy serialization protocol.

In the beginning, there was ASN.1 and XDR, and it was good. I think ASN.1 came first, and like many old things, it was very efficient. XDR was easier to use. At some point, probably before ASN.1, people noticed you could serialize things using stuff like s-expressions for a human readable JSON like format.

Today, we have an insane profusion of serializers. CORBA (which always sucked), Thrift,  protocol buffers,  Messagepack, Avro,  BSON,  BERT, Property Lists, Bencode (Bram … how could you?), Hessian, ICEEtch, CapnProto (because he didn’t get it right the first time), SNAC, Dbus, MUSCLE, YAML, SXDF, XML-RPC, MIME, FIX, FAST,  JSON, serialization in Python, R, PHP, ROOT and Perl… Somehow this is seen as progress.

Like many modern evils, I trace this one to Java and Google. You see, Google needed a serialization protocol across thousands of machines which had versioning. They probably did the obvious thing of tinkering with XDR by sticking a required header on it which allowed for versioning, then noticed that Intel chips are not Big Endian the way Sun chips were, and decided to write their own  semi shitty versioning version of XDR … along with their own (unarguably shitty) version of RPC. Everything has been downhill since then. Facebook couldn’t possibly use something written at Google, so they built “Thrift,” which hardly lives up to its name, but at least has a less shitty version of RPC in it. Java monkeys eventually noticed how slow XML was between garbage collects and wrote the slightly less shitty but still completely missing the point Avro. From there, every ambitious and fastidious programmer out there seems to have come up with something which suits their particular use case, but doesn’t really differ much in performance or capabilities from the classics.

The result of all this is that, instead of having a computer ecosystem where anything can talk to anything else, we have a veritable tower of babel where nothing talks to anything else. Imagine if there were 40 competing and completely mutually unintelligible versions of html or text encodings: that’s how I see the state of serialization today. Having all these choices isn’t good for anything: it’s just anarchy. There really should be a one size fits all minimal serialization protocol, just the same way there is a one size fits all network protocol which moves data around the entire internet, and, like UTF-8. You can have two flavors of the same thing: one S-exp like which a human can read, and one which is more efficient. I guess it should be little-endian, since we all live in Intel’s world now, but otherwise, it doesn’t need to do anything but run everywhere.

IMO, this is a social problem, not a computer science problem. The actual problem was solved in the 80s with crap like XDR and S-expressions which provide fast binary and human readable/self describable representations of data. Everything else is just commentary on this, and it only gets written because it’s kind of easy for a guy with a bachelors degree in CS to write one, and more fun to dorks than solving real problems like fixing bugs. Ultimately this profusion creates more problems than creating a new one solves: you have to make the generator/parser work on multiple languages and platforms, and each implementation on each language/platform will be of varying quality.

I’m a huge proponent of XDR, because it’s the first one I used (along with RPC and rpcgen), because it is Unixy, and because most of the important pieces of the internet and unix ecosystem were based on it. A little endian superset of this with a JSON style human semi-readable form, and an optional self-description field, and you’ve solved all possible serialization problems which sane people are confronted with. People can then concentrate on writing correct super-XDR extensions to get all their weird corner cases covered, and I will not be grouchy any more.

It also bugs the hell out of me that people idiotically serialize data when they don’t have to (I’m looking at you, Spark jackanapes), but that’s another rant.

Oh yeah, I do like Messagepack; it’s pretty cool.

How to shoot down a stealth fighter

Posted in Design by Scott Locklin on January 20, 2017

Editorial note: I actually wrote most of this five years ago, but was reluctant to publish it for misguided patriotic reasons. Since people are starting to talk about it, I figure I might as well bring some more sense to the discussion.

I’ve already gone on record as being against the F-35. Now it’s time to wax nerdy as to why this is a dumb idea. I’m not against military spending. I’m against spending money on things which are dumb. Stealth fighters are dumb. Stealth bombers: still pretty dumb, but significantly less dumb.

f-35-turkey

I have already mentioned the fact that the thing is designed for too many roles. Aircraft should be designed for one main role, and, well, it’s fine to use them for something else if they work well for that. The recipe for success is the one which has historically produced good airplanes: the P38 Lightning, the Focke-Wulf Fw-190, the F-4, the F-16, the Su-27, and the A-10. All of these were designed with one mission in mind. They ended up being very good at lots of different things. Multi-objective design optimization though, is moronic, and gets us aircraft like the bureaucratic atrocity known as the F-111 Aardvark, whose very name doesn’t exactly evoke air combat awesomeness.

What is stealth? Stealth is a convergence of technologies which makes an aircraft electronically unobservable, primarily via Radar. The anti-radar technology is two-fold: the skin of the aircraft can be radar absorbent, but the main trick is to build the aircraft in a shape which scatters the radio energy away from the radar set which sent the signal.  What is a fighter? A fighter is an aircraft that shoots down other aircraft. Fighters use guns, infrared guided missiles and radar guided missiles. Most modern radar guided missiles work by pointing the missile more or less in the target direction, illuminating the target with radar (from the jet, or from the missile itself; generally from the missile itself these days), and launching. The wavelength of the missile and jet radar is dictated by the physical size of the missile or jet. The main purpose of radar-resistant technology for a stealth fighter is avoiding being detected in the first place by enemy radar, but also defeating radar guided air to air missiles.

Of course, what nobody will tell you: the air to air radar guided missiles haven’t historically been very effective. The US has some of the best ones; the AMRAAM. They’ve only shot down 9 aircraft in combat thus far using this weapon; it has a kill probability of around 50% depending on who you ask. Previous generations of such missiles (the AIM-4AIM-7 and Phoenix) were fairly abysmal. The AIM-4 was a complete failure. The AIM-7, also a turkey in its early versions with a 10% kill probability in the Vietnam War (later versions were better). The Phoenix never managed a combat success, despite several attempts, though it was somehow considered a program success anyway, mostly by paper pushing war nerds. By and large, the venerable IR guided sidewinder works best. Amusingly, the Air Force thought the beyond visual range radar guided air to air missile would make stuff like guns and dogfighting obsolete … back in the 1950s. They were so confident in this, most of the Vietnam era fighters didn’t come equipped with guns. They were completely wrong then. They’re almost certainly wrong now as well. Yet, that is the justification for fielding the gold plated turd known as the F-35; a fighter so bad, it can’t even out fight a 45 year old design.

Oh. Well, stealthy planes can defeat the IR missiles that end up being used most of the time, right? No, actually. The stealthy technology can’t really defeat such missiles, which can now home in on a target which is merely warmer than the ambient air. I could build such a sensor using about $40 worth of parts from Digikey. All aircraft are warmer than the ambient air, even “stealthy” ones. Friction is one of the fundamental laws of physics. So, if a stealth fighter is located at all, by eyesight, ground observers or low frequency radars or whatever: an IR missile is a big danger. Worse, the planes which the US is most worried about are Russian made, and virtually all of them come with excellent IR detectors built into the airframe itself.  Airplane nerds call this technology IRST, and the Russians are extremely good at it; they’ve had world beating versions of this technology since the 1980s. Even ancient and shitty Russian jets come with it built into the airframe. The US seems to have mostly stopped thinking about it since the F-14. A few of the most recent F-18s have it strapped as an expensive afterthought to fuel tanks (possibly going live by 2018), and the F-35 (snigger) claims to have something which shoots sharks with laser beam eyes at enemy missiles, but most of the combat ready inventory lacks such sensors.

There is no immunity to gunfire, of course, so if you see a Stealth fighter with your eyeballs, and are close enough to draw a 6, you can shoot it down.

Now, it’s worth thinking a bit about the fighter role. What good is an invisible fighter? There are a couple of issues with the concept, which has never actually been usefully deployed in combat anywhere in all of history. It is also rarely spoken of. If you want to shoot down other jets with your stealth fighter, you have to find them first. To find them, the best way to do it is using radar. Maybe you can do this with AWACS.  AWACS somewhat assume air superiority has already been established. They’re big lumbering things everyone can see, both because they have giant signatures to radar, and because they are emitting radar signals. Maybe you can turn on your stealth fighter’s radar briefly, and hope the enemy’s electronic warfare facilities can’t see it, or hope the passive radar sensors work. Either way, you had better hope it is a fairly big country, and it is dark outside, or someone could find your stealth fighter. People did a reasonable job of spotting planes with binoculars and telephones back in the day. Modern jets are a little more than twice as fast as WW-2 planes, but that’s still plenty of time to alert air defences. Invisibility to radar guided missiles is only of partial utility; if you’re spotted, and your aircraft isn’t otherwise superior in air combat (the F-22 is), you stand a decent chance of being shot down. So, for practical use as a fighter, stealthiness is only somewhat theoretically advantageous. It’s really the attack/bomber role where Stealthiness shines as a concept; mostly for taking out air defences on the ground.

The F-117 (which was a misnamed stealth attack aircraft, an actual use for the technology) was shot down in the Serbian war by a Hungarian baker  by the name of Zoltan Dani.  The way he  did it was as follows: first, he had working radars. He did this by only turning them on briefly, and moving them around a lot, to avoid wild-weasel bombing raids. He also used couriers and land line telephones instead of radio to communicate with the rest of his command structure; he basically had no radio signal which could have been observed by US attack aircraft. He also had “primitive” hand tuned low-frequency radars. Low frequency means long wavelength. Long wavelength means little energy is absorbed by the radar absorbent materials, and, more importantly, almost none of it is scattered away from the radar receiver. Since the wavelength of a low-frequency radar is comparable to the size of the aircraft itself, the fine detail which scatters away modern centimeter-wavelength radars doesn’t have much effect on meter-wavelength radar. Mr Dani shot his SA-3 missiles up, guided it in using a joystick, and that was the end of the F-117, a trophy part of which now hangs in the garage of a Hungarian baker in Serbia.

zoltan-dani-the-serbian-commander-who-shot-down-f-117a-620x330

best hunting trophy ever

Similarly, if you want to shoot down stealth fighters, you need an integrated air defense system which uses long wavelength radars to track targets, and you dispatch interceptors to shoot them down with IR missiles, guided in by the air defense radar. Which is exactly how the Soviet Mig-21 system worked. It worked pretty well in Vietnam. It would probably work well against F-35’s, which are not as maneuverable as Mig-21’s in a dogfight. The old Mig-21 certainly costs less; I could probably put a Mig-21 point defense system on my credit cards. Well, not really, but it’s something achievable by a resourceful individual with a bit of hard work. A small country (I dunno; Syria for example) can afford thousands of these things. The US probably can’t even afford hundreds of F-35s.

Maybe the F-35 is going to be an OK replacement for the F-117? Well, sorta. First off, it is nowhere near as stealthy. Its supersonic abilities are inherently unstealthy: sonic boom isn’t stealthy, afterburners are not stealthy, and supersonic flight itself is pretty unstealthy. It does have an internal “bomb bay.” You can stuff one 2000lb JDAM in it (or a 1000lb one in the absurd VTOL F-35B). The F-117 had twice the capacity, because it was designed to be a stealth attack plane from the get go, and didn’t have to make any compromises to try to get it to do 10 other things. You could probably hang more bombs on an F-35’s ridiculously stubby little wings. But bombs hanging on a wing pylon make a plane non-stealthy. So do wing pylons. In clean, “stealthy” mode, the thing can only fly 584 miles to a target, making it, well, I guess something with short range and limited bomb carrying capability might be useful. The F-117 had twice the range. So, an F-35 is about a quarter as effective in the attack role as the F-117 was, without even factoring in the fact that it is only about a twice the radar cross section of an F-117. It kind of sucks how the F-35 costs a lot more than the F-117, which was designed for and demonstrably more useful for this mission. It’s also rather confusing to me as to why we need 2000 such things if they ain’t fighters with a significant edge against, say, a late model F-16 or Superhornet. But then, I’m not a retired Air Force General working at Lockheed. I’m just some taxpayer in my underpants looking on this atrocity in complete disbelief.

There are three things which are actually needed by Air Force procurement.  We have a replacement for the F-15 in air superiority role: the F-22. It works, and it is excellent; far more effective than the F-35, cheaper and stealthier to boot. We can’t afford many of them, and they have problems with suffocating their pilots, but we do have them in hand. If it were up to me, I’d keep the stealthy ones we got, make them attack planes, and build 500 more without the fancy stealth paint for air superiority and ground attack. It will be cheaper than the F-35, and more capable. Everyone will want to “update the computers.” Don’t.

The most urgent need is for a replacement for the F-16; a small, cheap fighter plane that can be used in the interceptor/air superiority role. The US needs it. So do the allies. It doesn’t need to be stealthy; stealth is more useful in the attack role. Building a better F-16 is doable: the Russian MIG-35, and Dassault Rafale all manage it (maybe the Eurofighter Typhoon also, though it isn’t cheap). I’m sure the US could do even better if they’d concentrate on building a fighter, rather than a do-everything monstrosity like the F-35. I’m sure you can strap bombs to a super F-16 and use it in the attack role as well, once your stealth attack planes have taken out the local SAMS and your air superiority planes have taken out the fighters. Making a fighter plane with a bomb-bay for stealth, though, is a loser idea. If I were king of the world: build a delta winged F-16. The prototype already exists, and there was nothing wrong with the idea. It’s pathetic and disgusting that the national manufacturers simply can’t design even a small and cheap replacement for the ancient T-38 supersonic trainers. All of the postulated ones under consideration are foreign designs. The best one is actually a Russian design; the Yak-130.

The second need is a replacement for the F-117 for stealthy attack on radar and infrastructure. F-35 doesn’t even match the F-117 in this role. The F-22 almost does, but it is expensive and largely wasted on this role. I thought the Bird of Prey was a pretty good idea; something like that would serve nicely. Maybe one of the stealthy drones will serve this purpose. Whatever it is, you could build lots of them for the price of a few dozen F-35s.

Finally, we urgently need a decent attack plane for close air support. The F-35, and F-35B will be a titanic failure in this role. They have neither the armor nor endurance required for this. You could shoot it down with a large caliber rifle shooting rounds that cost $0.50. This one is dirt simple: even the A-10 is too complicated. Just build a propeller driven thing. Build a turboprop A-1 Skyraider. The Tucano is too small to cover all the bases. Presumably someone can still build a F4U Corsair or F6F Hellcat and stick a turboprop in it, some titanium plates around the cockpit, and shove a 30mm cannon in the schnozz. People build such things in their backyards. It shouldn’t be beyond the magnificent engineering chops of the present day “Skunk Works” at Lockheed or one of the other major manufacturers. Using inflation on the A-1 or calculating such a device as approximately 1/4 of a C-130, you should be able to build one in the $5m range and have 30-50 of them for each F-35 they replace.

The entire concept of “Stealth Fighter” is mostly a fraud. Stealth bombers and tactical attack planes have a reasonable use case. Stealth fighters are ridiculous. The F-35 is a gold plated turd which should be flushed down the toilet.

The Parable of Zoltán Dani: Dragon Slayer

http://nationalinterest.org/blog/the-buzz/get-ready-china-the-us-navys-possible-stealth-killer-coming-18966

http://www.motherjones.com/politics/2000/01/pentagons-300-billion-dollar-bomb

http://www.ausairpower.net/APA-2009-01.html

http://www.popularmechanics.com/military/a8107/russian-made-tech-vs-americas-stealth-warplanes-13506974/

https://warisboring.com/the-pentagon-has-figured-out-how-to-hunt-enemy-stealth-fighters-3acf9d25cd44#.lm0ryanaa

The Leduc ramjet

Posted in big machines by Scott Locklin on November 29, 2016

I ran across this gizmo from looking at Yann LeCun’s google plus profile, and wondering at the preposterous gadget sitting next to him at the top. Figuring out what it was, I realized the genius of the metaphor. Yann builds things (convolutional networks and deep learning in general) which might very well be the Leduc ramjets of machine learning or even “AI” if we are lucky. Unmistakably Phrench, as in both French and physics-based, original in conception, and the rough outlines of something that might become common in the future, even if the engineering of the insides eventually changes.

image014

Rene Leduc was working on practical ramjet engines back in the 1930s. His research was interrupted by the war, but he was able to test his first ramjet in flight in 1946. The ramjet seems like a crazy idea for a military aircraft; ramjets don’t work until the plane is moving. A ramjet is essentially a tube you squirt fuel into which you light on fire. The fire won’t propel the engine forward unless there is already a great deal of air passing through. It isn’t that crazy if you can give it a good kick to get it into motion. If we stop to think about how practical supersonic aircraft worked from the 1950s on; they used afterburners. Afterburners to some approximation, operate much like inefficient ramjets; you squirt some extra fuel in the afterburning component of the engine and get a nice increase in thrust. Leduc wasn’t the only ramjet guy in town; the idea was in the proverbial air, if not the literal air. Alexander Lippisch (a German designer responsible for the rocket powered Komet, the F-106 and the B-58 Hustler) had actually sketched a design for a supersonic coal burning interceptor during WW-2, and his engine designer was eventually responsible for a supersonic ramjet built by another French company. The US also attempted a ramjet powered nuclear cruise missile, the SM-64 Navaho, which looks about as bizarre as the Leduc ramjets.

navaho

Navaho SM-84

In fact, early nautical anti-aircraft missiles such as the Rim-8 Talos used ramjets for later stages as well. The bleeding edge Russian air to air missile, the R-77, also uses ramjets as does a whole host of extremely effective Russian antiship missiles. Ramjets can do better than rockets for long range missilery as they are comparably simple, and hydrocarbon ramjets can have longer range than rockets. Sticking a man in a ramjet powered airframe isn’t that crazy an idea. It works for missiles.

The Leduc ramjets didn’t quite work as a practical military technology, in part due to aerodynamic problems, in part because they needed turbojets to get off the ground anyway, but they were important in developing further French fighter planes.  They were promising at the time and jam packed with innovative ideas; the first generation of them was much faster in both climb and final speed than contemporary turbojets.

lu022

Ultimately, their technology was a dead end, but what fascinates about them is how different, yet familiar they were. They look like modern aircraft from an alternate steampunk future. Consider a small detail of the airframe, such as the nose.  The idea was a canopy bubble would cause aerodynamic drag. Since ramjets operate best without any internal turbulence, the various scoops and side inlets you see in modern jets were non starters. So they put the poor pilot in a little tin can in the front of the thing. The result was, the earliest Leduc ramjet (the 0.10) looked like a Flash Gordon spaceship. The pilot was buried somewhere in the intake and had only tiny little portholes for visibility.

leduc010_05

Later models incorporated more visibility by making a large plexiglass tube for the pilot to sit in. Get a load of the look of epic Gaulic bemusement on the pilot’s “avoir du cran” mug:

faire la moue

faire la moue

Leduc_022_Ramjet

The later model shown above, the Leduc 0.22, actually had a turbojet which got it into the air. It was supposed to hit Mach-2, but never did. In part because the airframe didn’t take into account the “area rule” effect which made supersonic flight practical in early aircraft. But also in part because the French government withdrew funding from the project in favor of the legendary Dassault Mirage III; an aircraft so good it is still in service today.

The Leduc designs are tantalizing in that they were almost there. They produced 15,000 lbs of thrust, which was plenty for supersonic flight. A later ramjet fighter design, the Nord Griffon actually achieved supersonic flight, more or less by using a more conventional looking airframe. Alas, turbojets were ultimately less complex (and less interesting looking) so they ended up ruling the day.

najn8jtz6pwe1z2bilpsdk5

As I keep saying, early technological development and innovative technology often looks very interesting indeed. In the early days people take big risks, and don’t really know what works right. If you look at a radio from the 1920s; they are completely fascinating with doodads sticking out all over the place. Radios in the 50s and 60s when it was down to a science were boring looking (and radios today are invisible). Innovative technologies look a certain way. They look surprising to the eye, because they’re actually something new. They look like science fiction because, when you make something new, you’re basically taking science fiction and turning it into technology.

Some videos:

Numeric linear algebra code: an appreciation

Posted in Design by Scott Locklin on May 26, 2016

Most computer science guys and “software engineers” never deal with the floating point end of their computer machines. This is too bad, as the computer was invented basically to do floating point calculations, whether to build nuclear weapons or to intercept Rooskie bombers. The present use cases of using computers to watch TV, sell underpants on the internet, mining bitcoins, and sending email to the guy across the cubicle from you are later developments: all integer oriented.

Because the subject is so old and obscure, you have to be a sort of historian of science to understand why things are the way they are on the floating point side of the computer machine. You also have to know Fortran, because that’s more or less what most of it is written in. In fact, you have to know old Fortran, because an awful lot of it was written a long time ago.

Much longer ago ago than this even

It is an exaggeration to say everything you can do to simulate or model the world happens using numeric linear algebra of some kind, but it isn’t much of an exaggeration. Numeric linear algebra can be dense, meaning all numbers are defined, and a matrix looks like a matrix in memory, or sparse meaning most of the numbers are 0 and it looks like a hash table in memory.  Most problems are inherently dense, but sparsification can make impossible things possible, and there are many data science applications (recommendation engines for example) where sparsity can be assumed if you think your matrix of data has any meaning to it.

You’d think this sort of thing is a trivial problem, but it isn’t. The reason it is two fold. One is the operations are hard. A naive  dense square matrix multiply with no trickery involved is O(n^3). If it’s dense, and you need to keep all three matrices around, it’s 3 * O(n^2) in memory. There is an additional problem here which “big O” doesn’t say much about. When you have big-O memory problems, you have IO problems. Your computer doesn’t operate directly on memory chips you install on your computer; it operates directly on registers. While matrix multiply is just vector multiply and sum, the conga dance of doing this in a way that maximizes the data moving in and out of the registers, caches and main memory is extremely intricate and usually machine dependent (because everyone’s computer has a slightly different memory profile).

Most of you have never thought about these matters, because you haven’t tried to simulate a quantum mechanical system, or solved differential equations using finite element analysis. If you build your linear algebra library right, you can get factors of 100 or 1000 speedup. That’s not a big-O speedup, but it sure as hell matters. Computer dorks worry a lot about big-O. Of course, big-O matters very much for both memory and processor cycles. But other things matter too, and computer dorks rarely explicitly worry about them, in part because they have no way of really understanding when they’re happening. I’d posit for any interestingly “large” problem, you’re dealing with something I call “big-IO.” Once you get past having a correct linear algebra algorithm, big-IO is screechingly important on practical problems. It’s the difference between being able to solve a problem and not solve a problem.

I think I’ve convinced any sane reader that matrix multiplies are hard. But this is usually the easiest thing you have to deal with, as it is an actual array operation; something that can be done without explicit loops and branches. QR decompositions, Givens rotations, Eigenvalues, Eigenvectors, Singular Value Decompositions, LU decomposition, inverses, partial inverses; these are more algorithmically challenging. Depending on the properties of the matrix you’re working with (and there’s no simple way to make matrices introspective to tell you this), there may be cheap solutions to these problems, and there may be very expensive ones. Lots of interesting problems are in the O(n^3) range. Regardless of what you use, you have “big-IO” bottlenecks everywhere.

Further complicating your problem, linear algebra on a computer also takes place on several types of floating point data: 32bit/float, 64bit/double and the short and long versions of complex numbers.  Some problems are inherently only 32 bit, some are 64 bit; some are inherently real numbers, and some complex numbers. So already, we’re talking about every kind of operation having 4 different types of implementation, all of which must be written and maintained and optimized for your IO and data locality requirements.

 

Most dense linear algebra work presently happens in something called LAPACK. If you want to calculate eigenvectors in Matlab, Incanter/Clojure, Python, R, Julia and what have you: it’s almost certainly getting piped through LAPACK. If you’re working in C, C++ or Fortran, you’re still probably doing in via LAPACK. If you’re doing linear algebra in Java, you are a lost soul, and you should seriously consider killing yourself, but the serious people I know who do this, they allocate blobs of memory and run LAPACK on the blobs. The same thing was true 25 years ago. In fact, people have basically been running their numerics code on LAPACK for over 40 years when it was called EISPACK and LINPACK.  Quite a lot of it conforms with… Fortran 66; a language invented 50 years ago which is specifically formatted for punched cards. That is an astounding fact.

mx5

Lapack: significantly older than this photo; still being used on your computer now

LAPACK is a sort of layercake on top of something called the BLAS (“Basic Linear Algebra Subroutines”). I think BLAS were originally designed as a useful abstraction so your Eigendecomposition functions have more elementary operations in common with your QR decomposition functions. This made LAPACK better than its ancestors from a code management and complexity point of view. It also made LAPACK faster, as it allowed one to use machine optimized BLAS when they are available, and BLAS are the type of thing a Fortran compiler could tune pretty well by itself, especially on old-timey architectures without complex cache latency. While most of these functions seem boring, they are in fact extremely interesting. Just to give a hint: there is a function in there for doing matrix-multiply via the Winograd version of the Strassen algorithm, which multiplies matrices in O(N^2.8) instead of O(N^3). It’s the GEMMS family of functions if you want to go look at it. Very few people end up using GEMMS, because you have to be a numerical linear algebraist to understand when it is useful enough to justify the O(N^0.2) speedup. For example, there are exactly no Lapack functions which call GEMMS. But GEMMS is there, implemented for you, just waiting for the moment when you might need to solve … I dunno, a generalized Sylvester equation where Strassen’s algorithm applies.

BLAS can be hyperoptimized for an individual machine by specialized packages like ATLAS or GotoBLAS (and descendants BLIS and OpenBLAS). Usually, it’s worth a factor of 5 or so, so I usually do it, even though it is a pain in the ass. Those packages with your Linux distribution that call themselves ATLAS BLAS; they are technically, but they’re never as fast as rolling your own. The original GotoBLAS were  mostly hand tuned assembler, meta-compiled into very fast BLAS for various permutations and combinations of supported architecture. They’re not called GotoBLAS because they use goto statements (though they do); they’re named after the author, Goto Kazushige, who appears to be some kind of superhuman mentat.

LAPACK is intensely important. Computer people who are not physicists or electrical engineers have probably never given it a second thought. The fact that they’re old Fortran code has implications. The most obvious one is Fortran arrays are column major, just like Linear Algebra itself is. I have no idea why C decided to be row-major; probably because they weren’t doing linear algebra. There are ways of handling this, but it ‘s one more thing you have to keep track of, and it can prevent you from doing things on the C side without expensive memory copies. There is also the matter that because old school Fortran looks like stack allocation of arrays (in code appearance only), you have to do weird things like instantiate your arrays and bizarro looking “workspaces” way up the calling pipeline when you’re calling in a C-like language. This is painful, and efforts have been made (LAPACKE) to make the API a little more friendly where this is done under the covers.

mx1

The problem with stuff like this: if you’ve already gone through the trouble of porting a version of C or Fortran LAPACK to your system, LAPACKE makes you do it all over again. The only thing you get out of this exercise is a nicer calling convention, except you have to write the code for your new and improved calling convention, and you don’t get anything extra from this work. So the old calling conventions persist, and probably will persist for another 20-30 years. People will probably be coding up errors that amount to “punched card rejected” in 2050 or 2060.

It’s weird that we still use LAPACK, but considering the difficulty of writing such a thing and moving it from place to place, not terribly surprising. There have been attempts at rewrites, some of which look pretty good to me. FLENS was one of them. I’ve never actually used this, though I have struggled with some vintage of its peculiar build system when examining a neural net package I was interested in. Considering the fact that I found it easier to write my own neural net than to get an old Python package based on FLENS working, I don’t see this project taking over any time soon. While calling FLENS from C++/Python might have been sweet, it didn’t seem useful enough to justify its existence. Another attempt was LibFLAME, which I have never attempted to use. LibFLAME was claiming some performance improvements, which is a reason to think about using it. It was clean and easy to read and came with Goto-san’s enhanced BLAS. Unfortunately, the last major release was in 2010, and the last point release in March of 2014, so I’m presuming the authors are doing other things now. As a historical point, LibFLAME actually employed Goto-san back when he was writing GotoBLAS. Unfortunately for the libFLAME team, he is now at Intel, presumably working on MKL.

mx3

For all this, you’d think there has been no progress in numeric dense linear algebra. This isn’t remotely true. It is one of the most interesting areas in computer science research at present. All kinds of crazy things have happened since LAPACK was written. They’re scattered to the winds, existing as scraps of C/Matlab/Fortran. In fact, I would even go so far as to say LAPACK doesn’t even contain the most interesting subset of available dense numeric linear algebra routines, though it inarguably contains the most useful subset. Just one obvious example of its deficiencies; the standard eigen-decomposition routine in LAPACK calculates the complete eigen-decompostion of a matrix. Very often, you only need the top eigenvectors, and that’s a lot cheaper; you can use a Lanczos algorithm. What does one do when this happens? Well, there’s this other ancient Fortran numeric linear algebra package called ARPACK out there. Usually, this gets used, very often for this one function.  What about sparse methods? Well, that’s another package. In Fortran days it was IMSL or SparseKit. Now a days, there are many choices.

If I had to bet on a future linear algebra package taking over, it would be libelemental.  I only heard about this last year (thanks Ryan), and I have yet to use it for anything important, but it appears to be a very good place to stick new numeric linear algebra algorithms. The problem with it is it depends on a lot of other stuff; LAPACK,  libFLAME (optionally for the good parts anyway), Scalapack (old Fortran code for distributed matrix math) and the kitchen sink. It would be nice if they eventually replace these old pieces, but in the meanwhile, it has a decent framework for fetching and building some version of them. A very big upside to libelemental is it gives you access to a lot of useful old code, via a nice modern C API and a modern C++ API. It also gives you access to some useful things built on top of these guys, such as non-negative matrix factorization and skeleton decomposition. It  includes convex optimization routines, logistic regression, and quite  few other things that are nice to have kicking around. With luck, Jack Poulson, the primary author, will dedicate enough of his time to keep this going and turn it into something that lots of other code depends on out of the box. Seems like there is enough interest at Stanford to keep this moving forward.