Locklin on science

Technologies which did not live up to the hype

Posted in Progress by Scott Locklin on May 14, 2017

There are many, many false technological alleys which continue to be pimped as things worth investing time and money into.

  • Automotive Gas Turbines:
    In the 1960s, several car manufacturers made gas turbines. The batmobile, for example. Turbines were supposed to be “jet age” machines of the future. You could get more fuel efficiency and energy per cubic meter out of the things, plus they were simpler in design (in theory, only one moving part), and easier to cool (they basically cool themselves). Unfortunately, gas turbines are lousy at acceleration in stop and go traffic. You might some day have one in your laptop though.
  • Disco Space Colonies:
    Back in the 1970s, when America had just been exposed to its first real energy crisis, a fellow by the name of Gerald Kitchen O’Neill came up with the marvelous scheme of blasting enormous chunks of glass and metal into space (presumably not using fossil fuels), which would produce clean solar energy and beam it back to earth in the form of microwave radiation. O’Neill wasn’t just some looney with a Pete Rose bowl cut; he invented the particle storage ring. His disco space colony idea was related to this in that he suspected the “mass driver” -a sort of particle accelerator for large pieces of matter he also invented, might one day be an important component for construction of such colonies. He also proposed this immediately after the Apollo Space program, which was a huge success, and made space travel look routine rather than extreme. He forgot to take into account that, at it’s peak, the Apollo program was consuming about 1% of the economic output of America. Just to send a couple of freemasons to the Moon, let alone skyscrapers filled with space colonists, giagantor microwave guns and mongo solar panels made of unobtanium the equivalent distance. Amusingly, a guy named Eric Drexler was one of O’Neill’s proteges via an MIT conference on space colonization in the 1970s. I like to think Drexler realized one could make a career out of making scientific sounding honking noises about impossible technology from hanging out with O’Neill.
  • Nanotech:
    I’d first heard of nanotech as a science fiction plot device. I never gave it much thought until I was writing my Ph.D. thesis. Sitting in the library alone with my laptop in my own personal hell, I did a ton of procrastination reading. One of the things I read was Drexler’s alleged science book on nanotech; “Nanosystems: Molecular Machinery, Manufacturing, and Computation.” I figured it would be interesting and inspiring, as it was a very famous Ph.D. thesis, and of course, nano-stuff is gonna change the world, right? Hell, there was a ton of funding coming into my lab to set up the center for nanotechnology (name changed to equally bullshittium “molecular foundry”); maybe I could stick out my hat and capture some low hanging fruit! By the time I was done this book, I finally made up my mind to go into finance and applied mathematics. It is the sheerest science fiction. Almost every assertion he made about what is possible is wrong. Much of the “science” asserted as fact is obvious baloney. Many of the things he waves around as trivial violate the laws of thermodynamics. Matter simply doesn’t work the way he wants it to. I remember running into a very bright surface scientist who had gotten on board the mighty gravy train of nano-nonsense at tea time shortly after reading this book. I was all, “dude; Drexler is smoking crack!” My pal gave a world weary moue, and agreed that one could make a living correcting Drexler. But, the money was good, and there was interesting material science to be done under the rubric of “nano.”
  • Fuel Cells:
    Fuel cells are one of those ideas that’s been around for almost as long as regular chemical batteries; since the early 1800s. The problem with fuel cells has been obvious since then. They’re hugely expensive, big, fragile and they either require extremely clean fuels like liquid hydrogen, or they wear out fairly quickly. There isn’t much that technology can do about this, though mass production may lower the cost some. And nobody likes the idea of driving around with a bunch of hydrogen in the tank of their car.
  • Biotech:
    Biotech provides employment for a lot of my smart friends. None of them have been able to tell me what their work actually does for humanity. In the realm of human health, it has enabled enormously fat people to live longer and eat more sugar, by making humulin cheaper than what they used to extract from dead racehorses. It also allows idiot bodybuilders to inject themselves with human growth hormone grown in toilet water, instead of HGH extracted from pineal glands of cadavers. There are also enormously expensive and mostly ineffective drugs used in certain kinds of cancer. In agriculture, it has provided some modest benefits, and created an entire industry of paranoids who think they’ll grow 8 heads if they eat genetically modified corn (which gets fed to cows anyway). While this could change in the future, I’ve been hearing about how biotech is going to change everything since Genentech was founded in 1976.
  • Stem Cell Research:
    Remember stem cell research? How we were going to cure Parkinsons and chewing with your mouth open using stem cells? How the eeebil Jeebers creeps from the middle of the country were denying the progress of science by keeping the white jackets in test tubes from sticking embryos in a blender? Well, as I recall, nothing ever came of it. It’s not because it was banhammered (it isn’t, mostly); it’s just not useful. I mean, it was politically useful for beating up on people who are classically religious rather than politically religious people who “fucking love science.” But to first order, the political battle seems to have been the main contribution of stem cell research to human culture.
  • Quantum Computing:
    I opined that it was probably a big nothingburger 7 years ago, despite having myself expended some not-inconsiderable time thinking out the semiclassical dynamics of such a device. Nothing has happened since then to revise my opinion on the subject. It’s now been 32 years since David Deutsch had his big idea. He’ll most certainly die before a useful quantum computer exists. I probably will too, as will everyone reading this prediction, making me, alas, unable to collect on the bet. All you need do is look at history: people had working computers before Von Neumann and other theorists ever noticed them. We literally have thousands of “engineers” and “scientists” writing software and doing “research” on a machine that nobody knows how to build. People dedicate their careers to a subject which doesn’t exist in the corporeal world. There isn’t a word for this type of intellectual flatulence other than the overloaded term “fraud,” but there should be.

I don’t think people should abandon all thought of any of the above subjects. Nor any of the abundant subjects which are presently grossly overrated by futurologists. I do think these historical examples should give any young researcher pause when it comes to devoting their lives to future boondoggles. Do you really want to work in the technological equivalent of macro-economics?

The more hype there is around a subject, the more  marketing personnel and quasi academic mountebanks there are involved in promoting it, the less likely it is to be important or useful. The really important breakthroughs of the last 20-40 years; networking protocols, photolithography improvements, cryptography, various improvements in statistics, signal processing and linear algebra and such; these have been relatively quiet occurrences.

If you want to make a difference in the world, learn some practical math, physics and chemistry. Ignore the wares of humbugs and quacks, keep your nose to the grindstone and read Phil Anderson (greatest physicist of our era);

Feynman’s cryptic remark, “no one is that much smarter …,” to me, implies something Feynman kept emphasizing: that the key to his achievements was not anything “magical” but the right attitude, the focus on nature’s reality, the focus on asking the right questions, the willingness to try (and to discard) unconventional answers, the sensitive ear for phoniness, self-deception, bombast, and conventional but unproven assumptions.


Please stop writing new serialization protocols

Posted in Design, fun by Scott Locklin on April 2, 2017

It seems that every day, some computer monkey comes up with a new and more groovy serialization protocol.

In the beginning, there was ASN.1 and XDR, and it was good. I think ASN.1 came first, and like many old things, it was very efficient. XDR was easier to use. At some point, probably before ASN.1, people noticed you could serialize things using stuff like s-expressions for a human readable JSON like format.

Today, we have an insane profusion of serializers. CORBA (which always sucked), Thrift,  protocol buffers,  Messagepack, Avro,  BSON,  BERT, Property Lists, Bencode (Bram … how could you?), Hessian, ICEEtch, CapnProto (because he didn’t get it right the first time), SNAC, Dbus, MUSCLE, YAML, SXDF, XML-RPC, MIME, FIX, FAST,  JSON, serialization in Python, R, PHP, ROOT and Perl… Somehow this is seen as progress.

Like many modern evils, I trace this one to Java and Google. You see, Google needed a serialization protocol across thousands of machines which had versioning. They probably did the obvious thing of tinkering with XDR by sticking a required header on it which allowed for versioning, then noticed that Intel chips are not Big Endian the way Sun chips were, and decided to write their own  semi shitty versioning version of XDR … along with their own (unarguably shitty) version of RPC. Everything has been downhill since then. Facebook couldn’t possibly use something written at Google, so they built “Thrift,” which hardly lives up to its name, but at least has a less shitty version of RPC in it. Java monkeys eventually noticed how slow XML was between garbage collects and wrote the slightly less shitty but still completely missing the point Avro. From there, every ambitious and fastidious programmer out there seems to have come up with something which suits their particular use case, but doesn’t really differ much in performance or capabilities from the classics.

The result of all this is that, instead of having a computer ecosystem where anything can talk to anything else, we have a veritable tower of babel where nothing talks to anything else. Imagine if there were 40 competing and completely mutually unintelligible versions of html or text encodings: that’s how I see the state of serialization today. Having all these choices isn’t good for anything: it’s just anarchy. There really should be a one size fits all minimal serialization protocol, just the same way there is a one size fits all network protocol which moves data around the entire internet, and, like UTF-8. You can have two flavors of the same thing: one S-exp like which a human can read, and one which is more efficient. I guess it should be little-endian, since we all live in Intel’s world now, but otherwise, it doesn’t need to do anything but run everywhere.

IMO, this is a social problem, not a computer science problem. The actual problem was solved in the 80s with crap like XDR and S-expressions which provide fast binary and human readable/self describable representations of data. Everything else is just commentary on this, and it only gets written because it’s kind of easy for a guy with a bachelors degree in CS to write one, and more fun to dorks than solving real problems like fixing bugs. Ultimately this profusion creates more problems than creating a new one solves: you have to make the generator/parser work on multiple languages and platforms, and each implementation on each language/platform will be of varying quality.

I’m a huge proponent of XDR, because it’s the first one I used (along with RPC and rpcgen), because it is Unixy, and because most of the important pieces of the internet and unix ecosystem were based on it. A little endian superset of this with a JSON style human semi-readable form, and an optional self-description field, and you’ve solved all possible serialization problems which sane people are confronted with. People can then concentrate on writing correct super-XDR extensions to get all their weird corner cases covered, and I will not be grouchy any more.

It also bugs the hell out of me that people idiotically serialize data when they don’t have to (I’m looking at you, Spark jackanapes), but that’s another rant.

Oh yeah, I do like Messagepack; it’s pretty cool.

How to shoot down a stealth fighter

Posted in Design by Scott Locklin on January 20, 2017

Editorial note: I actually wrote most of this five years ago, but was reluctant to publish it for misguided patriotic reasons. Since people are starting to talk about it, I figure I might as well bring some more sense to the discussion.

I’ve already gone on record as being against the F-35. Now it’s time to wax nerdy as to why this is a dumb idea. I’m not against military spending. I’m against spending money on things which are dumb. Stealth fighters are dumb. Stealth bombers: still pretty dumb, but significantly less dumb.


I have already mentioned the fact that the thing is designed for too many roles. Aircraft should be designed for one main role, and, well, it’s fine to use them for something else if they work well for that. The recipe for success is the one which has historically produced good airplanes: the P38 Lightning, the Focke-Wulf Fw-190, the F-4, the F-16, the Su-27, and the A-10. All of these were designed with one mission in mind. They ended up being very good at lots of different things. Multi-objective design optimization though, is moronic, and gets us aircraft like the bureaucratic atrocity known as the F-111 Aardvark, whose very name doesn’t exactly evoke air combat awesomeness.

What is stealth? Stealth is a convergence of technologies which makes an aircraft electronically unobservable, primarily via Radar. The anti-radar technology is two-fold: the skin of the aircraft can be radar absorbent, but the main trick is to build the aircraft in a shape which scatters the radio energy away from the radar set which sent the signal.  What is a fighter? A fighter is an aircraft that shoots down other aircraft. Fighters use guns, infrared guided missiles and radar guided missiles. Most modern radar guided missiles work by pointing the missile more or less in the target direction, illuminating the target with radar (from the jet, or from the missile itself; generally from the missile itself these days), and launching. The wavelength of the missile and jet radar is dictated by the physical size of the missile or jet. The main purpose of radar-resistant technology for a stealth fighter is avoiding being detected in the first place by enemy radar, but also defeating radar guided air to air missiles.

Of course, what nobody will tell you: the air to air radar guided missiles haven’t historically been very effective. The US has some of the best ones; the AMRAAM. They’ve only shot down 9 aircraft in combat thus far using this weapon; it has a kill probability of around 50% depending on who you ask. Previous generations of such missiles (the AIM-4AIM-7 and Phoenix) were fairly abysmal. The AIM-4 was a complete failure. The AIM-7, also a turkey in its early versions with a 10% kill probability in the Vietnam War (later versions were better). The Phoenix never managed a combat success, despite several attempts, though it was somehow considered a program success anyway, mostly by paper pushing war nerds. By and large, the venerable IR guided sidewinder works best. Amusingly, the Air Force thought the beyond visual range radar guided air to air missile would make stuff like guns and dogfighting obsolete … back in the 1950s. They were so confident in this, most of the Vietnam era fighters didn’t come equipped with guns. They were completely wrong then. They’re almost certainly wrong now as well. Yet, that is the justification for fielding the gold plated turd known as the F-35; a fighter so bad, it can’t even out fight a 45 year old design.

Oh. Well, stealthy planes can defeat the IR missiles that end up being used most of the time, right? No, actually. The stealthy technology can’t really defeat such missiles, which can now home in on a target which is merely warmer than the ambient air. I could build such a sensor using about $40 worth of parts from Digikey. All aircraft are warmer than the ambient air, even “stealthy” ones. Friction is one of the fundamental laws of physics. So, if a stealth fighter is located at all, by eyesight, ground observers or low frequency radars or whatever: an IR missile is a big danger. Worse, the planes which the US is most worried about are Russian made, and virtually all of them come with excellent IR detectors built into the airframe itself.  Airplane nerds call this technology IRST, and the Russians are extremely good at it; they’ve had world beating versions of this technology since the 1980s. Even ancient and shitty Russian jets come with it built into the airframe. The US seems to have mostly stopped thinking about it since the F-14. A few of the most recent F-18s have it strapped as an expensive afterthought to fuel tanks (possibly going live by 2018), and the F-35 (snigger) claims to have something which shoots sharks with laser beam eyes at enemy missiles, but most of the combat ready inventory lacks such sensors.

There is no immunity to gunfire, of course, so if you see a Stealth fighter with your eyeballs, and are close enough to draw a 6, you can shoot it down.

Now, it’s worth thinking a bit about the fighter role. What good is an invisible fighter? There are a couple of issues with the concept, which has never actually been usefully deployed in combat anywhere in all of history. It is also rarely spoken of. If you want to shoot down other jets with your stealth fighter, you have to find them first. To find them, the best way to do it is using radar. Maybe you can do this with AWACS.  AWACS somewhat assume air superiority has already been established. They’re big lumbering things everyone can see, both because they have giant signatures to radar, and because they are emitting radar signals. Maybe you can turn on your stealth fighter’s radar briefly, and hope the enemy’s electronic warfare facilities can’t see it, or hope the passive radar sensors work. Either way, you had better hope it is a fairly big country, and it is dark outside, or someone could find your stealth fighter. People did a reasonable job of spotting planes with binoculars and telephones back in the day. Modern jets are a little more than twice as fast as WW-2 planes, but that’s still plenty of time to alert air defences. Invisibility to radar guided missiles is only of partial utility; if you’re spotted, and your aircraft isn’t otherwise superior in air combat (the F-22 is), you stand a decent chance of being shot down. So, for practical use as a fighter, stealthiness is only somewhat theoretically advantageous. It’s really the attack/bomber role where Stealthiness shines as a concept; mostly for taking out air defences on the ground.

The F-117 (which was a misnamed stealth attack aircraft, an actual use for the technology) was shot down in the Serbian war by a Hungarian baker  by the name of Zoltan Dani.  The way he  did it was as follows: first, he had working radars. He did this by only turning them on briefly, and moving them around a lot, to avoid wild-weasel bombing raids. He also used couriers and land line telephones instead of radio to communicate with the rest of his command structure; he basically had no radio signal which could have been observed by US attack aircraft. He also had “primitive” hand tuned low-frequency radars. Low frequency means long wavelength. Long wavelength means little energy is absorbed by the radar absorbent materials, and, more importantly, almost none of it is scattered away from the radar receiver. Since the wavelength of a low-frequency radar is comparable to the size of the aircraft itself, the fine detail which scatters away modern centimeter-wavelength radars doesn’t have much effect on meter-wavelength radar. Mr Dani shot his SA-3 missiles up, guided it in using a joystick, and that was the end of the F-117, a trophy part of which now hangs in the garage of a Hungarian baker in Serbia.


best hunting trophy ever

Similarly, if you want to shoot down stealth fighters, you need an integrated air defense system which uses long wavelength radars to track targets, and you dispatch interceptors to shoot them down with IR missiles, guided in by the air defense radar. Which is exactly how the Soviet Mig-21 system worked. It worked pretty well in Vietnam. It would probably work well against F-35’s, which are not as maneuverable as Mig-21’s in a dogfight. The old Mig-21 certainly costs less; I could probably put a Mig-21 point defense system on my credit cards. Well, not really, but it’s something achievable by a resourceful individual with a bit of hard work. A small country (I dunno; Syria for example) can afford thousands of these things. The US probably can’t even afford hundreds of F-35s.

Maybe the F-35 is going to be an OK replacement for the F-117? Well, sorta. First off, it is nowhere near as stealthy. Its supersonic abilities are inherently unstealthy: sonic boom isn’t stealthy, afterburners are not stealthy, and supersonic flight itself is pretty unstealthy. It does have an internal “bomb bay.” You can stuff one 2000lb JDAM in it (or a 1000lb one in the absurd VTOL F-35B). The F-117 had twice the capacity, because it was designed to be a stealth attack plane from the get go, and didn’t have to make any compromises to try to get it to do 10 other things. You could probably hang more bombs on an F-35’s ridiculously stubby little wings. But bombs hanging on a wing pylon make a plane non-stealthy. So do wing pylons. In clean, “stealthy” mode, the thing can only fly 584 miles to a target, making it, well, I guess something with short range and limited bomb carrying capability might be useful. The F-117 had twice the range. So, an F-35 is about a quarter as effective in the attack role as the F-117 was, without even factoring in the fact that it is only about a twice the radar cross section of an F-117. It kind of sucks how the F-35 costs a lot more than the F-117, which was designed for and demonstrably more useful for this mission. It’s also rather confusing to me as to why we need 2000 such things if they ain’t fighters with a significant edge against, say, a late model F-16 or Superhornet. But then, I’m not a retired Air Force General working at Lockheed. I’m just some taxpayer in my underpants looking on this atrocity in complete disbelief.

There are three things which are actually needed by Air Force procurement.  We have a replacement for the F-15 in air superiority role: the F-22. It works, and it is excellent; far more effective than the F-35, cheaper and stealthier to boot. We can’t afford many of them, and they have problems with suffocating their pilots, but we do have them in hand. If it were up to me, I’d keep the stealthy ones we got, make them attack planes, and build 500 more without the fancy stealth paint for air superiority and ground attack. It will be cheaper than the F-35, and more capable. Everyone will want to “update the computers.” Don’t.

The most urgent need is for a replacement for the F-16; a small, cheap fighter plane that can be used in the interceptor/air superiority role. The US needs it. So do the allies. It doesn’t need to be stealthy; stealth is more useful in the attack role. Building a better F-16 is doable: the Russian MIG-35, and Dassault Rafale all manage it (maybe the Eurofighter Typhoon also, though it isn’t cheap). I’m sure the US could do even better if they’d concentrate on building a fighter, rather than a do-everything monstrosity like the F-35. I’m sure you can strap bombs to a super F-16 and use it in the attack role as well, once your stealth attack planes have taken out the local SAMS and your air superiority planes have taken out the fighters. Making a fighter plane with a bomb-bay for stealth, though, is a loser idea. If I were king of the world: build a delta winged F-16. The prototype already exists, and there was nothing wrong with the idea. It’s pathetic and disgusting that the national manufacturers simply can’t design even a small and cheap replacement for the ancient T-38 supersonic trainers. All of the postulated ones under consideration are foreign designs. The best one is actually a Russian design; the Yak-130.

The second need is a replacement for the F-117 for stealthy attack on radar and infrastructure. F-35 doesn’t even match the F-117 in this role. The F-22 almost does, but it is expensive and largely wasted on this role. I thought the Bird of Prey was a pretty good idea; something like that would serve nicely. Maybe one of the stealthy drones will serve this purpose. Whatever it is, you could build lots of them for the price of a few dozen F-35s.

Finally, we urgently need a decent attack plane for close air support. The F-35, and F-35B will be a titanic failure in this role. They have neither the armor nor endurance required for this. You could shoot it down with a large caliber rifle shooting rounds that cost $0.50. This one is dirt simple: even the A-10 is too complicated. Just build a propeller driven thing. Build a turboprop A-1 Skyraider. The Tucano is too small to cover all the bases. Presumably someone can still build a F4U Corsair or F6F Hellcat and stick a turboprop in it, some titanium plates around the cockpit, and shove a 30mm cannon in the schnozz. People build such things in their backyards. It shouldn’t be beyond the magnificent engineering chops of the present day “Skunk Works” at Lockheed or one of the other major manufacturers. Using inflation on the A-1 or calculating such a device as approximately 1/4 of a C-130, you should be able to build one in the $5m range and have 30-50 of them for each F-35 they replace.

The entire concept of “Stealth Fighter” is mostly a fraud. Stealth bombers and tactical attack planes have a reasonable use case. Stealth fighters are ridiculous. The F-35 is a gold plated turd which should be flushed down the toilet.

The Parable of Zoltán Dani: Dragon Slayer






Predicting with confidence: the best machine learning idea you never heard of

Posted in machine learning by Scott Locklin on December 5, 2016

One of the disadvantages of machine learning as a discipline is the lack of reasonable confidence intervals on a given prediction. There are all kinds of reasons you might want such a thing, but I think machine learning and data science practitioners are so drunk with newfound powers, they forget where such a thing might be useful. If you’re really confident, for example, that someone will click on an ad, you probably want to serve one that pays a nice click through rate. If you have some kind of gambling engine, you want to bet more money on the predictions you are more confident of. Or if you’re diagnosing an illness in a patient, it would be awfully nice to be able to tell the patient how certain you are of the diagnosis and what the confidence in the prognosis is.

There are various ad hoc ways that people do this sort of thing.  The one you run into most often is some variation on cross validation, which produces an average confidence interval. I’ve always found this to be dissatisfying (as are PAC approaches). Some people fiddle with their learners and in hopes of making sure the prediction is normally distributed, then build confidence intervals from that (or for the classification version, Platt scaling using logistic regression).  There are a number of ad hoc ways of generating confidence intervals using resampling methods and generating a distribution of predictions. You’re kind of hosed though, if your prediction is in online mode.  Some people build learners that they hope will produce a sort of estimate of the conditional probability distribution of the forecast; aka quantile regression forests and friends. If you’re a Bayesian, or use a model with confidence intervals baked in, you may be in pretty good shape. But let’s face it, Bayesian techniques assume your prior is correct, and that new points are drawn from your prior. If your prior is wrong, so are your confidence intervals, and you have no way of knowing this.  Same story with heteroscedasticity. Wouldn’t it be nice to have some tool to tell you how uncertain your prediction when you’re not certain of your priors or your algorithm, for that matter?



Well, it turns out, humanity possesses such a tool, but you probably don’t know about it. I’ve known about this trick for a few years now, through my studies of online and compression based learning as a general subject. It is a good and useful bag of tricks, and it verifies many of the “seat of the pants” insights I’ve had in attempting to build ad-hoc confidence intervals in my own predictions for commercial projects.  I’ve been telling anyone who listens for years that this stuff is the future, and it seems like people are finally catching on. Ryan Tibshirani, who I assume is the son of the more famous Tibshirani, has published a neat R package on the topic along with colleagues at CMU. There is one other R package out there and one in python. There are several books published in the last two years. I’ll do my part in bringing this basket of ideas to a more general audience, presumably of practitioners, but academics not in the know should also pay attention.

The name of this basket of ideas is “conformal prediction.” The provenance of the ideas is quite interesting, and should induce people to pay attention. Vladimir Vovk is a former Kolmogorov student, who has had all kind of cool ideas over the years. Glenn Shafer is also well known for his co-development of Dempster-Shafer theory, which is a brewing alternative to standard measure-theoretic probability theory which is quite useful in sensor fusion, and I think some machine learning frameworks. Alexander Gammerman is a former physicist from Leningrad, who, like Shafer, has done quite a bit of work in the past with Bayesian belief networks. Just to reiterate who these guys are: Vovk and Shafer have also previously developed a probability theory based on game theory which has ended up being very influential in machine learning pertaining to sequence prediction. To invent one new form of probability theory is clever. Two is just showing off! The conformal prediction framework comes from deep results in probability theory and is inspired by Kolmogorov and Martin-Lof’s ideas on algorithmic complexity theory.


The advantages of conformal prediction are many fold. These ideas assume very little about the thing you are trying to forecast, the tool you’re using to forecast or how the world works, and they still produce a pretty good confidence interval. Even if you’re an unrepentant Bayesian, using some of the machinery of conformal prediction, you can tell when things have gone wrong with your prior. The learners work online, and with some modifications and considerations, with batch learning. One of the nice things about calculating confidence intervals as a part of your learning process is they can actually lower error rates or use in semi-supervised learning as well. Honestly, I think this is the best bag of tricks since boosting; everyone should know about and use these ideas.

The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA  when the standard parametric confidence intervals won’t work. Within the framework, the conformity function, whatever may be, when used correctly can be guaranteed to give confidence intervals to within a probabilistic tolerance. The original proofs and treatments of conformal prediction, defined for sequences, is extremely computationally inefficient. The conditions can be relaxed in many cases, and the conformity function is in principle arbitrary, though good ones will produce narrower confidence regions. Somewhat confusingly, these good conformity functions are referred to as “efficient” -though they may not be computationally efficient.


The original research and proofs were done on so-called “transductive conformal prediction.” I’ll sketch this out below.

Suppose you have a data set Z:= z_1,...,z_N  , with z_i:=(x_i,y_i) where x_i has the usual meaning of a feature vector, and y_i the variable to be predicted. If the N! different possible orderings are equally likely, the data set Z is exchangeable. For the purposes of this argument, most data sets are exchangeable or can be made so. Call the set of all bags of points from Z with replacement a “bag” B .

The conformal predictor \Gamma^{\epsilon}(Z,x) := \{y | y^{p} > \epsilon \} where Z is the training set and x is a test object and \epsilon \in (0,1) is a defined probability of confidence in a prediction. If we have a function A(B,z_i) which measures how different a point z_i is the bag set of B .

Example: If we have a forecast technique which works on exchangeable data, \phi(B) , then a very simple function is the distance between the new point and the forecast based on the bag set. A(B,z_i):=d(\phi(B), z_i)  .

Simplifying the notation a little bit, let’s call A_i := A(B^{-i},z_i)  where B^{-i} is the bag set, missing z_i  . Remembering that bag sets B are sets of all the orderings of Z we can see that our p^y can be defined from the nonconformity measures; p^{y} := \frac{\#\{i=1,...,n|A_i \geq A_n \} }{n}  This can be proved in a fairly straightforward way. You can find it in any of the books and most of the tutorials.

Practically speaking, this kind of transductive prediction is computationally prohibitive and not how most practitioners confront the world. Practical people use inductive prediction, where we use training examples and then see how they do in a test set. I won’t go through the general framework for this, at least this time around; go read the book or one of the tutorials listed below. For one it is worth, one of the forms of Inductive Conformal Prediction is called Mondrian Conformal Prediction; a framework which allows for different error rates for different categories, hence all the Mondrian paintings I decorated this blog post with.


For many forms of inductive CP, the main trick is you must subdivide your training set into two pieces. One piece you use to train your model, the proper training set. The other piece you use to calculate your confidence region, the calibration set. You compute the non-conformity scores on the calibration set, and use them on the predictions generated by the proper training set. There are
other blended approaches. Whenever you use sampling or bootstrapping in  your prediction algorithm, you have the chance to build a conformal predictor using the parts of the data not used in the prediction in the base learner. So, favorites like Random Forest and Gradient Boosting Machines have computationally potentially efficient conformity measures. There are also flavors using a CV type process, though the proofs seem more weak for these. There are also reasonably computationally efficient Inductive CP measures for KNN, SVM and decision trees. The inductive “split conformal predictor” has an R package associated with it defined for general regression problems, so it is worth going over in a little bit of detail.
For coverage at \epsilon confidence, using a prediction algorithm \phi and training data set Z_i,i=1,...,n , randomly split the index i=1,...,n into two subsets, which as above, we will call the proper training set and the calibration set I_1,I_2 .

Train the learner using data on the proper training set I_1

\phi_{trained}:=\phi(Z_i); i \in I_1 . Then, using the trained learner, find the residuals in the calibration set:

R_i := |Y_i - \phi(X_i)|, i \in I_2 
d := the k th smallest value in \{R_i :i \in I_2\} where
k=(n/2 + 1)(1-\epsilon)

The prediction interval for a new point x is \phi(x)-d,\phi(x)+d

This type of thing may seem unsatisfying, as technically the bounds on it only exist for one predicted point. But there are workarounds using leave one out in the ranking. The leave one out version is a little difficult to follow in a lightweight blog, so I’ll leave it up as an exercise for those who are interested to read more about it in the R documentation for the package.

Conformal prediction is about 10 years old now: still in its infancy.  While forecasting with confidence intervals is inherently useful, the applications and extensions of the idea are what really tantalizes me about the subject. New forms of feature selection, new forms of loss function which integrate the confidence region, new forms of optimization to deal with conformal loss functions, completely new and different machine learning algorithms, new ways of thinking about data and probabilistic prediction in general. Specific problems which CP has had success with; face recognition, nuclear fusion research, design optimization, anomaly detection, network traffic classification and forecasting, medical diagnosis and prognosis, computer security, chemical properties/activities prediction and computational geometry. It’s probably only been used on a few thousand different data sets. Imagine being at the very beginning of Bayesian data analysis where things like the expectation maximization algorithm are just being invented, or neural nets before backpropagation: I think this is where the CP basket of ideas is at.  It’s an exciting field at an exciting time, and while it is quite useful now, all kinds of great new results will come of it.

There is a website and a book. Other papers and books can be found in the usual way. This paper goes with the R package mentioned above, and is particularly clearly written for the split and leave one out conformal prediction flavors. Here is a presentation with some open problems and research directions if you want to get to work on something interesting. Only 19 packages on github so far.

Get your Conformal Predictions here.

Get your Conformal Predictions here.