Editorial note: I actually wrote most of this five years ago, but was reluctant to publish it for misguided patriotic reasons. Since people are starting to talk about it, I figure I might as well bring some more sense to the discussion.
I’ve already gone on record as being against the F-35. Now it’s time to wax nerdy as to why this is a dumb idea. I’m not against military spending. I’m against spending money on things which are dumb. Stealth fighters are dumb. Stealth bombers: still pretty dumb, but significantly less dumb.
I have already mentioned the fact that the thing is designed for too many roles. Aircraft should be designed for one main role, and, well, it’s fine to use them for something else if they work well for that. The recipe for success is the one which has historically produced good airplanes: the P38 Lightning, the Focke-Wulf Fw-190, the F-4, the F-16, the Su-27, and the A-10. All of these were designed with one mission in mind. They ended up being very good at lots of different things. Multi-objective design optimization though, is moronic, and gets us aircraft like the bureaucratic atrocity known as the F-111 Aardvark, whose very name doesn’t exactly evoke air combat awesomeness.
What is stealth? Stealth is a convergence of technologies which makes an aircraft electronically unobservable, primarily via Radar. The anti-radar technology is two-fold: the skin of the aircraft can be radar absorbent, but the main trick is to build the aircraft in a shape which scatters the radio energy away from the radar set which sent the signal. What is a fighter? A fighter is an aircraft that shoots down other aircraft. Fighters use guns, infrared guided missiles and radar guided missiles. Most modern radar guided missiles work by pointing the missile more or less in the target direction, illuminating the target with radar (from the jet, or from the missile itself; generally from the missile itself these days), and launching. The wavelength of the missile and jet radar is dictated by the physical size of the missile or jet. The main purpose of radar-resistant technology for a stealth fighter is avoiding being detected in the first place by enemy radar, but also defeating radar guided air to air missiles.
Of course, what nobody will tell you: the air to air radar guided missiles haven’t historically been very effective. The US has some of the best ones; the AMRAAM. They’ve only shot down 9 aircraft in combat thus far using this weapon; it has a kill probability of around 50% depending on who you ask. Previous generations of such missiles (the AIM-4, AIM-7 and Phoenix) were fairly abysmal. The AIM-4 was a complete failure. The AIM-7, also a turkey in its early versions with a 10% kill probability in the Vietnam War (later versions were better). The Phoenix never managed a combat success, despite several attempts, though it was somehow considered a program success anyway, mostly by paper pushing war nerds. By and large, the venerable IR guided sidewinder works best. Amusingly, the Air Force thought the beyond visual range radar guided air to air missile would make stuff like guns and dogfighting obsolete … back in the 1950s. They were so confident in this, most of the Vietnam era fighters didn’t come equipped with guns. They were completely wrong then. They’re almost certainly wrong now as well. Yet, that is the justification for fielding the gold plated turd known as the F-35; a fighter so bad, it can’t even out fight a 45 year old design.
Oh. Well, stealthy planes can defeat the IR missiles that end up being used most of the time, right? No, actually. The stealthy technology can’t really defeat such missiles, which can now home in on a target which is merely warmer than the ambient air. I could build such a sensor using about $40 worth of parts from Digikey. All aircraft are warmer than the ambient air, even “stealthy” ones. Friction is one of the fundamental laws of physics. So, if a stealth fighter is located at all, by eyesight, ground observers or low frequency radars or whatever: an IR missile is a big danger. Worse, the planes which the US is most worried about are Russian made, and virtually all of them come with excellent IR detectors built into the airframe itself. Airplane nerds call this technology IRST, and the Russians are extremely good at it; they’ve had world beating versions of this technology since the 1980s. Even ancient and shitty Russian jets come with it built into the airframe. The US seems to have mostly stopped thinking about it since the F-14. A few of the most recent F-18s have it strapped as an expensive afterthought to fuel tanks (possibly going live by 2018), and the F-35 (snigger) claims to have something which shoots sharks with laser beam eyes at enemy missiles, but most of the combat ready inventory lacks such sensors.
There is no immunity to gunfire, of course, so if you see a Stealth fighter with your eyeballs, and are close enough to draw a 6, you can shoot it down.
Now, it’s worth thinking a bit about the fighter role. What good is an invisible fighter? There are a couple of issues with the concept, which has never actually been usefully deployed in combat anywhere in all of history. It is also rarely spoken of. If you want to shoot down other jets with your stealth fighter, you have to find them first. To find them, the best way to do it is using radar. Maybe you can do this with AWACS. AWACS somewhat assume air superiority has already been established. They’re big lumbering things everyone can see, both because they have giant signatures to radar, and because they are emitting radar signals. Maybe you can turn on your stealth fighter’s radar briefly, and hope the enemy’s electronic warfare facilities can’t see it, or hope the passive radar sensors work. Either way, you had better hope it is a fairly big country, and it is dark outside, or someone could find your stealth fighter. People did a reasonable job of spotting planes with binoculars and telephones back in the day. Modern jets are a little more than twice as fast as WW-2 planes, but that’s still plenty of time to alert air defences. Invisibility to radar guided missiles is only of partial utility; if you’re spotted, and your aircraft isn’t otherwise superior in air combat (the F-22 is), you stand a decent chance of being shot down. So, for practical use as a fighter, stealthiness is only somewhat theoretically advantageous. It’s really the attack/bomber role where Stealthiness shines as a concept; mostly for taking out air defences on the ground.
The F-117 (which was a misnamed stealth attack aircraft, an actual use for the technology) was shot down in the Serbian war by a Hungarian baker by the name of Zoltan Dani. The way he did it was as follows: first, he had working radars. He did this by only turning them on briefly, and moving them around a lot, to avoid wild-weasel bombing raids. He also used couriers and land line telephones instead of radio to communicate with the rest of his command structure; he basically had no radio signal which could have been observed by US attack aircraft. He also had “primitive” hand tuned low-frequency radars. Low frequency means long wavelength. Long wavelength means little energy is absorbed by the radar absorbent materials, and, more importantly, almost none of it is scattered away from the radar receiver. Since the wavelength of a low-frequency radar is comparable to the size of the aircraft itself, the fine detail which scatters away modern centimeter-wavelength radars doesn’t have much effect on meter-wavelength radar. Mr Dani shot his SA-3 missiles up, guided it in using a joystick, and that was the end of the F-117, a trophy part of which now hangs in the garage of a Hungarian baker in Serbia.
Similarly, if you want to shoot down stealth fighters, you need an integrated air defense system which uses long wavelength radars to track targets, and you dispatch interceptors to shoot them down with IR missiles, guided in by the air defense radar. Which is exactly how the Soviet Mig-21 system worked. It worked pretty well in Vietnam. It would probably work well against F-35’s, which are not as maneuverable as Mig-21’s in a dogfight. The old Mig-21 certainly costs less; I could probably put a Mig-21 point defense system on my credit cards. Well, not really, but it’s something achievable by a resourceful individual with a bit of hard work. A small country (I dunno; Syria for example) can afford thousands of these things. The US probably can’t even afford hundreds of F-35s.
Maybe the F-35 is going to be an OK replacement for the F-117? Well, sorta. First off, it is nowhere near as stealthy. Its supersonic abilities are inherently unstealthy: sonic boom isn’t stealthy, afterburners are not stealthy, and supersonic flight itself is pretty unstealthy. It does have an internal “bomb bay.” You can stuff one 2000lb JDAM in it (or a 1000lb one in the absurd VTOL F-35B). The F-117 had twice the capacity, because it was designed to be a stealth attack plane from the get go, and didn’t have to make any compromises to try to get it to do 10 other things. You could probably hang more bombs on an F-35’s ridiculously stubby little wings. But bombs hanging on a wing pylon make a plane non-stealthy. So do wing pylons. In clean, “stealthy” mode, the thing can only fly 584 miles to a target, making it, well, I guess something with short range and limited bomb carrying capability might be useful. The F-117 had twice the range. So, an F-35 is about a quarter as effective in the attack role as the F-117 was, without even factoring in the fact that it is only about a twice the radar cross section of an F-117. It kind of sucks how the F-35 costs a lot more than the F-117, which was designed for and demonstrably more useful for this mission. It’s also rather confusing to me as to why we need 2000 such things if they ain’t fighters with a significant edge against, say, a late model F-16 or Superhornet. But then, I’m not a retired Air Force General working at Lockheed. I’m just some taxpayer in my underpants looking on this atrocity in complete disbelief.
There are three things which are actually needed by Air Force procurement. We have a replacement for the F-15 in air superiority role: the F-22. It works, and it is excellent; far more effective than the F-35, cheaper and stealthier to boot. We can’t afford many of them, and they have problems with suffocating their pilots, but we do have them in hand. If it were up to me, I’d keep the stealthy ones we got, make them attack planes, and build 500 more without the fancy stealth paint for air superiority and ground attack. It will be cheaper than the F-35, and more capable. Everyone will want to “update the computers.” Don’t.
The most urgent need is for a replacement for the F-16; a small, cheap fighter plane that can be used in the interceptor/air superiority role. The US needs it. So do the allies. It doesn’t need to be stealthy; stealth is more useful in the attack role. Building a better F-16 is doable: the Russian MIG-35, and Dassault Rafale all manage it (maybe the Eurofighter Typhoon also, though it isn’t cheap). I’m sure the US could do even better if they’d concentrate on building a fighter, rather than a do-everything monstrosity like the F-35. I’m sure you can strap bombs to a super F-16 and use it in the attack role as well, once your stealth attack planes have taken out the local SAMS and your air superiority planes have taken out the fighters. Making a fighter plane with a bomb-bay for stealth, though, is a loser idea. If I were king of the world: build a delta winged F-16. The prototype already exists, and there was nothing wrong with the idea. It’s pathetic and disgusting that the national manufacturers simply can’t design even a small and cheap replacement for the ancient T-38 supersonic trainers. All of the postulated ones under consideration are foreign designs. The best one is actually a Russian design; the Yak-130.
The second need is a replacement for the F-117 for stealthy attack on radar and infrastructure. F-35 doesn’t even match the F-117 in this role. The F-22 almost does, but it is expensive and largely wasted on this role. I thought the Bird of Prey was a pretty good idea; something like that would serve nicely. Maybe one of the stealthy drones will serve this purpose. Whatever it is, you could build lots of them for the price of a few dozen F-35s.
Finally, we urgently need a decent attack plane for close air support. The F-35, and F-35B will be a titanic failure in this role. They have neither the armor nor endurance required for this. You could shoot it down with a large caliber rifle shooting rounds that cost $0.50. This one is dirt simple: even the A-10 is too complicated. Just build a propeller driven thing. Build a turboprop A-1 Skyraider. The Tucano is too small to cover all the bases. Presumably someone can still build a F4U Corsair or F6F Hellcat and stick a turboprop in it, some titanium plates around the cockpit, and shove a 30mm cannon in the schnozz. People build such things in their backyards. It shouldn’t be beyond the magnificent engineering chops of the present day “Skunk Works” at Lockheed or one of the other major manufacturers. Using inflation on the A-1 or calculating such a device as approximately 1/4 of a C-130, you should be able to build one in the $5m range and have 30-50 of them for each F-35 they replace.
The entire concept of “Stealth Fighter” is mostly a fraud. Stealth bombers and tactical attack planes have a reasonable use case. Stealth fighters are ridiculous. The F-35 is a gold plated turd which should be flushed down the toilet.
One of the disadvantages of machine learning as a discipline is the lack of reasonable confidence intervals on a given prediction. There are all kinds of reasons you might want such a thing, but I think machine learning and data science practitioners are so drunk with newfound powers, they forget where such a thing might be useful. If you’re really confident, for example, that someone will click on an ad, you probably want to serve one that pays a nice click through rate. If you have some kind of gambling engine, you want to bet more money on the predictions you are more confident of. Or if you’re diagnosing an illness in a patient, it would be awfully nice to be able to tell the patient how certain you are of the diagnosis and what the confidence in the prognosis is.
There are various ad hoc ways that people do this sort of thing. The one you run into most often is some variation on cross validation, which produces an average confidence interval. I’ve always found this to be dissatisfying (as are PAC approaches). Some people fiddle with their learners and in hopes of making sure the prediction is normally distributed, then build confidence intervals from that (or for the classification version, Platt scaling using logistic regression). There are a number of ad hoc ways of generating confidence intervals using resampling methods and generating a distribution of predictions. You’re kind of hosed though, if your prediction is in online mode. Some people build learners that they hope will produce a sort of estimate of the conditional probability distribution of the forecast; aka quantile regression forests and friends. If you’re a Bayesian, or use a model with confidence intervals baked in, you may be in pretty good shape. But let’s face it, Bayesian techniques assume your prior is correct, and that new points are drawn from your prior. If your prior is wrong, so are your confidence intervals, and you have no way of knowing this. Same story with heteroscedasticity. Wouldn’t it be nice to have some tool to tell you how uncertain your prediction when you’re not certain of your priors or your algorithm, for that matter?
Well, it turns out, humanity possesses such a tool, but you probably don’t know about it. I’ve known about this trick for a few years now, through my studies of online and compression based learning as a general subject. It is a good and useful bag of tricks, and it verifies many of the “seat of the pants” insights I’ve had in attempting to build ad-hoc confidence intervals in my own predictions for commercial projects. I’ve been telling anyone who listens for years that this stuff is the future, and it seems like people are finally catching on. Ryan Tibshirani, who I assume is the son of the more famous Tibshirani, has published a neat R package on the topic along with colleagues at CMU. There is one other R package out there and one in python. There are several books published in the last two years. I’ll do my part in bringing this basket of ideas to a more general audience, presumably of practitioners, but academics not in the know should also pay attention.
The name of this basket of ideas is “conformal prediction.” The provenance of the ideas is quite interesting, and should induce people to pay attention. Vladimir Vovk is a former Kolmogorov student, who has had all kind of cool ideas over the years. Glenn Shafer is also well known for his co-development of Dempster-Shafer theory, which is a brewing alternative to standard measure-theoretic probability theory which is quite useful in sensor fusion, and I think some machine learning frameworks. Alexander Gammerman is a former physicist from Leningrad, who, like Shafer, has done quite a bit of work in the past with Bayesian belief networks. Just to reiterate who these guys are: Vovk and Shafer have also previously developed a probability theory based on game theory which has ended up being very influential in machine learning pertaining to sequence prediction. To invent one new form of probability theory is clever. Two is just showing off! The conformal prediction framework comes from deep results in probability theory and is inspired by Kolmogorov and Martin-Lof’s ideas on algorithmic complexity theory.
The advantages of conformal prediction are many fold. These ideas assume very little about the thing you are trying to forecast, the tool you’re using to forecast or how the world works, and they still produce a pretty good confidence interval. Even if you’re an unrepentant Bayesian, using some of the machinery of conformal prediction, you can tell when things have gone wrong with your prior. The learners work online, and with some modifications and considerations, with batch learning. One of the nice things about calculating confidence intervals as a part of your learning process is they can actually lower error rates or use in semi-supervised learning as well. Honestly, I think this is the best bag of tricks since boosting; everyone should know about and use these ideas.
The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA when the standard parametric confidence intervals won’t work. Within the framework, the conformity function, whatever may be, when used correctly can be guaranteed to give confidence intervals to within a probabilistic tolerance. The original proofs and treatments of conformal prediction, defined for sequences, is extremely computationally inefficient. The conditions can be relaxed in many cases, and the conformity function is in principle arbitrary, though good ones will produce narrower confidence regions. Somewhat confusingly, these good conformity functions are referred to as “efficient” -though they may not be computationally efficient.
The original research and proofs were done on so-called “transductive conformal prediction.” I’ll sketch this out below.
Suppose you have a data set , with where has the usual meaning of a feature vector, and the variable to be predicted. If the different possible orderings are equally likely, the data set is exchangeable. For the purposes of this argument, most data sets are exchangeable or can be made so. Call the set of all bags of points from with replacement a “bag” .
The conformal predictor where is the training set and is a test object and is a defined probability of confidence in a prediction. If we have a function which measures how different a point is the bag set of .
Example: If we have a forecast technique which works on exchangeable data, , then a very simple function is the distance between the new point and the forecast based on the bag set. .
Simplifying the notation a little bit, let’s call where is the bag set, missing . Remembering that bag sets are sets of all the orderings of we can see that our can be defined from the nonconformity measures; This can be proved in a fairly straightforward way. You can find it in any of the books and most of the tutorials.
Practically speaking, this kind of transductive prediction is computationally prohibitive and not how most practitioners confront the world. Practical people use inductive prediction, where we use training examples and then see how they do in a test set. I won’t go through the general framework for this, at least this time around; go read the book or one of the tutorials listed below. For one it is worth, one of the forms of Inductive Conformal Prediction is called Mondrian Conformal Prediction; a framework which allows for different error rates for different categories, hence all the Mondrian paintings I decorated this blog post with.
For many forms of inductive CP, the main trick is you must subdivide your training set into two pieces. One piece you use to train your model, the proper training set. The other piece you use to calculate your confidence region, the calibration set. You compute the non-conformity scores on the calibration set, and use them on the predictions generated by the proper training set. There are
other blended approaches. Whenever you use sampling or bootstrapping in your prediction algorithm, you have the chance to build a conformal predictor using the parts of the data not used in the prediction in the base learner. So, favorites like Random Forest and Gradient Boosting Machines have computationally potentially efficient conformity measures. There are also flavors using a CV type process, though the proofs seem more weak for these. There are also reasonably computationally efficient Inductive CP measures for KNN, SVM and decision trees. The inductive “split conformal predictor” has an R package associated with it defined for general regression problems, so it is worth going over in a little bit of detail.
For coverage at confidence, using a prediction algorithm and training data set , randomly split the index into two subsets, which as above, we will call the proper training set and the calibration set .
Train the learner using data on the proper training set
. Then, using the trained learner, find the residuals in the calibration set:
the th smallest value in where
The prediction interval for a new point is
This type of thing may seem unsatisfying, as technically the bounds on it only exist for one predicted point. But there are workarounds using leave one out in the ranking. The leave one out version is a little difficult to follow in a lightweight blog, so I’ll leave it up as an exercise for those who are interested to read more about it in the R documentation for the package.
Conformal prediction is about 10 years old now: still in its infancy. While forecasting with confidence intervals is inherently useful, the applications and extensions of the idea are what really tantalizes me about the subject. New forms of feature selection, new forms of loss function which integrate the confidence region, new forms of optimization to deal with conformal loss functions, completely new and different machine learning algorithms, new ways of thinking about data and probabilistic prediction in general. Specific problems which CP has had success with; face recognition, nuclear fusion research, design optimization, anomaly detection, network traffic classification and forecasting, medical diagnosis and prognosis, computer security, chemical properties/activities prediction and computational geometry. It’s probably only been used on a few thousand different data sets. Imagine being at the very beginning of Bayesian data analysis where things like the expectation maximization algorithm are just being invented, or neural nets before backpropagation: I think this is where the CP basket of ideas is at. It’s an exciting field at an exciting time, and while it is quite useful now, all kinds of great new results will come of it.
There is a website and a book. Other papers and books can be found in the usual way. This paper goes with the R package mentioned above, and is particularly clearly written for the split and leave one out conformal prediction flavors. Here is a presentation with some open problems and research directions if you want to get to work on something interesting. Only 19 packages on github so far.
I ran across this gizmo from looking at Yann LeCun’s google plus profile, and wondering at the preposterous gadget sitting next to him at the top. Figuring out what it was, I realized the genius of the metaphor. Yann builds things (convolutional networks and deep learning in general) which might very well be the Leduc ramjets of machine learning or even “AI” if we are lucky. Unmistakably Phrench, as in both French and physics-based, original in conception, and the rough outlines of something that might become common in the future, even if the engineering of the insides eventually changes.
Rene Leduc was working on practical ramjet engines back in the 1930s. His research was interrupted by the war, but he was able to test his first ramjet in flight in 1946. The ramjet seems like a crazy idea for a military aircraft; ramjets don’t work until the plane is moving. A ramjet is essentially a tube you squirt fuel into which you light on fire. The fire won’t propel the engine forward unless there is already a great deal of air passing through. It isn’t that crazy if you can give it a good kick to get it into motion. If we stop to think about how practical supersonic aircraft worked from the 1950s on; they used afterburners. Afterburners to some approximation, operate much like inefficient ramjets; you squirt some extra fuel in the afterburning component of the engine and get a nice increase in thrust. Leduc wasn’t the only ramjet guy in town; the idea was in the proverbial air, if not the literal air. Alexander Lippisch (a German designer responsible for the rocket powered Komet, the F-106 and the B-58 Hustler) had actually sketched a design for a supersonic coal burning interceptor during WW-2, and his engine designer was eventually responsible for a supersonic ramjet built by another French company. The US also attempted a ramjet powered nuclear cruise missile, the SM-64 Navaho, which looks about as bizarre as the Leduc ramjets.
In fact, early nautical anti-aircraft missiles such as the Rim-8 Talos used ramjets for later stages as well. The bleeding edge Russian air to air missile, the R-77, also uses ramjets as does a whole host of extremely effective Russian antiship missiles. Ramjets can do better than rockets for long range missilery as they are comparably simple, and hydrocarbon ramjets can have longer range than rockets. Sticking a man in a ramjet powered airframe isn’t that crazy an idea. It works for missiles.
The Leduc ramjets didn’t quite work as a practical military technology, in part due to aerodynamic problems, in part because they needed turbojets to get off the ground anyway, but they were important in developing further French fighter planes. They were promising at the time and jam packed with innovative ideas; the first generation of them was much faster in both climb and final speed than contemporary turbojets.
Ultimately, their technology was a dead end, but what fascinates about them is how different, yet familiar they were. They look like modern aircraft from an alternate steampunk future. Consider a small detail of the airframe, such as the nose. The idea was a canopy bubble would cause aerodynamic drag. Since ramjets operate best without any internal turbulence, the various scoops and side inlets you see in modern jets were non starters. So they put the poor pilot in a little tin can in the front of the thing. The result was, the earliest Leduc ramjet (the 0.10) looked like a Flash Gordon spaceship. The pilot was buried somewhere in the intake and had only tiny little portholes for visibility.
Later models incorporated more visibility by making a large plexiglass tube for the pilot to sit in. Get a load of the look of epic Gaulic bemusement on the pilot’s “avoir du cran” mug:
The later model shown above, the Leduc 0.22, actually had a turbojet which got it into the air. It was supposed to hit Mach-2, but never did. In part because the airframe didn’t take into account the “area rule” effect which made supersonic flight practical in early aircraft. But also in part because the French government withdrew funding from the project in favor of the legendary Dassault Mirage III; an aircraft so good it is still in service today.
The Leduc designs are tantalizing in that they were almost there. They produced 15,000 lbs of thrust, which was plenty for supersonic flight. A later ramjet fighter design, the Nord Griffon actually achieved supersonic flight, more or less by using a more conventional looking airframe. Alas, turbojets were ultimately less complex (and less interesting looking) so they ended up ruling the day.
As I keep saying, early technological development and innovative technology often looks very interesting indeed. In the early days people take big risks, and don’t really know what works right. If you look at a radio from the 1920s; they are completely fascinating with doodads sticking out all over the place. Radios in the 50s and 60s when it was down to a science were boring looking (and radios today are invisible). Innovative technologies look a certain way. They look surprising to the eye, because they’re actually something new. They look like science fiction because, when you make something new, you’re basically taking science fiction and turning it into technology.
This dope got lucky in 2012, essentially using “take the mean” and was hailed as a prophet. He was wrong about virtually everything, and if someone were to make a table of his predictions over time and calculate Brier scores, I’m pretty sure he’ll get a higher score than Magic-8 ball (Brier scores, lower is better). Prediction is difficult, as the sage said, especially regarding the future. Claiming you can do prediction when you can’t is irresponsible and can actually be dangerous.
While he richly deserves to find his proper station in life as an opinionated taxi driver, this clown is unfortunately likely to be with us for years to come, bringing shame on the profession of quantitative analysis of data. We’ll be watching, Nate.