This post is inspired by the “metacademy” suggestions for “leveling up your machine learning.” They make some halfway decent suggestions for beginners. The problem is, these suggestions won’t give you a view of machine learning as a field; they’ll only teach you about the subjects of interest to authors of machine learning books, which is different. The level-3 and level-4 suggestions they make are not super useful either: they just reflect the tastes of the author.
The machine learning literature is vast, techniques are bewilderingly diverse, multidisciplinary and seemingly unrelated. It is extremely difficult to know what is important and useful. While “metacademy” has the horse sense to suggest reading some books, the problem is, there is no book which can even give you a survey of what is available, or make you aware of things which might be helpful. The best guide for the perplexed, in my not at all humble opinion, is Peter Flach’s introductory text, “Machine Learning: the Art and Science of Algorithms that Make Sense of Data” which at least mentions some of the more obscure techniques, and makes pointers to other resources. Most books are just a collection of the popular techniques. They all mention regression models, logistic regression, neural nets, trees, ensemble methods, graphical models and SVM type things. Most of the time, they don’t even bother telling you what each technique is actually good for, and when you should choose one over the other for an approach (Flach does; that’s one of many reasons you should read his book). Sometimes I am definitely just whining that people don’t pay enough attention to the things I find interesting, or that I don’t have a good book or review article on the topic. Sleep deprivation will do that to a man. Sometimes I am probably putting together things that have no clearly unifying feature, perhaps because they’re “not done yet.” I figure that’s OK, subjects such as “deep learning” are also a bunch of ideas that have no real unifying theme and aren’t done yet; this doesn’t stop people from writing good treatments of the subject. Perhaps my list is a “send me review articles and book suggestions” cry for help, but perhaps it is useful to others as an overview of neat things.
Stuff I think is egregiously neglected in books, and in academia in unranked semi-clustered listing below:
Online learning: not the “Khan academy” kind, the “exposing your learners to data, one piece at a time, the way the human brain works” kind. This is hugely important for “big data” and timeseries, but there are precious few ML texts which go beyond mentioning the existence of online learning in passing. Almost all textbooks concentrate on batch learning. Realistically, when you’re dealing with timeseries or very large data sets, you’re probably doing things online in some sense. If you’re not thinking about how you’re exposing your learners to sequentially generated data, you’re probably leaving information on the table, or overfitting to irrelevant data. I can think of zero books which are actually helpful here. Cesa-Bianchi and Lugosi wrote a very interesting book on some recent proofs for online learners and “universal prediction” which strike me as being of extreme importance, though this is a presentation of new ideas, rather than an exposition of established ones. Vowpal Wabbit is a useful and interesting piece of software with OK documentation, but there should be a book which takes you from online versions of linear regression (they exist! I can show you one!) to something like Vowpal Wabbit. Such a book does not exist. Hell, I am at a loss to think of a decent review article, and the subject is unfortunately un-googleable, thanks to the hype over the BFD of “watching lectures and taking tests over the freaking internets.” Please correct me if I am wrong: I’d love to have a good review article on the subject for my own purposes.
Reinforcement learning: a form of online learning which has become a field unto its own. One of the great triumphs of machine learning is teaching computers to win at Backgammon. This was done via a form of reinforcement learning known as TD-learning. Reinforcement learning is a large field, as it has been used with great success in control systems theory and robotics. The problem is, the guys who do reinforcement learning are generally in control systems theory and robotics, making the literature impenetrable to machine learning researchers and engineers. Something oriented towards non robotics problems would be nice (Sutton and Barto doesn’t suffice here; Norvig’s chapter is the best general treatment I have thus far seen). There are papers on applications of the idea to ideas which do not involve robots, but none which unify the ideas into something comprehensible and utile to a ML engineer.
“Compression” sequence prediction techniques: this is another form of online learning, though it can also be done in batch mode. We’re all familiar with this; when google tries to guess what you’re going to search for, it is using a primitive form of this called the Trie. Such ideas are related to standard compression techniques like LZW, and have deep roots in information theory and signal processing. Really, Claude Shannon wrote the first iterations of this idea. I can’t give you a good reference for this subject in general, though Ron Begleiter and friends wrote a very good paper on some classical compression learning implementations and their uses. I wrote an R wrapper for their Java lib if you want to fool around with their tool. Boris Ryabko and son have also written numerous interesting papers on the subject. Complearn is a presumably useful library which encapsulates some of these ideas, and is available everywhere Linux is sold. Some day I’ll expound on these ideas in more detail.
Time series oriented techniques in general: a large fraction of industry applications have a time component. Even in marketing problems dealing with survival techniques, there is a time component, and you should know about it.In situations where there are non-linear relationships in the time series, classical regression and time-series techniques will fail. In situations where you must discover the underlying non-linear model yourself, well, you’re in deep shit if you don’t know some time-series oriented machine learning techniques. There was much work done in the 80s and 90s on tools like recurrent ANNs and feedforward ANNs for starters, and there has been much work in this line since then. There are plenty of other useful tools and techniques. Once in a while someone will mention dynamic time warping in a book, but nobody seems real happy about this technique. Many books mention Hidden Markov Models, which are important, but they’re only useful when the data is at least semi-Markov, and you have some idea of how to characterize it as a sequence of well defined states. Even in this case, I daresay not even the natural language recognition textbooks are real helpful (though Rabiner and Juang is OK, it’s also over 20 years old). Similarly, there are no review papers treating this as a general problem. I guess we TS guys are too busy racking in the lindens to write one.
Conformal prediction: I will be surprised if anyone reading this has even heard of conformal prediction. There are no wikipedia entries. There is a website and a book. The concept is simple: it would be nice to well motivated put error bars on a machine learning prediction. If you read the basic books, stuff like k-fold cross validation and the jackknife trick are the entire story. OK, WTF do I do when my training is online? What do I do in the presence of different kinds of noise? Conformal prediction is a step towards this, and hopefully a theory of machine learning confidence intervals in general. It seems to mostly be the work of a small group of researchers who were influenced by Kolomogorov, but others are catching on. I’m interested. Not interested enough to write one, as of yet, but I’d sure like to play with one.
ML in the presence of lots of noise: The closest thing to a book on it is the bizarro (and awesomely cool) “Pattern Theory: The Stochastic Analysis of Real World Signals” by Mumford and Desolneux, or perhaps something in the corpus of speech recognition and image processing books. This isn’t exactly a cookbook or exposition, mind you: more of a thematic manifesto with a few applications. Obviously, signal processing has something to say about the subject, but what about learners which are designed to function usefully when we know that most of the data is noise? Fields such as natural language processing and image processing are effectively ML in the presence of lots of noise and confounding signal, but the solutions you will find in their textbooks are specifically oriented to the problems at hand. Once in a while something like vector quantization will be reused across fields, but it would be nice if we had an “elements of statistical learning in the presence of lots of noise” type book or review paper. Missing in action, and other than the specific subfields mentioned above, there are no research groups which study the problem as an engineering subject. New stuff is happening all the time; part of the success of “Deep Learning” is attributable to the Drop Out technique to prevent overfitting. Random forests could be seen as a technique which at genuflects at “ML in the presence of noise” without worrying about it too much. Marketing guys are definitely thinking about this. I know for a fact that there are very powerful learners for picking signal out of shitloads of noise: I’ve written some. It would have been a lot easier if somebody wrote a review paper on the topic. The available knowledge can certainly be systematized and popularized better than it has been.
Feature engineering: feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success. Sometimes the features are obvious; sometimes not. Much of the success of machine learning is actually success in engineering features that a learner can understand. I daresay document classification would be awfully difficult without td-idf representation of document features.
Latent Dirichlet allocation is a form of “graphical model” which works wonders on such data, but it wouldn’t do a thing without td-idf. [correction to this statement from Brendan below] Similarly, image processing has a bewildering variety of feature extraction algorithms which are of towering importance for that field; the SIFT descriptor, the GIST and HOG descriptors, the Hough transform, vector quantization, tangent distance [pdf link]. The Winner Take All hash [pdf link] is an extremely simple and related idea… it makes a man wonder if such ideas could be used in higher (or lower) dimensions. Most of these engineered features are histograms in some sense, but just saying “use a histogram” isn’t helpful. A review article or a book chapter on this sort of thing, thinking through the relationships of these ideas, and helping the practitioner to engineer new kinds of feature for broad problems would be great. Until then, it falls to the practitioner to figure all this crap out all by their lonesome.
Unsupervised and semi-supervised learning in general: almost all books, and even tools like R inherently assume that you are doing supervised learning, or else you’re doing something real simple, like hierarchical clustering, kmeans or PCA. In the presence of a good set of features, or an interesting set of data, unsupervised techniques can be very helpful. Such techniques may be crucial. They may even help you to engineer new features, or at least reduce the dimensionality of your data. Many interesting data sets are only possible to analyze using semi-supervised techniques; recommendation engines being an obvious beneficiary of such tricks. “Deep learning” is also connected with unsupervised and semi-supervised approaches. I am pretty sure the genomics community does a lot of work with this sort of thing for dimensionality reduction. Supposedly Symbolic Regression (generalized additive models picked using genetic algorithms) is pretty cool too, and it’s in my org-emacs TODO lists to look at this more. Lots of good unsupervised techniques such as Kohonen Self Organizing Maps have fallen by the wayside. They’re still useful: I use them. I’d love a book or review article which concentrates on the topic, or just provides a bestiary of things which are broadly unsupervised. I suppose Oliver Chapelle’s book is an OK start for semi-supervised ideas, but again, not real unified or complete.
Personal background: I’ve flown Malaysian Airlines and declare it better and more civilized than any US airline. I’ve been to Ukraine on a business-vacation. I’m sympathetic to the aspirations of the long suffering Ukrainian people. I’m also sympathetic to the position of the Russian government with respect to Ukraine, which is, after all, sort of like their version of Canada, if Canada had annexed part of New England in 1991. I am not sympathetic to the claque of sinister war mongers and imperial Gauleiters in the US State department with respect to their activities in Ukraine and towards Russia. If I had my way, creeps like Vicky “fuck the EU” Nuland and Geoff Pyatt would be facing prison and the firing squad for what they’ve done over there. In my opinion, US policy towards Russia since the fall of the Soviet Union has been knavish, evil and disgusting. My opinion isn’t a mere slavophilic eccentricity; George Kennan, our greatest Cold War diplomat, said more or less the same things before he died.
If this was a shoot down by Donetsk separatists, and even if the Russians supplied the missiles to the separatists (who could have captured them from Ukrainian forces, or simply borrowed a couple from the local arms factories), this doesn’t make the Russians culpable for the tragedy. By that logic, the US is responsible for all the bad things done with weapons it supplies to its proxies, such as ISIS in Syria and Iraq, which is arguably worse. Certainly the US is responsible for the escalation of the situation in Ukraine. I say all this, because passions are high, and the war drums are beating. I am not a war monger, or apologist for anybody; in fact, I’m the closest thing you’re going to get to an unbiased observer in this disaster. I have no horse in this race. I wish they’d all learn to get along.
So, the Rooskies are now implying that a Ukrainian Su-25 may have shot down flight MH17. Facts and objective reality seem to be in short supply in Western coverage of the Ukraine crisis; I aim to supply some. I am going with the assumption that the Rooskies are telling the truth, and that there was indeed a Ukrainian Su-25 where they said there was. They said the Su-25 came within 2 to 3 miles of the 777.
Everyone agrees that the Boeing 777-200ER was flying over the separatist region at 33,000 feet. A Boeing 777′s cruising speed is about 560mph or Mach 0.84. Its mass is about 500,000 pounds, and it has a wingspan and length of about 200 feet each. The MH17 was flying from West to East, more or less.
The Su-25 Frogfoot is a ground attack aircraft; a modern Sturmovik or, if you like, a Rooskie version of the A-10 Warthog. The wingspan and length of the Su-25 is about 50 feet each, and the mass is about 38,000lbs with a combat load. The ceiling of an unladen Su-25 is about 23,000 feet. With full combat load, an Su-25 can only make it to 16,000 feet. This low combat ceiling was actually a problem in the Soviet-Afghanistan war; the hot air and the tall mountains made it less useful than it could have been. At altitude, the maximum speed of the unladen Su-25 is Mach 0.82; probably considerably lower with combat loads. For air to air armament, it has a pair of 30mm cannons and carries the R-60 missile. The Su-25 is also capable of carrying the Kh-13, though it is not clear that the Ukrainians deploy this missile on their Su-25s. For the sake of argument, we’ll talk about it anyway.
Since it was a Ukrainian Su-25, we can also assume it was heading West to East; more or less the same trajectory as flight MH17. It could have been traveling in some other trajectory, but we can already see the problem with an Su-25 intercepting a 777; it’s too low, and too slow. If you want to believe the crackpot idea that Ukrainian government were a bunch of sinister schemers who shot down MH17 on purpose, an Su-25 is pretty much the worst armed military aircraft you can imagine for such a task. The Ukrainian air force has a dozen Su-27s and two-dozen Mig-29s perfectly capable of intercepting and shooting down a 777. They also have the Buk missile, and are capable of placing it somewhere near the Donetsk separatists if they wanted to make them look bad. So, the theory that the evil Ukrainians shot down a 777 with a Su-25 on purpose is … extremely unlikely.
Could an Su-25 have shot down a 777 by accident? Fog of war and all that? Perhaps they thought it was a Russian plane? Well, let’s see how likely that is. The weapons of the Su-25 capable of doing this are the cannons, the R-60 missile (and its later evolutions, such as the R-73E) and the K-13 missile.
Cannons: impossible. The Su-25 was at minimum 10,000 feet below the 777. This means simply pointing the cannon at the 777 without stalling would have been a challenge. The ballistic trajectory of the cannon fire would have made this worse. The Gsh-30-2 cannon fires a round which travels at only 2800 feet per second, significantly lower than, say, the round fired by a 338 Lapua sniper rifle. Imagine trying to shoot down an airplane with a rifle, from 2-3 miles away using your eyeball, in a plane, at a ballistic angle. If the MH17 was somehow taken out by cannon fire, it will have obvious 30mm holes in the fuselage. None have been spotted so far.
K-13 missile: extremely unlikely. The K-13 is a Soviet copy of the 50s era AIM-9 sidewinder; an infrared homing missile. Amusingly, the Soviets obtained the AIM-9 design during a skirmish between China and Taiwan in 1958; a dud got stuck in a Mig-17. It is not clear that the Ukrainian air force fields these weapons with their Su-25′s; they’re out of date, and mostly considered useless. Worse, the effective range of a K-13 is only about 1.2 miles, putting the 777 out of effective range. Sure, a K-13 miiiight have made it to a big lumbering 777 with its two big, hot turbofans, but it seems pretty unlikely; a lucky shot. The 16lb of the K-13 warhead is certainly capable of doing harm to a 777′s engines. Maybe it would have even taken out the whole airliner. Doubtful though.
R-60 missile: extremely unlikely. If a Su-25 was firing missiles at a 777, this is probably what it was using. The R-60 is also an IR guided missile, though some of the later models use radar proximity fuzing. Unlike the K-13, this is a modern missile, and it is more likely to have hit its target if fired. Why is it unlikely? Well, first off, it is unlikely the Ukrainian Su-25s were armed with them in the first place: these are ground attack planes, fighting in a region where the enemy has no aircraft. More importantly, the R-60 has a tiny little 6lb warhead, which is only really dangerous to fragile fighter aircraft. In 1988, an R-60 was fired at a BAe-125 in Botswana. The BAe-125 being a sort of Limey Lear jet, which weighs a mere 25,000lbs; this aircraft is 20 times smaller than a 777 by mass. The BAe-125 was inconvenienced by the R-60, which knocked one of its engines off, but it wasn’t shot down; it landed without further incident. A 777 is vastly larger and more sturdy than any Limey Lear jet. People may recall the KAL007 incident where an airliner was shot down by a Soviet interceptor. The Su-15 flagon interceptor which accomplished this used a brobdingnagian K-8 missile, with an 88lb warhead, which was designed to take out large aircraft. Not a shrimpy little R-60. The R-60 is such a pipsqueak of a missile, it is referred to as the “aphid.”
That’s it; those are the only tools available to the Su-25 for air to air combat. The other available weapons are bombs and air to surface missiles, which are even more incapable of shooting down anything which is 10,000 feet above the Su-25.
My guess as to what happened … somebody … probably the Donetsk separatists (the least experienced, least well trained, and least well plugged into a military information network), fired a surface to air missile at something they thought was an enemy plane. It could have been the Buk SA-11/17 with its 150lb warhead and 75,000 foot range, just like everyone is reporting. Another candidate is the Kub SAM, which is an underrated SAM platform also in use in that part of the world. Yet another possibility is the S-125 Pechora, which isn’t deployed in Ukraine or Russia, but it is probably still manufactured in the Donbass region. A less likely candidate is the S-75 Dvina (the same thing that took out Gary Powers), though the primitive guidance system and probable lack of deployed installations in Ukraine and Russia make this unlikely. The fact that the MH17 disappeared from radar at 33,000 feet, and the condition of the wreckage indicates it was something really big that hit flight MH17; not a piddly little aphid missile. The pictures of the wreckage don’t indicate any sort of little missile strike which might have knocked off an engine; it looks like the whole plane was shredded. Both engines came down in the same area, more or less in one piece.
Whatever it was, it wasn’t an Su-25. There is also no use going all “Guns of August” on the Russians over something that was very likely beyond their control. Here’s hoping all parties concerned learn to resolve their differences in a civilized manner.
Interesting links from the rumor mill (as they come in):
Update July 22:
Nobody else has yet noticed that Donetsk manufactures SAMs, or that there are several other potential sources and varieties of such weapons. The Russians are sticking with the Su-25 idea, and haven’t corroborated the Su-27 story, making it seem much less likely.
“Blame the Rooskie” war mongers would do well to remember the Vincennes incident, where the US shot down an Iranian air liner over Iranian airspace, killing a comparable number of innocent civilians.
According to the nation’s editorial pages, the modern era is characterized by international trade, the spread of “democracy” and high technology. Historians from the future will characterize the present as a squalid LED-lit beeping dark age where common sense went to die.
Today’s exhibit, brought to you thanks to their PR department: Vessyl. A $200 electric cup that allegedly tells you what you put in the cup, keeps track of how much of it flows into your gob, and sends messages to your nerd-dildo telephone criticizing your choice of beverages. The idea is to help people make “healthier choices” with respect to caffeine intake and liquid calories.
From their website:
“A key feature of the cup is fundamental hydration-tracking, estimating how much you need for peak hydration. You can tell if you need more water or not through what the company dubbed Pryme. As Business Insider explained, “You simply tilt the cup to activate the display. That blue light at the top means you’re fully hydrated. Throughout the day, that line will fluctuate.”
Or, you could take a drink of water when you’re thirsty, you fucking dumbasses. Or should I say, dymasses. Just starting from the obvious: unless you also pee, shit, breathe and sweat in your dumb magic cup, Gauss’s Law dictates that a cup actually has no clue as to the state of hydration of the human container you’re pouring liquid in. What happens if I am exercising? What if I am in a desert, or the Antarctic? What if I contract another case of Ukrainian amoebic dysentery and am doubled over and shitting water and blood? What if I have a ‘drinking problem’ like the guy on Airplane?
No matter how good their models are or how many data scientists they hire: no magic cup can tell how well hydrated I am. Thirst, on the other hand, tends to work pretty well as a way of regulating hydration. Or, if optimal hydration is important, use the technique they use in the army and Burning Man and drink water until you pee clear.
There are other things obviously wrong with this muppet idea. For example: the concept that someone needs a $200 electric cup to tell them that drinking soda or liquor is going to make them into fat drunkards, or that coffee keeps you awake at night. What most fat people need is not a sensor in their cup, but a sensor in their mouth, with a loudspeaker which shouts insults at them every time they stick pasta, ho-hos and icecream in it. Fundamentally though: why do I need my cup to tell me what I put in my cup? I put it there. What kind of neurotic space cadet needs a $200 cup to tell them what they put in their cup?
The “quantified self movement” is one of the most godawful dorky things ever to have caused squeals of nerdy delight at gaseous TED Chautauquas. The fundamental idea behind such things is sound: muscle heads, coaches and athletes have kept food logs and workout notebooks for as long as there have been muscle heads, coaches, athletes and the ability to write things down. Data is useful, but no excel spreadsheet or preposterous algorithm is going to do the thinking for you. You have to find the patterns yourself. You almost always have to write the data down yourself as well. Finally, you have to do experiments which test for outcomes: A/B testing is actually kind of hard when performed on a human being. There are fancier ways to infer patterns than A/B testing (coaches use them instinctively), but the chances of average individuals using such statistical tools productively is approximately nil. Most people don’t even know where to start. All the “quantified self” thing does is attempt to give lazy people with too much money access to the ancient technology known as “a notebook,” which is far more general and useful. Emacs org mode if you want to get all technological. The results speak for themselves. Old school notebooks work better in achieving real world results. Technology is a distraction, and no amount of technology can make up for a lack of character.
Whatever problems this “vessyl” purports to solve are more effectively solved without the use of electrical devices. Fat people need to eat less, and stop drinking calories. Insomniacs should drink less or no coffee. No nerd dingus or $200 electric cups will be required. About the only genuine utility I can think of for this is using it to attempt to detect date rape drugs, and it doesn’t claim to be able to do that. Not that anyone would use it when they’d need such a thing, but at least it is a legitimate application of an alleged food sensor that costs $200.
But hey, don’t listen to me: listen to what “leaders” tell you:
Journalism, in the ideal world, is supposed to inform the citizenry of facts important to their well being. Modern journalism seems to involve issuing press releases from the oligarchical reptiles who are destroying Western Civilization. Maybe I am a naive fool, and it was always thus. Either way, Michael Lewis’s latest book lends credence to the view that he is a very modern journalist.
Lewis’s book purports to be about high frequency trading. He manages to write several hundred pages of gobbledeygook without actually speaking to a High Frequency Trader (unless you count his incongruous encounter with poor Sergey Aleynikov ). The story Lewis actually tells is one of incompetent sell side traders who started an exchange which serves the interests of wealthy buy siders and shady brokers.
Brad Katsuyama is the hero of the book. Lewis’s recent books use the dreary trope: the band of clever and plucky outsider misfits who take on the establishment. Katsuyama’s misfittery is he’s an Asian who is good at sports, bad at math and computers; and even though he worked for a Wall Street bank, he went to a crappy Canadian school instead of Yale-vard. Among his misfit sidekicks are a potato-wog who is good at network ops, but who could never get a break on the street (I can relate). Also, a fat grouch from Brooklyn, a computer genius, a puzzle wizard and a few other guys who fade into the woodwork. They worked for RBC: a Canook bank which is supposedly the least Wall Streety place on the Street.
The plucky outsiders in this story are not portrayed in a particularly flattering way. In fact, they come off as dimwitted incompetents. Katsuyama was an old school block shopping sell side trader. If you remember my previous pieces on HFT nay sayers: Joe Saluzzi was also a sell side block shopper. Old fashioned sell side guys have obsolete jobs. Their jobs are to find liquidity for “buy side” customers buying into or liquidating a large position. Katsuyama’s anger at the idea that “the market is rigged” seems the simple rage of a man who has been assigned a task he is not qualified for. There are tales of he and his team wasting hundreds of thousands in RBC money executing bad trades to see what happens. They seemed shocked, shocked, that the market would move away from their ham-fisted dumpings of huge blocks of shares to someone else’s routing system.
Lewis keeps going on about how “nobody understood” any of this back in 2009, except for his plucky outsider heroes. If “nobody” understood it, how was I was able to write about it on my blog in 2009? Over 100,000 people read my various blogs on HFT that year. If you were not among the elite group of more than 100,000 insiders who read blogs, any punter could have purchased the Larry Harris book “Trading and Exchanges” available on Amazon.com for $71.58 + tax. This is how I originally clued myself in (thanks FDAX-H). Larry’s book was published in 2002. In early 2010 Barry Johnson published the book, “Algorithmic Trading and DMA” which explains the profession dedicated to getting a good fill on the modern electronic trading landscape. So, in 2010, there was not only a job description, “algorithmic trader,” for getting a good buy-side fill, there was also a “how to” book on the subject. Such people perform the function that used to be done by sell side people like Katsuyama and Joe Saluzzi. Lewis repeatedly states that this was a mysterious topic and nobody was talking. Actually, it is an extremely well understood topic; library shelves groan with volumes dedicated to the subject.
No books were really needed; history and experience should suffice. Back in the days of pit traders, if you threw a huge order at the pit, you might get a fill on a couple of round lots. The rest of the pit is going to change their prices, because they figure anyone swinging 10,000 or 100,000 share orders around must be informed traders. If they’re informed traders, they need to pay for their immediacy. Informed traders may be criminal insider-trader creeps, they may be people with really good trading strategies; it doesn’t matter -they’re informed somehow: they know stuff. If the market maker doesn’t adjust their prices in front of an informed trader, the market maker will go bankrupt. That’s market economics 101. As I previously described it in 2009 in the Three Stooges of the High Frequency Apocalypse;
What happens when you buy something? … If you want it for cheap, you sit around and look at different markets (ebay, amazon, craigslist) until someone displays a price you find acceptable. If you want that “something” right now, you drive to a store and buy it. You’ll almost certainly pay a little more at the store, because they need to make enough money to pay employees to prevent barbarians from stealing everything, and to keep the lights on and other such things for your convenience. You can also generally return what you bought to the store much easier than to ebay or amazon. You’re paying for the immediacy (buy it now!) and liquidity (buy as many as you want!) provided by the store. This is a service which costs money.
Immediacy costs money. Markets have always moved prices away from large orders. Market participants have always been able to cancel or move a limit order. That’s one of the features of the limit order. If Katsuyama didn’t understand these simple facts, he had no business collecting a $2 million a year paycheck shopping blocks for his customers, because he didn’t understand the basics of his profession. It’s possible that Lewis simply misunderstood something Katsuyama explained to him. It’s also possible that Katsuyama is a shark who told Lewis a lot of bullshit to get good press for IEX. This leaves only two possibilities: either Lewis is a credulous idiot who is not competent as a journalist, or Katsuyama is an idiot who was not competent as a trader. Take your pick.
Where it gets interesting is where Lewis claims bigshot buysider crybabies like Loeb and Einhorn never heard of any of this. They made it sound as if, back in the day when Loeb and Einhorn were paying 1/8 of a dollar spreads to knuckle-dragging pit orcs, no rock-ribbed he-man trader with 10lbs of undigested beef in his lower intestine would would dare move his price away from where Loeb and Einhorn wanted it. Why, moving the price away from a big order: that’s un-American!
So … these “plucky underdogs” helped Katsuyama form a new stock exchange, IEX. They claim that no sort of nefarious activity is possible on IEX, because, well, “trust us!” Liquidnet’s average cross is 45,000 shares; over 100 times the vaunted liquidity figures provided by IEX. If I traded stocks, why should I trust IEX over Liquidnet? Because Michael Lewis says they’re honest guys? If I believe the tales of Michael Lewis, the founders of IEX are a collection of “traders” who do not know how to trade, and the market itself is owned by … buy side traders. He seems to give IEX sloppy wet kisses for honesty, yet sees nothing wrong with the fact that they’re owned by a bunch of buy side guys. They’re also owned by some unknown buy side guys, which does not inspire confidence. Buy side guys, if they’re good at their jobs are informed traders. Nobody wants to trade against informed traders. Everyone wants to trade against noise traders.
IEX has simple order types; limit, midpoint, fill or kill and market: I approve of this. On the other hand:
“IEX follows a price-priority model first, then by displayed order second. Then comes broker priority, which means a broker will always trade with itself first, which Katsuyama described as “free internalization.” He explained that brokers do not pay IEX to trade should an order be matched against another order from that same broker. This, he added, offers brokers incentive to trade in IEX.”
Hey now, wait a minute. Internalization and broker priority is pretty much the same thing as dark crossing, which Lewis was trying to tell us was bad. Now it’s supposed to be OK when Goldman does it? Later, Lewis actually quotes Katsuyama saying there were only a few brokers acting in their customer’s interests:
“Ten,” Katsuyama said. (IEX had dealings with 94.) The 10 included RBC, Bernstein and a bunch of even smaller outfits that seemed to be acting in the best interests of their investors. “Three are meaningful,” he added: Morgan Stanley, J. P. Morgan and Goldman Sachs.
I think this is the crux of this story: according to Michael Lewis and Katsuyama, we’re supposed to trust people like Einhorn who have been convicted of insider trading, people who are suspected of insider trading (buy side is by definition rife with this; particularly firms that do merger arb and special events), J.P. Morgan, Goldman and Morgan Stanley: we’re supposed to trust these guys more than we’re supposed to trust a bunch of tiny little market making firms who had been inconveniencing them by taking away some of their flow. Lewis tries to make this seem like a battle between the underdog “good guys” and the evil establishment. To believe this, you’d have to believe that Goldman Sachs and people like Einhorn are underdogs, rather than the actual establishment. To believe this, you’d have to believe the tiny industry of HFT traders actually rules the world and buys off congressmen and the SEC more than … J.P. Morgan and Goldman.
To give you a sense of scale: the largest HFT firm I know of, KCG, has operating cash flows of $140 million a year and a modest market cap of $1.4 billion (betcha didn’t know it was a publicly traded company: Lewis certainly doesn’t mention it). JPM has operating cash flows of $100 billion a year, almost a trillion on the balance sheets, and a quarter trillion or so in market cap. David Einhorn is personally worth $1.25 billion dollars. KCG’s entire market cap is only slightly more than that, and it employs 1200 people. Yet, somehow the HFT firms are the evil establishment, and JPM and Einhorn are … the plucky underdogs standing up for truth, justice and market makers not changing their quotes when some reptilian oligarch dumps 200,000 shares of YoyoDyne on the market.
Yeah, I might believe that. I might believe that if I were a dribbling retard.
Doing a bit of investigation into who owns IEX: we have the $13.2 billion activist shareholder fund Pershing Square, owned by Bill Ackman, another “underdog” worth $1.2 billion. We have the $6.7 billion Senator Investment Group. Scoggin Capital is only worth $1.8 billion; they do distressed debt and mergers, and have managed to only have one down year in 25. Another investor is venture capitalist Jim Clark, net worth $1.4 billion. He is particularly noteworthy as being a pal of Michael Lewis, and almost certainly the guy who made the introductions to the “flash boys” at IEX. Brandes Investment Partners is an old $29 billion AUM politically influential money management firm doing value investments, and is run by another billionaire. Third point, a hedge fund with $15 billion, also working in special situations aka “distressed debt and mergers,” run by Danny Loeb (who also miraculously has only one down year). Another investor in IEX is a little place called Capital Group Companies, one of the biggest buy side investors in the world, with $1.15 trillion AUM. Capital Group has been more or less scientifically proven to be one of the most powerful and influential corporations in the world.
You get the idea: IEX is not owned by plucky underdogs. It is owned by very rich and powerful “buy side” people. People who find the present system of liquidity provision inconvenient. Buy side has always found liquidity providers inconvenient; they had to pay old school “sell side” traders like Katusyama to work the trades for them at the very least. There wasn’t much they could do about it until now. Now that they own almost everything, they can open their own damn stock exchange and buy some cheap brokerage flow. That and unleash Michael Lewis, the FBI and New York Attourney General on the peasants who make them pay for liquidity.
I don’t think IEX and their investors represent the interests of “the little guy” at all. The actual little guy (aka people like me) does pretty well making small orders with the present system. If you believe Lewis’s book, the thing we’re supposed to be worried about is telegraphing a big buy or sell by routing your order to several different exchanges. The thing is, “the little guy” doesn’t make large buy or sell orders, and unless he does, what Lewis describes is impossible. The people IEX benefits are exclusively preposterously wealthy buy side people. That and the brokerages who get to trade against the pieces of their flow that they want. Pardon me if I notice that such people aren’t exactly tribunes of the people. What’s actually going on here is the brokers are, as usual, taking the flow. They’re giving up some of the leftovers to the buy side guys, who also pocket the exchange fees. If you’re worried about flow or think the present system of liquidity provision is somehow predatory: this is a buzzard and a hyena sharing a carcass.
I have no dog in this race: I’m not a HFT, I have never taken a dime from any exchange, and I haven’t so much as executed a stock trade in 4 years. Everything I’ve read, and all the traders (buy side and otherwise) I’ve spoken with seem to think that HFT market makers have improved things from the pre-decimalization bad old days of pit traders who got their jobs because they went to the right New York City high schools. I know for a fact that HFT market making as a business is nowhere near as profitable as it was even a few years ago. This is exactly what you would expect when you have lots of smart people competing in a not-so profitable business. I don’t think the use of computers makes markets any more inherently dangerous, any more than the use of computers in automobiles makes them more inherently dangerous. If you asked me what I thought the worst thing about the present system was, it would be the profusion of weird order types. Something that IEX, to their credit, gets right. There are actual frauds in HFT, just like there are in any other business involving money, from the Avon lady on up the food chain. The worst HFT tort I can think of is the practice of “quote stuffing.” Lewis (of course) never mentions this, and I have read nothing which indicates IEX is ready for it.
I know a few HFT type people. One of ‘em might be even be as rich as Michael Lewis. So far, all the ones I have met are clever and decent people, and I figure whatever they’ve managed to earn by the sweat of their brows, they deserve it. I’m not real pleased with the idea of a small group of decently paid, politically helpless nerds being the fall guys for a bunch of crooked oligarchs who don’t want to pay for their liquidity.
Speaking of which: FREE SERGEY
This review by a trader lists 15 more technical inaccuracies in the book. He also noticed that broker priority is shady business if we’re talking about helping “the little guy” here.
This trader gives a really great review.