To my mind, the most mathematically interesting thing about roulette is the betting system you should use to maximize your wins. Bet sizing systems are important in all probabilistic games, and the types of lessons learned from a winning game of roulette are the same types of lessons you need to learn in betting on other things, like success in trading, or having an edge on the wiener dog races. The nice thing about a game of roulette is it is relatively easy to characterize your edge. Most people’s edge over the roulette wheel is negative, so you should not bet. If you built one of the computer gizmos I went over in part 2, you have a positive edge over the roulette wheel.
We know from results in information theory, that sequential bets in the presence of an edge should be sized according to the Kelly Criterion to maximize bankroll growth rate.
or, in more probabilistic terms;
where is probability of success.
It’s probably not immediately obvious why this is so, but consider a biased coin toss at even odds ($1 payoff for $1 bet). If your coin’s edge is 100%, you gain money fastest by betting your whole bankroll. If you have 0% edge, you shouldn’t bet anything. If you have a 1% edge, you should bet 1% of your bankroll.
Daniel Bernoulli came up with the same fraction a long time before by maximizing the geometric mean.
Kelly’s original paper figured this out by modeling how a better would place bets assuming he had insider information transmitted over a noisy wire transmitting a binary code; a beautiful way of thinking about predictions in the presence of noise. Kelly is a guy I wish had lived longer. He dropped dead at the young age of 41; in his short life he was a Naval aviator in WW-2, invented computer speech synthesis, made huge contributions to information theory, mentored important mathematicians (Elwyn Berlekamp, who went on to found Axcom/Rentech, based in part on Kelly’s insights) and had the kind of life that would be considered hyperbole if he was in a science fiction novel. They make big men in Texas. Kelly was a giant.
I’ve been known to take sadistic glee in making fun of economists. One of the most mockable economists in American history is (Nobelist -the Swedes have dry humor) Paul Samuelson. One could write entire books on the ways in which Samuelson was a scoundrel and a numskull who set back human knowledge by decades. One fact will suffice for this essay: Samuelson didn’t believe in Kelly betting. Explaining why he thought this, and why he’s wrong would be pointless; debugging an economist’s faulty thought processes is as pointless as explaining why a crazy lady is breaking dishes in the kitchen. If you’re interested, Ed Thorp is your man here also.
Following Ed Thorp’s original essay in the Gambling Times, as good little experimental physicists, we need to build up an error budget to figure out our edge. Thorp breaks down the errors in his and Shannon’s Roulette system into several kinds.
- E1 Rotor speed measurement error
- E2 Ball speed measurement error
- E3 Ball rotor path randomness
- E4 Ball stator path randomness
- E5 Fret scatter
- E6 Rotor tilt (discovered by Shannon and Thorp)
Uncorrelated errors add up as the sum of squares, so the total error budget is
The Thorp/Shannon roulette system had a 44% edge on the most favored number; single number payouts in Vegas are 35:1, making the correct bet on one number 0.44/ 35 = 0.01256. Since nobody in 1960s Vegas suspected the mathematical machinators of having a physics edge on the wheel, they were able to place larger bets on parts of the quadrant. While Thorp describes it as “diversification” in his exposition. Another way of thinking about it: he’s just playing more games at once. A friend and former customer explained his trend following method as working in much the same way. The more bets you place, the more likely you’ll hit a winning trend.
Kelly betting isn’t a perfect solution in all cases; fixed fraction betting has certain disadvantages when you can’t exactly characterize your edge, or the payout odds, or you have a limited number of bets before you have to cash in your chips. However, in the case of a machine to beat Roulette, it’s difficult to think of a better technique.
Of course, Kelly betting and things like it figure in other sorts of betting; people do use it in Markets where it is appropriate. Supposedly it was part of Axcom/Rentech’s early secret sauce, and certainly folks who have thought about trading need a bet sizing and risk management strategy that makes sense. Kelly is often a good place to start, depending on your situation. But that’s a topic for another blog post. One more coming on modern techniques to beat Roulette, including the one I came up with in 2010 (which, in case you were holding your breath, didn’t really work, which is why I have to work, and am willing to talk about such things in blogs).
Kelly criterion resources
Kelly’s original paper:
The common method for interning strings breaks in fantastic ways. In Kerf, we’ve taken the old method and revised it for success in the current generation of languages.
If you’ve forgotten what string interning is from your time with Java, it’s a simple way of ensuring that any string appears only once. So for instance, if I have ten objects whose type is the string “lion”, I don’t have ten copies of the string “lion” in memory. What I have instead is a single copy of “lion” and then ten links which all point to the single copy somehow.
Most often these links are pointers (raw addresses in memory). We’ll discuss how this breaks soon. The reference copy of “lion” is never allowed to be moved, changed, or released. It’s permanent. The other implementation detail to figure out is how to keep the strings unique. The next time we create a “lion” string we need to trade our string in for a link, and what is usually done is that the reference copy is stored in a hashtable or some other deduplicating data structure. This lets us figure out that lion already exists, and when we perform the check we can walk away with our link at the same time. If the phrase is “bullmation”, and it doesn’t already exist, then we can add the initial copy to the data structure at the same time.
Every now and then I run into a piece of technology which I find completely mind boggling. Something that shouldn’t really exist, but does anyway. The Tatra 603 is one of these things.
For one thing, it’s a communist automobile from the former Czechoslovakia, released in 1949. You know; the communists -the people who brought us the Trabant and the Lada. First thing you notice is, unlike the Trabant or Lada, or even a Skoda, the Tatra is pretty.
Looking under the hood, well, you’ll find … nothing, because it’s a rear engined car, like an old Porsche. Looking in the trunk, you find … an air cooled V-8 which is insane and amazing. The only air cooled cars most people ever see are Porsches. So basically what we have here is a 6-passenger Porsche with a rumbley motor in it.
Apparently it handled like a giant Porsche also. It was also hand-made like an old Porsche. It was only a 100 horsepower V-8, but it was also a light car with a stick shift. Sort of like one of the 1930s era Jaguar sedans, except with a rear engine and the power curve of a V-8, rather than a straight six.
This mind blowing 1962 communist ad for the Tatra 603 … well, gear heads have to promise to take 13 minutes of their lives to watch this. First off; consider the fact that this was a car only allowed to high communist officials who got professional chauffeurs. I guess high communist officials just sat around all day and watched 13 minute long commercials about the glorious products of people’s Tatra factory. Second … I mean, look at the driving insanity. Road hogging, drifting … in a rear engined car, reckless (I’ve been on the very same roads; these guys are nuts) Steve McQueen style hot-dogging, off road mud-bogging, outrunning them silly Boss Hoggski policemen, hill climbing, driving on sidewalks, and doing doughnuts in Chesky Krumlov: they even rolled the damn car down a hill and drove away; just to show it could be done. What the hell, communist block leaders? Either these guys had more fun being communist officials than any other group of people in all of human history …. or I don’t know what to think. Either way, try to imagine any of this in an American car ad at any point in history. And then, remember this was communism; communism was never sold as a fun ideology; it was a grim and serious ideology covered in human blood. Just skip to the middle if you don’t have the same amount of free time as a high communist party official.
The vague resemblance to the VW bug is no coincidence. The 1930s Tatras were innovators in streamlined cars. The Tatra-77 was a direct ancestor, and the designer (Paul Jaray) was involved with Zeppelin design before he started fooling with cars. The aerodynamics of old Tatras were often better than modern cars, and the VW bug design was lifted directly from Tatra economy cars such as the V570 and the T97.
The communists had only been running the country for a few years when this thing came out in 1956, so it’s really an old capitalist/Paul Jaray design that ended up being made by commies, but it’s pretty damn cool that they kept it going until 1976. Also, the commercial makes me want to study dialectical materialism, so I can have a chauffeur and decorous, refined bimbo to drive around like a maniac with. I’m presuming that everyone in the car was completely schnockered on pivo and slivovitz, and am just a bit disappointed they weren’t all smoking like chimneys through the whole adventure.
One of the most powerful features of Kerf are the combinators. Kerf is a vector language. You can operate on vectors and matrices as if they are natural data units, achieving interpreter speedups over looping constructs. While everyone is used to for loops in an interpreter, interpreters do very badly on them, which is why they always encourage people programming in R and Matlab to use the vector constructs if available. If you think about what goes on in various kinds of interpreter, you realize that there is a lot going on inside of a for loop.
Depending on how the interpreter is implemented, you may have to parse each line in the loop for every iteration in the loop; you have to evaluate test conditions, maintain state and so on.
Even if your loop is trivial and does some element wise operation on vectors it ends up going slower than it would in a vector operation. Interpreters need to check things, build stacks, move things around in a for loop. For example:
Compilers can generally optimize down to the metal, so that’s one way out. In Kerf, you can use combinators to help the interpreter avoid these problems. For example, summation amounts to putting + in between all
the elements of a vector. Fold puts the function to its left in between all of the elements of the thing on the right.
timing(1) a:rand(1000000,1.0) + fold a 1ms
This is a trivial application, but it illustrates the power of the idea, and its speed in action. Tell the interpreter to do a lot of things at once, and it will be done at close to machine speed, even if there is no compiled primitive to accomplish the task. With a good set of combinators you can achieve the satori known as “No Stinking Loops.”
Sum is a trivial example (there is a primitive in Kerf for this), but one which illustrates the power …. continued at the Kerf blog