Locklin on science

Machine learning & data science: what to worry about in the near future

Posted in machine learning by Scott Locklin on July 9, 2018

Henry Kissinger  recently opined about machine learning. OK, he used the ridiculously overblown phrase “AI” rather than “machine learning” but the latter is what he seemed to be talking about. I’m not a fan of the old reptile, but it is a reasonably thoughtful piece of gaseous bloviation from a politician. Hopefully whoever wrote it for him was well compensated.


There are obvious misapprehensions here; for example, noticing that chess programs are pretty good. You’d expect them to be good by now; we’ve been doing computer chess since 1950. To put this in perspective; steel belted radial tires and transistor radios were invented 3 years after computer chess -we’re pretty good at those as well. It is very much worth noting the first important computer chess paper (Shannon of course) had this sentence in it:

“Although of no practical importance, the question is of theoretical interest, and it is hoped that…this problem will act as a wedge in attacking other problems—of greater significance.”

The reality is, computer chess largely hasn’t been a useful wedge in attacking problems of greater significance.  Kissinger also mentioned Alpha Go; a recent achievement, but it is something which isn’t conceptually much different from TD-Gammon;  done in the 1990s.

Despite all the marketing hype coming out of Mountain View, there really hasn’t been much in the way of conceptual breakthroughs in machine learning since the 1990s.  Improvements in neural networks have caused excitement, and the ability of deep learning to work more efficiently on images is an improvement in capabilities. Stuff like gradient boost machines have also been a considerable technical improvement in usable machine learning. They don’t really count as big conceptual breakthroughs; just normal improvements for a field of engineering that has poor theoretical substructure. As for actual “AI” -almost nobody is really working on this.

None the less, there have been progress in machine learning and data science. I’m betting on some of the improvements having a significant impact on society, particularly now that the information on these techniques is out there and commodified in reasonably decent software packages. Most of these things have not been spoken about by government policy maker types like Kissinger, and are virtually never mentioned in dopey “news” articles on the subject, mostly because nobody bothers asking people who do this for a living.

I’d say most of these things haven’t quite reached the danger point for ordinary people who do not live in totalitarian societies, though national security agency type organizations and megacorps are already using these techniques or could be if they weren’t staffed with dimwits. There are also areas which we are still very bad at, which are to a certain extent keeping us safe.

The real dangers out there are pretty pedestrian looking, but people don’t think through the implications. I keep using the example, but numskull politicians were harping on the dangers of Nanotech about 15 years ago, and nothing came of that either. There were obvious dangerous trends happening in the corporeal world 15 years ago which had nothing to do with nanotech. The obesity rate was an obvious problem back then, whether from chemicals in the environment, the food supply, or the various cocktails of mind altering pharmies that fat people need to get through the day. The US was undergoing a completely uncommented upon and vast demographic, industrial and economic shift. Also, there was an enormous real estate bubble brewing. I almost think numskull politicians talk about bullshit like nanotech to avoid talking about real problems. Similarly politicians and marketers prefer talking about “AI” to issues in data science which may cause real problems in society.

The biggest issue we face has a real world example most people have seen by now. There exists various systems for road toll collection. To replace toll takers, people are encouraged to get radio tags for their car like “ezpass.” Not everyone will have one of these, so government choices are to continue to employ toll takers, removing most of the benefit of having such tools, or use an image recognition system to read license plates, and send people a bill. The technology which underlies this system is pretty much what we’re up against as a society. As should be obvious: not many workers were replaced. Arguably none were; though uneducated toll takers were somewhat replaced by software engineers. The real danger we face from this system isn’t job replacement; it is Orwellian dystopia.

Here is a list of  obvious dangers in “data science” I’m flagging over the next 10-20 years as worth worrying about as a society.

1) Face recognition software  (and to a lesser extent Voice Recognition) is getting quite good. Viola Jones  (a form of boosted machine) is great at picking out faces, and sticking them in classifiers which label them has become routine. Shitbirds like Facebook also have one of the greatest self-owned labeled data sets in the world, and are capable of much evil with it. Governments potentially have very good data sets also. It isn’t quite at the level where we can all be instantly recognized, like, say with those spooky automobile license plate readers, but it’s probably not far away either. Plate readers are a much simpler problem; one theoretically mostly solved in the 90s when Yann LeCun and Leon Bottou developed convolutional nets for ATM machines.

Related image

2) Machine learning  and statistics on large data is getting quite respectable. For quite a while I didn’t care that Facebook, google and the advertisers had all my data, because it was too expensive to process it down into something useful enough to say anything about me. That’s no longer true. Once you manage to beat the data cleaning problems, you can make sense of lots of disparate data. Even unsophisticated old school stuff like éclat is pretty helpful and various implementations of this sort of thing are efficient enough to be dangerous.

3) Community detection. This is an interesting bag of ideas that has grown  powerful over the years. Interestingly I’m not sure there is a good book on the subject, and it seems virtually unknown among practitioners who do not specialize in it. A lot of it is “just” graph theory or un/semi-supervised learning of various kinds.

Image result for community detection algorithm

4) Human/computer interfaces are getting better. Very often a machine learning algorithm is more like a filter that sends vastly smaller lists of problems for human analysts to solve. Palantir originated to do stuff like this, and while very little stuff on human computer interfaces is open source, the software is pretty good at this point.

5) Labels are becoming ubiquitous. Most people do supervised learning, which … requires labels for supervision. Unfortunately with various kinds of cookies out there, people using nerd dildos for everything, networked GPS, IOT, radio tags and so on; there are labels for all kinds of things which didn’t exist before. I’m guessing as of now or very soon, you won’t need to be a government agency to track individuals in truly Orwellian ways based on the trash data in your various devices; you’ll just need a few tens of millions of dollars worth of online ad company. Pretty soon this will be offered as a service.


Ignorance of these topics is keeping us safe

1) Database software is crap. Databases are … OK for some purposes; they’re nowhere near their theoretical capabilities in solving these kinds of problems. Database researchers are, oddly enough, generally not interested in solving real data problems. So you get mediocre crap like Postgres; bleeding edge designs from the 1980s. You have total horse shit like Spark, laughably insane things like Hive, and … sort of OK designs like bigtables… These will keep database engineers and administrators employed for decades to come, and prevent the solution of all kinds of important problems. There are people and companies out there that know what they’re doing. One to watch is 1010 data; people who understand basic computing facts, like “latency.” Hopefully they will be badly managed by their new owners. The engineering team is probably the best to beat this challenge. The problem with databases is multifold: getting at the data you need is important. Keeping it close to learning algorithms is also important. None of these things are done well by any existing publicly available database engines. Most of what exists in terms of database technology is suitable for billing systems, not data science. Usually people build custom tools to solve specific problems; like the high frequency trader guys who built custom data tee-offs and backtesting frameworks instead of buying a more general tool like Kx. This is fine by me; perpetual employment. Lots of companies do have big data storages, but most of them still can’t get at their data in any useful way. If you’ve ever seen these things, and actually did know what you were doing, even at the level of 1970s DBA, you would laugh hysterically. Still, enough spergs have built pieces of Kx type things that eventually someone will get it right.


2) Database metadata is hard to deal with. One of the most difficult problems for any data scientist is the data preparation phase. There’s much to be said about preparation of data, but one of the most important tasks in preparing data for analysis is joining data gathered in different databases. The very simple example is the data from the ad server and the data from the sales database not talking to each other. So, when I click around Amazon and buy something, the imbecile ad-server will continue to serve me ads on the thing that Amazon knows it has already sold me. This is a trivial example: one that Amazon could solve in principle, but in practice it is difficult and hairy enough that it isn’t worth the money for Amazon to fix this (I have a hack which fixes the ad serving problem, but it doesn’t solve the general problem). This is a pervasive problem, and it’s a huge, huge thing preventing more data being used against the average individual. If “AI” were really a thing, this is where it would be applied. This is actually a place where machine learning potentially could be used, but I think there are several reasons it won’t be, and this will remain a big impediment to tracking and privacy invasions in 20 years. FWIIW back to my ezpass license plate photographer thing; sticking a billing system in with at least two government databases per state that something like ezpass works in -unless they all used the same system (possible), it was a clever thing which hits this bullet point.

3) Most commonly used forms of machine learning requires many examples. People have been concentrating on Deep Learning, which almost inherently requires many, many examples. This is good for the private minded; most data science teams are too dumb to use techniques which don’t require a lot of examples. These techniques exist; some of them have for a long time. For the sake of this discussion, I’ll call these “sort of like Bayesian” -which isn’t strictly true, but which will shut people up. I think it’s great the average sperglord is spending all his time on Deep Learning which is 0.2% more shiny, assuming you have Google’s data sets. If a company like google had techniques which required few examples, they’d actually be even more dangerous.

4) Most people can only do supervised learning. (For that matter, non-batch learning terrifies most “data scientists” -just like Kalman filters terrify statisticians even though it is the same damn thing as linear regression). There is some work on stuff like reinforcement learning being mentioned in the funny papers. I guess reinforcement learning is interesting, but it is not really all that useful for anything practical. The real interesting stuff is semi-supervised, unsupervised, online and weak learning. Of course, all of these things are actually hard, in that they mostly do not exist as prepackaged tools in R you can use in a simple recipe. So, the fact that most domain “experts” are actually kind of shit at machine learning is keeping us safe.



A shockingly sane exposition of what to expect from machine learning, which I even more shockingly found on a VC’s website:



I don’t want to work on your shitty blockchain project: especially you, Facebook

Posted in fun, privacy by Scott Locklin on May 24, 2018

At the moment, I appear to be some kind of unicorn. I’m a no bullshit dozen year veteran of using math and machine learning to solve  business problems. I’ve also got some chops in blockchain which I am considerably more humble about. I am a real life machine learning blockchain guy. I don’t actually ride to work on a unicycle while wearing silver pants, but I probably could get away with it. As such, recruiters looking to cash in on the blockchain chuckwagon  seem  unable to leave me alone, despite my explicitly asking them to do so.

Image result for blockchain unicorn

It boggles my mind that there even exist recruiters for blockchain. After the blockchain annus mirabilis of 2017, anyone who knows a few useful things about the subject is almost certainly productively employed and probably fairly unconcerned with stuff like money. I’d posit that any blockchain type who can’t find productive employment on socially useful projects or isn’t in danger of financial independence either  doesn’t feel like working, doesn’t care about money or doesn’t actually know anything about blockchain. In the former cases you can’t recruit them, and in the latter case, you really shouldn’t.

Of course there are no shortage of faux “experts” who wouldn’t know a Merkle-tree from a KD-tree. Usually these same “experts” were or would have been touting themselves as “AI” or machine learning thought leaders a few months prior, and IoT, augmented reality, clean tech, “dat cloud” and … I don’t remember what the litany of  marketing diarrhea was being squirted out of Silly Con Valley’s corporate orifices before then. I have better things to use that brain cell for.

On the off chance that someone who is competent in this subject were looking for a job, there are obvious places to go. The crypto currency exchanges are decent places that will  incubate many new ventures; Gemini would be my pick. Their exchange is technologically far and away the best there is, and based on my experiences so far, it’s also the best run. There is good reason for this; the Winkelvii struck me as a couple of smart, honest and diligent guys. Better than the exchanges are the companies and foundations running the various blockchain projects themselves. Crypto investment funds will be an interesting place to make a buck. Right now it’s shooting fish in a barrel and there are a lot of morons doing it, but some of them are going to accumulate tremendous wealth, and there are direct, obvious and not so obvious ways a blockchain expert can help them do this. Or, start your own blockchain project. There is much work to do, and even though it is more difficult to fund new projects than last year, good projects will be funded, and now is the time. Whatever solutions win either already exist or they will shortly.  Other decent ideas: one of the big accounting firms, the banks, various corporate contributors to hyperledger fabric.

Of course, I don’t want any of this: I’m exactly where I want to be. I am helping good people fix the internet and save it from corporate weasels. Every day I get up and help do my bit to make things better. It’s a nice feeling. Problems are pretty interesting too.

But if I did want another job, the very last place on earth I would work is Facebook. Facebook is corporate syphilis. I keep telling them this. I even went through the process of quitting their service and wrote a whole blog on it. They don’t listen. It’s almost like they don’t give a shit when people tell them things. I was polite the first time, joking they could have my services if they buy my company. No more.

When I say Facebook is corporate syphilis, I am not engaging in hyperbole. I consider tobacco companies to be more ethical and serving a higher social purpose. Tobacco companies employ factory workers, farmers, shopkeepers and .. they keep doctors in business. Tobacco is more sociable than Facebook; smokers must meet face to face now that they are banished to the outdoors. Hell, smoking is probably physiologically healthier than spending hours a day noodling with your nerd dildo on ‘tardbook; at least you get up and walk around once an hour.  Supposedly nicotine is a prophylactic against Parkinsons disease, even if the most popular delivery method does kind of give you cancer. Facebook isn’t prophylactic against anything but having a life. Unlike Facebook,  some people want and enjoy nicotine. Nobody in the history of the human race has ever decided they want something like Facebook in their lives. “Gee I want a fraudulent advertising service that ruins and commodifies my relationships, wastes my time, makes me depressed, decays the moral fiber of entire civilizations, causes mass hysteria, spies on me and sells me out for pocket change, is as addictive as heroin,  is the bones of a hellscape surveillance state and is impossible to live without in the modern world; SIGN ME UP YO.”

Even gambling syndicates serve a higher social purpose than Facebook. The gambling rackets provide subsidies for entertainment, jobs for hundreds of thousands of decent working class people, and they somehow manage to employ more and more interesting applied math types than Facebook does. Facebook has all of the addictive and time wasting qualities of gambling, applied to more people, causing more social corrosion and employing fewer people. Facebook really is corporate syphilis.



Their excuse for existence is that Facebook “brings people together.”  CBS news used to bring people together; everyone would watch 60 minutes and talk about it at the water cooler. Facebook is a narcissism factory which causes moral panics, ridiculous rumor propagation, argument between friends, social fragmentation, alienation and even mass suicide. It’s also so obviously rotting the social fabric of the internet and society at large, even the debauched whores in the media are noticing. Facebook’s walled garden is wrecking the economics of the content providers and entertainers that make the internets interesting and worthwhile. It’s run by opportunistic mountebanks and sinister robots who … well, assuming they aren’t actual comic book villains, they sure do a reasonable impersonation. The PR these yoyos get is at best Stalinistic nonsense; at worst, people just sucking up to money and power. Speaking of Stalinism, Facebook employs literal former Stasi agents to censor and snitch on people for … saying things. Think about that. They expect me to work for a company that employs East German Secret Police; in precisely the same capacity as they were used in the former East German Workers paradise. I wonder what their dental plan is like? Maybe the one described in Marathon Man?

Kim Jong Il backed by officers visits the July 18 Cattle FarmImage result for zuck and cows


The recruiters (4 so far counting outside contractors) tell me there is some little Eichmann at Facebook who suffers under the delusion I would work in their cubicle jonestown. I will not. Not as long as I have a kidney I can sell to Ukrainian kidney merchants,  hands to shovel shit, or a sword to fall on. Facebook needs blockchain and machine learning people the same way they need a Manhattan project on biological warfare.

I am no boy scout, but I do still harbor a vague moral sense. Facebook is bad and anyone who works there who is not an active saboteur or malingerer should be deeply ashamed of themselves.   The only way I will ever return to their once pleasant campus (it was pleasant when Sun Microsystems was there) is at the head of a column of tanks.


Edit add: look at what they came out with today; a press release from their own internal ministry of truth. I’m going to assume it is either the Demons they keep in the basement, or the electroshock therapy they administer in the “art rehab center” which causes the total lack of self awareness which makes crap like this possible: https://newsroom.fb.com/news/2018/05/facing-facts-facebooks-fight-against-misinformation/


Dynamite Cruiser Vesuvius

Posted in big machines by Scott Locklin on February 16, 2018

The 1800s were a time of revolution in technology. Everyone knows about the  H.M.S. Dreadnought, which made all other proto-battleships obsolete. There were a few false starts along these lines which were also interesting. One of the most hyped ones, at least as hyped as “stealth ships” or the “littoral combat ship” was the idea of the Dynamite Cruiser.

The Dynamite Cruiser Vesuvius was the only example of the kind. It was about as high tech as they come. Instead of using explosives to launch projectiles, it used compressed air. This made the first salvo completely silent. The main innovation was that the brobdingnagian 15″ guns which shot enormous quantities of explosive at the enemy. It was much faster than conventional ships, being lightly armored (only 900 tons, compared to an average of 4000 tons for a typical warship of its class) and equipped with enormous engines. The idea was to sneak up on the enemy, silently lob a couple of tons of dynamite on them and stealthily slip away. Back in the day, the perfidious Yankee’s idea was to build an enormous fleet of cheap  Dynamite Cruisers to challenge the European domination of the seas.

lookit dem gunz

In those days, filling shells with high explosives was a tricky business. To place this technology in historical context: the fact that it used an electric detonator was considered a really big deal. This thing was commissioned in 1890; a time when electricity and magic were pretty close to indistinguishable. We don’t have a parallel today, simply because technology has not advanced since 1970 or so, but imagine being in 1970 and being told you’d be able to carry a cell phone some day; that’s about the same as electrically detonated shells in 1890.

Black powder was still the main propellant used in launching shells back in those days: “smokeless” powders like cordite had not quite been invented yet. Guncotton was still high technology stuff of science fiction (in fact, this thing used a form of guncotton in its shells). Lots of early explosive shells would just explode inside their guns. So, early battle ships either used low explosives or solid shells. Everyone knew about dynamite: it was the new wonder technology of the age. More stable forms of high explosive which could survive launch via cannon hadn’t been discovered yet: picric acid explosive shells were some years away. TNT wasn’t used in shells until 1902. The main idea of the Dynamite Cruiser was to launch the fairly unstable explosive by a sort of aerial torpedo so it wouldn’t blow up inside the launching vessel’s cannon.

Image result for dynamite cruiser vesuvius

The fact that nobody has ever heard of the Vesuvius means it probably had a few problems. First problem: since it was a pneumatic launched projectile, it couldn’t use a gun turret. It was impossible to build air hoses which could withstand much pressure and be flexible enough to rotate. To this day, torpedo tubes only point forward or aft for the same reason. This meant the Vesuvius had to point itself at the enemy and hope that the bobbing of the sea didn’t bounce the point of impact around too much: a futile hope. It was also an extremely structurally unsound boat. It was built like a streamlined Yacht. But the designers managed to forget about the enormous cannon it sported in the front. This made it almost impossible to maneuver. In fact, it made the thing so structurally unsound, the bolts that held it together would explosively sheer in choppy water. The tanks and compressors which drove the cannon took up so much space in the boat, there wasn’t much room for people to do useful work. It was so cramped, it could only carry 30 shells. It also had a tiny beam, which, while useful for making for a good top speed, made it incredibly unstable as a gun platform; it rolled enormously and at a period of once every two seconds. Not good at all for a gun platform.

dem air valves

It was eventually used in the Spanish American war to bombard Cuba. It did succeed is scaring the crap out of the Spaniards, since they didn’t hear the report of the guns before it was raining humid dynamite. However, whatever damage it caused was accidental. The ineffectiveness of dynamite bombardment was rapidly realized, so the mighty Dynamite Cruiser was relegated to courier duties. Eventually it was refitted as an ordinary Torpedo boat, and then ignominiously sold for scrap.

I don’t know if there are any lessons to be learned from the Vesuvius. I guess the main one is a weapons system should be used in combat or something close to it before it is declared the latest thing. If we want to compare this giant leap forward in technology to modern American naval vessels, the LCS are so incredibly silly and can barely remain afloat.  Perhaps the Naval drone is more comparable in being “advanced,” expensive and completely untried. Or perhaps the government actually consists of anointed military genius Frederick Barbarosa types and I’ve been taking too much advantage of California’s legal marijuana crop in 2018.

I originally read about this thing in a Patton essay. Perhaps the best way to close is with what Patton said.

“When Samson slew the Philistines with the jawbone of an ass, he probably created such a vogue for the weapon that throughout the world no prudent donkey dared to bray. Certainly the advent of the atomic bomb was not half as startling as the initial appearance of gunpowder. In my own lifetime, I remember two inventions, or possibly three, which were supposed to stop war; namely the dynamite cruiser Vesuvius, the submarine, and the tank. Yet, wars go blithely on and will when our great-grandchildren are very old men.”

Decoupling from fakebook

Posted in privacy by Scott Locklin on October 5, 2017

I was around for the glory days of the internet: the 90s and early 2000s. Back then it was truly what it was supposed to be; a decentralized network where you could find all kinds of interesting data and interact with people who share obscure interests with you. The browser was organized to help you, rather than monetize you for evil megacorporations. And there was plenty of stuff that wasn’t browser intermediated. Very little of the remaining internet is anything like early libertarian internet. /chan probably comes closest, with some of the blockchain projects being in the correct utopian spirit. There is nothing inherent in modern day internets which prevents us from having decentralized social networks; a protocol which does this could be built directly into browsers, but nobody has done it yet, so the interwebs decay into the corporate surveillance dystopia we have today.

I’ve always disliked Facebook as a company.  Zuckerberg stole the idea from the Winkelvoss entity, and they even lifted the blue and white color scheme and layout from Friendster. I continued to use it for much too long as a way of sharing pictures with my friends and family, a chat application, a sort of recent cache of things I’m interested in, and a way of keeping touch with distant relatives and people I went to grammar school with. Reading Tim Wu’s “The Attention Merchants” finally made me realize there is no reason to use it, and lots of good reasons not to. One big reason not to continue: it’s a waste of time. You only get so much time on earth, and real human interaction is vastly more important than wasting even a few minutes a day on fake human interaction.

One of the sinister things about it is having as an audience hundreds of people you barely know (and if your privacy settings aren’t set to maximum; the entire world). You begin to censor yourself. While this is natural in any community; these people are not really your community.  There is no existing actual community where your Aunt Sadie, three of your ex girlfriends, a half dozen people you knew in the third grade, your second boss and some guy you met at a party once all watch your every interaction. Such an agglomeration of people is actually a nightmare.

Social networks should not be owned by profit-making companies; in this situation you are the product, and your very being is strip mined for nickels and dimes. It is inherently and trivially wrong to do this. We know now that some people catch depression from logging into this corporate dystopia. Some of the finest minds of our generation have worked very hard to make FB as addictive and misery spreading as a slot machine.

Sharing data with your friends, something the internet should be used for, is more difficult without companies like this, but it can be done; Diasporia, Riot/Matrix.org, Mastodon, Telegram, Signal all exist and I encourage people who need this sort of thing to use them.  People who want to keep my contact information in a handy place; use linkedin (which isn’t as obnoxious or time wasting as FB, but is still obnoxious), or find me here.



Even examining FB on their merits as a business: the ads they’ve served me have been a joke from the beginning. “Become a Physics Teacher” was an early and hilarious regular one. I’m pretty sure my Ph.D. in that subject (which was in my profile) qualifies me for such a job without any additional training.  Subsequent ones have been similarly ridiculous; they serve me ads for dishwasher soap (don’t own a dishwasher), money for “refugees” (sorry, I’ve read “Italy and her Invaders” and know how this story ends), NBA (don’t care about sportsball), potato chips (make my own) and various objects I’ve already purchased on the internet, generally from the same company serving a facebook ad. The one overt ad I clicked on in my entire FB career was for a home CRISPR kit, and I didn’t buy it.

These ads are annoying in that they are incorrect, but they’re also annoying in that FB is tracking my browsing in sites that have nothing to do with FB activities. It also offends my engineering sensibilities that Amazon or ebay pays FB for a display ad for stuff they know I have already purchased.  Yes, I understand why this happens: their purchase database doesn’t talk to the ad server, and yes, Amazon can afford to do this, but why should FB get paid even a penny CPM for this? There is also compelling evidence their click traffic is mostly fake. Weird things certainly happen when my non-secure browser window is open to a FB tab; I wouldn’t put it past them. We also know unambiguously that their metrics are science fiction.

If you want to follow me into the unFBing abyss; a checklist for you.

  • For normies who use phone-apps that rely on FB for identity; fix that first. Since I have never and will never do this, it wasn’t a consideration for me; best of luck.
  • Download your data if you want it for something. I did. Some of the links and photos will be amusing later. Some of this data may be useful in the event that some kind hearted software engineer actually create a useful decentralized social network which doesn’t treat its users as cattle to be exploited.
  • Delete your data. They make it really hard to do this, which is one of the reasons I don’t want to persist in using their shitty software. It’s also really hard to get at old data, and their reminders of what  thing I said or did 4 years ago are not helpful. I wanted to use this  and this to help assist in doing so, but they were flakeypants. So I moved on to:
  • Delete your account. Supposedly it will be fully deleted from backups and such in a couple of months. I think EU regulations require a hard delete,  but it isn’t in their T&C. You will get a hilariously misformatted message like this


BR BR BR!!!!

Next up: getting google out of my life as well.


Review and summary of Wu’s book