Locklin on science

Obvious and possible software innovations nobody does

Posted in tools by Scott Locklin on April 1, 2021

There are a number of things that people theoretically know how to do, but which aren’t possible because of how software gets made. Some of these are almost forgotten, but there are at least examples of all of them in existence.

  1. Automated FFI parsers. In 2021 I should be able to point any interpreted language at a C include file and have all the functions described in it turned into reasonably safe FFIed function calls, complete with autogenerated documentation. For example, if I want javascript calls to libsodium, I shouldn’t have to write anything; javascript knows about C APIs. I’m not asking for runtimes to talk to each other, you can keep up the insipid RPC-serialization conga dance for that. I’m just asking for a technology that encapsulates C (and Fortran and …. maybe C++) function calls and makes them accessible to other runtimes without actually doing any work. Of course parsers that do useful things are hard; people would rather write new serialization protocols. There will always be exceptions where such things don’t work, but you should be able to do 95% of the work using metaprogramming. Crap that runs on the JVM; same story -not only could you technically parse .h files and turn them into JNI, you should be able to have all your hooks into Clojure or Scala or whatever without writing anything. Clojure at least seems well equipped to do it, but I’m pretty sure this hasn’t happened yet. You see pieces of this idea here and there, but like everything else about modernity, they suck.
  2. While I’m talking about FFIs to high level languages, how about a VM that recognizes that it is not a unique snowflake, and that sometimes you have to call a function which may allocate memory outside its stack or something similarly routine but insane. Most VM designs I’ve seen are basically just student exercises; why not assume the outside world exists and has useful things to say? I think Racket has some good ideas in this domain, but I’m pretty sure it could be done better and there should be a higher standard.
  3. Cloud providers should admit they’re basically mainframes and write an operating system instead of the ad-hoc collection of horse shit they foist on developers. Imagine if the EC2 were as clean as, I dunno, z/OS, which has more or less been around since the 1960s. That would be pretty cool. I could read a single book instead of 100 books on all the myriad tools and services and frameworks offered by Oligarch Bezos. He would be hailed as a Jobs-like technical innovator if he had some of his slaves do this, and he would be remembered with gratitude, rather than as the sperdo who dumped his wife for sexorz with lip filler Cthulhu. There’s no excuse for this from an engineering perspective; Bezos was smart enough to know he was going to do timesharing, he was also smart enough to constrain the spaghetti into something resembling an OS. Same story with all the other cloud services. Really, they should all run like Heroku and you’d never notice they were there. You could also draw flowcharts for most of this shit and replace devops with something that looks like labview. Nobody will do that either, as innovation in core software engineering, or even learning from the past in core software engineering is basically dead.
  4. Front ends could be drag and drop native GUIs instead of electron apps. There are still examples of this around, but it seems to be a dying paradigm. It’s fascinating to me that people find it easier to write a pile of React and HTML on top of electron rather than dragging and dropping native widgets for a framework like we did in the old days. Literally this was possible on a 286 PC/XT running DOS; it worked great, looked great, had fewer problems. You know why it doesn’t get done? Because doing it is kind of hard, and electron apps are “easy” in that there are tons of cheap, fungible engineers with those skills.  In general native GUI frameworks are shit and they almost never include a GUI to develop them in. Even if you made something not as shitty as electron; maybe something that took 10mb instead of 500mb and didn’t gobble up all memory on your system that would be amazing. This is completely possible. People used to make GUI frameworks which did more than electron apps, looked better and fit in the tens of kilobytes range.
  5. Compilers and interpreters should learn how modern computers work. Pretty much all compilers and interpreters think computers are a PDP-11 stack machine. There are consequences to this everyone knows about: security is fairly execrable. There’s other consequences though! For example, the fact that memory is godawful slow and there are multiple cache speeds is a very serious performance problem unless you’re dealing with trivial amounts of memory. There are no compilers which can help you with this, unless you count meta-compilers on limited problems like ATLAS-BLAS or FFTW. There are a few interpreted languages whose designers have awareness of this and at least don’t fight the OS over these facts, or attempt to insist they’re really running on a PDP-11.
  6. Operating systems don’t have to look like your crazy hoarder aunt’s house. I know it’s hard to believe, but in my lifetime there were excellent multitasking operating systems with superior GUIs, networking, development toolchains, RTOS subsystems, cryptography that made the NSA nervous, and they all fit on a 70mb tape drive, and they would support something like 20 people checking their email and compiling Fortran for general relativity calculations from emacs terms. Meanwhile, my phone needs a constant diet of gigabyte upgrades to continue functioning reliably as a fucking telephone; telephones theoretically don’t even need a single transistor. Even my linux machines are ridiculously bloated and seem to require daily updates and patches. Why does shit like DPDK exist? Because your OS is stuck in the 1990s when ethernet was 10mbps. There’s zero reason or excuse for this, other than modern programmers are like your crazy hoarder aunt because storage is cheap and competent coder time is expensive. Clean OS design has a lot of follow on benefits, such as rare patching, higher security and lower maintenance in general. I have 4 objects in my house who require constant OS  upgrades (used to be 5, but my macbook committed suicide after an “OS upgrade” so I now use it as a paperweight), not including my TV or my car; make a cleaner OS and life actually gets better instead of everyone being a sort of shitty IT slave to keep their refrigerator and telephone running. Instead of a nice OS, current year innovation  is the open source “code of conduct” -apparently hoping you’ll attract enough people mentally ill enough to work for free, but sane enough to do useful work; arguably a narrow demographic.

The funny thing is, the same people who absolutely insist that the Church Turing thesis means muh computer is all-powerful simulator of everything, or repeat the fantasy that AI will replace everyone’s jobs will come up with elaborate reasons why these things listed above are too hard to achieve in the corporeal world, despite most of them being solved problems from the VLSI era of computer engineering. The reality is they’re all quite possible, but nobody makes money doing them. Engineers are a defeated tribe; it’s cheaper to hire an “AI” (Alien or Immigrant) slave to write the terraform or electron front end rather than paying clever engineers well enough to build themselves useful tooling to make them more productive and the world a better place. Consumers will suck it up and buy more memory, along with planned obsolescence, keeping the hardware industry in business. Computers aren’t for making your life easier; they’re for surveillance and marketing, and for manufacturers a consumer good they hope you buy lots of add-ons and  upgrades for, and which wears out as soon as possible.

Woo for its own sake

Posted in Design, tools by Scott Locklin on January 8, 2021

Software development is a funny profession. It covers people who do stuff ranging from register twiddling in device drivers and OS guts, to people who serve web content, to “big data” statisticians, to devops infrastructure, to people who write javascript and html front ends on electron apps. To a certain extent, software engineering is grossly underpaid. If software engineers were allowed to capture more of the value they create, we’d have vastly fewer billionaires and more software engineers with normal upper middle class lifestyles, such as houses owned in the clear and successful reproductive lifecycles. The underpaid are often compensated in self esteem.

By “compensated in self esteem” I don’t mean they have high self esteem; I mean the manager saying “dude yer so fookin smart brah” kind. This is the same brainlet payment system in place in the present day “hard sciences” with people writing bullshit papers nobody cares about, or, like, journalists and other “twitter activists” who believe themselves to be intellectual workers rather than the snitches and witch hunters they actually are. Basically, nerd gets a pat on the head instead of a paycheck.

Once in a while, independent minded programmers demand more. They may or may not be “so fookin smart,” but they think they are. Their day jobs consist of unpleasant plumbing tasks, keeping various Rube Goldberg contraptions functioning and generally eating soylent and larva-burgers and claiming to like it. As such, most programmers long to do something fancy, like develop a web server based on Category Theory, or write a stack of really cool lisp macros for generating ad server callbacks, or add some weird new programming language of dubious utility to an already complex and fragile stack.

Allowing your unicycle-riding silver pants mentat to write the prototype in Haskell to keep him from getting a job at the Hedge Fund may make some HR sense. But if you’re going to rewrite the thing in Java so a bunch of offshore midwits can keep it running, maybe the “adulting” thing to do is just write it in Java in the first place.

I’m not shitting on Haskell in particular, though there is an argument to be made for looking askance at using it in production. Haskell is mostly a researchy/academicy language. I don’t know, but I strongly suspect its run of the mill libraries dealing with stuff like network and storage is weak and not fully debugged. Why do I suspect this? In part from casual observation, but also from sociology. Haskell is a fancy language with people doing fancy things in it. One of the valuable things about popular but boring languages is that the code has been traversed many times, and routine stuff you’re likely to use in production is probably well debugged. This isn’t always true, but it’s mostly true. The other benefit to boring languages is people concentrate on the problem, rather than the interesting complexities of the language itself.

You see it in smaller ways too; people who feel like every line of code has to be innovative: new elliptic curves, new network protocols, new block ciphers, new ZNP systems; to a crucial money oriented application that would have been really cool and have a much smaller attack surface if you had bestowed only one innovation on it. I guess this sort of thing is like bike-shedding or Yak-shaving, but it’s really something more perverse. If you have a job doing shit with computers, you are presumably solving real world problems which someone pays for. Maybe, you know, you should solve the problem instead of being a unicycle riding silver pants juggling chainsaws.

You see a lot of it in the cryptocurrency community, in part because there is enough money floating around, the lunatics are often running the asylum, in part for its undeserved reputation as being complicated (it’s just a shared database with rules and checksums; Bram more or less did the hard part in the summer of 2000 while my buddy Gerald was sleeping on his couch). For example: this atrocity by Gnosis. Gnosis is an interesting project which I hope is around for a long time. They’re doing a ton of very difficult things. Recently they decided to offer multi-token batch auctions. Why? I have no freaking idea. It’s about as necessary and in demand as riding to work in silver pants on a unicycle. Worse though: from an engineering perspective, it involves mixed integer programming, which is, as every sane person knows, NP-hard.

This is a danger in putting software developers or programmers in charge. These guys are often child-like in their enthusiasm for new and shiny things. Engineers are different: they’re trying to solve a problem. Engineers understand it’s OK to solve the problem with ephemeral, trashy, but fast-to-market solutions if the product manager is going to change it all next week. Engineers also plan for the future when the software is critical infrastructure that lives and fortunes may depend on. Engineers don’t build things that require mixed integer programming unless it’s absolutely necessary to solve a real world problem. If they juggle on unicycles, they do it on their own time; not at work.

Consider an engineering solution for critical infrastructure from a previous era; that of providing motive power for small fishing boats. Motors were vastly superior to sail for this task. In the early days of motorized fishing, in some cases until fairly recently, there was no radio to call for help if something goes wrong. You’re out there in the vastness on your own; possibly by yourself, with nothing but your wits and your vessel. There’s probably not much in the way of supply lines when you’re at shore either. So the motors of the early days were extremely reliable. Few, robust moving parts, simple two stroke semi diesel operation, runs on any fuel, requires no electricity to start; just an old fashioned vaporizing torch which runs on your fuel; in a pinch you could start a fire of log books. You glance at such a thing and you know it is designed for robust operation. Indeed the same engines have been used more or less continuously for decades; they only turn at 500 rpm, and drive the propeller directly rather than through a gearbox.

Such engines are useful enough they remain in use to this day; new ones of roughly this design are still sold by the Sabb company in Norway. They’re not as environmentally friendly or fuel efficient as modern ones (though close in the latter measure), but they’re definitely more reliable where it counts. When you look at this in the engine room, you are filled with confidence Mr. Scott will keep the warp drives running. If you find some jackass on a unicycle back there (who will probably try to stick a solar powered Sterling engine in the thing), maybe not so much.

I don’t think long term software engineering looks much different from this. Stuff you can trust looks like a giant one-piston semidiesel. You make it out of well known, well traversed and well tested parts. There are a couple of well regarded essays on the boringness yet awesomeness of golang. Despite abundant disagreement I think there is a lot to that. Nobody writes code in golang because of its extreme beauty or interesting abstractions. It is a boring garbage collected thing that looks like C for grownups, or Java not designed by 90s era object oriented nanotech fearing imbeciles. I think it bothers a lot of people that it’s not complicated enough. I’m not shilling for it, but I think anyone who overlooks it for network oriented coding because it’s boring or they think it’s “slow” because it doesn’t use functors or borrow checkers or whatever is a unicycle riding idiot though. Again looking at blockchain land; Geth (written in golang) has mostly been a rock, where the (Rust) Parity team struggles to maintain parity with feature roll outs and eventually exploded into multiple code bases the last time I checked. There’s zero perceptible performance difference between them.

There’s a Joel Spolsky on (Peter Seibel interview with) JWZ which I always related to on complexification of the software process:

One principle duct tape programmers understand well is that any kind of coding technique that’s even slightly complicated is going to doom your project. Duct tape programmers tend to avoid C++, templates, multiple inheritance, multithreading, COM, CORBA, and a host of other technologies that are all totally reasonable, when you think long and hard about them, but are, honestly, just a little bit too hard for the human brain.

Sure, there’s nothing officially wrong with trying to write multithreaded code in C++ on Windows using COM. But it’s prone to disastrous bugs, the kind of bugs that only happen under very specific timing scenarios, because our brains are not, honestly, good enough to write this kind of code. Mediocre programmers are, frankly, defensive about this, and they don’t want to admit that they’re not able to write this super-complicated code, so they let the bullies on their team plow away with some godforsaken template architecture in C++ because otherwise they’d have to admit that they just don’t feel smart enough to use what would otherwise be a perfectly good programming technique FOR SPOCK. Duct tape programmers don’t give a shit what you think about them. They stick to simple basic and easy to use tools and use the extra brainpower that these tools leave them to write more useful features for their customers.

I don’t think this captures the perverseness and destructiveness of people who try to get fancy for no reason, nor do I think JWZ was a “duct tape programmer” -he was an engineer, and that’s why his products actually shipped.

I say this as an aficionado of a couple of fancy and specialized languages I use on a regular basis. I know that it is possible to increase programmer productivity through language choice, and often times, runtime performance really doesn’t suffer. Languages like OCaML, APL and Lisp have demonstrated that small teams can deliver complex high performance software that works reliably. Delphi and Labview are other examples of high productivity languages; the former for its amazing IDE, and the latter for representing state machines as flow charts and providing useful modules for hardware. The problem is that large teams probably can’t deliver complex high performance software that works reliably using these tools. One also must pay a high price up front in learning to deal with them at all, depending on where you come from (not so much with Labview). From a hiring manager or engineer’s perspective, the choice to develop in a weird high productivity language is fraught. What happens if the thing crashes at 4 in the morning? Do you have enough spare people someone can be raised on the telephone to fix it? What if it’s something up the dependency tree written by an eccentric who is usually mountaineering in the Alps? For mission critical production code, the human machine that keeps it running can’t be ignored. If your mentat gets hit by a bus or joins the circus as a unicycle juggler and the code breaks in production you’re in deep sheeyit. The idea that it won’t ever break because muh technology is retarded and the towers of jelly that are modern OS/language/framework stacks are almost without exception going to break when you update things.

 

The “don’t get fancy” maxim applies in spades to something like data science. There are abundant reasons to just use Naive Bayes in production code for something like sentiment analysis. They’re easy to debug and they have a trivial semi-supervised mode using the EM algorithm if you’re short of data. For unsupervised clustering or decomposition it’s hard to beat geometric approaches like single-linkage/dbscan or PCA. For regression or classification models, linear regression is pretty good, or gradient boost/random forest/KNN. Most of the time, your real problem is shitty data, so using the most accurate  tool is completely useless.

Using the latest tool is even worse. 99 times out of 100, the latest woo in machine learning is not an actual improvement over existing techniques. 100% of the time it is touted as a great revolution because it beat some other technique … on a carefully curated data set. Such results are trumpeted by the researcher because …. WTF else do you expect them to do? They just spent a year or two developing a new technique; the professor is trying to get tenure or be a big kahuna, and the student is trying to get a job by being expert in the new technique. What are they going to tell you? That their new technique was kind of dumb and worthless?

I’ve fallen for this a number of times now; I will admit my sins. I fooled around a bit with t-SNE while I was at Ayasdi, and I could never get it to do anything sane. I just assumed I was a moron who couldn’t use this advanced piece of technology. No, actually, t-SNE is kind of bullshit; a glorified random number generator that once in a while randomly finds an interesting embedding. SAX looked cool because it embodied some ideas I had been fooling around with for almost a decade, but even the author admits it is horse shit. At this point when some new thing comes along, especially if people are talking about it in weeb-land forums, I pretty much ignore it, unless it is being touted to me by a person who has actually used it on a substantive problem with unambiguously excellent results. Matrix profiles looks like one of these; SAX dude dreamed it up, and like SAX, it appears to be an arbitrary collection of vaguely common sense things to do that’s pretty equivalent to any number of similar techniques dating back over the last 40 years.

There are innovations in data science tools. But most of them since boosting are pretty marginal in their returns, or only apply to corner cases you’re unlikely to encounter.  Some make it easier to see what’s going on, some find problems with statistical estimators, but mostly you’re going to get better payoff by getting better at the basics. Everyone is so in love with woo, the guy who can actually do a solid estimate of mean differences is going to provide a lot more value than the guy who knows about the latest PR release from UC Riverside.

Good old numerical linear algebra, which everyone roundly ignores, is a more interesting subject than machine learning in current year.  How many of you know about using CUR decompositions in your PCA calculations? Ever look at some sloppy PCA and wonder which rows/columns produced most of the variance? Well, that’s what a CUR decomposition is. Obviously looking at the top 3 most important of each isn’t going to be as accurate as looking at the regular PCA, but it sure can be helpful. Nuclear Norm and non-negative matrix factorizations all look like they do useful things. They don’t get shilled; just quietly used by engineering types who find them helpful.


I’m tooling up a small machine shop again, and it makes me wonder what shops for the creation of physical mechanisms would look like if this mindset were pervasive. The archetypical small shop has always had a lathe in it. Probably the first thing after you get tired of hacksawing up material; a bandsaw or powered hacksaw. Small endmill, rotary sharpener, and you’re off to the races; generally building up more tooling for whatever steam engines, clocks or automatons you feel like building. I’m imagining the archetypical unicycle-juggler buying a shop full of solid printers and weird CNC machines and forgetting to buy cutters, hacksaws, files and machinist squares. As if files and machinist squares are beneath them in current year.

Just as good alternatives to big-five theories of personality

Posted in five minute university, models by Scott Locklin on December 24, 2020

It is a source of irritation to me that there exists ridiculously worthless and wrong psychological models in widespread use. Big five sends me into dangerous blood pressure levels. It’s preposterous and obviously only says something about the obsessions of the WIERD substrate it allegedly applies to, more than it says anything about the diversity of personality among human beings. When I say big-five is, worthless I don’t only mean it only applies to WIERD people, though that’s observably true; I mean it pertains to states of mind rather than permanent characteristics. It also is pretty worthless in predicting behavior, which is the only useful thing about psychometrics. I don’t care what people are feeling like when they take a test unless that maps directly onto long term behavioral patterns. Otherwise, it’s just checking in; “hey how you doin’ today?”

Five factor tests are essentially bags of words that respondents are asked to agree or disagree with. The assumption is that the bag of words form a basis set for describing human personalities. I have no doubts that they cluster very well under linear regression at least on WEIRD personalities. The problem is such models don’t have much explanatory power in explaining actual human psychological variance. 

Self testing, my results are all over the map. For example I took the thing and got this, this afternoon:

Addressing them one by one: for an extrovert, I surely do spend a lot of time by myself. I’m funny and do well at parties, but my natural set point is sitting on a mountain somewhere with a book. I’ll cop to “emotional stability” in that I’m fairly unflappable, though at various times in my life I was probably pretty neurotic. Locklin the disagreeable? Certainly I don’t suffer fools gladly. I’m also the dickhead who checks in on people to make sure they’re doing OK and who notices when they’re not; disagreeable people don’t do that. Conscientious; whatever -totally varies over time there are multiple 5 year periods of my life where I did nothing but chase women and drink heavily. I do usually pick things up off the floor, and go through vast map-reduce phases of gather/sort, though sometimes my desk looks like a junk pile.  Intellect/Imagination aka “Openness” -this one is most hilarious of all. It’s true, I revel in matters of the mind, I enjoy travel, art and I like messing with new ideas. While I’m fairly creative in my thinking, I’m also extremely traditional in my thinking: something that doesn’t compute with psychologists, who obviously don’t read much history or know who Ezra Pound or LeMaitre was. Or, for that matter Freeman Dyson or Heisenberg or Mendel or Celine or  Ernst Junger or Dali …. the list is endless -particularly among artistic and scientific giants. None of this is capable of predicting, say, who I voted for in the last election, or how likely I am to check in on the nice old lady upstairs. It’s just a bunch of shaggy dog stories and stereotypes about self regarding white college students in America in the mid to late 20th century.

another bad model mapped onto other cultures

I think pretty much anything is better than this; for example, the Hippocrates theory that men come in Phlegmatic, Choleric, Sanguine and Melancholic flavors is obviously better from a behavioral point of view, as they relate to how people behave. I don’t think those clusters map onto anything real, but I know people who exemplify all of these archetypes. Particularly people in Latin countries, more or less where the idea originated in ancient times.

There is also the Japanese blood type personality test. I only know a few Japanese people, and only well enough to know they take this idea seriously. I know that the English language wiki on the subject dismisses it as superstition, where the wiki link on big-five is treated with gaping credulity, and that seems to me, well, rather culturally insensitive. I’m willing to bet Japanese blood personality is more real and possibly more useful in Japan than big-five is in the US.

There are many things that matter which five-factor tests are completely blind to, for example: energy level. Some people vibrate with energy and enthusiasm. It has nothing to do with *any* of the five factors. It probably has something to do with thyroid activity and physical fitness. Dominance -some people dominate the room, and some have to be in charge otherwise they lose their shit; others go with the flow. Secretiveness; some people are not particularly forthcoming and you have no idea what they’re up to; they may even become anxious if you pry. They’re not necessarily up to anything shady, that’s just how some people are. Spooks love hiring such people. Curiosity: some people are curious about all kinds of things; other people really like sports or whatever fills up their hours.  Curious people tend to make better scientists, engineers, mechanics and detectives. Sociopathy; imagine you forgot to look for this in a life partner or cofounder -five factor doesn’t think it’s of any importance at all, because muh factors. Self reliance: some people don’t like getting help from others, other people seem to enjoy being dependent parasites. Character;  some people do as they say and say as they do. According to the five factor model, character has something to do with cleaning your room, or how likely you are to execute on a plan. Well, I’m here to tell you these are completely unrelated traits. There are deceptive, evil assholes who clean their rooms and can execute plans well, and people of the absolute highest character who live like slobs and are disorganized and lazy. Courage: some people don’t mind having grenades thrown at them all day; others wet the bed at the idea of walking around in the woods by themselves without a covid diaper on their face. Thrill seeking: some people may or may not be courageous, but seek sensory stimulation; others prefer a boring life and purchase lots of insurance. Beyond that: impulsivity is a trait many display, and others do not. You may be impulsive, a physical coward and thrill seeking: people like this exist -you meet them all the time. Five-factor will simply lump them all in with other unrelated populations of people such as one encounters on college campuses and in the clerical jobs they mostly matriculate to later. All of these are absolutely critical to people’s self conception and how they behave in the actual observable world. Modern psychology pretty much ignores them.

I think Cattell’s 16 factor test might measure more important things. However whenever I take the thing I always get a bullseye. Does this mean I have no personality, or does it mean it doesn’t measure my personality well? I think it might be a good start from a behavioral point of view, but it seems to be fairly unpopular among psychologist types. Cattell of course started out with training in the physical sciences, which is why he presumably thinks like me; wanting to make maps to observable behaviors.

Minnesota Multiphasic Personality Inventory MMPI is an old spook developed thing more or less designed to ascertain how fucked up you are. I think it’s reasonably useful for filtering out WEIRD types who might be mentally ill, or, like, evil, and things like it should probably be more widely used. This despite the fact that, in America anyway, the prevalence of personality disorders is approaching 10%. Seems useful to me even if you can only catch half of them. Tolerance of crazy and evil people is one of the worst things about modernity.

Myers Briggs I do not consider a better model; it’s astrology tier. Nobody else seems to take it seriously either, except for the people who sell the tests, and the credulous people who pass them around because they’re fun. There are other crummy ones out there; one is called DISC, and it seems to be universally reviled by academic psychology researchers, despite it being invented by the creator of Wonder Woman. I don’t know why they hate it so much; doesn’t seem much worse than five factor -maybe oriented towards winnowing out people who might be good at sales, which, unlike five-factor, is at least an ambition to be useful to somebody. Also inventing Wonder Woman is pretty cool.

Psychology is mostly a profoundly silly basket of shaggy dog stories masquerading as a serious subject; it gets sillier by the decade. The five factor test is one of the tools the psychologists seem most proud of, but it’s really just a demonstration of how intellectually bankrupt they are. Anyone who has actually understood the linear regression tool knows you can have five “good” factors and understand absolutely nothing about how the universe works. After all, butter production in Bangladesh, US cheese production and sheep population in the US and Bangladesh is an absolutely superb three factor model for the S&P500 [Leinweber’s famous PDF]. Since these mere three factors explain 99% of the variance in the S&P500, isn’t this a better model than five-factor?

We laugh at the idea that sheep, cheese and butter predict the S&P500, then credulously accept the idea that psychologists have some how nailed it with the five factor model because “muh variance” on some arbitrary data set of a ridiculously censored population sample. It’s not that I don’t think studying human behavior is interesting; it is one of the most interesting subjects there is. It’s just that psychological researchers are a bunch of doofuses.

Data is not the new oil: a call for a Butlerian Jihad against technocrat data ding dongs

Posted in econo-blasphemy, machine learning, Progress by Scott Locklin on November 5, 2020

I tire of the dialog on “big data” and “AI.” AI is an actual subject, but as used in marketing and press releases and in the babbling by ideologues and think tank dipshits, the term is a sort of grandiose malapropism meaning “statistics and machine learning.” As far as I can tell “big data” just means the data at one point lived in something other than a spreadsheet.

 “BigDataAI” ideology is a continuation of the program of the technocratic managerial “elite.” To those of you who are unfamiliar with the work of James Burnham, there is a social class of technocratic “experts” have largely taken over the workings of society in the West; a process which took place in the first half of the 20th century. While there have always been bureaucrats in civilized societies, the ones since around the time of Herbert Hoover have aped “scientific” solutions even where no such thing is actually possible. This social class of bureaucrats has had some mild successes; the creation of the American highway system, public health initiatives against trichinosis, US WW-2 production. But they have mostly discredited themselves for decades: aka the shitty roads in America, the unaffordable housing in major urban centers, a hundred million fat diabetics, deindustrialization because muh free market reasons, the covidiocy and most recently, the failure of every noteworthy technocrat in the world’s superpower to predict election outcomes and even its ability to honestly count its votes. Similar social classes interested in central planning also failed spectacularly in the Soviet Union, and led to the cultural revolution in China. There are reasons both obvious and deep as to why these social classes have failed.

The obvious reason is that mandarinates are inherently prone to corruption when there are no consequences for their failures. Bureaucrats are  wielders of power and have the extreme privilege of collecting a pension on the public expense. Various successful cultures had different ways of keeping them honest; the Prussians and pre-Soviet Russian bureaucrats recruited from honor cultures. Classical China and the early Soviets did it  via fear. The Soviet Union actually worked pretty well when the guys from Gosplan could be sent to the Gulag for their failings (or because Stalin didn’t like their neckties -keeps them on their toes). It progressively fell apart as it grew more civilized; by the 1980s, nobody was afraid of the late night knock on the door, and the Soviet  system fell apart when the US faked like it was going to build ridiculous space battleships. The rise of China has largely been the story of bureaucratic reforms by Deng where accountability (and vigorous punishment for malefactors) were the order of the day. Singapore makes bureaucrats meet regularly with their constituents; seems reasonable -don’t know why every society doesn’t make this a requirement. It is beyond question the American equivalent of the Gosplan mandiranate is almost unimaginably corrupt at this point, and the country is falling apart as a result. 

While it gives policy-makers a sense of agency having a data project, consider that there isn’t a single large scale data project beyond the search engine that has improved the lives of human beings. Mind you, the actual civilizational utility of the search engine is highly questionable. What improvement in human living standards has come of the advent of google in the last 20 years? The only valuable content on the internet is stuff made by human beings. Google effectively steals or destroys most of the revenue of content creators who made the stuff worth looking at in the first place. Otherwise, library science worked just fine without blue haired Mountain View dipshits running SVD on a link graph. INSPEC (more or less; dmoz for research) is 120 years old and is still vastly better for research than google scholar. Science made more progress then between 1898 and 2005 or so when google more or less replaced it: and the news wasn’t socially toxic clickfarming idiocy back when the CIA censored the  news instead of google komissars with facial piercings. These days google even sucks at being google; I generally have more luck with runaroo or just going directly to things on internet archive.

If “AIBigData” were so wonderful, you’d see its salutary effects everywhere. Instead, a visit to the center of these ideas, San Francisco is a visit to a real life dystopia.There are thousands of data projects which have made life obviously worse for people. Pretty much all of nutrition and public health research post discovery of vitamins, and statisticians telling people not to drink toilet water is worthless or actively harmful (look at all those fat people waddling around). Most biomedical research is false, and most commonly prescribed drugs are snake oil or worse. Various “pre-crime” models used to justify setting bail or prison sentences are an abomination. The advertising surveillance hellscape we’ve created for ourselves is both aesthetically awful and a gigantic waste of time. The intelligence surveillance hellscape we’ve created mostly keeps its crimes secret, and does nothing obviously helpful. Annoying advertising invading every empty space; I don’t want to watch ads to pump gas or get money from my ATM machine.  Show me something good these dorks have done for us; I’m not seeing it. Most of it is moronic overfitting to noise, evil or both.

It’s less obvious but can’t be stated often enough: often “there is no data in your data.” The technocracy’s mathematical tools boil down to versions of the t-test being applied to poorly sampled and/or heteroskedastic data where they may not be meaningful. The hypothesis under test may not have a meaningful null no matter how much data you collect. When they talk about “AI” I think it’s mostly aspirational; a way out of heteroskedasticity and actual randomness. It’s not; there are no “AI” t-tests in common use by these knuckleheads, and if there were, the upshot wouldn’t look that much different from 1970s era stats results. When they talk about big data, they don’t talk about \frac{1}{\sqrt{n}}, or issues like ROC curves and bias variance tradeoff. They certainly never talk about data which is heteroskedastic or simply random, which is most of it. 

In reality, data collection is mostly useless. In intelligence work, in marketing, political work: most of it is completely useless, and collecting it and acting on it is a sort of cargo cult for DBAs, cloud computing saleslizards, technocratic managerial nerds, economists, Nate Silver and other such human refuse. Once in a while it pays off. More often, the technocrat will take credit when things go his way and make complicated excuses when they don’t; just look at Nate Silver’s career for example; a clown with a magic 8-ball.  There’s an entire social class of “muh science” nerds who think it a sort of moral imperative to collect and act on data even if it is obviously useless. The very concept that their KPIs and databases might be filled with the sheerest gorp …. or that you might not be able to achieve marketing uplift no matter what you do… doesn’t compute for some people. 

Technocratic data people are mostly parasitic vermin and their extermination, while it would cut into my P/L, would probably be good for society. At the very least we should make their salaries proportional to (1- Brier) scores; that will require them to put error bars on their predictions, reward the competent and bankrupt the useless. Really though, they should all be sent to Idaho to pick potatoes. Or ….