Locklin on science

Obvious and possible software innovations nobody does

Posted in tools by Scott Locklin on April 1, 2021

There are a number of things that people theoretically know how to do, but which aren’t possible because of how software gets made. Some of these are almost forgotten, but there are at least examples of all of them in existence.

  1. Automated FFI parsers. In 2021 I should be able to point any interpreted language at a C include file and have all the functions described in it turned into reasonably safe FFIed function calls, complete with autogenerated documentation. For example, if I want javascript calls to libsodium, I shouldn’t have to write anything; javascript knows about C APIs. I’m not asking for runtimes to talk to each other, you can keep up the insipid RPC-serialization conga dance for that. I’m just asking for a technology that encapsulates C (and Fortran and …. maybe C++) function calls and makes them accessible to other runtimes without actually doing any work. Of course parsers that do useful things are hard; people would rather write new serialization protocols. There will always be exceptions where such things don’t work, but you should be able to do 95% of the work using metaprogramming. Crap that runs on the JVM; same story -not only could you technically parse .h files and turn them into JNI, you should be able to have all your hooks into Clojure or Scala or whatever without writing anything. Clojure at least seems well equipped to do it, but I’m pretty sure this hasn’t happened yet. You see pieces of this idea here and there, but like everything else about modernity, they suck.
  2. While I’m talking about FFIs to high level languages, how about a VM that recognizes that it is not a unique snowflake, and that sometimes you have to call a function which may allocate memory outside its stack or something similarly routine but insane. Most VM designs I’ve seen are basically just student exercises; why not assume the outside world exists and has useful things to say? I think Racket has some good ideas in this domain, but I’m pretty sure it could be done better and there should be a higher standard.
  3. Cloud providers should admit they’re basically mainframes and write an operating system instead of the ad-hoc collection of horse shit they foist on developers. Imagine if the EC2 were as clean as, I dunno, z/OS, which has more or less been around since the 1960s. That would be pretty cool. I could read a single book instead of 100 books on all the myriad tools and services and frameworks offered by Oligarch Bezos. He would be hailed as a Jobs-like technical innovator if he had some of his slaves do this, and he would be remembered with gratitude, rather than as the sperdo who dumped his wife for sexorz with lip filler Cthulhu. There’s no excuse for this from an engineering perspective; Bezos was smart enough to know he was going to do timesharing, he was also smart enough to constrain the spaghetti into something resembling an OS. Same story with all the other cloud services. Really, they should all run like Heroku and you’d never notice they were there. You could also draw flowcharts for most of this shit and replace devops with something that looks like labview. Nobody will do that either, as innovation in core software engineering, or even learning from the past in core software engineering is basically dead.
  4. Front ends could be drag and drop native GUIs instead of electron apps. There are still examples of this around, but it seems to be a dying paradigm. It’s fascinating to me that people find it easier to write a pile of React and HTML on top of electron rather than dragging and dropping native widgets for a framework like we did in the old days. Literally this was possible on a 286 PC/XT running DOS; it worked great, looked great, had fewer problems. You know why it doesn’t get done? Because doing it is kind of hard, and electron apps are “easy” in that there are tons of cheap, fungible engineers with those skills.  In general native GUI frameworks are shit and they almost never include a GUI to develop them in. Even if you made something not as shitty as electron; maybe something that took 10mb instead of 500mb and didn’t gobble up all memory on your system that would be amazing. This is completely possible. People used to make GUI frameworks which did more than electron apps, looked better and fit in the tens of kilobytes range.
  5. Compilers and interpreters should learn how modern computers work. Pretty much all compilers and interpreters think computers are a PDP-11 stack machine. There are consequences to this everyone knows about: security is fairly execrable. There’s other consequences though! For example, the fact that memory is godawful slow and there are multiple cache speeds is a very serious performance problem unless you’re dealing with trivial amounts of memory. There are no compilers which can help you with this, unless you count meta-compilers on limited problems like ATLAS-BLAS or FFTW. There are a few interpreted languages whose designers have awareness of this and at least don’t fight the OS over these facts, or attempt to insist they’re really running on a PDP-11.
  6. Operating systems don’t have to look like your crazy hoarder aunt’s house. I know it’s hard to believe, but in my lifetime there were excellent multitasking operating systems with superior GUIs, networking, development toolchains, RTOS subsystems, cryptography that made the NSA nervous, and they all fit on a 70mb tape drive, and they would support something like 20 people checking their email and compiling Fortran for general relativity calculations from emacs terms. Meanwhile, my phone needs a constant diet of gigabyte upgrades to continue functioning reliably as a fucking telephone; telephones theoretically don’t even need a single transistor. Even my linux machines are ridiculously bloated and seem to require daily updates and patches. Why does shit like DPDK exist? Because your OS is stuck in the 1990s when ethernet was 10mbps. There’s zero reason or excuse for this, other than modern programmers are like your crazy hoarder aunt because storage is cheap and competent coder time is expensive. Clean OS design has a lot of follow on benefits, such as rare patching, higher security and lower maintenance in general. I have 4 objects in my house who require constant OS  upgrades (used to be 5, but my macbook committed suicide after an “OS upgrade” so I now use it as a paperweight), not including my TV or my car; make a cleaner OS and life actually gets better instead of everyone being a sort of shitty IT slave to keep their refrigerator and telephone running. Instead of a nice OS, current year innovation  is the open source “code of conduct” -apparently hoping you’ll attract enough people mentally ill enough to work for free, but sane enough to do useful work; arguably a narrow demographic.

The funny thing is, the same people who absolutely insist that the Church Turing thesis means muh computer is all-powerful simulator of everything, or repeat the fantasy that AI will replace everyone’s jobs will come up with elaborate reasons why these things listed above are too hard to achieve in the corporeal world, despite most of them being solved problems from the VLSI era of computer engineering. The reality is they’re all quite possible, but nobody makes money doing them. Engineers are a defeated tribe; it’s cheaper to hire an “AI” (Alien or Immigrant) slave to write the terraform or electron front end rather than paying clever engineers well enough to build themselves useful tooling to make them more productive and the world a better place. Consumers will suck it up and buy more memory, along with planned obsolescence, keeping the hardware industry in business. Computers aren’t for making your life easier; they’re for surveillance and marketing, and for manufacturers a consumer good they hope you buy lots of add-ons and  upgrades for, and which wears out as soon as possible.

31 Responses

Subscribe to comments with RSS.

  1. Alex said, on April 1, 2021 at 9:56 pm

    At the risk of being banned for shilling, I will shamelessly shill for Urbit, which fixes point number 6.
    I think most of the problems you enumerate begin with how bloated and centralized everything became in the course of this last decade.

    A massive amount of bad decisions were taken in the name of “Cloud computing”.
    Another issue was outsourcing engineering efforts to Bangalore which got us a lot mediocre shit and subpar engineers writing mission-critical code. If people had been less mediocre when it came to writing software, we wouldn’t be where we are. This is, with a Cambrian explosion of languages that get differentiated by who has the wokest CoC.

    While I’m not a fan of Curtis Yarvin I think one of the best things we could do is burn it all down and start all over. While this idea sounds rather bizarre, in the long run I can’t see another solution for this.

    • Walt said, on April 7, 2021 at 4:08 pm

      A solution is proposed here. Maybe RISC-V will get us there. IDK.

  2. Ben Gimpert said, on April 1, 2021 at 10:18 pm

    For decades, all I have wanted is “nice with money.” So I can run “$ nice PID 120.00” and spend 120 dollars to make PID complete sooner. Yes this is impossible in general, but implementing adult logging & a few heuristics could get us most of the way there.

    • Scott Locklin said, on April 2, 2021 at 8:55 am

      I spent $5k on a threadripper. I think if I did that a lot, I’d buy two.

      If oligarch Bezos cared about humble numbers merchants like us, he’d have written an OS with #pragma mapfor support for big chunks and little chunks. Of course he don’t care; interferes with selling underpants on his time share machines.

  3. anonymous said, on April 2, 2021 at 2:53 am

    I’m actually using a computer I built in my first year of graduate school so … 8 years old at this point. I do most of my work in Linux Mint. It hasn’t “slowed down” at all, really, even though Mint==Ubuntu and bloats a bit. I haven’t really done dist-upgrades either except once.

    There were one or two lightweight linux variants I installed on old laptops my sister dumped. They’re reasonably fast, but the laptops have abused beat-up keyboards and dead batteries, so I only use them for display on some projects.

    I was actually thinking a bit about how LAPACK and the fortran idiom worked internally (all memory allocation done upfront, and a working memory buffer passed down the sequence of function calls), and came up with a style of coding this C library to do large-integer-math (I know, not innovative, but this one is mine and I understand it now). I actually had to write the thing 3 times and go back to read the handbook of applied cryptography to dump my naive gradeschool-operation-algorithms. Now I have something that runs faster than the compiled libraries underlying the python large-integer-math by about a factor of 2-10. Which is nice, because some operations are O(N^3ish) O(N^2.5ish if you use HAC).

    Perhaps it’s a reinvented wheel, but it’s *my* wheel, and now I can apply the patterns to other things. (And yes, I’ve already been yelled at for daring to presume I’m smart enough to code cryptography primitives. They’re working according to my functional tests, so 😛 )

  4. anonymous said, on April 2, 2021 at 3:16 am

    Regarding #1, try SWIG.

    source: http://swig.org/

    • Scott Locklin said, on April 2, 2021 at 8:47 am

      Didn’t know it was still around. Wonder if it works better these days.

      • ahgamut said, on April 3, 2021 at 7:53 pm

        I’ve found pybind11 to be pretty useful for wrapping C++ APIs to Python. It’s the Python analogue to RCpp: it does the necessary template metaprogramming magic, and while I’ve occasionally gotten lost in debugging some memory-related stuff (it is still C++ after all), it’s straightforward and convenient.

  5. George W. said, on April 2, 2021 at 3:53 am

    > encapsulates C (and Fortran and …. maybe C++) function calls and makes them accessible to other runtimes without actually doing any work.

    I could see this being useful in smaller projects but…

    Aside from being difficult and having no economic incentive, it seems like there would be some feasibility issues in creating a FFI parser. A pesky “engineering detail” per se.

    Each language would need to maintain an implementation of the ffi parser, or some protocol to use it. A parser that worked *most* of the time could make for more painful debugging. Maybe this isn’t a big deal in smaller projects where the alternative is also more debugging. [Just speculatively shitposting, I don’t know anything about SE…]

    Any of these ideas are better than the shit that engineers work on nowadays.

    • Raul Miller said, on April 2, 2021 at 12:21 pm

      Sure, and conceptually each language already has had a parser implemented for it, though there’s definitely some “baggage attached” issues with using that work.

      The problem, always, seems to be that automation requires non-automated effort. People want to automate automation, but we probably should be automating labor-intensive critical tasks that involve actual real world problems.

      Like, for example: we have thousands of programming languages, most of which probably started as student projects in some college course on parsers. But we have an extreme shortage of trash dump recycling equipment, let alone any sort of meaningful theoretical framework characterizing how to build such things.

      (I am recapping Locklin here: “Most VM designs I’ve seen are basically just student exercises; why not assume the outside world exists and has useful things to say?”)

      • Scott Locklin said, on April 2, 2021 at 1:57 pm

        BTW wasn’t there a tool for J which accomplished some of this with the old way of doing FFI at least? One of my uses for J is to screw around with libraries that look useful; would be amazing to have something that does it even more quickly.

        • Raul Miller said, on April 2, 2021 at 2:59 pm

          Sure, J has a fairly straightforward way of using FFIs.

          However, J punts on the issue of extracting type signatures from the foreign language, and asks the programmer to provide those type signatures (and to identify the calling convention if you don’t want the default. This is mostly a just windows issue, where __stdcall vs __cdecl is a thing).

          • Scott Locklin said, on April 2, 2021 at 8:39 pm

            I am vaguely recalling 6.x J had something which helped do the type signatures. I was yacking about the lapack FFI with Bill Lam at one point and he came up with api/lapacke which I think he used this code to generate (though maybe it was just copy pasta).

            • Raul Miller said, on June 20, 2021 at 2:37 pm

              Probably worth leaving a note here that C headers do not typically include type signatures for the libraries they represent. That information might be represented in the libraries themselves, but headers only provide definitions required to reference the libraries from C code. In the general case (for example, ‘printf’) C does not define a systematic type system and/or deliberately leaves a variety of issues to be defined by the implementation (which is part of why some people hate the language, and part of why it’s relatively easy to port to new hardware and/or build operating systems with).

              Mainframes were clean because they also strictly constrained the hardware.

              Meanwhile, an issue with types is that — while they started as a mechanism for characterizing the use of a particular spot in memory and/or parameter — they have also evolved (devolved?) into a way of characterizing the use of numbers. And, keeping things simple enough for that to be automated winds up with “community efforts” which first create barriers between different uses of the same numbers and later constructs mechanisms for leaking enough information past those barriers.

              And then there’s the mechanics of popularity and the temporary nature of fads.

              Which perhaps gets us into the meat of this April 1 post…

  6. Luke McCarthy said, on April 2, 2021 at 1:34 pm

    Zig programming language has the ability to seamlessly call C by just including a header file. Of course it achieves this by having the entirety of LLVM’s clang embedded within it. There are some shortcomings due to the limited expressiveness of C’s type system, you can get better results writing wrappers by hand (for example: is T* a pointer to a single T, a fixed-size array, a zero-terminated array or an array whose size is specified by a different argument? Is it nullable? There’s no way to express this in C).

    • Scott Locklin said, on April 2, 2021 at 1:51 pm

      Zig looked pretty cool. Doesn’t scratch any of my itches though.
      I insist you should be able to figure out stuff from the code, even though I know it is a problem. We have ding dongs claiming you’re going to replace a goddamned lawyer, doctor or policeman using “AI” -they can build a parser which deals with pointer problems. If nothing else; generating the wrapper, then calling up an IDE that says “whadayawant here on this vague pointer.”

      • Loup Vaillant said, on June 20, 2021 at 10:34 am

        There are 2 problems when automating an FFI: the first is parsing C despite macros. But that’s the easy one. The hard one is the impedance mismatch.

        Different languages have different capabilities, different sympathies, different idioms. Imagine you were to translate `crypto_key_exchange()` from my Monocypher: https://monocypher.org/manual/key_exchange

        There are 3 arguments, all pointers to bytes. One is not const, so we could guess it’s the output parameter. I have written the pointers in array form so we could guess they should point to fixed size buffers. But how would you know that the output parameter isn’t also an input? How would you know that those three buffers represent 3 kinds of keys, that should never be mixed?

        I could automate *part* of the FFI, but I don’t want a C favoured OCaml, or a C flavoured Python, or a C flavoured Lua. Once I have the low level functions worked out, I need to write a high-level, idiomatic interface. Here’s another example: https://monocypher.org/manual/hash How would I port the incremental interface to a language that supports streams? That would certainly look different from explicitly calling `update()` explicitly on every chunk of data.

        I don’t believe FFI automation is such a worthy problem to solve. Instead, I’d rather concentrate on making interfaces that are easy to call from other languages. No clever macros, avoid exposing complicated state machines, give explicit hooks for initialisation and destruction… Also, narrow your API down to C, or something similarly weak and constrained. Rich languages are awesome when they keep to themselves, but I don’t even want to try to have them talk to each other directly. We need some lowest common denominator first.

        • Scott Locklin said, on June 20, 2021 at 1:00 pm

          Well I certainly don’t think FFI automation is easy, though at this point it really should be. I just think it should be done if we actually believed in progress and valuing programmer time; obvious and possible thing to do.

          I also think it’s ridiculous that people (present company presumably excluded) talk about AI singularities or quantum apocalypto or software eating everything, but will come up with dozens of reasons some nice-for-software-engineers thing like FFI automation or sub 100gb operating systems are impossible.

  7. Igor Bukanov said, on April 2, 2021 at 1:47 pm

    It is not surprising that drag and drop GUI sort of disappeared. They are still available but are not used much except maybe for initial prototyping.

    It was easy when one could assume given screen resolution and size. Then assembling few screen using drag and drop that looks nice was straightforward. But the companies making those GUI have never figured out how to keep looks reasonable on vast set of screen sizes and resolutions and the need to rearrange GUI on screen rotation all while supporting touch interfaces for users with big fingers. There are various layout managers/controllers in different languages and frameworks, but none are fully automated, one needs non-trivial amount of code to deal with all corner cases. And once one needs programming, the whole drag-and-drop things becomes nuisance.

    Then even for cases like GUI for custom business applications that runs totally locally on the computer with known screen size (if such thing still exists) it is just easier to write PHP sitting on localhost backed by SQLite database and writes straightforward HTML than to assemble GUI with drag-and-drop and then figure out how to glue that to the business logic.

    • Scott Locklin said, on April 2, 2021 at 2:06 pm

      I’m sure it’s a hard problem, but it does get solved in chromium compositor guts somehow! Sort of anyway!
      There was a dialect of Rebol called Red which claimed to be able to do cross platform stuff at least, and which looked capable of the drag and drop. Of course it was all thin wrappers on wx and the community were all turkeys, so there was that.

      • Igor Bukanov said, on April 2, 2021 at 4:37 pm

        It is solved in Chromium because sites are either simple mixture of text and minimal navigation controls and simple UI or sites uses rather complex CSS rules or HTML tables that are specific to particular UI screen to reposition everything. Screen rotation is especially hard.

        Drag and and drop works nicely when everything is fixed where there is good notion where to drop things with quick shortcuts for alignment and precise sizing. The moment one needs to reposition things one needs to add. for example, quite a few spring-like connectors between buttons and annotations like keep those button together on resize but scale another one etc. And adding that with drag and drop is not faster than just writing the code.

  8. zardoz said, on April 2, 2021 at 8:17 pm

    > 1. Automated FFI parsers

    These do exist, and have for a while . See http://swig.org/ . Personally, I wouldn’t use this, since it’s an annoying dependency to have, and I wouldn’t trust it to get everything right. But if you want it, it’s there…

    > 2. … how about a VM that recognizes that it is not a unique snowflake, and that sometimes you have to call a function which may allocate memory outside its stack or something similarly routine but insane … ?

    It’s often ideological. For example, Sun didn’t bother writing something better than JNI for Java because “pure Java” programs were best. You don’t want to be “impure,” do you?

    But there are some genuine technological reasons why interoperability is hard. For example, Golang has a different view of what a thread is than C does, so you will necessarily lose some efficiency calling one from the other. Java wants to be able to move things in memory at arbitrary times, so if you try to pass a Java object to C, you need to somehow lock it in place for the duration of the call.

    > 3. Cloud providers should admit they’re basically mainframes and write an operating system instead of the ad-hoc collection of horse shit they foist on developers…

    Google tried this with Google Cloud Engine. Nobody used it because nobody wanted to rewrite their shit.

    Anyway, Bezos makes money from AWS no matter what OS you use, so why should he pick sides? Even Microsoft gave up on forcing their OS on everyone in the cloud (mostly). And now they’re making bank.

    > 4. Front ends could be drag and drop native GUIs instead of electron apps.

    Sure, but then the company would have to hire more expensive C or C++ developers rather than Javascript developers. And they’d still have to hire the Javascript developers to do the web version of the app, which everyone wants these days. Why should they do this when people are willing to put up with Electron? Slack made a zillion dollars with an electron app. Case closed for most business types.

    > 5. Compilers and interpreters should learn how modern computers work.

    It wouldn’t matter, though, because modern computers hide their guts from the operating system. Intel CPUs aren’t going to check with you before deciding how many instructions to execute in parallel. Your flash drive exposes an old-fashioned block interface that doesn’t say anything about its write-ahead log or how much onboard memory it has.

    > 6. Operating systems don’t have to look like your crazy hoarder aunt’s house.

    To be fair, after a few decades of incremental changes, ANYTHING would look like your crazy hoarder aunt’s house. There are a few people still living the dream and trying to re-invent the operating system. The Fuscia project is probably the most serious one. Too bad it’s controlled by the pink-haired freaks at Google.

    Alternately, you could use TempleOS, if you want something designed by a different flavor of mentally ill supervillain.

    • zardoz said, on April 2, 2021 at 9:34 pm

      Also, on the topic of supervillains writing operating systems, Curtis Yarvin (yes, THAT Yarvin) wrote Urbit, which seems to include a re-imaging of the operating system. Among other things. https://en.wikipedia.org/wiki/Urbit

  9. Joel said, on April 4, 2021 at 8:29 pm

    Hi Scott, Would you write about Palantir sometime.

    Thanks

  10. Lin Pengcheng said, on June 20, 2021 at 8:26 am

    You maybe need try “The Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model”: https://github.com/linpengcheng

  11. Terry Carmen said, on June 20, 2021 at 2:44 pm

    You’ve seen behind the curtain. It’s time to retire and do something you actually enjoy.

    For you and those like us, software will be a disgusting horror from now on.

  12. DaveFCook said, on June 20, 2021 at 4:00 pm

    Thank god somebody brought all this up. I agree with all of Scotts’ points. The suits have taken over technology and Facebook users think they’re tech people. We live in non-interesting times.

  13. Randall said, on June 21, 2021 at 6:07 am

    The modern z/OS thing is kinda fun. I don’t think this is what you were going for, but to me old-school Google App Engine seems the closest thing to a recent try: its APIs are _the_ APIs for files and so on (your target was not just a weird Linux box), you’re paying for capacity but not machines per se, the services you call out to (cache, datastore, etc.) are also auto-scaling-y, and there’s enough there to build some kind of app especially after they added background tasks, etc.

    Lambda/Fargate seem to be the spiritual successors, both variations on “your target is a Linux container” but with finer-grained scaling and billing and, in Lambda’s case, language runtimes. Even using other AWS APIs isn’t easier/cleaner inside Lambda than from anywhere else, and the other services you might call out to are not as auto-scaling-y as in the original GAE (e.g. you usually pay for cache or DBs by the node).

    Maybe there are deep reasons the GAE/super-Heroku model doesn’t pan out. Maybe users are too addicted to programming with bare Linux as the target, or if users pay for something besides machine-hours the price has to be set with a bunch of worst-case assumptions. I wouldn’t mind seeing more folks try. If you do get it right, billing for higher-level services rather than machine time leaves you more room to invent ways to deliver the same amount of service more efficiently and keep the difference, or improve the service and attract/keep people that way.


Leave a comment