Locklin on science

Open problems in Robotics

Posted in brainz, Open problems by Scott Locklin on July 29, 2020

Robotics is one of those things the business funny papers regularly wonder about; it seems like consumer robotics is a revolutionary trillion dollar market which is perpetually  20-years away -more or less like nuclear fusion.

I had contemplated fiddling with robotics in hopes of building something that would do a useful science-fictiony thing, like go fetch me a beer from the refrigerator. Seemed like a nice way of fucking around with math, the machine shop and ending up with something cool and useful to fiddle with.  To do this, my beer fetching robot would have to navigate my potentially cluttered apartment to the refrigerator, open the door, look for the arbitrarily shaped/sized beer bottle amidst the ketchup bottles, jars of herring, broccoli and other such irrelevant objects, move things out of the way, grasp the bottle and return to me. After conversing with a world-renowned expert in autonomous vehicles; a subset of robotics,  I was informed that this isn’t really possible. All the actions I described above are open problems. Sure, you could do some ridiculous workaround that makes it look like autonomous behavior. I could also train a monkey or a dog to do the same thing, or get up and get the damn beer myself.

There really aren’t any lists in open problems in robotics, I am assuming because it would be a depressingly long litany. I figured I would assemble one; one which I assume will be gratuitously incomplete and occasionally wrong, but which makes up for all that by actually existing. Like my list of open problems in physics and astronomy, I could very well be wrong about some of these, or behind the times, and since my expertise consists in google and 5-10 year old conversations with a cool dude between deadlifts, but it seems worth doing.

  1. Motion planning is an actual area of research, with its own journals, schools of thought, experts and sets of open problems. Things like, “how do I get my robot from point A to point B without falling into a canyon, getting stuck, or being able to deal generally with obstacles” are not solved problems. Even things like a model of where the robot is, with respect to the surroundings: totally an open problem. How to know where your manipulator is in space, and how to get it somewhere else; open problem. Obviously beer fetching robots need to do all kinds of motion planning. Any potential solution will be ad-hoc and useless for the general case of, say, fetching a screw from a bin in the machine shop.
  2. Multiaxis singularities -this one blew my mind. Imagine you have a robot arm bolted to the ground. You want to teach the stupid thing to paint a car or something. There are actual singularities possible in the equations of motion; and it is more or less an underconstrained problem. I guess there are workarounds for this at this point, but they all have different tradeoffs. It’s as open a problem as motion planning on a macro scale.
  3. Simultaneous Location and Mapping. SLAM for short. When you enter a room, your brain knows exactly where your body is, and makes a map of the surroundings. Robots have a hard time with this. There are any number of solutions to the problem, but ultimately the most useful one is to make a really good map in advance. Having a vague or topological map or some kind of prior as to the environment: these are all completely different problems which seem like they should have a common solution, but don’t. While there are solutions to some problems available, they’re not general and definitely not turn-key to where there would be a SLAM module you can buy for your robot. I could program my beer robot to know all about my room, but there’s always going to be new obstacles (a pair of shoes, a book) which aren’t in its model. It needs SLAM to deal.
  4. Lost Robot Problem. Related; if I wake up, and my friends moved my bed to another room; we’ll all have a laugh. Most robots won’t know what to do if it loses track of its location. It will need a strategy to deal with this. The strategies are not general. It’s extremely likely I turn on my beer robot in different positions and locations in the room, and it will have to deal with that. Now imagine I put it somewhere else in the apartment building.
  5. Object manipulation and haptic feedback. Hugely not done yet. The human hand is an amazing thing, and robot manipulators are nowhere near being able to manipulate with haptic feedback or even simply manipulate real world objects based on visual recognition. Even something like picking up a stationary object with a simple graspable plane is a huge unsolved problem people publish on all the time. My beer robot could have a special manipulator designed to grasp a specific kind of beer bottle, or a lot of models of shapes of beer bottles, but if I ask the same robot to fetch me a carrot or a jar of mayo, I’m shit out of luck.
  6. Depth estimation. A sort of subset of object manipulation; you’d figure a robot with binocular vision, or even simply the ability to poke at an object and see it move is something pretty simple to do. It’s very much an open problem. Depth estimation is a problem for my beer-fetching robot, even if the beer is in the same place in the refrigerator every time (the robot won’t be, depending on its trajectory).
  7. Position estimation of moving objects. If you can’t know how far away an object is, you’re sure going to have a hard time estimating what a moving object is doing. Lt. Data ain’t gonna be playing baseball any time soon. If my beer robot had a human-looking bottle opener, it would need a technology like this.
  8. Affordance discovery how to predict what an object you interact with will do when you interact with it.  In my example; the robot would need a model for how objects are likely to behave in moving them aside in searching my refrigerator for a beer bottle.
  9. Scene understanding: this one should be obvious. We’re just at the point where image recognition is useful: I drove an Audi on the autobahn which could detect and somewhat adhere to the lines on the highway. I’m pretty sure it eventually would have detected the truck stopped in the middle of the road in front of me, but despite this fairly trivial “you’re going to turn into road pizza” if(object_in_front) {apply_break} level of understanding, it showed no evidence of being capable of this much reasoning. Totally open problem. I’ll point out that the humble housefly has no problem understanding the concept of “shit in front of you; avoid,” making robots and Audi brains vastly inferior to the housefly. Even putting the obvious problem aside; imagine if your robot is tasked with getting me a beer out of the refrigerator and there is a bottle of ketchup obscuring the beer. The robot will be unable to deal. Even with a 3-d model of the concept of beer bottle and the ketchup bottle which is absurdly complex to program the robot with.


several of the above problems illustrated



There’s something called the Moravec paradox which I’ve mentioned in the past.

“it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility”

Robotics embodies the Moravec paradox. There’s a sort of corollary to this that people who work in the tiny field of “actual AI” (as opposed to ML ding dongs who got above their station) used to know about. This was before the marketing departments of google and other frauds made objective thought about this impossible. The idea is that intelligence and consciousness arose spontaneously out of biological motion control systems.

I think the idea comes from Roger Sperry, but whatever, it used to be widely known and at least somewhat accepted. Those biological motion control systems exist even on a microscopic level; even unicellular creatures like the paramecium, or primitive animals without real nervous systems like the hydra are capable of solving problems that we can’t do even in the general case with the latest NVIDIA supercomputer. While robotics is a noble calling and the roboticists solve devilishly hard problems, animal behavior ought to give a big old hint that they’re not doing it right.



Guys like Rodney Brooks seemed to accept this and built various robots that would learn how to walk using primitive hardware and feedback oriented ideas rather than programmed ideas. There was even a name for this; “Nouvelle AI.” No idea what happened to those ideas; I suppose they were too hard to make progress on, though the early results were impressive looking. Now Dr Brooks has a blog where he opines hilarious things like flying cars and “real soon now” autonomous vehicles are right around the corner.

I’ll go out on a limb and say I think current year Rodney Brooks is wrong about autonomous vehicles, but I think 80s Rodney Brooks was probably on the right path. Maybe it was too hard to go down the correct path: that’s often the way. We all know emergent systems are super important in all manner of phenomena, but we have no mathematics or models to deal with them. So we end up with useless horse shit like GPT-3.

It’s probably the case that, at minimum, a genuine “AI” would need to have a physical form and be capable of interacting with its environment. Many of the proposed algorithmic solutions to the problems listed above are NP-hard problems. To me, this implies that crap involving computers such as we use is wrong. We do approximately solve NP-hard problems in other ways all the time; you can do it with soap bubbles, but the design of the “computer” is vastly different from the von Neumann machine: it’s an analog machine where we don’t care about infinite accuracy.

You can see some of this in various proposed neuromorphic computing models: it’s abundantly obvious that nothing like stochastic gradient descent or contrastive divergence is happening in biological neurons. Spiking models like a liquid state machine are closer to how a primitive nervous system works, and they’re fairly difficult to simulate on Von Neumann hardware (some NPC is about to burble “Church Turing thesis” at me: don’t). I think it likely that many robot open problems could be solved using something more like a simulacrum of a simple nervous system than writing python code in ROS.

But really, all I know about robotics is that it’s pretty difficult.

14 Responses

Subscribe to comments with RSS.

  1. asciilifeform said, on July 29, 2020 at 5:04 pm

    The Von Neumann model of computation is simply crippling, even in “crunching” (e.g. chemical modeling) applications that have nothing to do with mechanical mules. Observe that 1950s rocketry was ~100% driven by analog servo controllers (and without even bothering to e.g. “simulate neuron” — worked great.)

    On the hardware side, the robotic animals are only superficially (i.e. for rube audience) biomimetic — actual animals do not have hinges for joints, and their prime mover is finely-controlled muscle tissue, rather than motors supplying gigantic torque directly at a joint. (But there is ~0 incentive to develop proper electronic muscle, because of 1st paragraph.)

    Not even to mention that “fetch beer from fridge” is not a market that can cover the R&D outlay for anything in this vein. “Bigdog” et al are simply boondoggles, rather than commercial products in the usual sense. They are not designed to work — but to impress the gullible; in the fine tradition of “Eliza”.

    • Scott Locklin said, on July 29, 2020 at 6:21 pm

      I really have no idea what “computer scientists” in current year get up to. Seems like it should be what they’re doing! I guess they’d rather pay theorists to tell them new architectures aren’t worth doing, than pay people to innovate.

      • asciilifeform said, on July 29, 2020 at 7:13 pm

        >…what “computer scientists” in current year get up to…

        They’re… cranking out GBs of LaTeX, pushing out LPUs, killing trees, earning diplomas and tenure, precisely like the “humanities” fluff wankers — but instead being abusers of equations and plots rather than of words. See e.g. the anon commenter’s links — primo examples.

        Meanwhile the only “innovations” involved are in new methods to part the fool from his money. And even there, not much in the way of novelty: the academic “jam tomorrow” racket is essentially the same today as it was during the reign of Reagan. Albeit featuring ever-larger dollar sums and ever-smaller outlay on physical props.

  2. Anonymous said, on July 29, 2020 at 5:58 pm

    One area that’s seen very good progress is perception (depth estimation, etc), driven in part by deep learning. Most autonomous vehicle projects use a combination of laser range finding (lidar scanning over 1M a second) with cameras & radar that all complement each other. LIDAR gives accurate depth returns, but suffers at detecting objects at long range (>200m) when the point cloud returned is sparse, and gets spooked by fog, snow, rain, folliage, etc. High res cameras can detect cars, pedestrians, etc at long range, but aren’t as accurate at depth perception. Radar can see through the fog & rain, but has other issues.
    One major research direction is fusing all of this together inside of neural networks (instead of generating separate outputs for each sensor modality and then fusing with a tracking algorithm / optimization method), which gets better results than having separate pipelines and fusing after [1].
    The going strategy in some of these AV projects is to use larger neural net backbones [2] coupled with more data, assuming it will continue to scale [3] (what I imagine you’d take issue with).
    Now there are some obvious problems, like DL lack of common sense [4]. Secondly, this uses a tonne of compute. There’s a reason Waymo is using minivans – current AV test vehicles are approaching a petaflop of compute, faster than any existing super computer in 2005. The power required is enough that it would potentially handicap the range (presuming they have an electric drivetrain).
    The other issue is hand engineering what data is transferred from the detection pipeline to the motion planning pipeline. In addition to location & velocity, now planners are starting to propagate turn signal information, but this requires reworking the entire pipeline. Imaginable even more things will need to be propagated, making this process very hand-engineered and potentially infeasible. However, the alternate to this is terrifying – end to end deep learned models that go from visual inputs to control outputs. There are some startups doing this [5], but I wonder how a fully black box pipeline would get regulatory sign off. The founder & CEO is a recent Cambridge PhD grad who at least seems humble about the limits of deep learning [6] and willing to address the job displacement issues [7], but we’ll see how their approach turns out.
    I really doubt an autonomous vehicle that works at all latitudes and in all conditions will be feasible until (if ever) a general AI is developed to handle the long tail of events. Even proving that an AV is safer than a human driver may take on the order of ~100M miles, after all the R&D work is complete [8].

    Formerly state of the art paper on fusing LIDAR with camera returns by Uber ATGs chief scientist. She got her PhD in autonomous robotics in 2006 & her earlier papers are GPs and other non deep learning techniques, so I think DL actually works better for this application.
    Latest SOTA object detector from Google

    Click to access 1911.09070.pdf


    Click to access 1712.00409.pdf

    Here’s the Waymo guy talking about how a bunch of kids who stole a stop sign & were carrying it on bikes and how it caused problems for their car: https://youtu.be/Q0nGo2-y0xY?t=355

    • Scott Locklin said, on July 29, 2020 at 6:08 pm

      Thanks for a meaty and useful comment. I figure Lidar makes depth perception a solved problem in a sort of “cheating” sense (aka my beer robot would have eyes but no lidar). But … mapping the Lidar identified objects to the DL identified objects to whatever ontology or internal map seems like an open problem.

      I remember the last time I was looking at the SBIR funny papers in 2015 or so, data fusion was still a huge problem for the F35. I assume they have some problems solved well enough to lob missiles at bad guys, but that’s a lot different than the beer robot rooting around in my refrigerator.

  3. Igor Bukanov said, on July 29, 2020 at 8:17 pm

    A few days ago out of curiosity I tried Tesla 3 autopilot on a mountain road in Norway. I switched it off after few minutes.

    The first problem was that it tried to stick to the middle of the lane even when there are many turns around a mountain resulting in nausea feeling. Any sensible driver will smooth that by driving towards the edges of the lane. Then at one point the line turned into two to allow for overtaking. The car “perceived” that only few meters before the marking separating the new lanes appeared in the middle of the old lane and then rather violently moved to the left, not right as one is supposed to do. That was enough.

    On the other hand the cruise control with radar tracking of the car in front is fantastic. It literally sticks to the car and brakes properly and smoothly if the car in front slows down. But then I suspect that it is just solving physical equations, not machine learning.

  4. Raul Miller said, on July 29, 2020 at 8:24 pm

    Yes, if I were inclined to push this field forwards, I would focus on the “cyborg” or “human enhancement” side of things rather than the “completely autonomous” side of things.

    The issues that could be tackled include:

    (*) Gathering additional information about the environment (I remember a guy using ultrasound to locate pipes, for example),

    (*) Dealing safely with unpleasant or maybe even dangerous things,

    (*) Using leverage or other mechanisms to move heavy things.

    All of these would involve tradeoffs, and (especially in the early stages), frustrations. Some approaches would also look completely dorky and/or invite other criticisms from instant experts.

    There might also be some use for “learning algorithms” (neural nets, genetic search, or maybe even just regular searching and/or sorting and/or statistics — the stuff that we hear the “AI” label get applied to), but that’s only going to take you so far.

    • Rickey said, on July 30, 2020 at 2:34 am

      That is already being done with aerial drones such as Reaper, remotely operated underwater vehicles (ROV’s) and planetary rovers. Even forklifts apply if you want to get really basic. It seems the best approach right now is to extend human reach and perception so a person can act at a distance through extended control without having to go into harms way. Money would be better spent developing better interface and communications systems. Maybe I am missing something, but fully autonomous robots seem to be a solution looking for a problem.

  5. pindash91 said, on July 29, 2020 at 11:59 pm

    I know of one person who is doing exactly this and building artificial reflexes, check out. He came to this conclusion by realizing that the brain is not a computer but a giant gland attached to reflexes. https://jlettvin.github.io/gaze/gaze.html

  6. Bruce said, on July 30, 2020 at 7:06 am

    A while back, I tried an echo state network to solve a difficult prediction problem. It worked well enough to drive home that gradient descent isn’t really needed in neural network learning. I suspect the filtered randomness of the ESN could play a role in biological networks.

    On a different note, the old Hebbian notion that biological neurons which fire together wire together seems to be making a minor comeback. 90 year old Bernard Widrow, who worked on early neural nets in the 50s and 60s, has published quite a few recent papers on his Hebbian-LMS method, which does seem to be successful at unsupervised clustering.

    • Scott Locklin said, on July 30, 2020 at 1:35 pm

      I had some interesting results from ESNs as well. At some point I’d like to fool around with LSM simulators as well.

      I hadn’t heard of Hebbian-LMS; that’s pretty neat.

      I figure fooling around with the latest DL atrocity in Keras is a lot less interesting than going over these old ideas that we have more computational power to throw at. Reservoir computing is pretty weird that it does anything at all; doesn’t seem to make anyone curious.

    • Raul Miller said, on July 31, 2020 at 7:21 pm

      When you say “gradient descent isn’t really needed” — do you mean “not needed for some basic functionality” or do you mean “did not measurably improve X” (where X is some specific quality or qualities, such as learning rate or degree of final convergence, etc)?

      (I am a big fan of minimalism, but I am also acutely aware of its limitations, so I want to understand…)

      • Scott Locklin said, on August 1, 2020 at 3:37 pm

        I think he means (or at least I agree with the statement) something like “neural nets do interesting things without gradient descent approaches.” An ESN for example is a random matrix with a linear regression output layer (matrix inverse), and it does amazing things.

        • Raul Miller said, on August 2, 2020 at 11:02 pm

          Gradient descent is about training the network. So I am going to guess that you are saying that ESN does not train the random matrix but instead it just keeps those initial random values? And that the “training” consists of building that output layer for the random matrix?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: