Settings

Theme

Tesla’s CV Approach to Autonomous Driving Built an Unassailable Lead in FSD

softmax.substack.com

11 points by bald 5 years ago · 45 comments

Reader

porphyra 5 years ago

Tesla fanboys seem to frequently forget the following facts:

* People using lidar also use deep learning.

* Lidar and HD maps are totally orthogonal concepts with nothing to do with each other. Lidar helps you avoid running into trucks without HD maps. Camera-based methods can use HD maps too.

* Lidars are less affected by rain and snow than cameras are (thanks to larger optical aperture, multiple returns, and faster "shutter speed" due to nanosecond pulses rather than 1/30 s exposures).

* Tesla's "big data" is not more effective than a concerted active learning data collection campaign with a moderate-sized fleet.

* HD maps are very cheap to create.

* Lidars may in fact be cheaper than the GPU you're going to need to run your monocular depth network, while being more accurate and more robust.

  • frisovv 5 years ago

    FWIW, you can just buy HD Maps: https://www.tomtom.com/products/hd-map/

  • mam2 5 years ago

    > Tesla's "big data" is not more effective than a concerted active learning data collection campaign with a moderate-sized fleet.

    Provided any concurrent can build a team of people with as much focus and ressourced as tesla. Even if equal, big data from real drivers beats everything else.

tmotwu 5 years ago

I could not get through the entire article with straight face. It's riddled with fallacies and misconceptions on the problems currently facing self driving. It was especially this horrible take that put the credibility of this article into question:

> "A simple analogy can illustrate the conceptual difference between Computer Vision and LiDAR. Imagine two students, where one is just cramming and memorizing the content (LiDAR), while the other one is trying to really understand the material and truly learn it (Tesla FSD). The student that learned the material (Tesla FSD) will be able to answer the exam questions correctly, even if the questions on the exam are swapped, the questions are rephrased, or new components are added to the questions, while the student that memorized the content (LiDAR) will likely fail the exam."

Tesla's advantage isn't even it's computer vision systems - computer vision models don't exactly scale like language transformers (where larger, sparser parameters make better models). There hasn't been any significant advancements in vision models since Faster-RCNNs or YOLO, which is close to five years old now. Especially if you want to compare Tesla's SotA against Waymo, which has an army of Captcha labelers and a large plethora of example images.

The goldmine is in identifying and navigating around rare edge cases - data that can only be obtained with hundreds of thousands of hours of real world driving. It has very little to do with the correct set of camera and calibration configurations. The novel research is deep into safety verification strategies. Like how do can we predict human or object behavior by using miliseconds of prior movement of an object? Or how to use human arm movements and eye pupils to determine if a pedestrian or other driver is distracted? Can we avoid an accident if we have a better behavioral understanding of the scene?

dr_faustus 5 years ago

What lead? The main advantage of Tesla is that they have the guts to put beta quality software in cars and get away with it (from a regulatory perspective) up to now.

As can be witnessed in loads of videos on YouTube, the current FSD betas still pretty much require full attention (I would say even more attention than driving youself because you have to watch the environment AND the behaviour of Autopilot). And these demos are on American roads which are among the easiest to drive in the world (generally very wide, lots of traffic lights, large intersections, etc.).

Bad weather conditons (which are a thing outside of southern California) are even more problematic. So while the tech might or might not be more advanced than what other car companies have in their labs, reliably its still pretty much an improved cruise control and in that regard not that far ahead of the competition (all premium brands have that in one way or the other).

  • alkonaut 5 years ago

    > American roads which are among the easiest to drive in the world

    This. Driving in Arizona or Nevada is possible for a child. It’s not pouring down, there’s often roads so wide that ther e is a lane in each direction! Often there are road markings painted on the road. That aren’t snowed over. It’s easy mode.

    I’d like to see the best efforts in some Italian alleys where you are 50/50 to have to reverse because you face another car, or a rainy single lane country road in England at night. Anything where the driver actually has to interact with other drivers, understanding eye contact, waves, social norms (who reverses? Is he in a hurry? Is she angry? Is that car parked or waiting too? Is that wave meaning I should go or is it just an angry gesture?) and not just the environment.

    These things (narrow alleys, roads snowed over 6 months, oncoming traffic in the same lane) aren’t edge cases. It’s what “driving” is in many places.

  • jsight 5 years ago

    I'm not convinced that the systems currently being fielded by their competitors are better than beta quality either. Somehow they get away with it.

jsight 5 years ago

Given the early stage, I find the idea of anyone having an "unassailable lead" to be laughable.

The idea that the one with such a lead is Tesla is even more laughable, and I say that as someone that actually likes some significant elements of Tesla's approach. Although I pretty strongly dislike some aspects too.

  • Someone 5 years ago

    Even if they had perfect FSD, that lead wouldn’t be unassailable.

    On the contrary, it would tell would-be competitors that the goal can be reached, and what it would approximately cost.

    They would do the math and see whether that’s worth it financially. Even if it weren’t (say because Tesla would license the tech for pennies), I think the likes of China still would spend the money to create a competitor.

  • rich_sasha 5 years ago

    This. Ford had an unassailable lead on making cars. Now it’s just one of many. Or Intel until recently.

NovemberWhiskey 5 years ago

There seems to be gigantic non-sequitur here where good performance of convolutional neural networks for image recognition is seen as evidence that visual sensors suffice for full driving autonomy and that LIDAR-based approaches are heading in the wrong direction.

There doesn't seem any evidence for the proposition that "if only we can train the model with more data it'll suddenly be good enough". There's an extremely long tail of scenarios, operating in different weather conditions, in different environments, at different times of day, and the driving problem is adaptive - the network needs to predict how others will respond to its own behavior.

The authors dismiss the idea that sensor fusion may provide more additive capability than train-with-m0ar-data but don't actually seem to provide any basis for that.

adflux 5 years ago

Another dumb post by someone who believes Tesla operates in a vacuum... Tesla is a very small player in terms of cars produced. Volkswagen has started outselling Tesla in some European countries. And dont forget about GM and Toyota. No lead is unaissalable in a highly competitive market like car manufacturing. And its not like there's no competition from the tech side either. Microsoft, Apple and Google are all working on making self driving cars. Any combination of said tech companies with a large car manufacturer would be a serious threat.

  • johnmorrison 5 years ago

    What portion of Volkswagens sold are outfitted with an array of HD cameras passively collecting data?

    • adflux 5 years ago

      Volkswagen sells about 10 million cars per year. If the only advantage they have is more miles recorded, please tell me how Teslas lead cant be beat by a company which outsells it 20 to 1.

soVeryTired 5 years ago

Has the author done any serious work in computer vision? Imagenet is one thing, but there are many many problems that need to be solved before a vision-based vehicle can run reliably. I worked at a self-driving company for a while, and they struggled with issues like multiple target tracking and reasoning about occluded vehicles.

And in a sense, vision is the easy part. There are whole areas like prediction and motion planning where the current state of the art isn't really up to scratch.

verdverm 5 years ago

hmm, I think the author missed that Waymo drove in snow 3 years ago [0], obviously outside of Pheonix. They also don't seem to understand it's CV + Lidar, not versus... I'd be surprised at a CV only system that can handle snow like seen in the Waymo example.

The rest of the article does not hold up once you realize the author is in an either-or mindset and thinks Waymo is not using CV, which they are, and has vertical vision stack, with arguably better experience at scaling ML

If you want to see actual FSD (without a human in the driver seat), this Waymo beta user has been video documenting on a regular basis: https://www.youtube.com/playlist?list=PL-13jt3ZPb7X6qJTo_MEn...

Having seen the "self-driving" displays from both, Waymo's shows Waymo information, which gives me greater trust in the system. By contrast, there are Tesla videos with some scary moments.

[0] https://www.engadget.com/2018-05-08-waymo-snow-navigation.ht...

  • hellotomyrars 5 years ago

    Yeah. The author here is just wrong.

    Tesla's bet on CV is the biggest thing holding it back and everyone else is using both. Does Tesla have the best and most sophisticated CV tech? Probably. Is that better than having LIDAR and CV? Probably not.

  • bigtones 5 years ago

    I agree with this. Humans have a tough time driving in the snow for quite a few reasons, but the ability of our eyes (which are fantastic technology) to differentiate objects covered in white snow is a big part of it, so camera's and CV are going to have a much tougher time that the combination of human brain and eye.

    On the other hand, driving in the snow is a very very small portion of most driving commutes, so maybe the solution is for the car to automatically turn off autonomous driving in the snow. The warmer half the world would not care it does not work in the snow.

    • sudosysgen 5 years ago

      You can make cameras that can differentiate objects covered in white snow.

      It's just going to be quite expensive, you would need quite good dynamic range and resolution, and I'd guess depth from de focus and parallax helps.

      But that's with ~40 bits per pixel of dynamic range (17-18 effective bits per channel), at fairly high resolutions.

  • invisible 5 years ago

    I'm a fan of both approaches, but Waymo is very heavily reliant on Lidar (and uses CV for augmenting). They have been working on this problem for 12 years now and have just launched in Phoenix, using high definition maps, lidar, and a suite of other sensors. They have <1000 vehicles. Zero fatalities attributed to Waymo.

    Tesla has been working on this problem for somewhere close to half the time and with far fewer resources initially. Tesla has a million cars on the road across the entire world and only use basic sensors and CV. Six fatalities have been attributed to Tesla (and the driver).

    So there really isn't a lot of data to go on for who is "winning" but Tesla has much more driver data. The fatalities are so low compared to the miles driven that it's difficult to really know if Waymo could achieve a better result.

    • sudosysgen 5 years ago

      Waymo is more reliant on lidar necessarily because lidar provides so much more data. You'd expect any algorithm to rely more on better data sources.

      • invisible 5 years ago

        This isn't a counterargument, but sometimes more data isn't better data. I think lidar is super impressive in the right conditions, but debris in the air can be considered a bird even if it's a paper bag. If Waymo is having to still use CV to validate that it's a paper bag, then aren't they having to solve both problems? (Not that that is a bad thing, but it is at least two problems. I'm not certain who will achieve a working system, or if any of the players today will at all.)

josefresco 5 years ago

"Immediately, the Bolt EUV is, in my experience, the best autonomous driving experience available today."

"But for my money right now, Super Cruise beats Tesla’s Autopilot in real-world usage."

https://electrek.co/2021/03/05/chevy-bolt-super-cruise-autop...

shusson 5 years ago

> Why is Computer Vision (using neural networks) superior to LiDAR?

It would be better to compare LiDAR to CMOS sensors. You can apply computer vision techniques to either.

  • NovemberWhiskey 5 years ago

    The point here really is that fundamentally you have a simultaneous localization and mapping (SLAM) problem.

    It's a problem that benefits from incorporation of data from multiple different sources that do not have correlated errors. The way the authors focus purely on the computer vision aspects is idiotic; I know nothing about how Tesla approaches this but I almost guarantee you that they also consume map data, vehicle kinematics and so on when updating their model.

    There is no world in which adding data that allows you to discriminate between (to pick a not-entirely-random example) the white roof of an overturned semi that represents an obstacle and a bright patch of sky doesn't help enormously. The suggestion that LIDAR is only relevant to pre-mapped areas is ... bizarre and nonsensical.

    The Tesla bet is just that "good enough" can be achieved with fewer sensor sources. That's it.

mrjet 5 years ago

I have spent the last six years working in self-driving and disagree with the article. I am an autonomy engineer at a major self-driving focused company.

First: The choice is between “lidar and cameras” and “cameras alone.” I am not aware of any contenders who have are only using lidar. That means the only downside to using lidar is cost.

Second, the article is incorrect. lidar is extremely reliable for detecting dogs, pedestrians, and anything else you can think of. For lidars with sufficient intensity sensitivity, you can even read the text on signs.

Here’s a list of some tradeoffs for available sensors.

Cost: Sterling Anderson said in a talk at MIT a few years ago “there is no unobtanium in lidar.” Making lidar cheap is a matter of manufacturing scale. Not a matter of new physics. Cameras are still much cheaper and will remain so for some time. This alone might justify choosing cameras for consumer vehicles. The game-changing imaging radars that exist are not cheap.

Long-Tail Events: On a camera-based system without depth sensors, the vehicle must react based on correct identification of obstacles. Consider an image of a pedestrian painted onto the road. A system with depth sensors will not need to stop.

Depth estimation with multiple cameras leaves a lot to be desired. It is bad for untextured objects. Poor illumination conditions will prevent texture from being visible to the cameras. Poor illumination conditions have no effect on lidar/radar.

I would not bet my life on a estimated depth from a monocular camera, no matter how many layers the DNN has.

Weather: Lidar works fine in the rain and snow. Degraded, but fine. Radar works fine in the rain and snow. Cameras can be made to work well, especially if placed in enclosures that self-clean. ATG’s vehicles famously made “whooshing” sounds as their pneumatic lens-cleaners forced water off of their camera lenses.

Time-of-Day: visible light cameras will do poorly. Every system I have seen has degraded camera performance at night. Some systems include an NIR channel to help. You cannot bring enough onboard illumunation to compete with the Sun. Most lidars choose a wavelength that leaves them completely unaffected in day vs night. Ouster has different noise characteristics during the day, but not enough to matter.

Range: At long distances, no commercial sensor can beat the angular resolution of cameras. This is where they shine most. That’s why you see highway-focused systems emphasize cameras so much. Blackmore was a promising path to enabling highway-range lidar capability, but they were bought by Aurora years ago now.

It is possible that cameras are completely sufficient. It is possible that Tesla is even ahead. But this article’s reasons won’t be the causal factor.

The company that builds a functional autonomous car will introduce the largest sea change in transportation since the automobile. The value delivered by each car will be massive. An additional $8,000 for a single lidar is not a dealbreaker. And that’s at today’s costs for a nice Hesai. In 2025 it will be yet smaller. In 2016, the only good lidar on the market was the Velodyne HDL64, which cost $80k. Pucks failed too often.

bpodgursky 5 years ago

I don't know how this will pan out, but I've seen a lot of criticism of Tesla "only" using vision, as if it's a ridiculous concept that will never work.

But humans drive... only using vision.

Maybe it's not possible for Tesla to get true FSD using only video data given current technology, but the idea that it's laughable doesn't make any sense to me. People drive in new environments, using only their eyes, all the time.

  • soVeryTired 5 years ago

    It's laughable given the current state of computer vision. CV has made enormous progress over the past decade, but it's still brittle and probably shouldn't be relied on in a life-or-death situation

  • paulryanrogers 5 years ago

    Human brains are massively more powerful per watt and trained for years to respond to visual stimuli.

  • solidasparagus 5 years ago

    > But humans drive... only using vision.

    No they don't. Driving uses multiple senses. Sound is the most obvious one, but other less-well-known senses almost certainly play a role as well.

benjohnson 5 years ago

We will need better-than-vision to drive safely - whiteness all the humans that pile up in multi-car crashes when it gets foggy or snowing.

Lidar, or something like it, will have to be part of the equation given that the best visual processing computers of all time (human brains) don’t get enough data from their systems to make good choices in bad weather.

  • VBprogrammer 5 years ago

    Wouldn't the simple solution here to limit speed such that it can stop within the distance that can be seen to be clear ahead? I mean this is exactly how you are supposed to drive but few actually do (which leads to perverse situations where you have to drive faster than you safely can to avoid someone driving into the back of you at 70mph in heavy fog).

    • cameldrv 5 years ago

      This is essentially impossible. Just for starters you’d never be able to pass another car in the opposite lane or pass a pedestrian standing near the curb at any reasonable speed. You’re forced to assume the other guy won’t suddenly do something crazy.

      • VBprogrammer 5 years ago

        I think making inferences about what the rational actors you know of will do is quite a different problem to dealing with situations where there is a limit to what you can see e.g. fog or a corner for that matter.

        • cameldrv 5 years ago

          My point is that in an overwhelming number of situations, it's not sufficient to merely identify the locations and possible trajectories of all of the moving objects. If you assume everyone will act "rationally", you will end up killing someone every few days. If you assume everyone will do the worst possible thing, you'll never get anywhere.

          This situation forces you to predict behavior reasonably accurately. There's a big not-so-long tail of people doing stupid things. If you drive in a bar area late at night, you will fairly often encounter drunk people in the street. You may notice some swerving on the road or walking erratically and infer that they might be distracted or drunk or a small child and give them a lot of extra room.

          Human drivers typically develop something of a sixth sense observing people on the road and know who it's safe to drive past at high speed and who needs lower speed and more room. If humans didn't have this, the road would be a lot more dangerous.

          AVs can compensate for this by having (potentially) faster reaction times, and sensors that can see longer distances. There are a lot of scenarios though where better prediction is more important than either of these. Behavior prediction is a very active area of research in the AV world.

    • sliken 5 years ago

      Heh, indeed, and is why there's huge pileups when it's slightly surprisingly wet, slippery, icy, foggy, etc.

      Clearly if people drove for conditions you would not have giant pileups when the weather changes.

  • verdverm 5 years ago

    Note both Waymo and Tesla have sensor fusion systems. I believe both are using CV and (ultrasonic?) radar, Waymo is additionally including Lidar.

    https://thelastdriverlicenseholder.com/2020/03/04/waymo-reve...

    • sliken 5 years ago

      Tesla uses CV, radar, and ultrasonic. But ultrasonic is just for parking type measurements, up to 36" or so.

      Tesla did just apply for approval to use 60-64 GHz band for radar, not clear to me how that will change the range/performance they get from the sensors.

  • Someone 5 years ago

    You don’t need lidar to make the good choice to slow down significantly in a snowstorm or to not drive the car at all when you encounter a heavy one.

    I also doubt human brains are the best at visual processing. Certainly at subtasks, quite a few animals are better.

    In particular, I would think the night vision of nocturnal animals are better at that “detecting cars in a snow storm” task than middle-aged humans wearing not quite correct glasses.

  • bpodgursky 5 years ago

    > (human brains) don’t get enough data from their systems to make good choices in bad weather

    Why is your bar for deploying self-driving cars "far better than human performance"? We let people drive in bad conditions.

    • verdverm 5 years ago

      Ascribing fault and insurance claims are some likely reasons it will take better than human performance

      • dogma1138 5 years ago

        No it won’t, insurance doesn’t need better than human performance it just needs to be able to quantify the risk and price their policy accordingly.

        If there is no driver it’s already lowers the risk even at as good as a human capability because you remove one potential human casualty from the equation when it comes ride services.

        If AVs are going to be slightly better at causing fewer pedestrian casualties then it reduces the risk even further.

        And even if the premiums are more expensive because of higher risk the question will be are they $30-50K which is the “salary” you would have to pay a full time driver at a minimum in the west these days a year more expensive.

        If Uber will have to pay $20-30K per car per year in insurance it will still be cheaper for them than to use drivers. And that’s at about 10 times the average care insurance cost right now.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection