Safety Impact

47 min read Original article ↗

Making roads safer

The trust and safety of the communities where we operate is paramount to us. That’s why we’re voluntarily sharing our safety data.

The data to date indicates the Waymo Driver is already making roads safer in the places where we currently operate. Specifically, the data below demonstrates that the Waymo Driver is better than humans at avoiding crashes that result in injuries — both of any severity and specifically serious ones — as well as those that lead to airbag deployments.

This hub compares the Waymo Driver’s Rider-Only (RO) crash rates to human crash benchmarks for surface streets. It leverages best practices in safety impact analysis and builds upon dozens of Waymo’s safety publications, providing an unprecedented level of transparency within the autonomous driving industry. By sharing our data and methodologies, we also invite you to join us as we push for advancements in measuring safety impact.

The data displayed on this webpage undergoes consistent updates aligned with NHTSA’s Standing General Order (SGO) reporting timelines.

Rider-only (RO) miles driven

Through September 2025, Waymo has driven 127M rider-only miles without a human driver

The Waymo Driver has tens of millions miles of real-world driving experience. This dashboard shows rider-only miles – miles that Waymo has driven without a human driver — in cities where we operate our ride-hailing service, Waymo.

Learn about our methodology

Waymo Driver compared to human benchmarks

This table shows how many fewer RO crashes Waymo had (regardless of who was at fault) compared to human drivers with the average benchmark crash rate if they were to drive the same distance in the areas we operate. Results have been rounded to the nearest whole number.

Learn about our methodology

Compared to an average human driver over the same distance in our operating cities, the Waymo Driver had

90% Fewer serious injury or worse crashes (27 fewer)

82% Fewer airbag deployment in any vehicle crashes (173 fewer)

81% Fewer injury-causing crashes (411 fewer)

Crash reductions involving injuries to Vulnerable Road Users

92% Fewer pedestrian crashes with injuries (48 fewer)

83% Fewer cyclist crashes with Injuries (28 fewer)

80% Fewer motorcycle crashes with injuries (20 fewer)

Waymo Driver compared to human benchmarks

 Airbag deployments, any injury

The graphs below show how many fewer incidents (crashes) per million miles (IPMM) Waymo had compared to human drivers with the benchmark crash rate. The error bars represent 95% confidence intervals for the IPMM estimate.

The reductions are shown for all locations combined and separately for individual cities. 

The comparisons in Atlanta are not shown here due to Waymo’s limited mileage, which means the results are not yet statistically significant.

Learn about our methodology

Serious Injury or Worse Crash Rates

LocationIncidents per Million Miles (IPMM), WaymoIncidents per Million Miles (IPMM), Benchmark
All Locations0.020.23
Phoenix0.020.12
San Francisco0.050.47
Los Angeles0.000.14
Austin0.000.15

Any-Injury-Reported Crash Rates

LocationIncidents per Million Miles (IPMM), WaymoIncidents per Million Miles (IPMM), Benchmark
All Locations0.743.97
Phoenix0.582.04
San Francisco0.887.91
Los Angeles0.902.39
Austin0.633.30

 Airbag Deployment in Any Vehicle Crash Rates

LocationIncidents per Million Miles (IPMM), WaymoIncidents per Million Miles (IPMM), Benchmark
All Locations0.311.66
Phoenix0.281.38
San Francisco0.392.26
Los Angeles0.271.18
Austin0.162.46

Airbag Deployment in Waymo Vehicle Crash Rate

LocationIncidents per Million Miles (IPMM), WaymoIncidents per Million Miles (IPMM), Benchmark
All Locations0.061.12
Phoenix0.050.97
San Francisco0.101.31
Los Angeles0.000.95
Austin0.002.07

Waymo Driver compared to human benchmarks

Percent difference in crash rate

The graphs below show the percent difference between the Waymo and human benchmark crash rates by location, with 95% confidence intervals. A negative number means the Waymo Driver reduced crashes compared to the human driver. Confidence intervals that do not cross 0% mean the percent difference is statistically significant.

The percent reductions and confidence intervals show that the Waymo Driver has a large, statistically significant, reduction in crash rates compared to the human benchmark across many outcomes and locations.

The comparisons in Atlanta are not shown here due to Waymo’s limited mileage, which means the results are not yet statistically significant.

Learn about our methodology

Waymo crash rate percent difference to benchmark

LocationPercent Difference to Benchmark, Airbag Deployment in Any VehiclePercent Difference to Benchmark, Airbag Deployment in Waymo VehiclePercent Difference to Benchmark, Any Injury ReportedPercent Difference to Benchmark, Serious Injury or Worse
All Locations-81.55%-95.11%-81.37%-89.81%
Phoenix-79.53%-94.50%-71.44%-85.09%
San Francisco-82.89%-92.16%-88.93%-88.94%
Los Angeles-76.62%-100.00%-62.25%-100.00%
Austin-93.59%-100.00%-80.86%-100.00%

Percent of Waymo Driver collisions with <1mph change in velocity

(Delta-V <1mph)

Delta-V measures the change in velocity during a collision. It is another way to investigate crash severity and is one of the most important predictors of injury risk in vehicle-to-vehicle crashes.

This graph shows the percentage of SGO-reported crashes where the maximum Delta-V (from either the Waymo vehicle or other vehicle) was less than 1 mph—meaning the collision resulted in a <1mph change in velocity. A Delta-V less than 1 mph usually results in only minor damage (dents and scratches). This graph includes vehicle-to-vehicle and single vehicle crashes, but not crashes with pedestrians, cyclists, and motorcyclists.

Delta-V is estimated using an impulse-momentum crash model with inputs measured by the Waymo vehicle’s sensor system. Note: Comparable human benchmarks for <1mph Delta-V are currently not possible to estimate with high certainty.

Learn about our methodology

% of SGO Collisions with less than 1mph change in velocity (Delta-V <1mph)

Location% Crashes <1 mph Delta-v
ALL AREAS45%
SF47%
PHX42%
LA43%
ATX51%

Waymo Driver compared to human benchmarks by crash type

These graphs show how many fewer RO (rider-only) crashes Waymo had (regardless of who was at fault) compared to human drivers with the average benchmark crash rate if they were to drive the same distance in the areas we operate. Crashes were classified into one of 11 crash types, and are representative of all locations. Data is available by individual cities in the download section.

Bars showing a number percent difference are statistically significant.

Learn about our methodology

Airbag Deployment in Any Vehicle Crashes

Crash Type GroupEvents (Benchmark)Events (Waymo)
V2V LATERAL111 (-91%)
V2V INTERSECTION1188 (-93%)
V2V HEAD-ON53
V2V F2R2617
SINGLE VEHICLE250 (-100%)
SECONDARY CRASH139
ALL OTHERS51

Any-Injury-Reported Crashes

Crash Type GroupEvents (Benchmark)Events (Waymo)
V2V LATERAL348 (-76%)
V2V INTERSECTION1988 (-96%)
V2V F2R7644 (-42%)
SINGLE VEHICLE342 (-94%)
SECONDARY CRASH258 (-68%)
PEDESTRIAN514 (-92%)
MOTORCYCLE245 (-80%)
CYCLIST356 (-83%)
ALL OTHERS103 (-72%)

By making detailed information about crashes and miles driven publicly accessible, Waymo’s transparency will not only support independent research but foster public trust. We hope other companies developing and deploying automated driving systems follow suit.

David Zuby, Chief Research Officer, Insurance Institute for Highway Safety (IIHS)

Methodology

  • Methodology

    • Despite the public availability of crash data for both human-driven and autonomous vehicles, drawing meaningful comparisons between the two is challenging. To ensure a fair comparison, there’s a number of factors that should be taken into consideration. Here are some of the most important:

      • AV and human data have different definitions of a crash. AV operators like Waymo must report any physical contact that results or allegedly results in any property damage, injury, or fatality, while most human crash data require at least enough damage for the police to file a collision report.
      • Not all human crashes are reported. NHTSA estimates that 60% of property damage crashes and 32% of injury crashes aren’t reported to police (Blincoe et al. 2023). In contrast, AV companies report even the most minor crashes in order to demonstrate the trustworthiness of autonomous driving on public roads.
      • Focus should be put on injury-causing crashes. Low speed crashes that result in minor damage can cause property damage that can be quickly repaired. These low speed crashes are also the most frequent types of crashes. In traffic safety, the most emphasis is put on reducing the highest severity crashes that can result in injuries.
      • It’s important to look at rates of events (incidents per mile) instead of absolute counts. Waymo is growing its operations in the cities we operate in. With more driving miles come more absolute collisions. It’s critical to consider the total miles driven to accurately calculate incident rates. If you do not consider the miles driven, it may appear like incidents are increasing while in reality the rate of incidents could be going down.
      • All streets within a city are not equally challenging. Waymo’s operations have expanded over time, and, because Waymo operates as a ride-hailing service, the driving mix largely reflects user demand. The results on this data hub show human benchmarks reported in Scanlon et al. (2024) and extended upon in Kusano et al. (2025) that are adjusted to account for differences in driving mix using a method described by Chen et al. (2024). See the “Human Benchmarks” section below for more details.

      Waymo has used industry best-practices to make a fair comparison between AV and human data sources that is presented on this webpage. This analysis is described more below, and in even more depth in several of Waymo’s safety publications.

    • Waymo’s data is derived from crashes reported under NHTSA’s Standing General Order (SGO) and uses the same criteria as described in Kusano et al. (2024) and Kusano et al. (2025).

      We are intentionally using publicly available data to allow other researchers to replicate the results. To link the data shown on this dashboard to NHTSA’s published SGO data, researchers can download a list of SGO report IDs and boolean membership in each outcome group in the download section below. Comparisons of crash rates for the outcomes listed below and additional outcomes described in the release notes are also available for download.

      We compare Waymo’s crash rate to human benchmarks across several different types of crashes:

      Outcome Description Waymo Data* Human Benchmark
      Any-injury-reported A crash where any road user is injured as a result of the crash Any SGO reported crash with the field “Highest Injury Severity Alleged” is “Minor”, “Moderate”, or “Serious”, or “Fatality”). “Unknown” reported severity where the SGO narrative mentions injuries of unknown severity are also included. Police-reported crashed vehicle rate where at least one road user had a reported injury. A 32% underreporting adjustment was applied according to Blincoe et al (2023).
      Airbag deployment in Any Vehicle A crash where an airbag deploys in any vehicle involved in the crash Any SGO reported crash where the “Any Air Bags Deployed?” is “Yes” for either the subject vehicle (SV) or counter party (CP). Additionally, crashes are included in this category when a review of relevant data (e.g., video) finds an airbag deployed in a third party. Police-reported crashed vehicle rate where any vehicle involved in the crash had an airbag deployment. No underreporting adjustment was applied.
      Airbag deployment in Waymo Vehicle A crash where an airbag deploys in the Waymo vehicle involved in the crash Any SGO reported crash where the “Any Air Bags Deployed?” is “Yes” for the subject vehicle (SV). Police-reported crashed vehicle rate where airbag deployment occurred in the vehicle. No underreporting adjustment was applied.
      Serious injury or worse A crash where any road user is seriously injured or killed as a result of the crash Police reports were requested through public information requests for any SGO crash with “Highest Injury Severity Alleged” as “Serious” or “Fatality” for the field “Highest Injury Severity Alleged.” The SGO crash was included if the police report indicated any person in the crash had an “incapacitating” (“A”) or “killed” (“K”) injury severity. Police-reported crashed vehicle rate where any person in the crash had a police-reported injury of “incapacitating” (“A”) or “killed” (“K”). No underreporting adjustment was applied.

      *Based on initial data submitted as part of the NHTSA Standing General Order 2021-01

    • The human benchmark data are the same as reported in Scanlon et al. (2024), and extended upon in Kusano et al. (2025). These benchmarks are derived from state police reported crash records and Vehicle Miles Traveled (VMT) data in the areas Waymo currently operates RO services at large scale (Phoenix, San Francisco, Los Angeles, and Austin). The human benchmarks were made in a way that only included the crashes and VMT corresponding to passenger vehicles traveling on the types of roadways Waymo operates on (excluding freeways). The any-injury-reported benchmark also used a 32% underreporting correction (based on NHTSA’s Blincoe et al., 2023 study to adjust for crashes not reported by humans. The serious injury or worse (referred to as “suspected serious injury+” in the papers) and airbag deployment human benchmarks rates used the observed crashes without an underreporting correction.

      All streets within a city are not equally challenging. If Waymo drives more frequently in more challenging parts of the city that have higher crash rates, it may affect crash rates compared to quieter areas. The benchmarks reported by Scanlon et al. are at a city level, not for specific streets or areas. The human benchmarks shown on this data hub were adjusted using a method described by Chen et al. (2024) that models the effect of spatial distribution on crash risk. The methodology adjusts the city-level benchmarks to account for the unique driving distribution of the Waymo driving. The result of the reweighting method is human benchmarks that are more representative of the areas of the city Waymo drives in the most, which improves data alignment between the Waymo and human crash data. Achieving the best possible data alignment, given the limitations of the available data, are part of the newly published Retrospective Automated Vehicle Evaluation (RAVE) best practices (Scanlon et al., 2024b). This spatial dynamic benchmark approach described by Chen et al. (2024) was also used in Kusano et al. (2025).

    • Confidence intervals for Incidents Per Million Miles (IPMM) crash rates were computed using a Poisson Exact method. The confidence intervals for the percent reduction used a Clopper-Pearson binomial described in Nelson (1970). Both confidence intervals were assessed at a 95% confidence level. These confidence intervals use the same methods as described in Kusano et al. (2023).

      There is no perfect “apples-to-apples” comparison between human and AV data available today. The benchmarks and comparisons done on this page represent the current state-of-the-art human and AV data sources, based on the state of the art in the research in this field. The serious injury or worse and airbag deployment benchmarks do not have an underreporting correction for the human data because there is no estimate for airbag crash underreporting. Although, it is likely there is more underreporting in human crash data compared to AV crash data. The any-injury-reported benchmark does use an underreporting correction from Blincoe et al. (2023) based on multiple analyses of national crash police-report and insurance data and a national phone survey. It is not straightforward to compute confidence intervals on the any-injury-reported underreporting estimate because it is derived from multiple sources. There is also evidence that underreporting may differ between localities, meaning a national estimate may not fully represent underreporting in the cities Waymo operates in.

      See Scanlon et al. (2024) and Kusano et al. (2024) for a more comprehensive discussion of the limitations of these results:

      1. Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2024). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. Traffic Injury Prevention, 25(sup1), S51-S65.
      2. Kusano, K. D., Scanlon, J. M., Chen, Y. H., McMurry, T. L., Chen, R., Gode, T., & Victor, T. (2024). Comparison of Waymo Rider-only crash data to human benchmarks at 7.1 million miles. Traffic Injury Prevention, 25(sup1), S66-S77.

Frequently Asked Questions

  • 1. Are the results trustworthy?

      • Although comparing crash rates boils down to 4 simple counts – crashes and miles for Automated Driving System (ADS) and a benchmark – there are many decisions about the study design and data sources used that can affect the outcome. Safety impact research has been a well-used tool in the vehicle safety research literature, dating back to safety advances like electronic stability control and automated emergency braking. ADS which are responsible for the entire dynamic driving task present some unique challenges, and as a result the RAVE Checklist was published as a consensus of research best practices for ADS safety impact research. The checklist, which is being developed into an international standard, lays out the best practices for conducting safety impact studies of ADS like presented on the Safety Impact Data Hub. The research that underpins the safety impact data hub is designed to comply with the RAVE Checklist (see the online appendix of Kusano et al., 2025, for a conformance assessment of the methods against the RAVE checklist requirements).

      • Waymo’s safety impact research is based on reporting required by the National Highway Traffic Safety Administration (NHTSA) Standing General Order (SGO). All Automated Driving System (ADS) operators (a technical term for automated vehicle operators like Waymo), including Waymo, must adhere to the SGO and report all crashes meeting the requirements within the specified reporting windows. NHTSA has the authority to investigate and take corrective action if they believe there are reporting inconsistencies with ADS operator SGO reports. The SGO reporting requirements include crashes with minor damage, which is a lower reporting threshold (more minor crashes are included) than traditional police-reported and insurance crash databases. All crashes where any injury is alleged to have occurred or any airbag is deployed, which are the outcomes the safety impact data hub results focus on, must be reported as part of the SGO. Therefore, given the stringent reporting requirements and operational policies of Waymo’s fleet, it is highly unlikely that any crashes resulting in the outcomes reported on the data hub occurred and are not included. For reference, NHTSA reports (Blincoe, et al., 2023) that underreporting for human-driven vehicle crashes is 69.7% of property damage crashes and 31.9% of injury crashes. Waymo’s reporting is for all known crashes that are detected by a highly capable sensor suite, a more complete reporting. 

        Because Waymo is using police-reported data to derive benchmarks, only crashes where the Waymo vehicle is transported and contacted during the crash are included in the comparison to the benchmark. In police-reported data, vehicles that are not contacted during a collision sequence are not included as vehicles in the crash data. Therefore, comparing Waymo crashes reported in the SGO where there was no contact to the Waymo vehicle (but may be reported as part of the SGO due to alleged contribution to the crash) would overcount the Waymo crash rate relative to the crash rate. Similarly, the Waymo vehicle is sometimes parked in a valid parking space waiting to serve customers in the future. The ADS software is active, but the vehicle is in park and in a valid parking space (either a marked space or within 18 inches of a curb for on-street parking). In police-report data, parked vehicles like these are also not included in the vehicle count (parked vehicles are considered fixed objects). 

      • Aligning the Automated Driving System (ADS) and human crash data is one of the most important dimensions of doing a fair apples-to-apples comparison, and an important step to aligning data is coming up with a consistent definition for a “crash.” Waymo’s Safety Impact research uses past safety evaluation research as a starting point to pick crash outcomes that can be best identified in both ADS and human data sources. The most prevalent and reliable human crash data source are police report databases. Not all human crashes are reported to police, especially minor ones. More serious crashes that result in air bag deployments or injury (either serious injuries or worse, or any level of injury) are more relevant to assessing safety than those that result in small amounts of property damage.

        Even though we believe serious injury or worse, airbag deployment, and any-injury-reported outcomes are more relevant to assessing safety than those that result in small amounts of property damage, we still track and report these minor collision rates compared to benchmarks available in the downloads section of the data hub website (for example, any property damage or injury and police-reported).

      • The Waymo safety impact research uses several approaches to align the driving conditions between the human benchmark and Waymo driving: (a) using human data from the counties where Waymo operates and (b) a location-based, dynamic benchmark adjustment. Driving is different in different cities, and not all roads or driving conditions are equally risky. To capture the local crash risk, Waymo’s safety impact research uses state-maintained crash and vehicle miles traveled (VMT) data sources restricted to the counties where Waymo currently operates. Even within a county, human crash rates differ based on what part of the county you drive in. Generally, more dense areas of cities have higher crash rates than less densely populated areas. To capture this effect, Waymo’s safety impact results use a dynamic benchmark adjustment that weights the human benchmark proportionally to the miles the Waymo service drives in each area (see Kusano et al., 2025 and Chen et al., 2025, for details). By comparing Waymo driving to benchmark driving from the same locations, many of the driving condition effects are implicitly accounted for. Our research has shown that crash rates vary substantially by geographic location, which is why we suggest against using a national average benchmark to compare with Waymo’s driving.

        Better aligning the benchmark crash rates to the Waymo driving environment through local crash data and the dynamic adjustment accounts for many but not all possible factors that may affect crash risk. For example, the current cities Waymo operates in do not have appreciable snow fall, and as a result neither the Waymo nor the human benchmark data include this type of inclement weather. Chen et al. (2025) found that time of day affects crash rates (crash rates late at night are generally higher than during the day). The bottleneck for accounting for more factors when aligning the benchmark and Waymo data is often a lack of data for the human driving exposure. For example, the VMT data used to do the dynamic benchmark is provided as an annual average, so it can’t be used to adjust for time of day. We are investigating other data sources that could help provide human data to additionally align the benchmark and Waymo data.

      • The results on the safety impact data hub compare Waymo’s crash performance to the current human driving fleet from the areas where Waymo operates using best practices to align the Waymo and human crash data. This comparison answers the research question, “what is the effect of Waymo’s driving on the status quo.” This type of research question is the most basic question researchers ask when a new vehicle technology is being developed and deployed (for example, automated emergency braking, electronic stability control). This type of status quo comparison demonstrates the potential of a vehicle technology to improve traffic safety.

        Some of Waymo’s other research has investigated comparison to other populations. For example, in prior research and our prospective safety determination methodologies examining our collision avoidance performance, we readily compare the Waymo Driver’s performance against a “non-impaired with eyes on the conflict (NIEON)” driver. There are methodological challenges with creating a comparable crash rate version of this benchmark, because the exact amount of VMT for a “NIEON” like driver is not readily available to quantify benchmarks, primarily due to the fact that human drivers are not always in a NIEON state when driving.  In other work from Swiss Re in collaboration with Waymo (in peer review), Waymo’s third party claims rates have been compared to human drivers driving the latest-generation vehicles. This represents another, higher performing subset of human-driven vehicles because latest-generation vehicles typically have improved safety features.

        Another potentially enlightening comparison could be with other driving populations like taxis or human ride-hailing. Today, there are no publicly available (and therefore independently verifiable) data sources for quantifying crashes and VMT for these special populations across a wide range of outcomes like is done for general police report and public VMT databases.  Another benchmark that would represent a furtherance expectation could be non-impaired driver benchmark. While this can be a valuable comparison, it does not provide an assessment of reduction on the status quo crash rate. Similar to the special population rates, it’s difficult to produce a local estimate of both the number of impaired crashes and impaired VMT. These are challenging but valuable areas of further research as new data sources become available.

      • Injury outcomes can be measured in a variety of different ways. We did not want our analysis to be overly focused on the occupants of the Waymo vehicle, because this could undercount Waymo’s safety impact in being involved in crashes that injure people outside of Waymo’s vehicle. So, we chose a crash-level outcome that looks at the maximum sustained injury of any person involved across the entire crash sequence. A maximum, crash-level injury score is common practice in automotive safety research, and is actually often provided directly in police reports as an entry field.

    • It may seem like the miles driven by Waymo (in the 100’s of millions of miles) pales in comparison to the billions of miles driven in the cities where Waymo drives, or trillions of miles driven annually in the entire United States. When comparing the rates of two populations, however, the conclusions you can draw from data are governed by what is called statistical power. The question being answered by the Safety Impact Data Hub is are the Waymo and benchmark crash rates different? The input to this calculation is the number of crashes and the number of miles driven by Waymo and the benchmark populations and is modeled using a Poisson distribution, the most common distribution for handling count data.

      An example of this problem would be to examine the number of students that do not pass an exam. In a school district, say that 300 out of 1,000 students that take the same test do not pass (3 do not pass per 10 testtakers). One could ask whether a Class A of 20 students performed differently than the overall population on this test (note we are assuming passing or not passing the test is independent of being in Class A for the sake of this simplified example). Say Class A had 10 out of 20 students that did not pass the exam (5 do not pass per 10 test takers). Class A had a not pass rate that is double the rate of the school district. When we use a Poisson confidence interval, however, the rate of not passing in the class of 20 is not statistically different from the school district average at the 95% confidence level. If we instead compare Class A to the entire state of 100,000 students (with the same 3 not pass per 10 test takers rate, or 30,000 out of 100,000 to not pass), the 95% confidence intervals of this comparison are almost identical to the comparison to the county (300 out of 1000 test takers). This means that for this comparison, the uncertainty in the small number of observations in Class A (only 20 students) is much more than the uncertainty in the larger population. Take another class, Class B, that had only 1 out of 20 students not pass the test (0.5 do not pass per 10 test takers). When applying the 95% confidence intervals, this Class B does have a statistically different pass rate from the county average (as well when compared to the state). This example shows that when comparing rates of events in two populations where one population is much larger than the other (measured by test takers, or miles driven), the two things that drive statistical significance are: (a) the number of observations in the smaller population (more observations = significance sooner) and (b) bigger differences in the rates of occurrence (bigger difference = significance sooner).

      Now consider another experiment with Waymo data. Consider the figure below that keeps the number of Waymo airbag deployment in any vehicle crashes (34) and VMT (71.1 million miles) constant while assuming different orders of magnitude of miles driven in the human benchmark population (benchmark rate of 1.649 incidents per million miles with 17.8 billion miles traveled). The point estimate is that Waymo has 71% fewer of these crashes than the benchmark. The confidence intervals (also sometimes called error bars) show uncertainty for this reduction at a 95% confidence level (95% confidence is the standard in most statistical testing). If the error bars do not cross 0%, that means that from a statistical standpoint we are 95% confident the result is not due to chance, which we also refer to as statistical significance. This “simulation” shows the effect on statistical significance when varying the VMT of the benchmark population. This comparison would be statistically significant even if the benchmark population had fewer miles driven than the Waymo population (10 million miles). Furthermore, as long as the human benchmark has more than 100 million miles, there is almost no discernable difference in the confidence intervals of the comparison. This means that comparisons in large US cities (based on billions of miles) are no different from a statistical perspective than a comparison to the entire US annual driving (trillions of miles). Like the school test example, Waymo has driven enough miles (tens to hundred of millions of miles) and the reductions are large enough (70%-90% reductions) that statistical significance can be achieved. 

      Graph of ADS Reduction per Human Benchmark Vehicle Miles Traveled
    • This analysis leverages the methodology and human benchmarks introduced in Scanlon et al. (2024), Kusano et al. (2024), and Kusano et al. (2025)

      These research papers have been published in peer-reviewed, scientific journals.

      Citations:

      • Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2024). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. Traffic Injury Prevention, 25(sup1), S51-S65.

      • Kusano, K. D., Scanlon, J. M., Chen, Y. H., McMurry, T. L., Chen, R., Gode, T., & Victor, T. (2024). Comparison of Waymo Rider-only crash data to human benchmarks at 7.1 million miles. Traffic Injury Prevention, 25(sup1), S66-S77.

      • Kusano, K. D., Scanlon, J. M., Chen, Y. H., McMurry, T. L., Gode, T., & Victor, T. (2025). Comparison of Waymo Rider-Only Crash Rates by Crash Type to Human Benchmarks at 56.7 Million Miles. Traffic Injury Prevention, 26(sup1), S8–S20. https://doi.org/10.1080/15389588.2025.2499887.

      Peer review, where a research paper is submitted to a journal and anonymous researchers with expertise in the area of research review and suggest improvements to the paper. The peer review process has stood as the gold standard for publishing research. This process requires the research to be described in sufficient detail to reproduce the results, and that the conclusions of the research be supported by the results. The methods used on the Safety Impact Data Hub are the same as in the peer reviewed papers, which provides a level of transparency in the methods. As is customary in academic publishing, we also often release pre-prints of the publications while the articles are undergoing peer review, a current best practice, with the goal of disseminating our work and inviting comment from the scientific community.

    • Yes, the results on the Data Hub can be reproduced using publicly available data. As we wrote in 1.1.2, all Waymo crash counts are based on events reported as part of the NHTSA Standing General Order (SGO). Additionally, the raw data used to generate all the statistics on the data hub are provided as CSV file downloads – allowing any researcher or other third party to replicate and verify the results. This includes the number of miles driven in each location (CSV1), the SGO case identification and outcome categories for each case included in the analysis (CSV2), comparisons to the benchmark crash rates aggregated by location, outcome, and crash type (CSV3), and the miles driven in geographic locations in the city used for the dynamic location adjustment (CSV4). The methods used in the data hub are based on peer-reviewed papers that are open access (see question 1.3 for citations).

    • A crashed vehicle rate or vehicle-level rate is computed by counting the number of vehicles involved in crashes at a certain outcome level and dividing by the population-level VMT. For the Waymo crashes, the crashed vehicle rate is computed as the number of Waymo vehicles in crashes with a given outcome level divided by the total Rider-Only (RO) miles traveled by Waymo. For the benchmark, it is the total number of vehicles involved in crashes of a certain outcome in police report data divided by the total population VMT. 

      Another metric available is a crash-level rate (i.e., number of crashes per population VMT). To illustrate why using a crash-level benchmark to compare to vehicle-level rate of an Automated Driving System (ADS) fleet creates a unit mismatch that could lead to incorrect conclusions, it’s useful to use a hypothetical, and simple, example. Consider a benchmark population that contains two vehicles that both drive 100 miles before crashing with each other (2 crashed vehicles, 1 crash, 200 population VMT). The crash-level rate is 0.5 crash per 100 miles (1 crash / 200 miles), while the vehicle-level rate is 1 crashed vehicle per 100 miles (2 crashed vehicles / 200 miles). This is akin to deriving benchmarks from police report crash data, where on average there are 1.8 vehicles involved in each crash and VMT data where VMT is estimated among all vehicles. Now consider a second ADS population that has 1 vehicle that also travels 100 miles before being involved in a crash with a vehicle that is not in the population. This situation is akin to how data is collected for ADS fleets. The total ADS fleet VMT is recorded, along with crashes involving an ADS vehicle. For the ADS fleet, the crashed vehicle (vehicle-level) rate is 1 crashed vehicle per 100 miles. If an analysis incorrectly compares the crash-level benchmark rate of 0.5 crashes per 100 miles to the ADS vehicle-level rate of 1 crashed vehicle per 100 miles, the conclusion would be that the ADS fleet crashes at a rate that is 2 times higher than the benchmark. The reality is that in this example, the ADS crash rate of 1 crashed vehicle per 100 miles is no different than the benchmark crashed vehicle rate, in which an individual driver of a vehicle was involved in 1 crash per 100 miles traveled.

      Difference between Crash- and Vehicle-level Rates: Using crashed vehicles in human benchmark data instead of crashes gives a fair apples-to-apples comparison to ADS collision data.

      This mistake of comparing a crash-level rate to a vehicle-level rate is easy to do when using aggregate statistics because summary statistics provided by research agencies often list the number of crashes instead of the number of vehicles involved in crashes. For example, Scanlon et al. (2024) reported that nationally there were 5,930,496 police-reported crashes in 2022, involving 10,528,849 crashed vehicles. The total national VMT for 2022 was  3.2 trillion miles. This means that the crash-level rate for the US is 1.9 crashes per million miles while the vehicle-level rate is 3.3 crashed vehicles per million miles. 

      Another common metric used in traffic safety is injured people per VMT (i.e., a person-level rate). As a population level measure of the burden of crashes, a person-level rate has merit. There are several practical and interpretation issues that make a person-level rate not an ideal metric when comparing one population to another like is done in the Safety Impact Data Hub. A person-level rate for an ADS fleet operating in mixed traffic will appear to decrease as fleet size (or penetration) increases, even if crash involvement rate stays the same. Because crashes often involve multiple vehicles, the larger the fleet size the more likely it would be that multiple ADS vehicles are involved in a crash, which would decrease the person-level rate (same number of people involved in the crash, more VMT). This means that early in testing, the person-level rate of the ADS fleet would appear higher than the benchmark even if the ADS was involved in a similar number of crashes as the benchmark population. To address this bias, one could compute a fractional person-level rate defined as the total people involved in a crash at a given outcome divided by the number of vehicles in the crash. Although this fractional person-level rate addresses the bias in multiple vehicles, it creates a different bias in the interpretation of the results. The fraction person-level crash rate weights crashes involving fewer vehicles more than crashes that happen to involve multiple vehicles. There is also a practical limitation in that the NHTSA Standing General Order, the most comprehensive source of ADS crashes, reports only the maximum injury severity in the crash and not the number of injured occupants at given severity levels. So, it is not possible to compute a person-level rate from the SGO data today. This limitation also applies to some state crash databases, where only maximum severity is reported. Because of the potential biases in interpretation and reporting limitations, a vehicle-level rate is preferable to a person-level rate when comparing ADS and benchmark crash rates.

    • Mathematically, crashes per mile and miles between crashes are the inverse of one another (i.e., to go between crashes per mile and miles between crashes, just divide 1 over the rate). However, there are important reasons that crash rates should be presented in crashes per miles, as stated in the RAVE checklist recommendations. The reason is that the crashes per mile metric has a linear relationship with the number of events, whereas the inverse of miles between crashes has a non-linear relationship. This non-linear relationship makes it more difficult to compare changes in the rates. Similar difficulties have been noted in other measurements like vehicle fuel efficiency (miles per gallon vs gallons per 100 miles).

      As stated in the RAVE checklist: “Consider one ADS that has a miles per incident rate of 1 million miles per crash compared to a benchmark of 750,000 miles per crash. Another ADS has a 500,000 miles per crash rate compared to a benchmark of 250,000 miles per crash. In both instances, the difference in miles driven per crash is 250,000, giving the illusion that the difference in performance is similar. Contrary to this, the former comparison shows an ADS that reduces the number of crashes per mile by 25% (1 IPMM vs 1.33 IPMM), while the latter reduces the number of crashes per mile by 50% (2 IPMM vs 4 IPMM). Because the incidents per exposure units rates are linearly proportional to the number of events and the exposure unit per incident rates are not linearly related, it is not readily apparent that the relative rates are more difficult to compare.”

      Graph of Expected incidents over 2 millions miles vs Million miles per incident (MMPI), compared with graph of Expected incidents over 2 millions miles vs Incidents per million miles (IPMM).
  • 2. What do these results mean?

      • The research shows that the Waymo Driver is safer than the overall human driver population in the same geographical areas where it operates, measured by the number of crashes of a given outcome per vehicle mile traveled. The research has focused on comparing the Waymo Driver’s safety performance to the entire collection of human-driven vehicles within the same geographical area. The human crash rate can be thought of as the “status quo” of driving for that area. This comparison is used in safety impact analysis to determine how effective the introduction of Waymo’s  technology is compared to the status quo. 

        The overall human driver crash rate is commonplace, and made possible due to the longstanding data reporting practices almost universally present throughout the United States and much of the world.  There is a strong historical precedent for examining year-over-year trends and systemic challenges facing entire geographic driver populations. While crash data offers some insight into various performing subsets (e.g., factors like vehicle type or driver intoxication), the corresponding VMT data typically lacks such fine-grained granularity for detailed breakdown. For example, to be able to compare crash rates for factors like driver intoxication, you would need to know the VMT for drivers with intoxication or estimate it. Significantly less effort has been dedicated to isolating specific subsets of drivers for comprehensive crash risk analysis.

      • While a statistically significant reduction represents a safety benefit (i.e., there are fewer crashes), a claim regarding “safe enough” is made through Waymo’s Safety Framework and Safety Case before the release of an ADS configuration. The goal of safety impact is not to establish what is or is not a reasonable level of safety for an automated driving system. Waymo uses its Safety Framework to determine safety readiness according to approval guidelines for a given software release candidate. Additionally, independent analysis of the appropriateness of such a process is conducted through the Safety Case. A safety case is a formal way to explain how an ADS developer determines that its system is safe enough to be deployed on public roads without a human driver. The safety case includes evidence to formally determine absence of unreasonable risk. It involves an explanation of the system, the methodologies and metrics used to validate it and the actual results of validation tests. Conversely, the retrospective evidence provided by the Safety Impact Data Hub serves a post-deployment validation role for the Safety Framework and Safety Case. This cycle of continuously building confidence in the Safety Framework and Safety Case processes also builds confidence that the process will yield similar safety impact results as Waymo expands into new areas. 

      • Most of the safety impact research uses crashes from all Rider-Only (RO) miles accumulated to-date. Waymo’s driving miles over time have greatly increased, such that the more recent data makes up a larger proportion of the Waymo driving miles than the older miles. Similar to the FAQ on “why aren’t the comparisons of Waymo Rider-Only driving to the benchmark crash rates divided into more categories?”, dividing the driving miles into smaller portions reduces the statistical power of the analysis, which is a common limitation noted in other safety critical fields. 

        Grouping multiple software releases, or even data from different manufacturers, is commonplace in the safety impact research done on past safety systems. For example, the research from the Insurance Institute for Highway Safety and the PARTS consortium often looks at a technology like Automated Emergency Braking or Lane Departure Prevention and groups several manufacturers together to determine the overall impact of a technology. Similarly, the Waymo safety impact research shows the overall impact of the Waymo Driver. As more driving mileage is accumulated, there is an opportunity to investigate Waymo’s safety impact on smaller time periods.

        Waymo’s safety impact research strives to answer the research question “what is Waymo’s safety impact over current human-driven vehicle crash rates?” (the status quo). A slightly different, and equally as important, question is “how is Waymo confident that new software and hardware releases are safe?” To answer this second question, Waymo has developed a Safety Framework and Safety Case Approach. In short, they evaluate Waymo’s performance on every new candidate configuration using a collection of methodologies that span vehicle architecture, driving behavior, and operational layers against acceptance criteria.

      • Waymo has published a wide range of benchmarks, including fatal crash involvement, that will be used in future evaluations. Currently, there is not enough Waymo VMT to detect statistical significance in the areas that we drive and therefore we do not separate out that fatalities-only category in our reporting. The Waymo Driver is also inherently designed to mitigate or eliminate the top causes of fatal collisions according to the latest NHTSA data: speeding, impaired and distracted driving, and unbelted passengers. The “serious injury or worse” category includes both seriously injured and fatalities. All other crash outcome categories also include fatalities. 

        Waymo’s approach has been to (a) proactively publish benchmarks, methodology, and intended analytical lenses, (b) perform evaluations on those established benchmarks when previously completed power analyses indicate that significance may be detected, and (c) publish the findings on our data hub and in scientific publications. 

        As has been the case for many safety innovations in the history of vehicle safety, there are other ways to determine the potential of a technology before it is widely deployed and miles are accumulated. For example, our research that reconstructed fatal crashes involving human drivers in Chandler, AZ found the Waymo Driver avoided 100% of simulated, fatal crashes when it was the initiator, and 82% of collisions even when it was the responder. This type of study, when paired with Waymo’s safety readiness determination process, shows that the Waymo Driver has a tremendous potential to reduce serious and fatal injuries.

      • In automotive safety research, injuries at or above a certain level are commonly studied. In the case of our analysis, “serious injury or worse” includes both suspected serious injuries (denoted as “A”-level or incapacitating injuries on the KABCO scale used on police reports in the US) and fatal injuries (denoted as “K”-level injuries on the KABCO scale). Waymo has published benchmarks that include “K”-level crashes, any fatality, as its own category. This outcome is not currently reported as part of the Safety Impact Data Hub, but we intend to add this outcome at a later date.  

        If we looked at “serious injury” (only “A”-level injuries) by itself, we could potentially introduce a type of exclusion bias. For example, if a treatment were to create only fatal outcomes and very few suspected serious injury outcomes, it could erroneously lead to the conclusion the treatment is much safer than it is, because the “fatal” injuries were not being counted. By adding in the “at or above” stipulation, we avoid this potential fallacy. 

      • The Waymo safety impact research investigates the difference in crash rates between Waymo vehicles and human drivers that drive in comparable areas. Remote Assistance enables the Waymo Driver to contact a human agent for additional information to help contextualize its surroundings in certain challenging or uncommon situations. Remote Assistance has been part of the design of the Waymo Driver since the beginning, and it is indeed part of the reason Waymo has been able to scale its operations safely. Waymo’s Remote Assistance program has received an independent, third-party audit that shows it conforms to industry best practices in this area and is indeed autonomous according to industry best practices (1, 2). 

      • The Waymo Driver is currently driving millions of miles per week. The Waymo Driver has the cumulative experience of hundreds of human lifetimes of driving, when taking into account miles driven on the road and in simulation. At this scale, many of the challenging situations, like pedestrians suddenly appearing from behind a parked car or another vehicle running a red light, happen regularly. If the Waymo Driver could not handle many of the challenging situations that humans handle throughout their lifetime, then Waymo’s crash rates would not be so much lower than human drivers.

        Waymo uses its Safety Framework to determine safety readiness according to approval guidelines for a given software release candidate. Additionally, independent analysis of the appropriateness of such a process is conducted through the Safety Case. A safety case is a formal way to explain how an ADS developer determines that its system is safe enough to be deployed on public roads without a human driver. The safety case includes evidence to formally determine absence of unreasonable risk. It involves an explanation of the system, the methodologies and metrics used to validate it and the actual results of validation tests.

      • The results of Waymo’s safety impact research show that compared to the current status quo of human driven vehicles, Waymo has fewer injury-causing crashes per vehicle mile traveled. Part of the benefit is that there is sometimes no one in the Waymo vehicle (e.g., while the vehicle is traveling to or from a depot to charge or between serving riders). It is important to note that the metrics examined by Waymo’s safety impact research considers an injury to any person involved in the crash sequence, whether or not the person is inside a Waymo vehicle. This includes human vulnerable road users, such as pedestrians and cyclists, or the occupants of other vehicles involved in a crash. Therefore, even if there is some benefit from the Waymo vehicle being unoccupied sometimes, it’s unlikely this unoccupied benefit alone explains Waymo’s large reduction in injury-causing crashes (the vehicle could be unoccupied all the time and still get in crashes that may injure people outside the vehicle). Other outcomes, like the airbag deployment metrics, are not affected by the Waymo vehicle occupancy. The Waymo vehicle airbags will fire regardless of occupancy of the Waymo vehicle. The magnitude of the airbag reduction compared to the benchmark is similar to the injury-causing reduction, increasing the confidence that the observed benefits are not highly dependent on Waymo vehicle occupancy.

      • Aligning the Waymo and human crashes and driving is one of the most important factors for making a fair “apples-to-apples” comparison of crash rates (see question 1.1 for more details on alignment). 

        We have focused our early efforts on three components that we believe are informative for safety evaluation. 

        • Collision severity - establishing multiple levels ranging from police-reported to fatality.  

        • Crash type (shown below) - the typology we selected was based on prior research from NHTSA highlighting the most challenging driving scenarios. 

        • Road type - We break down crash rates by surface streets and freeways. We currently only have limited miles on freeways, so we focus on surface streets, only. But, we plan to differentiate between the two road type groupings in our future publications when the VMT enables a statistical comparison. 

        Diagram explaining crash types: Cyclist, Motorcycle, Pedestrian, Secondary Crash, Single Vehicle, Vehicle-to-vehicle backing, Vehicle-to-vehicle front-to-rear, vehicle-to-vehicle opposite direction, vehicle-to-vehicle intersection, and vehicle-to-vehicle lateral.

        We are actively working to expand the analytical lenses we apply to evaluating the Waymo Driver. However, we are generally limited by the available human crash data. Waymo is actively relying on publicly available crash and mileage data. This data has limited information about the specifics of each individual human crash. Conversely, Waymo’s data is rich with information due to our constant monitoring of VMT and our ability to capture each crash with our wide array of sensors. To expand our analysis, we are continuously investigating the usage of new data sources with more fine-grained information, and looking toward the broader community for analyses and data that can help support the research.

      • All crashes involving Waymo vehicles operating in Rider-Only (RO) configuration are included in the safety impact analysis. Therefore, the collision risk of the Waymo vehicle stopping on a roadway and another vehicle subsequently colliding with the stopped Waymo vehicle is included in the safety impact. These types of stopped vehicle crashes are also included in the human benchmark.

      • This analysis included all collisions, regardless of the party at fault and Waymo’s responsibility. Moreover, the question of fault in causing or contributing to a collision is a legal determination. That said, the recent peer reviewed study led by Swiss Re showed that over 3.8 million miles, the Waymo Driver reduced the frequency of property damage insurance claims by 76% and completely eliminated bodily injury claims compared to human drivers.

        Citation:

        • Di Lillo, L., Gode, T., Zhou, X., Atzei, M., Chen, R., & Victor, T. (2024). Comparative safety performance of autonomous-and human drivers: A real-world case study of the Waymo Driver. Heliyon, 10(14). https://doi.org/10.1016/j.heliyon.2024.e34379

        A subsequent study using insurance claims data that is currently under peer review found the Waymo RO service had similarly large reductions compared to humans over 25 million miles driven. In addition to an overall human benchmark, this new study also introduces a “new model year” vehicle benchmark. The newest vehicles (defined as model years 2018 to 2021) had lower property damage and bodily injury claims rates compared to the overall population. Waymo had an 88% reduction in property damage claims, 92% in bodily injury claims compared to the overall population, and an 86% reduction in property damage claims and 90% in bodily injury claims. All of these differences were statistically significant.

        Citation:

      • Today, Waymo’s service is comparable to human ride-hailing services. The data show Waymo prevents serious injury or worse, airbag deployment, and any-injury-reported by more than 80%. In order for the introduction of Waymo to lead to a net increase in crashes, Waymo would need to increase overall VMT by over 80%, which does not seem like a realistic assumption. There are many studies that show that overall VMT and vehicles on the road can be greatly reduced with the introduction of shared autonomous vehicles (for example 1, 2, 3, 4, 5).

      • Traffic safety is a public health issue and the 2030 Agenda for Sustainable Development has set an ambitious target of reducing road traffic deaths and injuries worldwide by 50% by 2030. A study by the RAND Corporation modeled Automated Driving System (ADS) deployments under several assumptions, including a system that had a crash rate that is only marginally lower than current humans or waiting years to deploy a system with a much lower crash rate than humans. The findings were that more harm could be prevented by deploying earlier. 

        Waymo has a safety framework and safety case approach that has a top level goal of deploying a Rider-Only (RO) system that has an absence of unreasonable risk (AUR). This safety case goal is accomplished by decomposing the possible hazards of the system by several dimensions, setting acceptance criteria, and assessing both the claims and evidence before deploying. This process is designed to ensure the Waymo Driver is acceptably safe before deploying.

      • We do not need to choose one technology or policy initiative to combat the traffic safety crisis. Automated vehicles, like the Waymo Driver, are one of many tools available for improving traffic safety. Waymo is committed to the safe system approach and Vision Zero, which through multiple improvements for safer roads, safer speeds, safer vehicles, safer road users, and safer post-crash care. Many improvements to safety (like investments in safer roads, setting safe speed limits, enforcing existing traffic laws, improving seat belt compliance, reducing impaired driving, to name a few) will also make riding in a Waymo safer. Waymo, like much of the industry, is a privately funded company. We as a society can support the expansion of automated vehicles without taking away from other safety improvements.

        Automated vehicles pose a unique opportunity compared to other safety technologies because of the relatively large safety impact over human driving. For example, Automated Emergency Braking reduces rear-end striking crashes (which are only about a quarter of all crashes) by approximately 50%. In comparison, the Waymo Driver reduces crashes resulting in any-injury-reported by approximately 80% across all crash modes, including intersection and VRU crashes where current active safety technologies are not yet significantly reducing crashes.

      • The Safe System approach, based on the global Vision Zero movement, is a systematic method that aims to eliminate serious and fatal injuries in the road transportation system. Waymo’s Automated Vehicles provide a valuable tool in the Safe System toolkit because they are designed to follow principles of Vision Zero. Waymo requires seat belt use by all occupants. Waymo is designed to follow the speed limit and uses vehicles with the latest passive safety features. 

      • In this analysis, we use publicly available data — specifically, Waymo’s crash reports submitted under NHTSA’s Standing General Order (SGO) — to enable other researchers to replicate the results. The data displayed on this webpage undergoes consistent updates aligned with the NHTSA SGO reporting timelines.

        In addition to new data being published, we may update the methodology used to do comparisons between the Waymo RO (Rider Only) service and human benchmarks. The best practices in retrospective safety impact is an evolving science. When we do make changes in methodology, we will communicate those changes and their effects on the results and interpretation of the data. For more details, see the release notes documents available in the downloads section.

      • This information is important to analyze and understand collisions and is not available in the NHTSA SGO. Data after June 2025 does not have zip code, because this field was removed from the NHTSA SGO reporting form (see SGO amendment 3). The zip code field was added back to the SGO reporting form in September 2025. SGO events reported after September again have zip code in the data download file.

Safety Research

We’re actively conducting studies and publishing peer-reviewed findings on our safety methodologies, performance data, and more

Read our publications
  • Miles per Geo

    Total miles driven in each location (through September 2025)

    Download CSV

  • Crashes with SGO identifier and group membership

    Police-reported, any-injury-reported, airbag deployment, serious injury or worse, delta-V < 1 mph and other relevant collision information: day, location, zip code (through September 2025)

    Download CSV

  • Collision count and comparisons to benchmarks by outcome and location

    Aggregated by outcome and location (through September 2025)

    Download CSV

  • Geographic distribution of benchmark and Waymo RO miles

    Human benchmark crash counts for different outcome levels, human vehicle miles traveled (VMT), and Waymo RO miles reported by S2 cell through June 2024. This information can be used to reproduce the dynamic benchmark adjustments.

    Download CSV

  • Release Notes

    A description of changes to the data and methodologies used on the data hub, links to historical data, and data dictionaries.

    Download PDF