Spreadsheet of San Francisco Bay Area Covid-19 Data and Charts
docs.google.comCalifornia unfortunately has a huge backlog of pending test results. The cause seems to be the private labs (Quest in particular) accepted test samples and build up a huge backlog of the earlier manually processed test samples. Other labs would push back if their queue got too long. The newer samples are run on the Roche high speed machines.
Do you happen to have a source for the cause of the backlog? Not doubting you, just curious to read more information.
See the link to the Atlantic mentioned by boyd.
here you go: https://covidtracking.com/data/state/california
I’m asking for a source on why the backlog happened. Unless I’m missing something, that just shows the number of pending cases.
They accepted more tests than they had the ability to quickly process. They also appear to have accepted many tests that required use of a lower throughput assay before switching to higher throughput testing on Roche 8800s: https://www.theatlantic.com/health/archive/2020/03/next-covi...
Thanks, googling found this[1] statement by Quest that they have a backlog of 115,000 tests, down from 160,000 on March 25th. It would be interesting to know what percentage of the California backlog is contained within that.
[1] https://newsroom.questdiagnostics.com/COVIDTestingUpdates
Where is the most important metric? Daily tested vs tested positive stats.
I don't believe this is a reliable metric.
Who gets tested is a moving target. Stanford a short time ago did a free-for-all testing binge in order to collect data, but finished that and is now restricting tests to people requiring specific risk factors to give a test.
The first time I tried to get a test from another provider I just wasn't able, they didn't know of anywhere that would test me outside of hospitalization-type symptoms.
So testing is uneven and not very available, any stats need to include some metric for the criteria to get tests in the first place.
In other words, there is likely an enormous population with no symptoms or mild symptoms who couldn't get tested if they tried.
After two video appointments with separate providers I was able to get tested yesterday and the result came back negative about 22 hours later. It took me about 8 hours of effort and time to get that done, a luxury many people do not have.
> After two video appointments with separate providers I was able to get tested yesterday and the result came back negative about 22 hours later. It took me about 8 hours of effort and time to get that done, a luxury many people do not have.
Is there any value in people self-selecting into personal choice testing? You could get infected tomorrow, for instance...
If we wanted a full picture of community spread we'd need a top-down random sample, not self-selection, no?
Aren't there ways of turning self-selection populations into random sample populations for statistical purposes? (It has just been a while since I have had to think of these things).
But really we want more than just accurate statistics, we want to minimize damage. Any increase in testing is good testing, and triaging testing to highest risk individuals makes sense when your capacity is limited.
The consequences though are that reported statistics are often just wrong. Skewed towards higher negative outcomes and comparisons between dates are flawed without much additional information.
It's one of the most reliable metrics we have, and a lot better than just tests or just confirmed cases.
After you have this info, you can compare the rates to what kind of testing policies the areas have, and make some initial conclusions.
The one I pay attention to is daily growth rate of confirmed cases. It can't cover people who aren't tested. But, it approximates the velocity of the problem's magnitude. And, over time it shows the acceleration --which reflects on how we are improving the situation, or not...
https://paroj.github.io/arewedeadyet/#rate
The good news is that the US has gone from a 30+% daily growth rate 10 days ago down to a 15% growth rate and falling. We need to keep falling into the negative rates to solve this problem.
me too. it has some noise from variation in testing rate, but it's directionally accurate.
for example the bay area has recently been seeing some days with single digit growth rates. shelter in place IS WORKING, but it's going to take time / we may need some additional measures. I was just reading it may also be spread in the air from breathing.
Honestly, the most important metric is deaths, and from what I can see, the SF Bay Area has done relatively well in that metric. No overcrowded hospitals, for example.
To me the hospitalization rate is most important.
* Overcrowded hospitals is what leads to large jumps in fatality rates.
* It only lags the date of infection by about a week.
* It also isn't subject to external factors like availability of tests. (Though availability of hospital beds is a factor later on)
Yes, this is very important. But one should not forget also the avg. hospitalization time (which will go down once we have clear procedures for treating COVID in different stage)
Agreed! Still not available much on a county level in the San Francisco Bay Area :(
In the end, yes deaths are most important, but in order to make temporal decisions that affect that number, we need infection-related data first.
yes, but it lags 2-3 weeks behind confirmed cases
This is not released by most counties, unfortunately.
There are some limited stats to the very far right side of the "SF Bay Area Actuals" sheet.
Anecdotally, Bay Area is seeing <10% positivity rates
Another important data point to assess testing is Case Fatality Rate (CFR). This is about 2.5% in the SF Bay Area.
In other places with higher testing, such as Australia, the CFR is 0.6% or less. This implies that the true number of cases is 4-5 times higher... probably a lot more.
It seems like this disease is so successful because of a significant symptom-free-but-contageous period followed by a small percentage of very serious symptoms.
That's what a pandemic needs. If it is very deadly very quickly it kills its transmission vectors before they can transmit. If it is entirely symptom free, it is very evolutionarily successful, but no one cares because there aren't any negative effects.
There is an "optimum" of disease characteristics for maximum damage and we seem to be experiencing one.
The bottom line is that it seems to be very difficult to prevent a majority of the world population from getting this disease and the result is going to be a global fatality rate of somewhere in the neighborhood of 1%.
What put it into perspective for me is the CDC estimate of up to 25% cases being entirely asymptomatic [1], and data from Iceland shows 50% of those tested were asymptomatic at time of testing [2].
It will be hard to trace and isolate if this is the case.
[1] https://sfist.com/2020/04/01/cdc-director-coronavirus-25-per... [2] https://nationalpost.com/news/world/in-iceland-free-coronavi...
The Diamond Princess numbers are 11 deaths out of 712 cases, with 82 still outstanding (15 serious or critical).
The CFR should end up being about 1.5% (or possibly somewhat higher).
Cruise ship passenger demographics might not be representative of the general population.
It's currently a little under 5% worldwide. There are on the order of 1 million cumulative cases and a bit under 50 thousand deaths.
https://www.who.int/emergencies/diseases/novel-coronavirus-2...
Doesn't account for lack of testing, of course.
That also doesn't account for the exponential growth in number of cases; the people dying now are out of a much smaller cohort of confirmed cases in the past.
Deaths / (Deaths + Recoveries) would be more like it, and that's a scary number.
That's a very good point; unfortunately the WHO doesn't have a "closed case" statistic that I can see.
The conspicuous lack of realistic infection data from India, coupled with the extreme challenges to containment and control there (just due to the sheer crowding) is frightening, regardless of whether the poor data is intentional or just because India is hard place to coordinate.
That the published infection and mortality rates are so low strains credulity in the extreme, especially when much smaller-population countries at similar proximity to the equator but greater distance from China have higher case rates (i.e Brazil, Ecuador, the UAE).
I developed this for myself but data junkies trying to get a feel for what is happening with the coronavirus spread across the San Francisco Bay Area will appreciate it.
I am updating it regularly.
Where are you getting the raw data? I'm extracting it from the New York Times dataset for my own graphing. They have the data for all counties in the US. I've been meaning to automate the graphing but for now doing it manually.
I wish you had the new cases per day graphed for all the bay area counties because that is what I monitor.
Raw data was originally from SF Chronicle, but they removed their timelapse view so I am now getting it direct from county websites. Stanford Open Data project also has a reasonable historical dataset that comes from the county websites.
I'll add a new cases graph for each county.
@andfrob, just saw your comment about SF Chronicle removing their timelapse view.
We made one here from the NYT dataset on MintData [1]:
(note: I think we need to update the cumulative counter, we'll be fixing that shortly)
@andfrob happy to get you free/unlimited access to MintData if you're interested in making similar visualizations, please DM me if this would be helpful.
Is anyplace in the Bay Area sharing stats by zipcode?
For example, San Diego has zipcode breakdown here: https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs...
Skimming the health department web sites of various counties, it doesn't look like it. Most of them just provide the basic cases and deaths numbers. Whoever compiled this at the Stanford Open Data site (https://opendata.stanforddaily.com/#/datasets/covid19_bayare...) might be doing so manually.
SF County - https://www.sfdph.org/dph/alerts/coronavirus.asp
San Mateo County - https://www.smchealth.org/coronavirus
Alameda County excluding Berkeley - http://www.acphd.org/2019-ncov.aspx
Berkeley - https://www.cityofberkeley.info/coronavirus/
Santa Clara County - https://www.sccgov.org/sites/phd/DiseaseInformation/novel-co...
Marin County has an (ominously named) dashboard - https://coronavirus.marinhhs.org/surveillance
That would be so useful to identify hotspots within counties. Also information about new cases such as if they are working from home or are considered "essential workers". How are transmissions occurring despite all the stay at home efforts? That would help us all tighten up our collective defenses. Hopefully the governments are doing at least some rudimentary "contact tracing" efforts and we're just not getting to see the data.
Austin has a decent breakdown here: http://www.austintexas.gov/covid19
Not that I am aware of. And the counties vary a lot in terms of the data they provide. The best is Napa County.
Very cool. By the way did you intend for the Y-axis on the "Days since 100 cases" chart to be "Days since 100 cases"? It seems like the Y-axis is "cases" and the X-axis is "Days since 100 cases".
Thanks, fixed!
Very helpful. Thank you for sharing!
Have you been able to find data on # of tests carried out?
Very, very limited data on the Bay Area. Under the "SF Bay Area Actuals" you can scroll all the way to the right you will see what I have been able to find.
California does report them on aggregate, but the purpose of this sheet was to focus on the Bay Area.
Is anyplace in the Bay Area sharing stats by age brackets?
Santa Clara County (South Bay Area) shares a dashboard with cases by age group and deaths by age group: https://www.sccgov.org/sites/phd/DiseaseInformation/novel-co...
Design an evacuation plan for San Francisco. You have 15 minutes.
Rename it Oakland.