Statistical Formulas for Programmers (2013)

evanmiller.org

239 points by Tomte 10 days ago


armanboyaci - 10 days ago

>Being able to apply statistics is like having a secret superpower.

I totally with this sentence. BUT If you ask for my opinion, merely knowing a list of statistical formulas is not very helpful. Most of the time, people don’t remember the underlying assumptions, so there is a fair chance they will use them in inappropriate situations.

I recommend watching these two YouTube videos. The presenters advocate using simulation/bootstrapping/shuffling methods instead of memorizing formulas.

Jake Vanderplas - Statistics for Hackers https://www.youtube.com/watch?v=Iq9DzN6mvYA

John Rauser - Statistics Without the Agonizing Pain https://www.youtube.com/watch?v=5Dnw46eC-0o

mcphage - 10 days ago

The article "How Not To Sort By Average Rating" by the same author (and also linked in this article) is really good, and definitely changed my thinking about any kind of "sort by best to worst" list: https://www.evanmiller.org/how-not-to-sort-by-average-rating...

Terr_ - 10 days ago

I think I avoid imposter syndrome in some areas, but Not Enough Real Math is definitely a weak spot.

When people start talking about eigenvalues, I'm just a business-rule caveman with a little discrete-math unga bunga.

This kind of statistical stuff falls somewhere in-between.

bob1029 - 10 days ago

I'd add z-score (standard score) to your tool belt. The ability to identify or reject outliers is invaluable when trying to stabilize real-world business processes.

For example, if you are building heuristics that determine if a customer's bank account is "reasonably active", you may not want to consider very small transactions unless that is typical activity for a given customer.

TheHideout - 10 days ago

FYI, using this stuff without understanding Test Power is dangerous and can lead to making bad decisions with false confidence.

gpderetta - 9 days ago

Also: "Common statistical tests are linear models (or: how to teach stats)"[1]. Also also, bootstrapping is a superpower.

[1] https://lindeloev.github.io/tests-as-linear/

snitzr - 10 days ago

Why isn't 7 greater than 5?

cmdrmac - 10 days ago

This is certainly a very useful resource - even for a seasoned data scientist!

curtisszmania - 10 days ago

[dead]

hmcamp - 10 days ago

[flagged]

extrememacaroni - 10 days ago

[flagged]