Statistical Formulas for Programmers (2013)

evanmiller.org

240 points by Tomte 2 months ago


armanboyaci - 2 months ago

>Being able to apply statistics is like having a secret superpower.

I totally with this sentence. BUT If you ask for my opinion, merely knowing a list of statistical formulas is not very helpful. Most of the time, people don’t remember the underlying assumptions, so there is a fair chance they will use them in inappropriate situations.

I recommend watching these two YouTube videos. The presenters advocate using simulation/bootstrapping/shuffling methods instead of memorizing formulas.

Jake Vanderplas - Statistics for Hackers https://www.youtube.com/watch?v=Iq9DzN6mvYA

John Rauser - Statistics Without the Agonizing Pain https://www.youtube.com/watch?v=5Dnw46eC-0o

mcphage - 2 months ago

The article "How Not To Sort By Average Rating" by the same author (and also linked in this article) is really good, and definitely changed my thinking about any kind of "sort by best to worst" list: https://www.evanmiller.org/how-not-to-sort-by-average-rating...

Terr_ - 2 months ago

I think I avoid imposter syndrome in some areas, but Not Enough Real Math is definitely a weak spot.

When people start talking about eigenvalues, I'm just a business-rule caveman with a little discrete-math unga bunga.

This kind of statistical stuff falls somewhere in-between.

bob1029 - 2 months ago

I'd add z-score (standard score) to your tool belt. The ability to identify or reject outliers is invaluable when trying to stabilize real-world business processes.

For example, if you are building heuristics that determine if a customer's bank account is "reasonably active", you may not want to consider very small transactions unless that is typical activity for a given customer.

TheHideout - 2 months ago

FYI, using this stuff without understanding Test Power is dangerous and can lead to making bad decisions with false confidence.

gpderetta - 2 months ago

Also: "Common statistical tests are linear models (or: how to teach stats)"[1]. Also also, bootstrapping is a superpower.

[1] https://lindeloev.github.io/tests-as-linear/

snitzr - 2 months ago

Why isn't 7 greater than 5?

cmdrmac - 2 months ago

This is certainly a very useful resource - even for a seasoned data scientist!

curtisszmania - 2 months ago

[dead]

hmcamp - 2 months ago

[flagged]

extrememacaroni - 2 months ago

[flagged]