Settings

Theme

Beautiful Probability

readthesequences.com

71 points by codeAligned 2 years ago · 66 comments

Reader

birdofhermes 2 years ago

As other commenters have pointed out any given introductory chapter in a book on Bayesian statistics, including Jaynes’, is better exposition than this. I found _Probability Theory: The Logic of Science_ very easy to follow and very well-written.

I had a similar experience when I finally found a copy of Barbour’s _The End of Time_ and discovered, much to my chagrin, that it wasn’t nearly as mystical or complicated as EY makes it seem in the Timeless Physics “sequence”. Barbour’s account was much more readable and much easier to understand.

Yudkowsky just isn’t that great of a popular science writer. It’s not his specialty, so this shouldn’t be surprising.

  • topologie 2 years ago

    100% with you on Jaynes and Barbour.

    Jaynes' book is a game changer, but I particularly love that you mentioned Barbour and his work.

    On Barbour's work: Apart from being an incredibly interesting book, I was amazed that he was a sort of "outsider" writing papers and books "on his own" (or at least outside of Academia) while making money through technical translations is just a really clever way to be able to explore any interesting avenues one might find. Einstein had the right idea too...

    (Sadly, it's also something that wouldn't be as feasible nowadays, but who knows...)

  • xelxebar 2 years ago

    Jaynes is great, but The Logic of Science is a bit rough around the edges, with lots of errata. Jaynes died when the book was really just a very rough draft plus notes. Bretthorst had to go in and turn it into something publishable, not an enviable task by any means.

    Here's a list of errata and commentary, collected by a fan: https://ksvanhorn.com/bayes/jaynes/index.html.

    • topologie 2 years ago

      Thank you for this!

      I had spotted some errors here and there, but it's always good to have them in one place.

      I think we are all in the same wagon when I say that even with those rough edges Jaynes' book is kind of a transformative experience for everyone who has already been "conditioned" to other Probability texts.

      For example, for me Feller is a great intro to "start working with Probability," but Jaynes is where one starts actually "thinking in Probability."

      The whole Maximum Entropy thing was mind blowing for me.

  • lalaithion 2 years ago

    Here's a link: http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

    And if you want to read what he has to say on the optional stopping problem, you can scroll down to page 196 (166 in page numbers) to the heading "6.9.1 Digression on optional stopping"

    I don't personally think Jaynes is much easier to read than Yudkowsky, but he's definitely more rigorous.

AbrahamParangi 2 years ago

I'm confused in that I don't see how this is troubling. Yes, the two experimenters rolled dice and got the same result, but it's as if one of them was rolling a 6 sided die and the other a 20 sided one. Each experiment is not a result per se but a sample from a distribution.

How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken. This set of paths is different in each case, which means the inference we make must also be different.

There is no inconsistency. The confusion seems to be in assuming that the experimental result was a true statement about the nature of the world rather than a true statement about simply what happened.

edit: This seems to me to be a specific case of a general class of difficult thinking where you ask yourself: "what are all the worlds that I might be in that are consistent with what I'm presently observing".

  • lalaithion 2 years ago

    If you see two people roll a d20 and get a 20, you get to say "wow, that was unlikely" to both of them, even if one of them privately admits they were going to quickly re-roll their die if they got below a 10. What matters is their actual behavior (identical in the example) not their intentions. The d6 vs d20 version is different because their behavior is different.

    • AbrahamParangi 2 years ago

      Let's imagine that we ran it as a simulation and we ran it a million times. The two people would have a different distribution of results. If you ignore the intention, you ignore reality as if that intention were not a part of it.

      Do you not notice that your inference is less accurate using this line of reasoning? Does that not suggest that it's simply wrong?

      • lalaithion 2 years ago

        What do you mean by 'results'?

        They would not have different distributions of results on their first die roll.

        They would have different distributions of results on their reported die roll.

        If I am looking at their first die roll, the fact that they would have different reported die rolls doesn't matter!

      • lalaithion 2 years ago

        Here’s another example:

        Say you have a lazy researcher. They flip a coin, and if it comes up heads, they do the experiment. If it comes up tails, they just write down a random number.

        If you _only get access_ to the final number, then you should discount what they wrote down – it’s 50% likely to be fake.

        If you do 1,000,000 simulations of this, it’s useless 50% of the time.

        But if you know the result of the coin flip, it doesn’t matter whether they would have generated a nonsense number in a different timeline, or that they’re not reliably accurate. _You know_ they’re reliably accurate in _this case_, so you can trust their data.

      • usgroup 2 years ago

        This is well put. Coincidentally in the example the results are the same , but they need not be. given repeated experiments with the same intentions one may expect different distributions.

        However, one could just move the argument up a level and manufacture a case of different intentions leading to the same distributions and then ask the same question.

        • lalaithion 2 years ago

          Imagine you have a machine that rolls a d20 and lies if the die comes up 1-19, and tells the truth on a 20. Should you trust this machine usually? No. But if you can _see that the die comes up 20_ then you should trust it. The fact that it sometimes might lie doesn't mean that you should distrust the machine if you can see that in this case it's telling the truth.

        • kgwgk 2 years ago

          > Coincidentally in the example the results are the same , but they need not be.

          The questions is whether we should draw different conclusions when the results are the same. I don’t think that anyone has any issues with drawing different conclusions when the results are different!

    • ninthcat 2 years ago

      Unlikely in what probability space? We only see one version of reality so the probabilities that we assign to any outcome are based on a prior choice of probability space. That is why the researchers' intent matters.

      • lalaithion 2 years ago

        Both events have the same probability of happening; 1/20. The fact that the researcher intended to do something in a reality that didn't happen isn't relevabnt.

        • ninthcat 2 years ago

          If you want to know whether a drug is more effective than placebo, the answer to that question depends on both the data collected in a study and the initial study design. There’s a reason why it’s meaningless to say “that was unlikely” after somebody says they were born on January 1, or after getting a two-factor code that is the same number six times. There’s nothing special about those particular events except for the fact that we noticed them. Since we live in a single instance of the universe where they have already happened, they have probability 1. At the same time, on any given instance they have probability 1/365ish or 1/10000. The difference between these two interpretations of the probability is the same difference as having a good experimental design vs a flawed experimental design where you repeat the experiment until you get the results you want to see.

          • pdonis 2 years ago

            > a flawed experimental design where you repeat the experiment until you get the results you want to see.

            But the Bayesian point is that, if you use Bayesian statistics, this doesn't work. Except by outright lying about their experimental protocol or the data that was actually collected (for example, only reporting the successful trial at the end and not all the failed ones the preceded it), an experimenter cannot "fool" you into accepting a hypothesis not justified by the data. They can point to the one successful trial all they want, and make up stories about how the previous failed trials were somehow different, and the Bayesian simply does not care. The Bayesian just looks at the entire corpus of data and finds that it doesn't support the hypothesis, and that's it.

      • AbrahamParangi 2 years ago

        Yes, indeed.

  • derbOac 2 years ago

    I had the same reaction?

    We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

    If somehow we learned that both experiments were totally unreplicable and a product purely of that time and location with no implications for anything else ever before or since we wouldn't care about them except maybe as a historical curiosity.

    Intentionally is a red herring; what matters is our expectation about what might be observed if we were to repeat the experiments again.

    In that sense, there's variability in the second experiment's results due to sample size being random. So we interpret and infer based on that potential experiment we could do, not what happened to be observed at a particular moment.

    I'm also confused about what this has to do with Bayesian versus non-Bayesian inference as you could approach either experiment from either paradigm, and there are different forms of Bayesianism, including nonsubjective Bayesianism.

    • kgwgk 2 years ago

      > We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

      How can the experiments provide relevant information other than through what happened?

      If what happened is exactly the same (first patient with such and such characteristics had this outcome, etc.) what information can be provided by the things that didn’t happen in either?

      How could it matter that the things that didn’t happen in one experiment are different from the things that didn’t happen in the other when we are interested in the information provided by what did happen?

      We don't actually care at all about the distribution of things that could have happened per se, we care about the information provided by the experiments about future or other events.

  • pdonis 2 years ago

    > Each experiment is not a result per se but a sample from a distribution.

    But what distribution? What is this "distribution" that we are taking a sample from?

    The frequentist says: because the two experimenters have different intentions, the experiments they ran are samples from different distributions.

    But the Bayesian says: the experimenter's intentions can't affect things like how dice rolls come out or how well a given treatment works on a given patient. The actual "distribution" is the set of all factors that do affect how the dice rolls come out or how well the treatment works on each patient. And those factors are the same for both experimenters; their different intentions don't affect that. So both sets of data are samples from the same distribution, not different ones.

    > How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken.

    If you're going to state it this way, then the Bayesian response is: "all courses your experiment could have taken" has nothing to do with the experimenter's intentions. The experimenters can't magically make the physical world and the biology of humans work differently depending on what stopping criterion they choose. And the physical world and the biology of humans is what determines "the courses your experiment could have taken".

    In other words, when the frequentist makes up "distributions" based on the experimenter's stopping criterion, they are, whether they admit it (or even realize it) or not, making a claim about how the physical world and the biology of humans works that is obviously false.

    • AbrahamParangi 2 years ago

      This seems to assume that intentions "don't count" in some way, as if they were nonphysical, whereas unless you presume a supernatural soul, they are as physical as any other part of the experiment.

      • pdonis 2 years ago

        > This seems to assume that intentions "don't count" in some way, as if they were nonphysical

        Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

        The experimenter's intentions (not about the stopping criterion, but about other things) can of course be upstream physical causes, so to speak, of things like what the actual process of the treatment is, and that can, of course affect how well the treatment works. But in the scenario under discussion, all those things are stipulated to be the same in both experiments. And once that is specified, whatever physical variation corresponds to the variation in the experimenters' intentions cannot affect the results.

        • psychoslave 2 years ago

          > Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

          For a dice that is not a concern (unless animism is taken into consideration), but when humans are on both side of the equation, how do you get rid of all the social and psychological effects that imply, including placebo and the desire to see the study bend in some direction, be it at some unconscious level?

          • pdonis 2 years ago

            > when humans are on both side of the equation, how do you get rid of all the social and psychological effects that imply, including placebo and the desire to see the study bend in some direction, be it at some unconscious level?

            You don't. But again, in the scenario under discussion, these are stipulated to be the same in both cases. (Or more precisely, the underlying unconscious factors involved have the same distribution in both cases.) So again, these kinds of "intentions" don't make the distributions different in the two cases.

            Another comment is relevant here. The whole point of things like double blind studies in medicine is to make it the case that, whatever unconscious factors are involved along the lines you describe, they don't change the underlying distribution from which the sampled results are drawn. In the scenario as described in the article, it was assumed that all of these precautions were taken. That is part of the reason for the article's statement that the experimenters' intentions about the stopping criterion can't affect the results.

            Of course if you know that in a particular case, those precautions were not taken, that changes how you view the results. But Bayesian analysis can cover this case too: you just expand your hypothesis space and your prior to include things like "the experimenters unconsciously influenced the results in different ways because of their different stopping criterion". The article excludes this possibility in the scenario it describes, but in the real world, yes, we know it is possible to have study designs that don't eliminate this failure mode, and our analysis should allow for that in cases where the study design was such that it might have happened.

            • psychoslave 2 years ago

              > But again, in the scenario under discussion, these are stipulated to be the same in both cases.

              That's not what I read. The two studies imply that they are conducted under two very different mindsets, which will most likely also mean people will receive a different human treatment. At this point, the statistics you will get out of it is almost ornamental. Sometime the most significant information to extract from a description is not the one that is the most obvious that is pointed at as the thing you can quantify and draw comparisons.

              • pdonis 2 years ago

                > The two studies imply that they are conducted under two very different mindsets, which will most likely also mean people will receive a different human treatment.

                Even if the studies are conducted using the standard double blind protocol for medical and social science research, where the experimenters who have the different stopping criteria are not involved in any of the actual experimental activities? They don't do the random assignment of patients to treatment and control groups; they don't administer any of the actual treatments or placebos; they don't interact with the patients or the treatment personnel in any way during the experiment; and they don't inform anyone who is actually involved in the experimental activities of their intentions, regarding stopping criterion or anything else.

                You're saying that even if all this is done--and, as I've said, this is standard procedure in medical and social science research--it is still impossible to prevent the experimenters' intentions--which aren't known to anyone else involved--from affecting the results? Remember that, as I've already said, the whole point of these double blind protocols is to prevent the experimenters' intentions from affecting the results. You're saying that's a fool's errand?

  • kgwgk 2 years ago

    The question is whether we should draw different conclusions from one set of observations depending not just on what we are observing but also on different ways to define "what are all the worlds that I might be in that are consistent with what I'm presently observing".

usgroup 2 years ago

So you know when you believe something and then you update your belief because you get some evidence?

Yeah, and then you stack some beliefs on top of that.

And then you discover the evidence wasn’t actually true. Remind me again what the normative Bayesian update looks like in that instance.

Unfortunately it’s turtles all the way down.

  • lalaithion 2 years ago

        P(B|I saw E, P) = P(I saw E|B,P) * P(B|P) / P(I saw E|P)
    
        P(B|E was false, I saw E, P) = P(E was false|B,I saw E,P) * P(B|P,I saw E) / P(E was false|P, I saw E)
    
    This is a pretty basic application of Bayes' theorem.
    • usgroup 2 years ago

      Love it: p(I saw E) and p(I didn’t really see E).

      Just move the argument one level down: “I saw E is false” and it turns out so is “E is false” . So then? Add “E was false was false”?

      Turtles all the way down.

      At some point something has to be “true” in order to conditionalise on it.

      • drdeca 2 years ago

        I believe you can condition on a probability of proposition.

        For example, if you are in a fairly dark room and you observe with 90% confidence a red object. Then you can do (iirc) P(X | 90% confidence see red object) = 90% * P(X | see red object) + 10% * P(X | do not see red object)

        I would think that in principle, this allows for allowing all observations to be fallible, without any kind of “infinite regress” problem? You just apply the same kind of process each time.

      • psychoslave 2 years ago

        Yes sure, here are a few truths that never disapointed me:

        There is an absolute universal truth.

        Absolute universal truth, as a whole, is unreachable even to the most intelligent and resourceful human that will ever exist.

  • jawarner 2 years ago

    Real world systems are complicated. In theory, you could do belief propagation to update your beliefs through the whole network, if your brain worked something like a Bayesian network.

    • biomcgary 2 years ago

      Natural selection didn't wire our brains to work like a Bayesian network. If it had, wouldn't it be easier to make converts to the Church of Reverend Bayes? /s

      Alternatively, brains ARE Bayesian networks with hard coded priors that cannot be changed without CRISPR.

  • nerdponx 2 years ago

    > you discover the evidence wasn’t actually true

    Not really going to vouch for the normative Bayesian approach, but you might just consider this new (strong) evidence for applying an update.

    • crdrost 2 years ago

      The precise claim (I believe) is that the prior update which you had, made some assumptions about the correct way to phrase your perceptions.

      That is, you say, for the update, "the probability that this trial came out with X successes given everything else that I take for granted, and also that the hypothesis is true" vs. "the probability that this trial came out with X successes given everything else that I take for granted, and also that the hypothesis is false." So you actually say in both cases the fragment, "this trial came out with X successes."

      What happens if it didn't really? Well, the proper Bayesian approach is to state that you phrased this fragment wrong. You actually needed to qualify "the probability that I saw this trial come out with X successes given ...", and those probabilities might have been different than the trial actually coming out with X successes.

      OK but what happens if that didn't really, either. Well, the proper Bayesian approach is to state that you phrased the fragment doubly wrong. You actually needed to qualify it as "the probability that I thought I saw this trial come out with X successes given...". So now you are properly guarded, like a good Bayesian, against the possibility that maybe you sneezed while you were reading the experiment results and even though you saw 51, it got scrambled in your head and you thought you saw 15.

      OK but what happens if that didn't really, either either. You thought that you thought that you saw something, but actually you didn't think you saw anything, because you were in The Matrix or had dementia or any number of other things that mess with our perceptions of ourselves. So you, good Bayesian that you wish to be, needed to qualify this thing extra!

      The idea is that Bayesianism is one of those "if all you have is a hammer you see everything as a nail" type of things. It's not that you can't see a screw as a really inefficient nail, that is totally one valid perspective on screwness. It's also not that the hammer doesn't have any valid uses. It does, it's very useful, but when you start trying to chase all of human rationality with it, you start to run into some really weird issues.

      For instance, the proper Bayesian view of intuitions is that they are a form of evidence (because what else would they be), and that they are extremely reliable when they point to lawlike metaphysical statements (otherwise we have trouble with "1 + 1 = 2" and "reality is not self-contradictory" and other metaphysical laws that we take for granted) but correspondingly unreliable when, say, we intuit things other than metaphysical laws, such as the existence of a monster in the closet or a murderer hiding under the bed or that the only explanation for our missing (actually misplaced) laptop is that someone must have stolen it in the middle of the night." You need to do this to build up the "ground truth" that allows you to get to the vanilla epistemology stuff that you then take for granted like "okay we can run experiments to try to figure out stuff about the world, and those experiments say that the monster in the closet isn't actually there."

  • cyanydeez 2 years ago

    TThis just sounds like logical tetris

4bpp 2 years ago

I think there is a simple solution to the thought experiment in the beginning, ignoring the paragraphs upon paragraphs of EY liking the sound of his own voice: The information content of each experiment consists of more than just the stated number of patients tested and success rate. In particular, each experiment report I notice is strong evidence that someone actually used humanity's limited resources to perform that experiment, and slightly less strong evidence that they actually followed the stated procedure. Therefore, the completion of the "stop when I have a high enough success rate" experiment should cause me to update in favour of people with the means to actually running such an experiment, and hence make it more likely that at this very moment there are other research groups out there that are like 1000 patients in and have not yet gotten their 60% success rate.

lalaithion 2 years ago

From _Probability Theory: The Logic of Science_:

> Then the possibility seems open that, for different priors, different functions r(x1,..., xn) of the data may take on the role of sufficient statistics. This means that use of a particular prior may make certain particular aspects of the data irrelevant. Then a different prior may make different aspects of the data irrelevant. One who is not prepared for this may think that a contradiction or paradox has been found.

I think this explains one of the confusions many commenters have; for an experimenter who repeats observations until they reach their desired ratio r/(n-r), the ratio r/(n-r) is not a sufficient statistic! But when we have an experimenter who has a pre-registered n, then ratio r/(n-r) is a sufficient statistic. However, in either case,

> We did not include n in the conditioning statements in p(D|θ I) because, in the problem as defined, it is from the data D that we learn both n and r. But nothing prevents us from considering a different problem in which we decide in advance how many trials we shall make; then it is proper to add n to the prior information and write the sampling probability as p(D|nθ I). Or, we might decide in advance to continue the Bernoulli trials until we have achieved a certain number r of successes, or a certain log-odds u = log[r/(n − r)]; then it would be proper to write the sampling probability as p(D|rθ I) or p(D|uθ I), and so on. Does this matter for our conclusions about θ?

> In deductive logic (Boolean algebra) it is a triviality that AA = A; if you say: ‘A is true’ twice, this is logically no different from saying it once. This property is retained in probability theory as logic, since it was one of our basic desiderata that, in the context of a given problem, propositions with the same truth value are always assigned the same probability. In practice this means that there is no need to ensure that the different pieces of information given to the robot are independent; our formalism has automatically the property that redundant information is not counted twice.

  • roenxi 2 years ago

    That seems a bit long winded since this situation is a direct result of Bayes' theorem. It seems to me equivalent to say:

    Bayes' Theorem holds because it can be proven. Therefore, situations can be constructed where considering identical data without considering priors gives nonsense conclusions. For example if we happen to know as a prior that P(outcome of experiment is a certain ratio) = P(experiment is completed) then that must be considered when interpreting the results.

psychoslave 2 years ago

>Think laws, not tools

But laws are tools, and the esthetical intellectual elegance is an epiphenomenal bonus or a mean to keep human psychism motivated to keep its focus away from all the other attention sinks that life throw at it.

And that apply for both law in judiciary and sciences parlances.

d0mine 2 years ago

Bayesian approach sounds like a religion (one true way).

There is nothing unusual about different mathematical methods/models producing different results e.g., the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest (sometimes they do/sometimes they don't). All models are wrong some are useful.

  • lalaithion 2 years ago

    > the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

    You are confusing ambiguity in a problem statement due to human language being imprecise with two well-specified identical experimental results having different results due to the intentions of the human carrying them out.

    Is arithmetic a religion because there's "one true way" of adding integers?

    • d0mine 2 years ago

      It is not about human language being imprecise. I can formulate the questions using the math language precisely with the exact same result (different number of roots are possible for different formulations of the problem for the same "physical" (coefficients of the quadratic equation) setup).

      The Map is not the Territory.

      Different maps can be useful. No true map.

    • kevindamm 2 years ago

      I can think of at least two ways to add integers.. the categorical way that applies a mapping from the set into itself, and the set-theoretic way that deals with unwrapping and rewrapping successor relations. The latter is sometimes resorted to in heavily-relational contexts like Datalog.

      • lalaithion 2 years ago

        Yes, this is addressed in the original article... there are multiple "lawful" ways of adding integers which all give the same results, and likewise in probability all "lawful" ways of analyzing data should give the same results. If you have two different ways of adding numbers which give different results, one is not lawful.

  • pdonis 2 years ago

    > Bayesian approach sounds like a religion (one true way).

    Only about the things that can be mathematically proven. Which is just like any other branch of math.

    It is true that some Bayesians (and EY can be argued to be among them) like to talk as though Bayesian computation is a drop-in replacement for your brain. Of course it isn't, and Bayesianism, like any mathematical approach, should be taken with a good-sized dose of humility. As Bertrand Russell said, to the extent that mathematical propositions refer to reality, they are not certain, and to the extent that they are certain, they do not refer to reality.

  • pdonis 2 years ago

    > the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

    No. The number of roots that you care about might depend on your private thoughts; but the number of roots itself does not. It's a mathematical fact. It just might not be a mathematical fact that you actually care about. But what you care about is not part of math.

    • lupire 2 years ago

      Why are you ignoring the quaternion roots? 3x3 matrix roots?

      • pdonis 2 years ago

        Normally "quadratic equation" means "over the complex numbers" (and the mention of "complex roots" in the post I responded to bears out that interpretation).

        But yes, different mathematical models can give different answers for things like "number of roots of an equation". But that doesn't mean math depends on "private thoughts". It just means you need to specify which mathematical model you are talking about.

  • biomcgary 2 years ago

    One of my priors: "a group of people who look like a faith-based community, but claim not to be one, should not be trusted".

  • usgroup 2 years ago

    Yeah I’d agree at some depth. We don’t talk enough about integers, rationals and real numbers and what they imply for our “normative rationality” or “epistemological commitment”. But aside from the integers, everything else is totally suspicious.

randomsolutions 2 years ago

I use Bayesian methods often, but this a just religious. Bayesian methods are just that, tools, methods for approaching a problem.

There are no laws for applying probability to the real world. To think so puts too much faith in your models. Remember, all models are wrong. Applying probability to the real world requires a host of assumptions, regardless of the methods you use.

Frequentist and Bayesian methods have different goals, both have there place.

For a counterweight to the strong likelihood principle find discussions of Larry Wasserman: https://youtu.be/Z-YvWyM6dRQ?si=qwzRiaPbj9ruiUEv

And for a balanced discussion for why both are great see Michael Jordan: https://youtu.be/HUAE26lNDuE?si=cwg6wpRS1gXL6r1Y

jawarner 2 years ago

Isn't that Edwin T. Jaynes example just p-hacking? If only 1 out of 100 experiments produces a statistically significant result, and you only report the one, I would intuitively consider that evidence to be worth less. Can someone more versed in Bayesian statistics better explain the example?

  • skulk 2 years ago

    I find the original discussion to be far more interesting than whatever I just read in TFA: https://books.google.com.mx/books?id=sLz0CAAAQBAJ&pg=PA13&lp...

    • abeppu 2 years ago

      > One who thinks that the important question is: "Which quantities are random?" is then in this situation. For the first researcher, n was a fixed constant, r was a random variable with a certain sampling distribution. For the second researcher, r/n was a fixed constant (approximately), and n was the random variable, with a very different sampling distribution. Orthodox practice will then analyze the two experiments in different ways, and will in general draw different conclusions about the efficacy of the treatment from them.

      But so then the data _are_ different between the two experiments, because they were observing different random variables -- so why is it concerning if they arrive at different conclusions? In fact, the _fact that the 2nd experiment finished_ is also an observation on its own (e.g. if the treatment was in fact a dangerous poison, perhaps it would have been infeasible for the 2nd researcher to reach their stopping criteria).

    • usgroup 2 years ago

      Yeah generally Jaynes book is very nice and easy to read for this sort of material.

  • Terr_ 2 years ago

    I think the point is that the different planned stopping rules of each researcher--their subjective thoughts--should not affect what we consider the objective or mathematical significance of their otherwise-identical process and results. (Not unless humans have psychic powers.)

    It's illogical to deride one of those two result-sets as telling us less about the objective universe just because the researcher had a different private intent (e.g. "p-hacking") for stopping at n=100.

    _________________

    > According to old-fashioned statistical procedure [...] It’s quite possible that the first experiment will be “statistically significant,” the second not. [...]

    > But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.

  • lalaithion 2 years ago

    If you have two researchers, and one is "trying" to p-hack by repeating an experiment with different parameters, and one is trying to avoid p-hacking by preregistering their parameters, you might expect the paper published by the latter one to be more reliable.

    However, if you know that the first researcher just happened to get a positive result on their first try (and therefore didn't actually have to modify parameters), Bayesian math says that their intentions didn't matter, only their result. If, however, they did 100 experiments and chose the best one, then their intentions... still don't matter! but their behavior does matter, and so we can discount their paper.

    Now, if you _only_ know their intentions but not their final behavior (because they didn't say how many experiments they did before publishing), then their intentions matter because we can predict their behavior based on their intentions. But once you know their behavior (how many experiments they attempted), you no longer care about their intentions; the data speaks for itself.

  • usgroup 2 years ago

    Well no because it’s talking about either a fixed sample size or stopping when a % total is reached. Neither imply a favourable p-value necessarily.

    I think the author means to say that it’s two methods incidentally equivalent in the data they collect that may draw different conclusions based on their initial assumptions. Question is how do you make coherent sense of it.

    At level 1 depth it’s insightful.

    At level 2 depth it’s a straw man.

    At level 3 depth, just keep drinking until you’re back at level 1 depth.

    • tech_ken 2 years ago

      > The other ... decided he would not stop until he had data indicating a rate of cures definitely greater than 60%

      I believe that "definitely greater than 60%" is supposed to imply that the researcher is stopping when the p-value of their HA (theta>=60%) is below alpha, so an optional stopping (ie. "p-hacking") situation.

mturmon 2 years ago

This essay is so weird to read. The author is extremely passionate, yet also claiming to be simply rational. He’s throwing terminology around (Dutch book, ZF) but seems unaware of the limits of the approach he advocates.

There are so many cracks in the Bayesian edifice promoted in TFA!

These problems are well-known in the Theories of Probability community [1] (which is only a subset of the larger set of theorists recognizing the limits of mechanical Bayesian reasoning in decision problems).

Here are a couple.

(1)

Bayesian approaches force you to assign a sharp probability to every event. How do we map any event to a sharp probability? E.g., I need to give a number for the probability of rain tomorrow, a non-repeating event. How do I map that to a number? Not through relative frequencies- it’s non-repeating. If two people give different numbers, how do we decide who is right?

This problem is what Peter Walley has called the “Bayesian dogma of precision.” [2]

(2)

As noted above in an aside, we have a hard time computing probabilities. This is a practical problem that we all are aware of, but often discount.

In what we could call CMP (Conventional Mathematical Probability - Kolmogorov’s axioms) we typically can’t even correctly enumerate the sample space. We’re always forgetting something, so our models are too confident. (In the “Dutch book” analogy alluded to in TFA, we are following the axioms but are somehow always losing money, in a very real sense.)

Related to this problem of computing probabilities, we don’t have a rigorous way to determine when two real-world events are independent. Yet we constantly invoke independence to construct models. Kolmororov’s 1933 manuscript was clear on this problem. [3]

Not satisfied with this, we go on to hypothesize conditional independence relationships in order to feed our complex “rational” Bayesian machine. It’s thirsty for numbers, and we just make them up!

*

This all sounds somewhat hypothetical. It’s not. In my day job, I compute supposed Bayesian credible intervals for various physical variables.

The people downstream who use those variables to assimilate into physical models typically multiply our credible intervals by 2. My friend across lab has it even worse, they multiply his Bayesian intervals by 3.

This is not a well-functioning machine.

[1] E.g., https://isipta23.sipta.org/, or https://plato.stanford.edu/entries/imprecise-probabilities/#...

[2] https://issuu.com/impreciseprobabilities/docs/imprecise_prob..., first paragraph, although the whole short article is on-point

[3] from memory, the quote is something like, “determining the conditions under which events may be judged independent is one of the major outstanding problems in theory of probability“

bdjsiqoocwk 2 years ago

Meaningless drivel.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection