Most scientifically-inclined people are reasonably aware that one of the major divides in research is that correlation≠causation: that having discovered some relationship between various data X and Y (not necessarily Pearson’s r, but any sort of mathematical or statistical relationship, whether it be a humble r or an opaque deep neural network’s predictions), we do not know how Y would change if we manipulated X. Y might increase, decrease, do something complicated, or remain implacably the same. This point can be made by listing examples of correlations where we intuitively know changing X should have no effect on Y, and it’s a spurious relationship: the number of churches in a town may correlate with the number of bars, but we know that’s because both are related to how many people are in it; the number of pirates may inversely correlate with global temperatures (but we know pirates don’t control global warming and it’s more likely something like economic development leads to suppression of piracy but also CO2 emissions); sales of ice cream may correlate with snake bites or violent crime or death from heat-strokes (but of course snakes don’t care about sabotaging ice cream sales); thin people may have better posture than fat people, but sitting upright does not seem like a plausible weight loss plan1; wearing XXXL clothing clearly doesn’t cause heart attacks, although one might wonder if diet soda causes obesity; the more firemen are around, the worse fires are; judging by grades of tutored vs non-tutored students, tutors would seem to be harmful rather than helpful; black skin does not cause sickle cell anemia nor, to borrow an example from Pearson2, would black skin cause smallpox or malaria; more recently, part of the psychology behind linking vaccines with autism is that many vaccines are administered to children at the same time autism would start becoming apparent (or should we blame organic food sales?); height & vocabulary or foot size & math skills may correlate strongly (in children); national chocolate consumption correlates with Nobel prizes3, as do borrowing from commercial banks & buying luxury cars & serial killers/mass-murderers/traffic fatalities4; moderate alcohol consumption predicts increased lifespan and earnings; the role of storks in delivering babies may have been underestimated; children and people with high self-esteem have higher grades & lower crime rates etc, so “we all know in our gut that it’s true”
that raising people’s self-esteem “empowers us to live responsibly and that inoculates us against the lures of crime, violence, substance abuse, teen pregnancy, child abuse, chronic welfare dependency and educational failure”
- unless perhaps high self-esteem is caused by high grades & success, boosting self-esteem has no experimental benefits, and may backfire?
“The development of the theory of testing has been much influenced by the special problem of simple dichotomy, that is, testing problems in which H0 and Hi have exactly one element each. Simple dichotomy is susceptible of neat and full analysis (as in Exercise 7.5.2 and in § 14.4), likelihood-ratio tests here being the only admissible tests; and simple dichotomy often gives insight into more complicated problems, though the point is not explicitly illustrated in this book. Coin and ball examples of simple dichotomy are easy to construct, but instances seem rare in real life. The astronomical observations made to distinguish between the Newtonian and Einsteinian hypotheses are a good, but not perfect, example, and I suppose that research in Mendelian genetics sometimes leads to others. There is, however, a tradition of applying the concept of simple dichotomy to some situations to which it is, to say the best, only crudely adapted. Consider, for example, the decision problem of a person who must buy, f0, or refuse to buy, fi, a lot of manufactured articles on the basis of an observation x. Suppose that i is the difference between the value of the lot to the person and the price at which the lot is offered for sale, and that P(x i) is known to the person. Clearly, H0, Hi, and N are sets characterized respectively by i > 0, i < 0, i = 0. This analysis of this, and similar, problems has recently been explored in terms of the minimax rule, for example by Sprowls [S16] and a little more fully by Rudy [R4], and by Allen [A3]. It seems to me natural and promising for many fields of application, but it is not a traditional analysis. On the contrary, much literature recommends, in effect, that the person pretend that only two values of i, to > 0 and i < 0, are possible and that the person then choose a test for the resulting simple dichotomy. The selection of the two values i0 and %i is left to the person, though they are sometimes supposed to correspond to the person’s judgment of what constitutes good quality and poor quality-terms really quite without definition. The emphasis on simple dichotomy is tempered in some acceptance-sampling literature, where it is recommended that the person choose among available tests by some largely unspecified overall consideration of operating characteristics and costs, and that he facilitate his survey of the available tests by focusing on a pair of points that happen to interest him and considering the test whose operating characteristic passes (economically, in the case of sequential testing) through the pair of points. These traditional analyses are certainly inferior in the theoretical framework of the present discussion, and I think they will be found inferior in practice.”
“One reason why the directional null hypothesis (H 02 : μ g ≤ μ b ) is the appropriate candidate for experimental refutation is the universal agreement that the old point-null hypothesis (H 0 : μ g = μ b ) is [quasi-] always false in biological and social science. Any dependent variable of interest, such as I.Q., or academic achievement, or perceptual speed, or emotional reactivity as measured by skin resistance, or whatever, depends mainly upon a finite number of
“strong”
variables characteristic of the organisms studied (embodying the accumulated results of their genetic makeup and their learning histories) plus the influences manipulated by the experimenter. Upon some complicated, unknown mathematical function of this finite list of “important”
determiners is then superimposed an indefinitely large number of essentially “random”
factors which contribute to the intragroup variation and therefore boost the error term of the statistical significance test. In order for two groups which differ in some identified properties (such as social class, intelligence, diagnosis, racial or religious background) to differ not at all in the “output”
variable of interest, it would be necessary that all determiners of the output variable have precisely the same average values in both groups, or else that their values should differ by a pattern of amounts of difference which precisely counterbalance one another to yield a net difference of zero. Now our general background knowledge in the social sciences, or, for that matter, even “common sense”
considerations, makes such an exact equality of all determining variables, or a precise “accidental”
counterbalanceing of them, so extremely unlikely that no psychologist or statistician would assign more than a negligibly small probability to such a state of affairs.”