Talk:Student's t-test

This is the talk page for discussing improvements to the Student's t-test article.
This is not a forum for general discussion of the subject of the article.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 3 months

Statistics Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Top	This article has been rated as Top-importance on the importance scale.

Mathematics High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
High	This article has been rated as High-priority on the project's priority scale.

Assumptions

Maybe I'm missing something, but it seems like the assumptions section is extremely wrong. The underlying distributions do *not* need to be normal. The statistics' (i.e., sample average) distributions need to be normally distributed, and they will be, according to the Central Limit Theorem. 70.35.57.149 (talk) 19:13, 7 March 2017 (UTC)[reply]

My understanding is that you are right, mostly. Only for small samples do we need the sample(s) to follow a normal distribution, when the mean (numerator) and standard error (denominator) won't automatically be normally distributed according to the CLT. And this is the situation where t-tests are most important, because when the samples are large enough for the CLT to apply, they're also large enough for the t-distribution to converge to the Z-distribution. I think this ought to be mentioned (although my authority for this is a statistician friend - I'm still looking for a published statement about it). Then the bit that describes how to test a sample for normality brings a special irony, because a test (like the Shapiro-Wilk or Kolmogorov-Smirnov) for normality is more likely to reject the null hypothesis of normality as the sample size becomes larger, and this is exactly when you don't need to worry so much about normality! RMGunton (talk) 15:45, 13 February 2019 (UTC)[reply]

The sample mean need not be normally distributed either. Sketch of proof: Efron (1969) (Student's t-Test Under Symmetry Conditions) shows in Section 1 that a proof by Fisher (1925) (Applications of "Student's" Distribution) for the normal case actually only uses the 'sphericity / rotational invariance / orthogonal invariance' of the normal distribution of individual observations for the t-test to control size (Type I error). So, orthogonal invariance of the distribution of X := (X_1, X_2, ..., X_n) is sufficient. This absolutely does not imply that the sample mean is normally distributed, so normality of the sample mean is not necessary. For (counter)example, if n = 3 then it follows from Archimedes' Hat-Box Theorem that a random variable distributed uniformly over the unit sphere (which is clearly orthogonal invariant) has a sample mean that follows a uniform distribution. NWK2 (talk) 14:31, 3 June 2021 (UTC)[reply]

I added a tag "dubious" to assumptions section. I agree that the distribution does not to be normal. I further think that variance does not have to follow Chi squared distribution. Even if part of it is true, it sounds very misleading. I included Shapiro-Wilk test in an official document before running the t-test, partly because of this Wikipedia page.

Should these two assumptions be deleted entirely, or should one or both be substituted with some other statements in order to not be misleading? 38.104.28.226 (talk) 16:10, 17 October 2022 (UTC)[reply]

I think it should be clarified that weaker conditions suffice, but that normality or CLT suffice. These conditions are much easier to understand and we have to remember that this page should emphasize accessibility of the article over giving the weakest possible sufficient conditions for the test to apply Chalkson (talk) 13:31, 6 September 2025 (UTC)[reply]

Is s the SEM or the SD?

s is used as the SEM when defining the test statistic.

s is used as the SD in the equations related to Slutsky's Theorem Chris Andrews (talk) 13:53, 25 January 2024 (UTC)[reply]

Student History

I want to give some more context to my Oct 2025 edit in https://en.wikipedia.org/w/index.php?title=Student%27s_t-test&oldid=1318357185 The reference that I deleted was added 13 April 2020 in https://en.wikipedia.org/w/index.php?title=Student%27s_t-test&oldid=950691109, inside a large edit of the history section. The dubious website seems to be the source where LilyLoaf got the text from. The website itself (https://www.tdistributiontable.com/) appears to be machine-generated from other sources since the logical flow of the text is fragmented, unlikely to be human. Searching for key words like "a more general form as Pearson Type IV" only found similar text in https://dx.doi.org/10.2139/ssrn.2872158, a 2026 article on statistical distributions in Bitcoin giving a brief summary of the t-distribution. My interpretation is that author had read the source that the website drew from. I point this out here since it is not clear that the text added to the history section is LilyLoaf's own text. Rizzerwiki (talk) 16:01, 23 October 2025 (UTC)[reply]