:Talk:Confidence interval

{{WikiProject banner shell|class=C|vital=yes|1=

}}

{{User:MiszaBot/config

|archiveheader = {{aan}}

|maxarchivesize = 150K

|counter = 4

|minthreadsleft = 5

|minthreadstoarchive = 2

|algo = old(90d)

|archive = Talk:Confidence interval/Archive %(counter)d

}}

{{User:HBC Archive Indexerbot/OptIn

|target=/Archive index |mask=/Archive <#> |leading_zeros=0 |indexhere=yes

}}

{{Press|

|author = Margaret Talbot

|title = Elon Musk Also Has a Problem with Wikipedia

|date = March 4, 2025

|org = The New Yorker

|url = https://www.newyorker.com/news/the-lede/elon-musk-also-has-a-problem-with-wikipedia

|lang =

|quote = Some articles on math and science, though they may be technically correct, can be almost impenetrable for the general reader. (Look up the statistical term “confidence interval,” which I had occasion to do recently, and see if you are as flummoxed as I was.)

|archiveurl =

|archivedate =

|accessdate = March 19, 2025

}}

With regards to the approachability of this article

Why not use the Simple English version of this complicated article (link below)?

It seems more accessible for the average reader than the in-depth one here.

https://simple.wikipedia.org/wiki/Confidence_interval

DC (talk) 14:26, 30 March 2016 (UTC

Thank you for providing the link to the simple.wikipedia.org page. I found it to be more accessible just as you said. Thank you! -Anon 14:54 UTC, 15 Nov 2020

Wishlist

Replace pictures showing intervals around bars. These pictures are ugly. They're also misleading, because of course you can't just assume any old interval is a CI. There's even a caption explaining this. Ridiculous! Just get rid of it. Here's a better idea: a picture with both standard error and confidence interval bars. You can show how they're related, how to interpret them, and why it's important to understand which one you're looking at.

Explain the relationship with Bayesian intervals. It's painfully obvious that confidence intervals and Bayesian intervals are often similar. When and why does that happen?

Give rules of thumb for interpreting CIs on graphs. I see two CIs and maybe they overlap, or they're just touching, or they're separated by some distance—does that mean anything?

FRuDIxAFLG (talk) 15:07, 22 March 2025 (UTC)

: These are very sensible things to have added. The problem will be to find decent reliable sources to justify new text. I could start by writing some text here, without any citations to the literature. How about someone else start by giving a literature reference (or several references) where all issues are covered? Richard Gill (talk) 03:22, 1 April 2025 (UTC)

Proposed summary for technical prose

I've been using Google's Gemini 2.5 Pro Experimental large language model to create summaries for the most popular articles with {{tl|Technical}} templates. This article, Confidence interval, has such a template above the entire article. Here is the paragraph summary at grade 5 reading level which Gemini 2.5 Pro suggested:

:A confidence interval is like making a guess range for an unknown number about a whole group, like the average height of all kids in a town. Since you can't measure everyone, you measure a smaller group (a sample) and use math to create a range. You also pick how sure you want to be, like 95% sure. This means if you used this math method many times with different samples, about 95 out of 100 of the guess ranges you create would include the true number for the whole group. The size of the guess range depends on things like how many kids you measure (more kids usually means a smaller, better range), how much their heights differ (more difference means a wider range), and how sure you want to be (being more sure makes the range wider).

While I have read and may have made some modifications to that summary, I am not going to add it to the article because I want other editors to review, revise if appropriate, and add it instead. This is an experiment with a few dozen articles initially to see how these suggestions are received, and after a week or two, I will decide how to proceed. Thank you for your consideration. Cramulator (talk) 12:33, 2 April 2025 (UTC)

:That text is not bad. What is Grade 5 reading level? Richard Gill (talk) 13:31, 2 April 2025 (UTC)

::At some point I read or was told that the World Book Encyclopedia was written for fifth grade readers. Whether that is true or not, asking for summaries without specifying reading levels was often producing very technically worded results which seemed graduate level, so I had to specify something. Unfortunately not all of them turned out as good as this one. I've posted 68 of them for the most popular articles with the template, and am not going to post any more until I have at least some concrete ideas for how to do better. Cramulator (talk) 13:48, 2 April 2025 (UTC)

:::Sorry, I simply don't know what "Grade 5" means. I do recall it is US terminology. I'm guessing it means kids at the end of primary school. Age 10 or 11? Richard Gill (talk) 13:57, 2 April 2025 (UTC)

::::Fifth grade, yes. Cramulator (talk) 14:06, 2 April 2025 (UTC)

: What do you think about the two issues with the article which are flagged at the beginning? I think issue 1 has now been fixed. I guess your proposal is aimed at issue 2. Richard Gill (talk) 13:36, 2 April 2025 (UTC)

::Since when are "many reverts and fixes" a sign of problems and not the normal, expected progression of Wikipedia articles on difficult topics? And getting confidence intervals right is absolutely difficult. There are a lot of popular shortcuts which are only rough approximations, or have specific assumptions that are rarely checked, or both. Cramulator (talk) 13:52, 2 April 2025 (UTC)

:::I take that as agreement that the flags can be deleted? Richard Gill (talk) 13:55, 2 April 2025 (UTC)

::: I have boldly reduced the two flags to one. Richard Gill (talk) 14:09, 2 April 2025 (UTC)

::::I think the two bulleted lists are also highly frowned on, as well. I believe there is a tag for that, {{tlx|prose|section}}. I decided I wasn't going to edit any of the 68 articles I posted these suggestions on, for fear of conflict, at least not for a week, but I'll get back to this one eventually. Cramulator (talk) 14:16, 2 April 2025 (UTC)

Use of LLM needs to be discussed by the entire Wikipedia community, not just those interested in one article. You can start at WP:Village pump. Sundayclose (talk) 15:26, 2 April 2025 (UTC)

I am retracting this and the other LLM-generated suggestions due to clear negative consensus [https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(miscellaneous)&oldid=1283927858#I_boldly_put_LLM-generated_summary_suggestions_on_the_talk_pages_of_the_68_most_popular_articles_with_Technical_templates at the Village Pump]. I will be posting a thorough postmortem report in mid-April to the source code release page. Thanks to all who commented on the suggestions both negatively and positively, and especially to those editors who have manually addressed the overly technical cleanup issue on six, so far, of the 68 articles where suggestions were posted. Cramulator (talk) 22:42, 4 April 2025 (UTC)

Suggestion for a simpler introduction for the general public

A 95% confidence interval is a tool to help us understand uncertainty in data. It shows how sure we are about a number we get from a study or survey. It gives us a range where we think the true answer is likely to be.

For example, let’s say we ask 1,000 people if they like ice cream, and 60% say yes. We can’t ask everyone in the world, so we use a confidence interval to say, “We’re pretty sure the real number of ice cream lovers is between 57% and 63%.” That range (57% to 63%) is the 95% confidence interval.

“95% confidence” means that if we did the same survey 100 times, we expect the true answer to fall inside our range 95 times out of 100. It doesn’t mean there’s a 95% chance the true answer is in the range — it's about how the method works over many tries. 128.189.175.141 (talk) 22:39, 4 April 2025 (UTC)

Introduction

I reverted the change to the lede by unknown author. It is incorrect to say, "we can say that we can be 95% confident that the range of 2-4 hours contains the true value of daily screen time for Vancouverites (which is 3.5 hours)". Also, it does not make sense to say, "we can be 95% confident that the interval either contains or does not contain the true value."

The previous text correctly states that the confidence interval reflects the long-run reliability of the method used to generate the interval. The word 'confidence' has a specific technical meaning in the context of confidence intervals and it is misleading to use it in a way that non-technical readers are likely to misunderstand. To say, "we can be 95% confident" suggests a probability or a Bayesian degree of credence. A confidence interval does not in general tell us the probability or the appropriate degree of credence that attaches to the interval covering the parameter of interest.

The article on Credible interval actually does a better job of explaining the difference between confidence intervals and credible intervals. Dezaxa (talk) 15:14, 7 April 2025 (UTC)

More on the Interpretation

I have removed some inaccurate or misleading statements from recent edits. Please do not edit this article unless you understand the difference between a confidence interval and a credible interval. Editors of this article have been working for several years to remove inaccurate statements about the interpretatation of a confidence interval, but people continue to put them back in.

As the article correctly states, when we speak of a 95% confidence interval, the 95% probability relates to the probability of repeated samples yielding intervals that cover the parameter. It is not correct in general that for a given realized interval that this interval has a 95% probability of covering the parameter. In fact, it is straightforward to construct examples where the probability that a given interval covers a parameter is very different from the confidence level.

It would be a good idea to read these articles before making edits relating to the interpretation of confidence intervals.

Morey, Richard D.; Hoekstra, Rink; Rouder, Jeffrey N.; Lee, Michael D.; Wagenmakers, Eric-Jan (2016). "The fallacy of placing confidence in confidence intervals". Psychonomic Bulletin & Review. 23 (1): 103–123. doi:10.3758/s13423-015-0947-8. https://link.springer.com/article/10.3758/s13423-015-0947-8 PMC 4742505. PMID 26450628.

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://link.springer.com/article/10.3758/s13423-013-0572-3

Morey, R., Hoekstra, R. D., Rouder, J. N., Wagenmakers, E. J. (2015). Continued misinterpretation of confidence intervals: response to Miller and Ulrich. Psychon Bull Rev. 2015 Nov 30;23:131–140. doi: 10.3758/s13423-015-0955-8

https://link.springer.com/article/10.3758/s13423-015-0955-8 Dezaxa (talk) 12:23, 13 April 2025 (UTC)

Weather forecast analogy

I'm unsure of how helpful the weather forecast analogy{{--}}added in [https://en.wikipedia.org/w/index.php?title=Confidence_interval&diff=prev&oldid=1278387038 this edit] by {{u|FRuDIxAFLG}}{{--}}given in {{slink|Confidence interval#Common misunderstandings|nopage=y}} is. The article currently reads:

{{blockquote|text=A 95% confidence level does not mean that for a given realized interval there is a 95% probability that the population parameter lies within the interval (i.e., a 95% probability that the interval covers the population parameter). This distinction can be understood by analogy: If the weather forecast is accurate 95% of the time, it does not follow that it is accurate today with 95% probability. For instance, maybe forecasts of sun are 99% accurate and forecasts of rain are 80% accurate. Then, having seen today's forecast, one would be either 99% or 80% confident, but never 95% confident.}}

Does this not miss the point? The statement {{tq|having seen today's forecast, one would be either 99% or 80% confident, but never 95% confident}} seems rather misleading to me: on the day for which the weather is forecast, we can simply observe the actual weather. At that point, it is no longer a question of probability or confidence, but of fact: the prediction is either right, or it isn't.

Even if the analogy did map cleanly onto confidence intervals (which I'm not sure it does), it seems to suggest that we can assign a probability to whether a specific realised interval contains the true parameter value, which is precisely the misconception it is trying to correct. Pink Bee (talk) 14:52, 16 April 2025 (UTC)

:If this particular analogy isn't to everyone's liking then we can change it. But clearly something is needed to help people understand what is going on here. This talk page is evidence of that.

:Anyway, of course you can assign a probability to whether the parameter is in a particular interval; you just need to use a Bayesian approach. That's a weird thing to do, but who cares if it's weird?

:FRuDIxAFLG (talk) 17:38, 16 April 2025 (UTC)

::I think we should probably avoid {{tq|weird}}ness in a section about common misunderstandings, in favour of giving the simplest possible explanation. The primary audience for this section is likely to be people who have these misconceptions, and people who can appreciate the differences between Bayesian and frequentist perspectives may not be well-represented in that group.

::I agree that an example or analogy or some other intuitive explanation is necessary here. I like the example given in [https://journals.sagepub.com/doi/10.1177/201010581001900316 this source] under "Common misunderstanding of the true meaning of confidence intervals". Maybe we could do something along those lines{{--}}what do you think? Pink Bee (talk) 18:47, 16 April 2025 (UTC)

:I'm inclined to agree that the example is not particularly helpful. The fundamental point is that a confidence interval is a frequentist concept. As such, any use of the term probability must be understood as a long-run frequency, and a frequency requires a reference class. In the case of confidence intervals, the reference class is the class of intervals that would be obtained from repeated experiments. Once a single experiment has been performed and a particular realized interval has been calculated, that interval can be thought of as belonging to many different reference classes, which is why it is potentially misleading to think of it as having a probability of covering the parameter of interest. It is a probability only in the specific sense that it is drawn from a reference class of intervals from hypothetical repeated experiments. Dezaxa (talk) 10:14, 18 April 2025 (UTC)

:I have [https://en.wikipedia.org/w/index.php?title=Confidence_interval&diff=prev&oldid=1286582728 replaced] the example with one based on that in the source I mentioned [https://en.wikipedia.org/wiki/Talk:Confidence_interval#c-Pink_Bee-20250416184700-FRuDIxAFLG-20250416173800 here]. I've made it as approachable as I can, but I'd appreciate it if someone with more statistics knowledge than me could check and edit it as needed to make sure it's still technically accurate. Pink Bee (talk) 20:24, 20 April 2025 (UTC)

::My only problem with the example is it's too easy. Usually we don't know the true mean, but even then the usual misconceptions can be dangerous.

::FRuDIxAFLG (talk) 21:32, 20 April 2025 (UTC)

:::I take your point, and I agree that if the factory premise were the setup for an exercise, giving the true mean would make it pointless. But this is an example, not an exercise, and I don't see how we can demonstrate the fallacy quite as simply without just giving specific values. The risk of saying "if the true mean is x then ___, or if it is y then ___" after giving a specific CI is that it permits the misunderstanding that the population parameter is somehow random, which would justify making probability statements about it.

:::That said, this is WP:NOTTEXTBOOK, so maybe I should stop worrying about how best to teach this. I have [https://en.wikipedia.org/w/index.php?title=Confidence_interval&diff=prev&oldid=1286604962 removed] the true mean for now. Pink Bee (talk) 23:07, 20 April 2025 (UTC)