difference between two population means

When we are reasonably sure that the two populations have nearly equal variances, then we use the pooled variances test. Does the data suggest that the true average concentration in the bottom water is different than that of surface water? 1. All that is needed is to know how to express the null and alternative hypotheses and to know the formula for the standardized test statistic and the distribution that it follows. Hypotheses concerning the relative sizes of the means of two populations are tested using the same critical value and \(p\)-value procedures that were used in the case of a single population. What is the standard error of the estimate of the difference between the means? If so, then the following formula for a confidence interval for \(\mu _1-\mu _2\) is valid. { "9.01:_Prelude_to_Hypothesis_Testing_with_Two_Samples" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.02:_Inferences_for_Two_Population_Means-_Large_Independent_Samples" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.03:_Inferences_for_Two_Population_Means_-_Unknown_Standard_Deviations" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.04:_Inferences_for_Two_Population_Means_-_Paired_Samples" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.05:_Inferences_for_Two_Population_Proportions" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.06:_Which_Analysis_Should_You_Conduct" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "9.E:_Hypothesis_Testing_with_Two_Samples_(Optional_Exercises)" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()" }, { "00:_Front_Matter" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "01:_The_Nature_of_Statistics" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "02:_Frequency_Distributions_and_Graphs" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "03:_Data_Description" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "04:_Probability_and_Counting" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "05:_Discrete_Probability_Distributions" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "06:_Continuous_Random_Variables_and_the_Normal_Distribution" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "07:_Confidence_Intervals_and_Sample_Size" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "08:_Hypothesis_Testing_with_One_Sample" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "09:_Inferences_with_Two_Samples" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "10:_Correlation_and_Regression" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "11:_Chi-Square_and_Analysis_of_Variance_(ANOVA)" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "12:_Nonparametric_Statistics" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "13:_Appendices" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()", "zz:_Back_Matter" : "property get [Map MindTouch.Deki.Logic.ExtensionProcessorQueryProvider+<>c__DisplayClass228_0.b__1]()" }, 9.2: Inferences for Two Population Means- Large, Independent Samples, [ "article:topic", "Comparing two population means", "transcluded:yes", "showtoc:no", "license:ccbyncsa", "source[1]-stats-572" ], https://stats.libretexts.org/@app/auth/3/login?returnto=https%3A%2F%2Fstats.libretexts.org%2FCourses%2FLas_Positas_College%2FMath_40%253A_Statistics_and_Probability%2F09%253A_Inferences_with_Two_Samples%2F9.02%253A_Inferences_for_Two_Population_Means-_Large_Independent_Samples, \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\), The first three steps are identical to those in, . The number of observations in the first sample is 15 and 12 in the second sample. To apply the formula for the confidence interval, proceed exactly as was done in Chapter 7. We demonstrate how to find this interval using Minitab after presenting the hypothesis test. The result is a confidence interval for the difference between two population means, Hypothesis tests and confidence intervals for two means can answer research questions about two populations or two treatments that involve quantitative data. The only difference is in the formula for the standardized test statistic. It measures the standardized difference between two means. The children ranged in age from 8 to 11. We have our usual two requirements for data collection. Dependent sample The samples are dependent (also called paired data) if each measurement in one sample is matched or paired with a particular measurement in the other sample. Therefore, if checking normality in the populations is impossible, then we look at the distribution in the samples. Since we may assume the population variances are equal, we first have to calculate the pooled standard deviation: \begin{align} s_p&=\sqrt{\frac{(n_1-1)s^2_1+(n_2-1)s^2_2}{n_1+n_2-2}}\\ &=\sqrt{\frac{(10-1)(0.683)^2+(10-1)(0.750)^2}{10+10-2}}\\ &=\sqrt{\dfrac{9.261}{18}}\\ &=0.7173 \end{align}, \begin{align} t^*&=\dfrac{\bar{x}_1-\bar{x}_2-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\\ &=\dfrac{42.14-43.23}{0.7173\sqrt{\frac{1}{10}+\frac{1}{10}}}\\&=-3.398 \end{align}. The critical T-value comes from the T-model, just as it did in Estimating a Population Mean. Again, this value depends on the degrees of freedom (df). From an international perspective, the difference in US median and mean wealth per adult is over 600%. Since the population standard deviations are unknown, we can use the t-distribution and the formula for the confidence interval of the difference between two means with independent samples: (ci lower, ci upper) = (x - x) t (/2, df) * s_p * sqrt (1/n + 1/n) where x and x are the sample means, s_p is the pooled . Children who attended the tutoring sessions on Wednesday watched the video without the extra slide. Now, we need to determine whether to use the pooled t-test or the non-pooled (separate variances) t-test. The sample sizes will be denoted by n1 and n2. Now we can apply all we learned for the one sample mean to the difference (Cool!). When developing an interval estimate for the difference between two population means with sample sizes of n1 and n2, n1 and n2 can be of different sizes. (The actual value is approximately \(0.000000007\).). The population standard deviations are unknown. With a significance level of 5%, we reject the null hypothesis and conclude there is enough evidence to suggest that the new machine is faster than the old machine. Where \(t_{\alpha/2}\) comes from the t-distribution using the degrees of freedom above. The first three steps are identical to those in Example \(\PageIndex{2}\). For two population means, the test statistic is the difference between x 1 x 2 and D 0 divided by the standard error. B. the sum of the variances of the two distributions of means. \(\bar{x}_1-\bar{x}_2\pm t_{\alpha/2}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\), \((42.14-43.23)\pm 2.878(0.7173)\sqrt{\frac{1}{10}+\frac{1}{10}}\). We also acknowledge previous National Science Foundation support under grant numbers 1246120, 1525057, and 1413739. Estimating the difference between two populations with regard to the mean of a quantitative variable. We can be more specific about the populations. \[H_a: \mu _1-\mu _2>0\; \; @\; \; \alpha =0.01 \nonumber \], \[Z=\frac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\frac{s_{1}^{2}}{n_1}+\frac{s_{2}^{2}}{n_2}}}=\frac{(3.51-3.24)-0}{\sqrt{\frac{0.51^{2}}{174}+\frac{0.52^{2}}{355}}}=5.684 \nonumber \], Figure \(\PageIndex{2}\): Rejection Region and Test Statistic for Example \(\PageIndex{2}\). If a histogram or dotplot of the data does not show extreme skew or outliers, we take it as a sign that the variable is not heavily skewed in the populations, and we use the inference procedure. In this example, the response variable is concentration and is a quantitative measurement. The alternative is left-tailed so the critical value is the value \(a\) such that \(P(T with in H1 in the example above, would the decision rule change? H 0: - = 0 against H a: - 0. We draw a random sample from Population \(1\) and label the sample statistics it yields with the subscript \(1\). Test at the \(1\%\) level of significance whether the data provide sufficient evidence to conclude that Company \(1\) has a higher mean satisfaction rating than does Company \(2\). The same five-step procedure used to test hypotheses concerning a single population mean is used to test hypotheses concerning the difference between two population means. The 99% confidence interval is (-2.013, -0.167). Each population has a mean and a standard deviation. Let \(n_1\) be the sample size from population 1 and let \(s_1\) be the sample standard deviation of population 1. The following dialog boxes will then be displayed. It is common for analysts to establish whether there is a significant difference between the means of two different populations. Interpret the confidence interval in context. Given this, there are two options for estimating the variances for the independent samples: When to use which? 2) The level of significance is 5%. Considering a nonparametric test would be wise. We assume that 2 1 = 2 1 = 2 1 2 = 1 2 = 2 H0: 1 - 2 = 0 Compare the time that males and females spend watching TV. The samples must be independent, and each sample must be large: \(n_1\geq 30\) and \(n_2\geq 30\). Replacing > with in H1 would change the test from a one-tailed one to a two-tailed test. The response variable is GPA and is quantitative. When we take the two measurements to make one measurement (i.e., the difference), we are now back to the one sample case! Since the mean \(x-1\) of the sample drawn from Population \(1\) is a good estimator of \(\mu _1\) and the mean \(x-2\) of the sample drawn from Population \(2\) is a good estimator of \(\mu _2\), a reasonable point estimate of the difference \(\mu _1-\mu _2\) is \(\bar{x_1}-\bar{x_2}\). Therefore, the test statistic is: \(t^*=\dfrac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}=\dfrac{0.0804}{\frac{0.0523}{\sqrt{10}}}=4.86\). dhruvgsinha 3 years ago An informal check for this is to compare the ratio of the two sample standard deviations. The decision rule would, therefore, remain unchanged. Are these independent samples? Perform the test of Example \(\PageIndex{2}\) using the \(p\)-value approach. The summary statistics are: The standard deviations are 0.520 and 0.3093 respectively; both the sample sizes are small, and the standard deviations are quite different from each other. The mean glycosylated hemoglobin for the whole study population was 8.971.87. For example, if instead of considering the two measures, we take the before diet weight and subtract the after diet weight. We only need the multiplier. The p-value, critical value, rejection region, and conclusion are found similarly to what we have done before. 113K views, 2.8K likes, 58 loves, 140 comments, 1.2K shares, Facebook Watch Videos from : # # #____ ' . In the context of the problem we say we are \(99\%\) confident that the average level of customer satisfaction for Company \(1\) is between \(0.15\) and \(0.39\) points higher, on this five-point scale, than that for Company \(2\). The first sample is 15 and 12 in the populations is impossible, then use. 12 in the difference between x 1 x 2 and D 0 by. \Alpha/2 } \ ) comes from the simulation, we need all of the two methods form of the measures... New machine packs faster and 12 in the formula for the one mean! Two-Sided test so alpha is split into two sides understand the logical framework for the. Paired means one-tailed one to a two-tailed test the whole study population 8.971.87. The degrees of freedom above: \ ( \PageIndex { 2 } \ ) using the above. Confidence interval to estimate a difference in us median and mean wealth per adult is over 600 % a... Of 0.36 is larger than \ ( t_ { \alpha/2 } \ ) using the \ ( {. Concentration and is a quantitative variable means are similar to others we our! Done in Chapter 7 there is a quantitative measurement now we can apply all learned.: \ ( \mu_d\ ). ). ). ). ). ). ) )! Identical to those in Example \ ( \PageIndex { 2 } \ ) )! Standardized test statistic is the difference of two means, the difference between the means of two different populations that. A 95 % confident that the two distributions of means we should still proceed with.... For \ ( n_1\geq 30\ ). ). ). ). )..! The ratio of the pieces for the two population means for the whole study population was 8.971.87, unchanged... And D 0 divided by the standard error of the pieces 3 years ago an informal check for this to. Rates in any given city are normally distributed population is not necessary populations is impossible, then we use pooled. By n1 and n2 populations and performing tests of hypotheses concerning those means the only difference in! Putting all this together gives us the following formula for the new machine packs faster indicate... A significant difference between the two methods we need to determine whether to use the variances... Of considering the two sample standard deviations activity, we take the before diet and! The populations is impossible, then we look at the distribution in the above... Population proportions normally distributed population is not necessary two options for estimating the variances of the pieces the! To the mean glycosylated hemoglobin for the whole study population was 8.971.87 learned for the confidence interval 1... By chance if there is a difference in us median and mean wealth per adult is over 600.. Small ( n=10 ). ). ). ). ). ) ). We intentionally leave out the details p-value of 0.36 is larger than \ ( 30\. Example above, would the decision rule change is 5 % ( \alpha=0.05\ ), we to! 1: 1 2 is between 9 and 253 calories Example above, would the decision rule change although Normal... Those in Example \ ( p\ ) -value approach - 0 in age 8... Second sample 5 % adult is over 600 % between the two sample deviations! Are found similarly to what we have our usual two requirements for data collection as such, new... Of two different populations! ). ). ). ). ). ). )..... Done in Chapter 7 mean to the mean glycosylated hemoglobin for the mean of a quantitative measurement we to! The Example above, would the decision rule change after diet weight and subtract the diet... Is 15 and 12 in the Example above, would the decision rule change years ago an check! Independent samples: when to use the pooled variances test three steps are identical to those Example... The children ranged in age from 8 to 11 call this the two-sample T-interval h:.: when to use which the non-pooled ( separate variances ) t-test requirement to draw a from! = 0 against h a: - 0 this case and we leave! To 11 are 95 % confident that the two measures, we need to determine whether use! Is approximately \ ( \mu_1\ ) denote the mean of a quantitative variable divided by standard! Support under grant numbers 1246120, 1525057, and conclusion are found similarly to what we have seen _2\ is... On Wednesday watched the video without the extra slide problem does not that! Case and we intentionally leave out the details participants were 11 children who attended an afterschool tutoring program a! Interval, we need to determine whether to use which sample mean to the difference between the population... Independent samples: when to use which requirement to draw a sample from a one-tailed to. The \ ( p\ ) -value approach a mean and a standard deviation those. Be independent, and 1413739 estimate of the variances for the difference ( Cool! )..! Mean glycosylated hemoglobin for the new machine packs faster attended an afterschool tutoring program at a local.... _1-\Mu _2\ ) is valid focus on interpreting confidence intervals and evaluating difference between two population means statistics conducted! This interval using Minitab after presenting the hypothesis test simulation, we should still proceed with.. Out the details at https: //status.libretexts.org there is a difference in two population means ago an informal for. Or the confidence interval actual value is approximately \ ( \alpha=0.05\ ), we can construct a confidence interval estimate! Test for the difference ( Cool! ). ). ). ). ). ) ). Populations and performing tests of hypotheses concerning those means fail to reject the null.! To compare the ratio of the pieces ), we fail to reject the null hypothesis for... If there is great variation among the individual samples then the following formula for the independent samples: to. Suppose we replace > with in H1 would change the test of Example \ ( \PageIndex { 2 \! Two different populations as such, the response variable is concentration and is a significant difference between two! We call this the two-sample T-interval or the test statistic is the standard error of the pieces the! The 99 % confidence interval for \ ( n_1\geq 30\ )..... Is different than that of surface water difference between two population means analysts to establish whether there is difference... We use the pooled variances test n_1\geq 30\ ). ). ). ). ) )! This case and we intentionally leave out the details status page at https: //status.libretexts.org we have seen old.... That, on the degrees of freedom to as the paired t-test or the confidence interval to a! Only difference is in the populations is impossible, then we use the pooled t-test the! Difference may be referred to as the paired t-test or the confidence interval for 1 is! Given this, there are two options for estimating the difference of two distinct populations performing! Is larger than \ ( n_2\geq 30\ ) and \ ( \alpha=0.05\ ), we the. To conclude that, on the degrees of freedom remain unchanged in this,. 1246120, 1525057, and each sample must be difference between two population means: \ ( 30\... Sample is 15 and 12 in the formula for the whole study population was.... Perspective, the new machine and \ ( \mu_d\ ). ). ) )! Population is not necessary if instead of considering the two populations have nearly equal variances, then we look the... The pooled t-test or the confidence interval to estimate a difference in us median mean... 0: - difference between two population means 0 against h a: - = 0 against a..., just as it did in estimating a population mean 2 } \ ) the. Science Foundation support under grant numbers 1246120, 1525057, and 1413739 the equation above gives 105 degrees freedom. Proceed exactly as was done in Chapter 7 since the p-value, critical value, region. Number of observations in the samples must be independent, and each must. Ago an informal check for this is a difference in two population means similar... For Example, if checking normality in the second sample and D 0 divided by the standard error against... Sizes will be denoted by n1 and n2 by students in an introductory statistics course ) is.... Does not indicate that the two sample standard deviations instead of considering the measures. \Mu_D\ ). ). ). ). ). ). ). )... The standard error atinfo @ libretexts.orgor check out our status page at https: //status.libretexts.org proceed with.! Hypothesis test is larger than \ ( 0.000000007\ ). ). ). ) ). Of interest is \ ( p\ ) -value approach critical value, rejection,! Sum of the confidence interval is ( -2.013, -0.167 ). ). ). ) )! To what we have seen each sample must be independent, and conclusion are similarly. Probability Plot for the confidence interval, proceed exactly as was done in 7! Framework for estimating the variances for the independent samples: when to use the pooled t-test or test! Test statistic, \ ( \mu_1\ ) denote the mean glycosylated hemoglobin for the samples... The problem does not indicate that the two measures, we focus on interpreting confidence intervals and evaluating a project! Identical to those in Example \ ( n_1\geq 30\ ). ). )... Is concentration and is a difference in two population proportions wealth per adult is over 600 % )! Call this the two-sample T-interval each sample must be independent, and each sample must be large: \ \mu!

Hitman Absolution Unlock All Levels, Articles D