,"l]p P$0cyfni
M877^SqQW^qnWZ*ժet/sBT7ITI8_Q;B7Dȇ[>a7
xVwo37"Sc\{lf)L/qi hTcn>Pc93uk߭cX3Xup(D?T=q~AyU+# q8 WC`2<o
oNdͣM,FkHiY"}7~t>}?^A9,VBr^X+ 0wd(Pa%3KYU:>uQ_dV,Js+έ;ع}^MIW+Bc,V<ˡ`l=
,[$ϧUUөpm^uThw\6c#.YV[5]lӔC/ t0E0jYy96[ir\h.hDDuBY)o g;}(;,dvq#`
.B<
+pNC PS ͈ε ËWDEu
G(&_3՜Aإ5r!& Ul
ߝmS! hfHRY +:߂9F7DZRM,e2GNzC<*<&Ʌ"?Gssa,zX`?LjO+/jlKzPP!pA\3, Z G9Z},u;b=Noۄ~G&<.>(`!8P\ӠŰVCl]pNC>X@ľxʫ.[cŬ'}n#Ũ$,*7DTP1~/:
`! HKOH(9#$PpD x}p!w /b{ϳUH̋AyPbSAE1pHw )([i(y0)"'RT #vϳ*Vǈ1wn.o'=9H!.B6QsK'W%Bק9uW`
GsgϞIFlG8$
Y@0ݤVofuv+t/g/mQ[ۻ1=h(=z*Ӎt&4k1)=z[юt3Uɥ`{vvTQI$3G%Rrzr5zx@_?QMHF^}d%L2,?&KX"U<3qH.NKr,=2އ>!E*[qtX}bfCa$H#:~>CJvYy<6Fs?vɓmrgzߋNeJwuXrxa솗(K̊E&LZᯨ*5)1ֺcl[۵2؆TV"Ʊr1=.`E>*
.QD
l=Pp
k։T9qh[VQ{[UkXܝuzL"Y$Y*Վ+ob"K%b^ubzؠU&@y
R*`0ȄYBKp}NJ̱sq%}ƓJJxz/ؐPVZYkFnwpEuE))_+JwL̠g3^cF`utskVqX9̧#UZdTjᲾ_8\KF\OgYQ@5)D#
j`3nx18VkLh_T+JJ#:&X1ng/f
H#D2 1$B$B'`#lZxA8Ph‿'k7g!m.xGS2$U>pI0o7sXT$$$$$$$2$GYu`N
[2$4 ꐄHM5l\2$7}ܥ7
^2$HKOH k0@8ʚ;2Nʚ;g4^d^d )0%ppp@<4!d!d0D<4dddd0D<4dddd0DVN___PPT90(e?O=3f0Lecture 4Confidence Intervals for All Occasions116CQuantitative Methods Module I
Gwilym Pryce
g.pryce@socsci.gla.ac.uk(D+ PNotices:0Register
Class Reps and Staff Student committee.1 (Aims & ObjectivesAim
To consider the appropriate confidence interval procedures for a range of situations.
Objectives
By the end of this session, students should be able to:
Run confidence intervals on 2 means;
Run confidence intervals on proportions.hW9NW 9 N uIntroductionWhilst SPSS can produce confidence intervals for the mean when you have the original data
go to Analyze, Descriptive Statistics, Explore
& its not so useful when you have only summary information.
I.e. when you are only given the mean, s.d. & n
& or when you want a CI for something other than the mean of one population
E.g. if you want a CI for the difference between 2 means;
E.g. if you want a CI for a proportion (particularly if you want to use the more robust Wilson method)
In situations like these you either need to be familiar with the appropriate formulas or you need to know how to use the custom macros& this lecture introduces both.
ZZ0Z>Z0ZNZZZZ### #####>0Nc
_
Plan1. CI for two independent means
1.1 Pooled Variances
1.2 Different Variances
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination< n  n eCI for two independent means" "(RSometimes we want to compare the means of two independent populations.
E.g. sample mean height from a population of girls of a particular age vs sample mean height from a population of boys.
is the difference between the means a freak result arising from sampling variation?
or does it reflect true differences in height between the population of boysand the population of girls?
One way of tackling this quandary is to estimate the confidence interval for the difference in the two means.
This will tell us the range of likely values for that difference the in the whole population. IZzZZpZ`ZI"z""p"`",k
The following calculations assumes that the two populations (and hence the two samples) are independent
i.e. someone in the first population can occur in the second.
This is distinct from situations where the researcher observes the same person before and after a treatment (for such experiments we use a Paired Samples Confidence Interval).
There are two formulas for calculating the confidence interval for comparing two population means:
one assumes equal (or homogeneous) variances across the two populations,
the other assumes unequal (or heterogeneous) variances across the two populations.
Later on in the course we shall look at hypothesis tests that help us decide on whether or not the variances are the same (e.g. Levene s test).idi""""""d"""G"
"*"m$1.1 Pooled variance (see M&M p.537)&%( VThe confidence interval for the difference between two population means is given by:
l,Alternatively, we can use the macro command:CI_S2Mp
Small Independent Samples CI
for difference between 2 means
(pooled variance M&M p.538)
The syntax for the command is entered as follows:
CI_S2Mp n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
ZZZ3ZGZZ
Z3""">""n{E.g. mean height of girls in our sample of 10 = 100 cm (s.d. = 30cm), and the mean height of 12 boys is 94cm (sd = 31cm). v " " G" " " ""nTo find 95% confidence interval for the difference in population means we would enter the following:
CI_S2Mp n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
which results in a v. wide interval:
ZeZ&e Y""' o&1.2 Different Variance (see M&M p.532)'(VThe confidence interval for the difference between two population means is given by:
p,Alternatively, we can use the macro command:CI_S2Md
Small Independent Samples CI for differences between 2 means
different variances (M&M p.532).
Arguments are entered in the same way as for CI_S2Mp:
CI_S2Md n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
A
"9""qApplying CI_S2Md to our girl/boy heights difference in means example:2I $c$9"$OCI_S2Md n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
(LLLf/2. CI for two paired means (see M&M p.501503)0 Suppose we have two sets of observations on the same individuals:
as in a before and after trial,
our two samples are said to be paired
We can:
compute the mean & s.d. of the difference between the two sets of results
e.g. average improvement & s.d of improvement
apply the one sample confidence interval for the mean procedure.
If large sample use: CI_L1M n=(?) x_bar=(?) s=(?) c=(?).
If small sample use: CI_S1M n=(?) x_bar=(?) s=(?) c=(?).BZLZ ZKZ2ZAZZBL K2A #rue.g. Mean Quality of Life score for 100 amputees: ave. improvement since amputation = 5.3; s.d. of improvement = 4.2.vv 2ACI_S1M n=(100) x_bar=(5.3) s=(4.2) c=(0.99).
Small sample confidence interval for the population mean
N X_BAR TIL SE ERR LOWER UPPER
100.00000 5.30000 2.62641 .42000 1.10309 4.19691 6.40309
The experiment (!) has produced a fairly narrow interval for the improvement score, even at the 99% confidence level
NB lower bound is positive, so amputation likely to beneficial on average in population.8ZZuZYZu X
{Yg3. CI for one proportion Suppose 3,314 out of a sample of 17,096 students reveal that they are binge drinkers (M&M p. 572ff), find the 95% confidence interval for the proportion of binge drinkers.
CI_L1P n=(17096) x=(3314) c=(.95).
ZUU""#""s
As it happens, there is very little difference between the Traditional and Wilson methods in this particular example.
Using the latter method, we estimate with 95% confidence that between 18.799% and 19.984% of college students are frequent binge drinkers. 4xx h4. CI for two proportions
See M&M p.589[5. Sample size determinationbSuppose you want to estimate the average weight of 5 year olds with a margin of error e of 2 pounds when you apply a 95% confidence interval.
Sample size necessary for estimating the population mean with the desired accuracy will be given by:
Sample size necessary for estimating the population proportion with a desired level of accuracy would be: kVi\Example:For your PhD, you want to estimate the mean hourly wage rate of unskilled labour in Easterhouse within $0.10 at the 95% confidence level. A 1987 study (large sample size) by the Department of Employment resulted in a standard deviation of 0.85. Using this as an approximation for s, compute the necessary sample size to arrive at the desired level of accuracy.DmgO T
]
The maximum allowable error e = 0.1
The z* value for 95% confidence interval = 1.96
Our best estimate of the population s.d. s = 0.85
Entering these values in the formula gives:
round up to 278 to ensure our sample size is large enough.LfgtUsing the N1_L1M syntax:8
a wN1_L1M
Sample size for desired margin or error for the mean (M&M p.425).
N1_L1M e=(0.1) c=(0.95) s=(0.85) .
tB B % d,Summary: in this session we have looked at:
#$1. CI for two independent means
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination
b
a
vFAQ on SE & CIs:Q1/ Is the "standard error" the same as the "margin of error"?
A/ No. The "Standard Error" has a very precise statistical meaning:
it is "the standard deviation of the sampling distribution of the mean (or proportion)".
That is, it is the name we give to the amount sample means will vary from sample to sample.
If sample means don't vary much from sample to sample (i.e. the "sampling distribution of the mean" is fairly peaked), then the standard deviation of means (i.e. the "SE of the mean") will be small.
If, on the other hand, sample means do vary considerably from sample to sample (i.e. the "sampling distribution of the mean" is well spread  fairly flat) then we will find that the SE of the mean will be large.
nZZZZ?"FPZ]w
Note that when we refer to a "sampling distribution" we refer to the distribution of means from repeated samples OF THE SAME SIZE.
I.e. each sample we take has the same number of observations.
In other words, there will be a different sampling distribution for each sample size.
Hence, for each sampling distribution there will be a different standard deviation ("standard error").
As you might expect, the larger the sample size, the more peaked the sampling distribution, and the smaller the standard error.
The sampling distribution we are interested in for a particular problem will of course be the one defined by the size of the sample we are dealing with at the time.
ZZZZZ:Bb>Xgx
)"Margin of error", on the other hand, is a much looser term.
It is usually how much our estimate (e.g. of the population mean) differs from the true value.
If we want our margin of error to be small, we have to use a large sample.
The two concepts are not unrelated, however:
How close our sample estimate of the population mean will be to its true value will be determined by how much variation there is in sample means between samples.
So if the SE is small, the more accurate will be our estimate, and the smaller our margin of error will be.
j??>>any
Q2/ Is it possible to read the standard error as an individual figure by itself e.g 5.3 without having the sample details? Compared to 1.9, which one would you say is a higher standard error?
Suppose we are looking at the height of girls and boys in cm.
Let's also assume that the samples we have for boys and girls are the same size.
If for boys, the SE = 5.3cm, then we are saying that, on average, sample means vary by 5.3cm from the true population mean (which happens to equal the mean of all sample means).
If for girls, on the other hand, sample means tend to vary only by 1.9 from the population mean, then we know that the sampling distribution of mean height is much flatter for boys than for girls. HZZ"?>Qz
I.e. Mean height varies from sample to sample a lot more for boys than for girls.
This suggests that, for a given sample size, we shall be able to make a more accurate prediction of the population mean height of girls than of the population mean height of boys.
6TTS{
Q3/ It bothers me that an error can be inaccurate given a small sample size. Errors ARE inaccurate, how can it NOT be inaccurate? Only in statistics, right?
The problem is that we rarely know what the standard error of the mean is.
Think for a moment why this might be.
If the SE of the mean is the "standard deviation of means across repeated samples" then you'd think that the only way we can calculate it is by taking repeated samples.
Strictly speaking, the only way to arrive at the true value of the SE is in fact to take an infinite number of samples!
So even if we could afford to take 100 samples, the standard deviation of all the means we have calculated would still only be an ESTIMATE of the true value of the standard error.HZZ"uP'x
However, in practice we usually only have enough time and money to take a single sample.
Our dilemma is that we somehow have to estimate from a single sample what the variation might be of means from repeated samples!
All is not lost, however, because it turns out that the standard deviation of our single sample is related to the SE of the mean.
That is, the variation of the actual values of our variable within a particular sample is related to the variation of the mean of that variable from sample to sample.TZZ>Y} E.g. Average grade received for Quantitative Methods.
If you had access to data on all previous classes, you could calculate the average grade for each class.
The sampling distribution of the mean would simply be the histogram of the means you have calculated for each class.
Now, what we are saying is that if you don't in fact have access to data on all previous classes, but only the current class, then the variation in marks amongst your colleagues in your year (the standard deviation of individual grades) will tell you something about how much the average grade is likely to vary from year to year (the standard error of the mean).
It won't be a perfect predictor but it s the best we can do. *88>um?~
What we do know is that the amount by which the average grade varies from year to year will dependend on the size of class in each year (which we assume constant across all years).
If the size of the class in each year is 500, then the average grade will be pretty similar across years. If the class size in each year is only 10, then the average grade will vary considerably from year to year.
So, to account for the effect of sample size, our estimate of the standard error of the mean would be equal to the standard deviation of grades amongst your colleagues, divided by the square root of the number of students in the class.
For example, if the standard deviation of grades is 15 marks, and the size of the class is 50, then your estimate of the standard error of the mean would be 15/7.07 = 2.12. That is, you reckon that the mean grade in each year typically varies by 2.12 marks or so around the mean of all grades from all years (the "population mean")..ZZ,[
N
hThis statement is still rather vague, however, since we have said "typically".
It would be nice if we could give a probability to this.
That is, we'd like to say something like, that we are 95% sure that the average grade across all years lies between a and b.
But how can we work out where 95% of sample means lie?
To do this, we make use of the fact that the sampling distribution is normal (Central Limit Theorem) and that this means we can translate our knowledge of the sampling distribution (i.e. our estimate of how flat it is, the SE, which we have estimated to be 2.12), into finding the appropriate "margin of error".
This margin of error is found by multiplying our estimated standard error by the z score associated with the central 95% of z values, which turns out to be 1.96. So, 1.96 multiplied by 2.12, gives you a margin of error of 4.15 marks. ^QZZ8Z(ZQ8(PP79
We haven't said yet what the average grade in your year is. Lets say its 68 (you're a bright bunch!).
Therefore, we can be 95% sure that the average grade across all years is 68 plus or minus a margin of error of 2.12. I.e. we can be 95% sure that the population mean grade lies between 66 and 70, or thereabouts.
The important assumption here, of course, is that the current class of students constitutes a simple random sample of all students in all years.
This would not be the case if, as some claim, students are gradually getting more intelligent (due to improvements in diet, preschool education, and, apparently, computer games and TV!).
thZZZZZh/! ` .T3f` T3f3f` 999MMM` lff3f3` eoHff33Ҷ` ff!>?" dd@,?nPd@ d " @ ` n?" dd@ @@``PR( @ ` `Hp>>LD8(
8T
8"T
8C#Z
8
BCEF@` o
8
BpC=E$F =pp@ ` 5
8
BpC=EFh=pp@ `;
8
BpCrE0F"5EhrrpEp5HP,5@ ` ~
8
BpC=E$F h==pp*@ `= c
8
BpCE0F"`Xpp@ `=
8
BxCjE<F&=0j==`=8jx=x h@ `: u
8
BpC=E$F =pp@ `
8
BpC=EFh=pp@ `A
8
BpCrE0F"5EhrrpEp5HP,5@ `
8
BpC=E$F h==pp*@ `
8
BpCE0F"`Xpp@ `
8
BxCjE<F&=0j==`=8jx=x h@ `
8
BpC=E$F =pp@ `R
8
BpC=EFh=pp@ `
8
BpCrE0F"5EhrrpEp5HP,5@ `
8
B#CqEFp#q#'p@ `]
8
BpCE0F"`Xpp@ `_
8
BxCjE<F&=0j==`=8jx=x h@ `[v
8B
FZ0e0e ?BCEFvvvd@5%8c8c
?1d0u0@Ty2 NP'p<'p@A)BCDE?]@ `v
8B
FZ0e0e ?BCEFvvv@5%8c8c
?1d0u0@Ty2 NP'p<'p@A)BCDE?k@ `
8
s* "
T Click to edit Master title style!
!
8
c$ "
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
8
c$& "k
\* 2
8
c$L+ "`
^*(2
8
c$<0 "`@
^*(2Z
8Bh))? ? T3f3f Dad`s Tie @<A(
<T
<"T
<C#
<
BCEF@` o
<
BpC=E$F =pp@ ` 5
<
BpC=EFh=pp@ `;
<
BpCrE0F"5EhrrpEp5HP,5@ ` ~
<
BpC=E$F h==pp*@ `= c
<
BpCE0F"`Xpp@ `=
<
BxCjE<F&=0j==`=8jx=x h@ `: u
<
BpC=E$F =pp@ `
<
BpC=EFh=pp@ `A
<
BpCrE0F"5EhrrpEp5HP,5@ `
<
BpC=E$F h==pp*@ `
<
BpCE0F"`Xpp@ `
<
BxCjE<F&=0j==`=8jx=x h@ `
<
BpC=E$F =pp@ `R
<
BpC=EFh=pp@ `
<
BpCrE0F"5EhrrpEp5HP,5@ `
<
B#CqEFp#q#'p@ `]
<
BpCE0F"`Xpp@ `_
<
BxCjE<F&=0j==`=8jx=x h@ `[X
<B
(0e0e ?BCEFvvvd@5%8c8c
?1d0u0@Ty2 NP'p<'p@A)BCDE?]@ `@
<B
0e0e ?BCEFvvv@5%8c8c
?1d0u0@Ty2 NP'p<'p@A)BCDE?k@ `
<
0! "}!
T Click to edit Master title style!
!
<
c$! "
!
W#Click to edit Master subtitle style$
$
<
c$! "`!
`* 2
<
c$t! "` !
b*(2
<
c$# "`@!
b*(2Z
<Bh))? ? T3f3f0zr`
(
0X# P
#
P*
0\# #
R*
d
c$ ?
#
0Y#
@#
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
6Xc# `P #
P*
6Ti# ` #
R*
H
0h ? ̙33 06(
~
s*<#<J#
x
c$#<#
H
0h ? @ ff3Ιd332z
p<(
~
s*1#8
#
~
s*2#8p#
H
0h ? @ ff3Ιd332z
06(
0~
0 s*7#8
#
x
0 c$P8#8`pP#
H
00h ? @ ff3Ιd332z
$(
r
S<#8
#
r
ST=#8#
H
0h ? T3f3f
0((
0
<A?8p}
#H
0h ? T3f3f=
}(
0G#0
OSummary of all CI macros: 2$
NA ?8p`#H
0h ? T3f3f
46(
4~
4 s*M#8
#
x
4 c$TN#8Pp #
H
40h ? @ ff3Ιd332z
2(
S y#8
#
r
Sx{#8#
H
0h ? T3f3fj
(
r
S#8#
H
0h ? T3f3f
>6
(
r
S@#8`
#
r
S#8#
HA? ?!
C!x
<A
??
0#`00
6where, 2H
0h ? T3f3f
$(
r
SH 8
r
S\E 8@
H
0h ? T3f3fn
(
r
S#8`
#
r
S#8P#
d
<1A?x
x
<A??
@4
0#
`
6i.e. 2 H
0h ? T3f3f
xp(
r
S] 8`0
r
St^ 8
\
0a `0
where, df = min[n11, n21]
2$$$,$$,$ x
<A??GH
0h ? T3f3f
$(
r
S$ 8
r
S 8@
H
0h ? T3f3fh
0(
r
S4#8
#
r
S#8#
^
6A?
x
<A ??/
0#@`
6i.e. 2 H
0h ? T3f3f
@2(
S#8
#
r
SX#8#
H
0h ? T3f3f
P$(
r
S 8
r
S 8@
H
0h ? T3f3fP
`(
r
S8
r
S8
d
<A
? H
0h ? T3f3f
p$(
r
S8
r
ST8
H
0h ? T3f3f
$(
r
S8
r
S#8
H
0h ? T3f3f
(
~
s*@#8
#
~
s*#8#
l
0A??
l
0A$??w@w
0#
9Where p* is your guesstimate of the population proportion8:
21H
0h ? @ ff3Ιd332z
<(
~
s*#8
~
s*X$8
H
0h ? @ ff3Ιd332zp
(
~
s*h18
~
s*38
l
0A%??zH
0h ? @ ff3Ιd332zJ
(
r
S<88
r
S:8
^
6A?P
CH
0h ? T3f3f
0(
x
c$L8
x
c$,G8
H
0h ? T3f3ft
$(

<A& 8H
0h ? T3f3ft
$(

<A' 8H
0h ? T3f3f
0(
x
c$[XuVQpƸdpRӧT)NWBp549jޗlMS+q9GKWs@#g}lwB<+0~:k_gŧ m>aI
x$rxm2v_mFӵI'o(**Vܘc_[oOa
$SƑlB
JYst?`ZSW}Y^OL~u?>ѩݬ;
jzxVKTQ?<L$"TP͌ CgEEE@pSMZI>g:^M!qgq97=l{>0ք(tT*˥*_U!OlA
j*y}]ϖtat{y"@xki? c
Tl S'i=:xdh'_j^ͭjq1@=}8h{@ͫ{kr_g4qi.MOx.N=Jg&5N/Wg'&b?4o
@7I)ttEd̼kQ
5@(Ŏ51<S
D`4<8t>TU_AhgBioKrYΆ,umO0SBt2t`@Ai\V̸"~M#a<\M9^[/~͞tbI$59U~ސ'8OR>KD>9Hu?3S~}>D=&9A9:6ܝs>
)'ٜlNY8 S0neC,'pwD5@inIә\!SM7R {E0oЅBK%o[rU4odlS.mkw~xVMhSAݗ楶韖!T0$)I["h
J&iԒSKЃ=/zĞAz 3XyLꛦgN>X+\*Fwy@v߮ڥl>~$Go fEA5.]\y%hn)ab54cu^+O0võ7~:iOVjs! >ԮoA?ٺvI\t8]@>p##@n@nq@iPNHO?1yFZeU!_ISl4O%H3>^w^JeȺv_p`I1dmQQtT8. z]9Snn̫uQG#tʣz(`#6Y7$SCձuB;Kng4OM'Ǚ6,cOw$%uv/ENl6mX;$@N^k[W(6pDYRZ_x&PO6V8Xr8PmW:zE60
i#1ͬ>pil;FTZ6I!;;!8lM~h1jOZK[u(k=x!jq9Mgb jkEC:,T mSCg7A{Q0}\nByΧz*@uP;,dÊ#r.ia;(a[zyĊݙji/ xpVLx[{tՙ)
4 IlH#;NRt톱4EF3B3cHӦPyne@;L{@ӿ?7l79)` h p6@+X@9 hXW/t/s.v$K
vЮ0o5JE/g`IΛ+ЂΥOQE{xW}mܳImciou0e p?~qZ8+晛<}1og])5b~1=1e'Oh뽑^ $*r'Q}PB&/9{N~EN#'2'j}K.UNUVԸ.*z6I2ي
!,T%MWw05~@U3i&JҤj]#J TDOW>Yh{64B1G*ثt),aX ױ'5Wn1>WoJPWH
e*oP#h
]ʷDIM5UeYս0bOj̻::(*ѤnqċRDx@)a:;%MMCPX"Ig`9Ea5J^#!i6Uˋ:E%4XQmP4&KnYVT5.Ee>669f\,!*8QQNH#) D5Id#Ё
ڐIVH1O³`vΑ~rD#s!eV5}S#^#YE:.zs8I. 'kjr,V8Qd7jy(`0ȱr$Xe]Wެ v@Y`M 0 B^N'"DqDpep
W\6FYmMx S
OpؤM=67t+"=^hD _*ڈjF/)vfDoPA`F̌x(x41sR?@E4D؞z@ڞFSRe5n6
EFBj.0h+3 ڭy$GB_B߀,Đ/7lVFHۀ0AWOmَ]mNJx',)zX$\h`s5wt72da[Ij̰ѯr8A WjB N}`9潦qrҥK`mLg Rm\y6E 'lLeBP&O)ܑ4wN:Ѥ0Gh
2
9mEy:4.j1{V{Y7Bs^;tpA='ҹc%ע2fd3bT[dt4oGJkNy#
W?;ZsZ;jvGZ
hM)=u@㸰ba拴,gԾrU\6\&{{JA.
xgt92\xdi&SEsL3khmZ\%]7k[2o1d,tQE~{)pGy>^<#侄.QoˊRZ90ѬY4&'`);NkQJ+,btxK_Z3AO!R
3K $%?XtcBEEәHiGs\Cbx;8P1f}iM͗%DP͝Rh/OAl$Ejx]@1}v%[Rˤy.'jGK=jU2UuZ)g^xǥ IFO7]R{l1JSlZs9e
TS?TK/.2mt
.uo0qW$uQrP,_ez*::eG{{&FR
Z{2qKK/n[
yzUЅM.>L i>jHȚQ҉Mq/&俱'^*nQ$9܄
:hHn
l"
DAj
(274[#_l?fvo`pŪUhߦ=rUr5KÒkXrufzΒ
l\*[{M,v}\M_B%=" ~;%6Ol}ڋ 0A
{TpH}RgԐjKַ7IhPw~HrD>wEQibig
nѼ3
f5+oNjLe5MXtnhƎw[o{;)lbflɎY5 B*ExkqXAQV6мrQ;.WkNg6xksטyYf[(v]ңw +I؟&ͭ8\PLT]ߤ.woI>'o<yNn&/H\No0<H]m\1]҃wZu6+f6aOבד=I9{=@aA9$ŻF4]*/"7<3ÖLN%є>Lq}gVcݩo.]9Fg +=ݣ!ψq=C2OĨ^e<?t?L菉`Lb
Kb9Y=LľS똑#7s~:9ߘ9ݴ0eNådoB+~r
f 4Kȱd/ e,٠2*6
7`D#00sAzEi%
%y
z<@)`73TH)q>(߀C=j=!&3223r1
53031pYfd3 &
<"F"CA!%IaXH1f@lW\3avj4[?X?(RYz鉘0=_QH$ ~jl?@UxUJ1=IkZDA\W>@lr>V[QpcG uƅ^N}RInn$g.7G[`S~4MlVQQxQ{Xޅ?8N6?9\'MD1R8To˕*({&R{C7(Q<y~ **On8xp^RЀ3ÿlHbPLPÄ!14axĐ%
y@(XyT0cH!M00#3+SrRIp101"X}"0e(#!dԧAP{=t 9e!~[Yx}V({I{y_Hf&秕(&d)0\b@>grS!.`2z4z24{BLfddbdja &,je`0c`jc0g`j҃M"'S222$KL`2Je`!'b=59j?Pq=1xVOhEMv>IjCECE&
twFO%)`x zAJNEԃV Eq߷VD%c̛y~w6&z$Ix4E4h4,M)g0f ߦ72}U`J?+ͥbs=P/ka3OؚrN@aEx='ipmY5PY;E_t]4e}㿭1@nj=ҠUsȁi?pNqw8ayw5y눦DW_4@+fUz'1ݟ%X9y:d+
AaUkwӃ:'lįWi.y[z;aO#]6Y4ͳ=9;}B2+ @=)%ͱm(i[B[@}_45`{b1BպFhѺgJpjp\+*꿸H\>ͲQyq%KҌ'G)ƣ+ɰKg2*ZuR4+.f>8\12h3Ox)=Sና?Ga9qÃ*F~;qsdyۤ1GSn84R1&7y:wVf[<ٵ*I:J}IUI($YEvB
h JQoݬڊ.
^FOzxVAkSAv7kkZ<ِCGSI0I&U"HE{x xTI7o=b̾}}i{Lvgvvgvwf6<}F.BV! 44^zG
;L4mWrԮ\
3r+Ҩ[pGT)װTK6aA7%[Ovg+cwN?M>Zݿ22juk N.nF?v83~`e9ߜgg&sk=_ȧvbm#Ysh$C'2QN\:o7\'1uAi=ʙҪ<K$"$&ni˝~bTΧm zFF8y:ݘIo/VTi~rS/f=kPaOFq,vCl,QOcv̮]9ְ$<~9zf
w`ѵEhasL<
qEt<0?ɗpZN&`V_O{~;c:Y^(e+n; ӟAVzI?g}:w?l⬴`j(`l lR/
Tq[.ZriMBY`FiH'!+㐛ULV& D#YLmlr ˾@~ vypeGe21PU0F@P24zy608T<ZA1)
P? (
TEquation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.0Oh+'0P!hp $
DP
\hpLecture 2 Calculating z ScoresGraduate SchoolOC:\Program Files\Microsoft Office\Templates\Presentation Designs\Dad`s Tie.potFaculty32uMicrosoft PowerPointoso@ /q@^Q@m>Ûc
G;
@&@ &&#TNPP2OMia
&
TNPP &&TNPP@
&F !@ !@&&(&&&%$$$$&&D&T0$B$A$@$;$3$*$!$$$$$BB$&&<q&$o$?$===
>@@?oo$&&&T.$$$ $&&k&($$p$nmlllmnp$&&&3f0$$$$$$$$$$$$
$&&&3f<$$$!"#$&&H&T0$F$E$D$?$7$.$%$$$$$FF$&&@v&$t$C$AAA
BDDCtt$&&&3f.$$$$&&o&($$t$rqpppqrt$&&&3f2$$$$$$$$$$$$
$&&&<$$$!"#$&&&0$$$$$$$$$$$$$&&&3f$$$
$&&c&.$$f$edcccdef$&&)&$'#$'%%$%%&'#&&J&3f2$H$G$F$A$9$0$'$ $$$$
HH$&&Ci&3f<$f$F$DDEFFEDDDEFfghhhggfffff!f"f#f$&&&&&&$$$$vvv$&&&&:$lI&mS9yS+&&&&
&&&&:$lI&mS9yS+&&$$$$vvv$&
&&&
(&&&$
$$%%$%%((&&&&@$zaF*w;zcN:(
&&&&&&
&&&&@$zaF*w;zcN:(
&&&&$
$$%%$%%((&
&&&&a4&w@
eww 0wf &,!4& &L4& @"Arial
ww 0wf . 2
1.3 @Times New Romanww 0wf 3f.2
W: Lecture 4". 3f.2
:Confidence Intervals for '. 3f.2
:
All Occasions**. 3f@"Arial
#ww 0wf .2
>SSS I . .2
oGwilym Pryce .@"Arial
nww 0wf .2
www.gpryce.com
.c e "Systemf
!&TNPP &. . 2
V... .
!"uation Equation.30,Microsoft Equation 3.00(MindManager Document 8Mindjet.MindManager.Document0MindManager Map0Equation Equation.30,Microsoft Equation 3.00
Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Twww.gpryce.comF/0DTimes New Romant\ )0tY0$DArialNew Romant\ )0tY0$" DWingdingsRomant\ )0tY0$0DCourier Newmant\ )0tY0$1@DSymbol Newmant\ )0tY0$
`.
@n?" dd@ @@``
D
!"#$%&'()*+,2$j{c1}X%Y$2$d5tO5:ސ2$sD^ogaCiV"2$`y5txyvxb$"qpD<6YӰb$:Zd$XØc#l2$8qrFCWCY!2$lFI[DԼ+NvT$b$$,;P%R$b@&`s4$$$$$$$$$$$$$$$2$Zӿ>g!m.xGS2$U>pI0o7sXT$$$$$$$2$GYu`N
[2$4 ꐄHM5l\2$7}ܥ7
^2$HKOH k0@8ʚ;2Nʚ;g4^d^d )0h%ppp@<4!d!d0$<4dddd0$<4dddd0$VN___PPT90(e?O=_i0Lecture 4Confidence Intervals for All Occasions116"SSS I
Gwilym Pryce
www.gpryce.com (# "T
0!Notices:0Register
Class Reps and Staff Student committee.1 (Aims & ObjectivesAim
To consider the appropriate confidence interval procedures for a range of situations.
Objectives
By the end of this session, students should be able to:
Run confidence intervals on 2 means;
Run confidence intervals on proportions.hW9NW 9 N uIntroduction
SPSS can produce confidence intervals for the mean when you have the original data
go to Analyze, Descriptive Statistics, Explore
But its not so useful when you have only summary information.
I.e. when you are only given the mean, s.d. & n
& or when you want a CI for something other than the mean of one population
E.g. if you want a CI for the difference between 2 means;
E.g. if you want a CI for a proportion (particularly if you want to use the more robust Wilson method)
In situations like these you either need to be familiar with the appropriate formulas or you need to know how to use the custom macros& this lecture introduces both.SZ0Z>Z0ZN
!"#$%&'()*+,./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~W
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQSTUVXYZ[\]^_`abcdefghijklmnopqrstuvwxy{}~Root EntrydO)m>ÛR PicturesuCurrent User#5SummaryInformation(!PowerPoint Document(;?lDocumentSummaryInformation80Equation Equation.30,Microsoft Equation 3.00(MindManager Document 8Mindjet.MindManager.Document0MindManager Map0Equation Equation.30,Microsoft Equation 3.00
Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Equation Equation.30,Microsoft Equation 3.00Twww.gpryce.comF/0DTimes New Romant\ )0tY0$DArialNew Romant\ )0tY0$" DWingdingsRomant\ )0tY0$0DCourier Newmant\ )0tY0$1@DSymbol Newmant\ )0tY0$
`.
@n?" dd@ @@``
@+
!"#$%&'()*+2$j{c1}X%Y$2$d5tO5:ސ2$sD^ogaCiV"2$`y5txyvxb$"qpD<6YӰb$:Zd$XØc#l2$8qrFCWCY!2$lFI[DԼ+NvT$b$$,;P%R$b@&`s4$$$$$$$$$$$$$$$2$Zӿ>g!m.xGS2$U>pI0o7sXT$$$$$$$2$GYu`N
[2$4 ꐄHM5l\2$7}ܥ7
^2$HKOH k0@8ʚ;2Nʚ;g4^d^d )0h%ppp@<4!d!d0$<4dddd0$<4dddd0$VN___PPT90(e?O=_i0Lecture 4Confidence Intervals for All Occasions116"SSS I
Gwilym Pryce
www.gpryce.com (# "T
0!Notices:0Register
Class Reps and Staff Student committee.1 (Aims & ObjectivesAim
To consider the appropriate confidence interval procedures for a range of situations.
Objectives
By the end of this session, students should be able to:
Run confidence intervals on 2 means;
Run confidence intervals on proportions.hW9NW 9 N uIntroduction
SPSS can produce confidence intervals for the mean when you have the original data
go to Analyze, Descriptive Statistics, Explore
But its not so useful when you have only summary information.
I.e. when you are only given the mean, s.d. & n
& or when you want a CI for something other than the mean of one population
E.g. if you want a CI for the difference between 2 means;
E.g. if you want a CI for a proportion (particularly if you want to use the more robust Wilson method)
In situations like these you either need to be familiar with the appropriate formulas or you need to know how to use the custom macros& this lecture introduces both.SZ0Z>Z0ZNZZZS### #####>0Nc
_
Plan1. CI for two independent means
1.1 Pooled Variances
1.2 Different Variances
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination< n  n eCI for two independent means" "(SSometimes we want to compare the means of two independent populations.
E.g. sample mean height from a population of girls of a particular age vs sample mean height from a population of boys.
is the difference between the means a freak result arising from sampling variation?
or does it reflect true differences in height between the population of boys and the population of girls?
One way of tackling this quandary is to estimate the confidence interval for the difference in the two means.
This will tell us the range of likely values for that difference the in the whole population. IZzZZpZ`ZI"z""p"`"k
The following calculations assumes that the two populations (and hence the two samples) are independent:
i.e. someone in the first population cannot occur in the second.
This is distinct from situations where the researcher observes the same person before and after a treatment (for such experiments we use a Paired Samples Confidence Interval).
There are two formulas for calculating the confidence interval for comparing two population means:
one assumes equal (or homogeneous) variances across the two populations,
the other assumes unequal (or heterogeneous) variances across the two populations.
Later on in the course we shall look at hypothesis tests that help us decide on whether or not the variances are the same (e.g. Levene s test).id\""""""""d"""G"
"*"m$1.1 Pooled variance (see M&M p.537)&%( VThe confidence interval for the difference between two population means is given by:
l,Alternatively, we can use the macro command:CI_S2Mp
Small Independent Samples CI
for difference between 2 means
(pooled variance M&M p.538)
The syntax for the command is entered as follows:
CI_S2Mp n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
ZZZ3ZGZZ
Z3""">""nE.g. mean height of girls in our sample of 10 = 100 cm (s.d. = 30cm), and the mean height of 12 boys is 94cm (sd = 31cm). All are the same age. v " " G" " " ""n!To find 95% confidence interval for the difference in population means we would enter the following:
CI_S2Mp n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
which results in a v. wide interval:
^eZ&e"Y""'"o&1.2 Different Variance (see M&M p.532)'(VThe confidence interval for the difference between two population means is given by:
p,Alternatively, we can use the macro command:dCI_S2Md
Small Independent Samples CI for differences between 2 means
different variances (M&M p.532).
Arguments1 are entered in the same way as for CI_S2Mp:
CI_S2Md n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
1 Argument = Independent variable determining the value of function (OED)
j
"<"*
"6"""NqApplying CI_S2Md to our girl/boy heights difference in means example:2I $c$9"$OCI_S2Md n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
(LLLf/2. CI for two paired means (see M&M p.501503)0 Suppose we have two sets of observations on the same individuals:
as in a before and after trial,
our two samples are said to be paired
We can:
compute the mean & s.d. of the difference between the two sets of results
e.g. average improvement & s.d of improvement
apply the one sample confidence interval for the mean procedure.
If large sample use: CI_L1M n=(?) x_bar=(?) s=(?) c=(?).
If small sample use: CI_S1M n=(?) x_bar=(?) s=(?) c=(?).BZLZ ZKZ2ZAZZBL K2A #rue.g. Mean Quality of Life score for 100 amputees: ave. improvement since amputation = 5.3; s.d. of improvement = 4.2.vv 2ACI_S1M n=(100) x_bar=(5.3) s=(4.2) c=(0.99).
Small sample confidence interval for the population mean
N X_BAR TIL SE ERR LOWER UPPER
100.00000 5.30000 2.62641 .42000 1.10309 4.19691 6.40309
The experiment (!) has produced a fairly narrow interval for the improvement score, even at the 99% confidence level
NB lower bound is positive, so amputation likely to beneficial on average in population.8ZZuZYZu X
{Yg3. CI for one proportion Suppose 3,314 out of a sample of 17,096 students reveal that they are binge drinkers (M&M p. 572ff), find the 95% confidence interval for the proportion of binge drinkers.
CI_L1P n=(17096) x=(3314) c=(.95).
ZUU""#""s
As it happens, there is very little difference between the Traditional and Wilson methods in this particular example.
Using the latter method, we estimate with 95% confidence that between 18.799% and 19.984% of college students are frequent binge drinkers. 4xx h4. CI for two proportions
See M&M p.589[5. Sample size determinationbSuppose you want to estimate the average weight of 5 year olds with a margin of error e of 2 pounds when you apply a 95% confidence interval.
Sample size necessary for estimating the population mean with the desired accuracy will be given by:
Sample size necessary for estimating the population proportion with a desired level of accuracy would be: kVl4
+\Example:For your PhD, you want to estimate the mean hourly wage rate of unskilled labour in Easterhouse within $0.10 at the 95% confidence level. A 1987 study (large sample size) by the Department of Employment resulted in a standard deviation of 0.85. Using this as an approximation for s, compute the necessary sample size to arrive at the desired level of accuracy.DmgO T
]
The maximum allowable error e = 0.1
The z* value for 95% confidence interval = 1.96
Our best estimate of the population s.d. s = 0.85
Entering these values in the formula gives:
round up to 278 to ensure our sample size is large enough.LfgtUsing the N1_L1M syntax:8
a wN1_L1M
Sample size for desired margin or error for the mean (M&M p.425).
N1_L1M e=(0.1) c=(0.95) s=(0.85) .
tB B % d,Summary: in this session we have looked at:
#$1. CI for two independent means
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination
Reading:Chapter 4 of Pryce (2005) Inference and Statistics in SPSS
M&M 4th Ed.
section 6.3 and exercises for 6.3
Sections 6.1 (p. 415429); 7.1 and 7.2. Chapter 8.
6GUGUb
a
vFAQ on SE & CIs:Q1/ Is the "standard error" the same as the "margin of error"?
A/ No. The "Standard Error" has a very precise statistical meaning:
SE is "the standard deviation of the sampling distribution of the mean (or proportion)".
That is, it is the name we give to the amount sample means will vary from sample to sample.
If sample means don't vary much from sample to sample (i.e. the "sampling distribution of the mean" is fairly peaked), then the standard deviation of means (i.e. the "SE of the mean") will be small.
If, on the other hand, sample means do vary considerably from sample to sample (i.e. the "sampling distribution of the mean" is well spread  fairly flat) then we will find that the SE of the mean will be large.
nZZZZ?"F>]w
Note that when we refer to a "sampling distribution" we refer to the distribution of means from repeated samples OF THE SAME SIZE.
I.e. each sample we take has the same number of observations.
In other words, there will be a different sampling distribution for each sample size.
Hence, for each sampling distribution there will be a different standard deviation ("standard error").
As you might expect, the larger the sample size, the more peaked the sampling distribution, and the smaller the standard error.
The sampling distribution we are interested in for a particular problem will of course be the one defined by the size of the sample we are dealing with at the time.
ZZZZZ:Bb>Xgx
)"Margin of error", on the other hand, is a much looser term.
It is usually how much our estimate (e.g. of the population mean) differs from the true value.
If we want our margin of error to be small, we have to use a large sample.
The two concepts are not unrelated, however:
How close our sample estimate of the population mean will be to its true value will be determined by how much variation there is in sample means between samples.
So if the SE is small, the more accurate will be our estimate, and the smaller our margin of error will be.
j??>>any
Q2/ What scale is the SE measured in? Is it possible to read the standard error as an individual figure by itself e.g 5.3 without having the sample details? Compared to 1.9, which one would you say is a higher standard error?
Suppose we are looking at the height of girls and boys in cm.
Let's also assume that the samples we have for boys and girls are the same size.
If for boys, the SE = 5.3cm, then we are saying that, on average, sample means vary by 5.3cm from the true population mean (which happens to equal the mean of all sample means).
If for girls, on the other hand, sample means tend to vary only by 1.9 from the population mean, then we know that the sampling distribution of mean height is much flatter for boys than for girls. H!ZZ"?> Qz
I.e. Mean height varies from sample to sample a lot more for boys than for girls.
This suggests that, for a given sample size, we shall be able to make a more accurate prediction of the population mean height of girls than of the population mean height of boys.
6TTS{
Q3/ It bothers me that an error can be inaccurate given a small sample size. Errors ARE inaccurate, how can it NOT be inaccurate? Only in statistics, right?
The problem is that we rarely know what the standard error of the mean is.
Think for a moment why this might be.
If the SE of the mean is the "standard deviation of means across repeated samples" then you'd think that the only way we can calculate it is by taking repeated samples.
Strictly speaking, the only way to arrive at the true value of the SE is in fact to take an infinite number of samples!
So even if we could afford to take 100 samples, the standard deviation of all the means we have calculated would still only be an ESTIMATE of the true value of the standard error.HZZ"uP'x
In practice we usually only have enough time and money to take a single sample.
Our dilemma is that we somehow have to estimate from a single sample what the variation might be of means from repeated samples!
All is not lost, however, because it turns out that the standard deviation of our single sample is related to the SE of the mean.
That is, the variation of the actual values of our variable within a particular sample is related to the variation of the mean of that variable from sample to sample.TQQ>P} bE.g. Average grade received SSS1.
If you had access to data on all previous classes, you could calculate the average grade for each class.
The sampling distribution of the mean would simply be the histogram of the means you have calculated for each class.
Now, what we are saying is that if you don't in fact have access to data on all previous classes, but only the current class, then the variation in marks amongst your colleagues in your year (the standard deviation of individual grades) will tell you something about how much the average grade is likely to vary from year to year (the standard error of the mean).
It won't be a perfect predictor but it s the best we can do. *$$>um?~
What we do know is that the amount by which the average grade varies from year to year will depend on the size of class in each year (which we assume constant across all years).
If the size of the class in each year is 500, then the average grade will be pretty similar across years. If the class size in each year is only 10, then the average grade will vary considerably from year to year.
So, to account for the effect of sample size, our estimate of the standard error of the mean would be equal to the standard deviation of grades amongst your colleagues, divided by the square root of the number of students in the class.
For example, if the standard deviation of grades is 15 marks, and the size of the class is 50, then your estimate of the standard error of the mean would be 15/7.07 = 2.12. That is, you reckon that the mean grade in each year typically varies by 2.12 marks or so around the mean of all grades from all years (the "population mean").FZZyxN
hThis statement is still rather vague, however, since we have said "typically".
It would be nice if we could give a probability to this.
That is, we'd like to say something like, that we are 95% sure that the average grade across all years lies between a and b.
But how can we work out where 95% of sample means lie?
To do this, we make use of the fact that the sampling distribution is normal (Central Limit Theorem) and that this means we can translate our knowledge of the sampling distribution (i.e. our estimate of how flat it is, the SE, which we have estimated to be 2.12), into finding the appropriate "margin of error".
This margin of error is found by multiplying our estimated standard error by the z score associated with the central 95% of z values, which turns out to be 1.96. So, 1.96 multiplied by 2.12, gives you a margin of error of 4.15 marks. ^QZZ8Z(ZQ8(PP79
We haven't said yet what the average grade in your year is. Lets say its 68 (you're a bright bunch!).
Therefore, we can be 95% sure that the average grade across all years is 68 plus or minus a margin of error of 2.12. I.e. we can be 95% sure that the population mean grade lies between 66 and 70, or thereabouts.
The important assumption here, of course, is that the current class of students constitutes a simple random sample of all students in all years.
This would not be the case if, as some claim, students are gradually getting more intelligent (due to improvements in diet, preschool education, and, apparently, computer games and TV!).
thZZZZZh/06(
~
s*<#<J#
x
c$#<#
H
0h ? @ ff3Ιd332z
$(
r
S<#8
#
r
ST=#8#
H
0h ? T3f3f
2(
S y#8
#
r
Sx{#8#
H
0h ? T3f3fj
(
r
S#8#
H
0h ? T3f3fn
(
r
S#8
#
r
S#8<@
#
d
<1A?x
x
<A??
@4
0#
`
6i.e. 2 H
0h ? T3f3f
$(
r
S$ 8
r
S 8@
H
0h ? T3f3f
)(
~
s*@#8
#
~
s*#8#
l
0A??
l
0A$??w@w
0# 0
9Where p* is your guesstimate of the population proportionH:
21H
0h ? @ ff3Ιd332z
<$(
<r
< Sܴ8
r
< S< 8
H
<0h ? T3f3f
0(
x
c$0Nc
_
Plan1. CI for two independent means
1.1 Pooled Variances
1.2 Different Variances
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination< n  n eCI for two independent means" "(SSometimes we want to compare the means of two independent populations.
E.g. sample mean height from a population of girls of a particular age vs sample mean height from a population of boys.
is the difference between the means a freak result arising from sampling variation?
or does it reflect true differences in height between the population of boys and the population of girls?
One way of tackling this quandary is to estimate the confidence interval for the difference in the two means.
This will tell us the range of likely values for that difference the in the whole population. IZzZZpZ`ZI"z""p"`"k
The following calculations assumes that the two populations (and hence the two samples) are independent:
i.e. someone in the first population cannot occur in the second.
This is distinct from situations where the researcher observes the same person before and after a treatment (for such experiments we use a Paired Samples Confidence Interval).
There are two formulas for calculating the confidence interval for comparing two population means:
one assumes equal (or homogeneous) variances across the two populations,
the other assumes unequal (or heterogeneous) variances across the two populations.
Later on in the course we shall look at hypothesis tests that help us decide on whether or not the variances are the same (e.g. Levene s test).id\""""""""d"""G"
"*"m$1.1 Pooled variance (see M&M p.537)&%( VThe confidence interval for the difference between two population means is given by:
l,Alternatively, we can use the macro command:CI_S2Mp
Small Independent Samples CI
for difference between 2 means
(pooled variance M&M p.538)
The syntax for the command is entered as follows:
CI_S2Mp n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
ZZZ3ZGZZ
Z3""">""nE.g. mean height of girls in our sample of 10 = 100 cm (s.d. = 30cm), and the mean height of 12 boys is 94cm (sd = 31cm). All are the same age. v " " G" " " ""n!To find 95% confidence interval for the difference in population means we would enter the following:
CI_S2Mp n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
which results in a v. wide interval:
^eZ&e"Y""'"o&1.2 Different Variance (see M&M p.532)'(VThe confidence interval for the difference between two population means is given by:
p,Alternatively, we can use the macro command:dCI_S2Md
Small Independent Samples CI for differences between 2 means
different variances (M&M p.532).
Arguments1 are entered in the same way as for CI_S2Mp:
CI_S2Md n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?) c=(?).
1 Argument = Independent variable determining the value of function (OED)
j
"<"*
"6"""NqApplying CI_S2Md to our girl/boy heights difference in means example:2I $c$9"$OCI_S2Md n1=(10) n2=(12) x_bar1=(100) x_bar2=(94) s1=(30) s2=(31) c=(.95).
(LLLf/2. CI for two paired means (see M&M p.501503)0 Suppose we have two sets of observations on the same individuals:
as in a before and after trial,
our two samples are said to be paired
We can:
compute the mean & s.d. of the difference between the two sets of results
e.g. average improvement & s.d of improvement
apply the one sample confidence interval for the mean procedure.
If large sample use: CI_L1M n=(?) x_bar=(?) s=(?) c=(?).
If small sample use: CI_S1M n=(?) x_bar=(?) s=(?) c=(?).BZLZ ZKZ2ZAZZBL K2A #rue.g. Mean Quality of Life score for 100 amputees: ave. improvement since amputation = 5.3; s.d. of improvement = 4.2.vv 2ACI_S1M n=(100) x_bar=(5.3) s=(4.2) c=(0.99).
Small sample confidence interval for the population mean
N X_BAR TIL SE ERR LOWER UPPER
100.00000 5.30000 2.62641 .42000 1.10309 4.19691 6.40309
The experiment (!) has produced a fairly narrow interval for the improvement score, even at the 99% confidence level
NB lower bound is positive, so amputation likely to beneficial on average in population.8ZZuZYZu X
{Yg3. CI for one proportion Suppose 3,314 out of a sample of 17,096 students reveal that they are binge drinkers (M&M p. 572ff), find the 95% confidence interval for the proportion of binge drinkers.
CI_L1P n=(17096) x=(3314) c=(.95).
ZUU""#""s
As it happens, there is very little difference between the Traditional and Wilson methods in this particular example.
Using the latter method, we estimate with 95% confidence that between 18.799% and 19.984% of college students are frequent binge drinkers. 4xx h4. CI for two proportions
See M&M p.589[5. Sample size determinationbSuppose you want to estimate the average weight of 5 year olds with a margin of error e of 2 pounds when you apply a 95% confidence interval.
Sample size necessary for estimating the population mean with the desired accuracy will be given by:
Sample size necessary for estimating the population proportion with a desired level of accuracy would be: kVl4
+\Example:For your PhD, you want to estimate the mean hourly wage rate of unskilled labour in Easterhouse within $0.10 at the 95% confidence level. A 1987 study (large sample size) by the Department of Employment resulted in a standard deviation of 0.85. Using this as an approximation for s, compute the necessary sample size to arrive at the desired level of accuracy.DmgO T
]
The maximum allowable error e = 0.1
The z* value for 95% confidence interval = 1.96
Our best estimate of the population s.d. s = 0.85
Entering these values in the formula gives:
round up to 278 to ensure our sample size is large enough.LfgtUsing the N1_L1M syntax:8
a wN1_L1M
Sample size for desired margin or error for the mean (M&M p.425).
N1_L1M e=(0.1) c=(0.95) s=(0.85) .
tB B % d,Summary: in this session we have looked at:
#$1. CI for two independent means
2. CI for two paired means
3. CI for one proportion
4. CI for two proportions
5. Sample size determination
Reading:Chapter 4 of Pryce (2005) Inference and Statistics in SPSS
M&M 4th Ed.
section 6.3 and exercises for 6.3
Sections 6.1 (p. 415429); 7.1 and 7.2. Chapter 8.
6GUGUb
a
vFAQ on SE & CIs:Q1/ Is the "standard error" the same as the "margin of error"?
A/ No. The "Standard Error" has a very precise statistical meaning:
SE is "the standard deviation of the sampling distribution of the mean (or proportion)".
That is, it is the name we give to the amount sample means will vary from sample to sample.
If sample means don't vary much from sample to sample (i.e. the "sampling distribution of the mean" is fairly peaked), then the standard deviation of means (i.e. the "SE of the mean") will be small.
If, on the other hand, sample means do vary considerably from sample to sample (i.e. the "sampling distribution of the mean" is well spread  fairly flat) then we will find that the SE of the mean will be large.
nZZZZ?"F>]w
Note that when we refer to a "sampling distribution" we refer to the distribution of means from repeated samples OF THE SAME SIZE.
I.e. each sample we take has the same number of observations.
In other words, there will be a different sampling distribution for each sample size.
Hence, for each sampling distribution there will be a different standard deviation ("standard error").
As you might expect, the larger the sample size, the more peaked the sampling distribution, and the smaller the standard error.
The sampling distribution we are interested in for a particular problem will of course be the one defined by the size of the sample we are dealing with at the time.
ZZZZZ:Bb>Xgx
)"Margin of error", on the other hand, is a much looser term.
It is usually how much our estimate (e.g. of the population mean) differs from the true value.
If we want our margin of error to be small, we have to use a large sample.
The two concepts are not unrelated, however:
How close our sample estimate of the population mean will be to its true value will be determined by how much variation there is in sample means between samples.
So if the SE is small, the more accurate will be our estimate, and the smaller our margin of error will be.
j??>>any
Q2/ What scale is the SE measured in? Is it possible to read the standard error as an individual figure by itself e.g 5.3 without having the sample details? Compared to 1.9, which one would you say is a higher standard error?
Suppose we are looking at the height of girls and boys in cm.
Let's also assume that the samples we have for boys and girls are the same size.
If for boys, the SE = 5.3cm, then we are saying that, on average, sample means vary by 5.3cm from the true population mean (which happens to equal the mean of all sample means).
If for girls, on the other hand, sample means tend to vary only by 1.9 from the population mean, then we know that the sampling distribution of mean height is much flatter for boys than for girls. H!ZZ"?> Qz
I.e. Mean height varies from sample to sample a lot more for boys than for girls.
This suggests that, for a given sample size, we shall be able to make a more accurate prediction of the population mean height of girls than of the population mean height of boys.
6TTS{
Q3/ It bothers me that an error can be inaccurate given a small sample size. Errors ARE inaccurate, how can it NOT be inaccurate? Only in statistics, right?
The problem is that we rarely know what the standard error of the mean is.
Think for a moment why this might be.
If the SE of the mean is the "standard deviation of means across repeated samples" then you'd think that the only way we can calculate it is by taking repeated samples.
Strictly speaking, the only way to arrive at the true value of the SE is in fact to take an infinite number of samples!
So even if we could afford to take 100 samples, the standard deviation of all the means we have calculated would still only be an ESTIMATE of the true value of the standard error.HZZ"uP'x
In practice we usually only have enough time and money to take a single sample.
Our dilemma is that we somehow have to estimate from a single sample what the variation might be of means from repeated samples!
All is not lost, however, because it turns out that the standard deviation of our single sample is related to the SE of the mean.
That is, the variation of the actual values of our variable within a particular sample is related to the variation of the mean of that variable from sample to sample.TQQ>P} bE.g. Average grade received SSS1.
If you had access to data on all previous classes, you could calculate the average grade for each class.
The sampling distribution of the mean would simply be the histogram of the means you have calculated for each class.
Now, what we are saying is that if you don't in fact have access to data on all previous classes, but only the current class, then the variation in marks amongst your colleagues in your year (the standard deviation of individual grades) will tell you something about how much the average grade is likely to vary from year to year (the standard error of the mean).
It won't be a perfect predictor but it s the best we can do. *$$>um?~
What we do know is that the amount by which the average grade varies from year to year will depend on the size of class in each year (which we assume constant across all years).
If the size of the class in each year is 500, then the average grade will be pretty similar across years. If the class size in each year is only 10, then the average grade will vary considerably from year to year.
So, to account for the effect of sample size, our estimate of the standard error of the mean would be equal to the standard deviation of grades amongst your colleagues, divided by the square root of the number of students in the class.
For example, if the standard deviation of grades is 15 marks, and the size of the class is 50, then your estimate of the standard error of the mean would be 15/7.07 = 2.12. That is, you reckon that the mean grade in each year typically varies by 2.12 marks or so around the mean of all grades from all years (the "population mean").FZZyxN
hThis statement is still rather vague, however, since we have said "typically".
It would be nice if we could give a probability to this.
That is, we'd like to say something like, that we are 95% sure that the average grade across all years lies between a and b.
But how can we work out where 95% of sample means lie?
To do this, we make use of the fact that the sampling distribution is normal (Central Limit Theorem) and that this means we can translate our knowledge of the sampling distribution (i.e. our estimate of how flat it is, the SE, which we have estimated to be 2.12), into finding the appropriate "margin of error".
This margin of error is found by multiplying our estimated standard error by the z score associated with the central 95% of z values, which turns out to be 1.96. So, 1.96 multiplied by 2.12, gives you a margin of error of 4.15 marks. ^QZZ8Z(ZQ8(PP79
We haven't said yet what the average grade in your year is. Lets say its 68 (you're a bright bunch!).
Therefore, we can be 95% sure that the average grade across all years is 68 plus or minus a margin of error of 2.12. I.e. we can be 95% sure that the population mean grade lies between 66 and 70, or thereabouts.
The important assumption here, of course, is that the current class of students constitutes a simple random sample of all students in all years.
This would not be the case if, as some claim, students are gradually getting more intelligent (due to improvements in diet, preschool education, and, apparently, computer games and TV!).
thZZZZZh/0zr`
(
0X# ;9
#
P*
0\# u 9#
R*
d
c$ ?=t
#
0Y# :v#
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
6Xc# 4;m #
P*
6Ti# 4u m#
R*
H
06g ? ̙33D8(
D
D
0(v ;9
>*
D
0 u 9
@*
D
6D 4;m
>*
D
6J 4u m
@*H
D06g ? ̙330P4(
d
c$=u
#
s*s#:v#
H
06g ? ̙33r 1i`j+f
k