*=====================================================================. *HOME COMPUTER Syntax for Lab 1 of Quantitative Methods 1. *3 Oct 05, v6. *=====================================================================. *NB: it is assumed in this file that you are using your own PC (as opposed to a lab PC ) . *It is also assumed that you have placed the data files (available from the Teaching page of www.gwilympryce.co.uk) . *in a folder called C:\STATISTICS. *If the files have been placed in an alternative folder, you will need to change the syntax below accordingly. *=====================================================================. *1.2.1 Example 1.3.3a How to Create a Histogram in SPSS (Pryce, p.1-24). *=====================================================================. *Open the householddata.sav file . GET FILE='C:\STATISTICS\householddata.sav'. GRAPH /HISTOGRAM=Dprice /TITLE= 'Histogram of House Purchase Price ' /FOOTNOTE= 'Source: Hypothetical House Price Data'. GRAPH /HISTOGRAM=Bincbasic /TITLE= 'Histogram of Basic Income' /FOOTNOTE= 'Source: Hypothetical Basic Income Data'. *Edit the graphs by double clicking on them. *=====================================================================. 1.2.2 Example 1.5.6a Computing the Sample Mean in SPSS (Pryce, p.1-33). *=====================================================================. * Open the employees.sav data. GET FILE='C:\STATISTICS\employees.sav'. *Compute the average number of employees who worked in the firms you sampled. *The simplest way to compute the mean of a variable is to use the Descriptives function. DESCRIPTIVES VARIABLES=size /STATISTICS=MEAN . *=====================================================================. *1.2.3 Example 1.5.6b Computing the Sample Standard Deviation in SPSS (Pryce, p.1-35). *=====================================================================. * Open the employees.sav data. GET FILE='C:\STATISTICS\employees.sav'. *Compute the standard deviation of employees who worked in the firms you sampled. DESCRIPTIVES VARIABLES=size /STATISTICS= STDDEV. *If you want to calculate the variance as well, simply add this to the list of statistics you ask SPSS to compute. DESCRIPTIVES VARIABLES=size /STATISTICS= STDDEV VARIANCE. *=====================================================================. *1.2.4 Example 1.5.7 How to obtain a Summary of FIle Info (Pryce, p. 1-37). *=====================================================================. Open up the employees.sav dataset: GET FILE='C:\STATISTICS\employees.sav'. *Click on File, Display Data File Information, Working File, or simply run the following syntax. DISPLAY DICTIONARY. *=====================================================================. *1.2.5 Exercise 2.2 Understanding & Calculating Areas under a Density Curve (Pryce, p.2-14). *=====================================================================. *=====================================================================. *1.2.6 Example 2.6a Sampling Distribution of Means (Pryce p. 2-17). *=====================================================================. *You have the selling prices of all properties on the south side of Glasgow sold in the second half of 2004. *i.e. you have information on the population, a total of 3,731 sales. *1. Open the houseprices_pop_2004q3q4.sav . *2. Run descriptives on the sellingprice variable to calculate the population mean. *3. Run a histogram on the population selling price to verify that house prices do indeed have a non-normal distribution. *4. Take a random sample of 100 prices from this population and calculate the mean selling price. *5. Take repeated random samples of 100 properties. *6. Calculate the mean selling price of each sample. *7. Plot a histogram of all the sample means. *1. Open the houseprices_pop_2004q3q4.sav . GET 'C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. *2. Run descriptives on the sellingprice variable. DESCRIPTIVES VARIABLES=sellingprice . *3. Run a histogram on the population selling price to verify that house prices do indeed have a non-normal distribution. GRAPH /HISTOGRAM=sellingprice /TITLE= 'Histogram of the Population of House Prices'. *4. Take a random sample of 100 prices from this population and calculate the mean selling price. *Highlight all three lines below and run as one command:. TEMPORARY. SAMPLE 100 from 3731. DESCRIPTIVES VARIABLES=sellingprice . *5. Take repeated random samples of 100 properties. *6. Calculate the mean selling price of each sample. *7. Plot a histogram of all the sample means. *You can do this by repeating step 4 above (making a note of the sample mean in each repetition), or you can use the CLT macro. *### NB ### You need to be able to access your H: drive to be able to run this macro. *To use the CLT macro, first highlight all of the program below and run as one command by pressing CTRL+R. *..................................................................................................................................................................................... *CLT macro for HOME use. DEFINE CLT (variable = !ENCLOSE('(',')') /nsample = !ENCLOSE('(',')') /Npop = !ENCLOSE('(',')') /reps = !ENCLOSE('(',')') ). !DO !L = 1 !TO !reps. - TITLE !reps Repeated Samples of size !nsample . - temporary. - sample !nsample from !Npop. - MATRIX. - GET VARIABLE / VARIABLES = !variable. - COMPUTE N = NROW(VARIABLE). - COMPUTE I = MAKE(n,1,1). - COMPUTE X_BAR = (1/N)*(TRANSPOS(I) * VARIABLE). - SAVE {X_BAR} / OUTFILE =!CONCAT('CLT__', !variable, '_sample', !L, '.sav') /VARIABLES = X_BAR. - END MATRIX. !DOEND. GET FILE= !CONCAT('CLT__', !variable, '_sample', '1.sav'). !DO !J = 2 !TO !reps. - ADD FILES /FILE=* /FILE=!CONCAT('CLT__', !variable, '_sample', !J, '.sav'). - EXECUTE. !DOEND. SAVE / OUTFILE =!CONCAT('CLT__n', !nsample, !variable, '_sample', 'ALL', !reps, '.sav') . TITLE !reps Repeated Samples of size !nsample . GRAPH /HISTOGRAM=X_BAR /TITLE= 'Histogram of Sample Means from Repeated Samples'. TITLE !reps Repeated Samples of size !nsample . DESCRIPTIVES VARIABLES=X_BAR /STATISTICS=MEAN STDDEV MIN MAX . !ENDDEFINE. *..................................................................................................................................................................................... *The CLT macro allows you to draw multiple random samples, and plots a histogram of the means of those samples. *The macro works by taking a random sample from the file currently in memory, . *computing the mean and saving the mean of that sample as a separate single-cell data file in your current folder. *It repeats this until the desired number of samples have been extracted and then combines all the single-cell data files into a single column called X_BAR . *and saves it as a new dataset. *It then runs a histogram on the column of sample means contained in the X_BAR variable, and finally computes a table of descriptives. *To run the macro on sellingprice based on a sample size of 100, population of 3731 and 120 repetitions, you need to open you data set. GET 'C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. *Then run the following syntax: CLT variable=(sellingprice) nsample=(100) Npop=(3731) reps=(120). *where the items in parenthesis can be changed according to the desired specification. *In this case, the variable we are interested in is sellingprice. *So this is the name of the variable we have entered in the first set of brackets. *We want to extract samples of size 100, so we enter ‘100’ in the second set of brackets. *We need to tell SPSS how large our population is. In this case it is 3,731, which we enter in the third set of brackets. *Finally, we enter the number of repetitions – the number of samples we want to draw – in the fourth set of brackets. *Let’s go for 120. *### NB ### the CLT macro as run above will create 120 files on your H: drive. *you should delete these using My Computer once you have completed the exercise since subsequent exercises will create more files . *and you don't want to clog up your H: drive. *=====================================================================. *1.2.7 Exercise 2.6b Impact on CLT of Reducing Sample Size (Pryce, p.2-20). *=====================================================================. *Open up the Glasgow_houseprices_pop_2004q3q4.sav file and re-run the CLT command with 60 observations. *Copy and paste the histogram of means into a word processor, then re-run the CLT command again with 30 observations. *Repeat for 20 observations, 10 and 5. Your syntax should look like this: . *Sample size of 100. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. CLT variable=(sellingprice) nsample=(60) Npop=(3731) reps=(120). *Sample size of 30. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. CLT variable=(sellingprice) nsample=(30) Npop=(3731) reps=(120). *Sample size of 20. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. CLT variable=(sellingprice) nsample=(20) Npop=(3731) reps=(120). *Sample size of 10. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. CLT variable=(sellingprice) nsample=(10) Npop=(3731) reps=(120). *Sample size of 5. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. CLT variable=(sellingprice) nsample=(5) Npop=(3731) reps=(120). *=====================================================================. *1.2.8 Exercise 2.8 Proportions and the CLT (Pryce, p.2-23). *=====================================================================. *Open the Glasgow_houseprices_pop_2004q3q4.sav. GET FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'. *Create a new variable called ‘over100k’ which equals zero if the house price is less than or equal to £100,000. *and equals one if the house price is greater than £100,000. COMPUTE over100k = 0. IF(sellingprice > 100000) over100k = 1. EXECUTE. *Alternatively, you could use the RECODE command:. RECODE sellingprice (Lowest thru 100000=0) (100000.01 thru Highest=1) INTO over100k . EXECUTE . *If you calculate the average of this variable it will give you the proportion of properties over £100K, so you can treat it as a proportion. DESCRIPTIVES VARIABLES=over100k . *Run a histogram on over100k. GRAPH /HISTOGRAM=over100k /TITLE= 'Histogram: Proportion of House Prices > £100k'. *Does it look as you expected? Now obtain the sampling distribution of the over100k variable using the CLT syntax. *(let the sample size = 100, and use two hundred repetitions). CLT variable=(over100k) nsample=(100) Npop=(3731) reps=(200). *=====================================================================. *1.2.9 Additional Exercise: Sampling Distribution of Mean Landvalue . *(NB: this exercise uses one of the standard SPSS datafiles which come with SPSS 13). *=====================================================================. *Open the Home sales [by neighborhood].sav dataset which is in the C:\Program Files\SPSS\ folder on the lab computers:. GET FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'. *1. Compare the histogram of the population land value . *with the sampling distribution of mean land value. *2. Compare the population mean land value . *with the mean-of-means (i.e. the average value of the sampling distribution of mean land value). *3. Attempt the exercise using sample sizes of 100, 30, and 5. Draw 140 samples. *1. and 2.: First compute the population mean and run the population histogram: GET FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'. DESCRIPTIVES VARIABLES=landval /STATISTICS=MEAN STDDEV. GRAPH /HISTOGRAM=landval /TITLE= 'Histogram: landval'. *3. Then run the CLT syntax for the required sample sizes, based on 140 repetitions in each instance: *Sample size of 100. GET FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'. DESCRIPTIVES VARIABLES=landval /STATISTICS=MEAN STDDEV. CLT variable=(landval) nsample=(100) Npop=(2440) reps=(140). *Sample size of 30. GET FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'. DESCRIPTIVES VARIABLES=landval /STATISTICS=MEAN STDDEV. CLT variable=(landval) nsample=(30) Npop=(2440) reps=(140). *Sample size of 5. GET FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'. DESCRIPTIVES VARIABLES=landval /STATISTICS=MEAN STDDEV. CLT variable=(landval) nsample=(5) Npop=(2440) reps=(140). *=====================================================================. *End of exercises. *=====================================================================.