*=====================================================================.
*HOME COMPUTER Syntax for Lab 1 of Quantitative Methods 1.
*3 Oct 05, v6.
*=====================================================================.

*NB: it is assumed in this file that you are using your own PC (as opposed to a lab PC ) .
*It is also assumed that you have placed the data files (available from the Teaching page of www.gwilympryce.co.uk) .
*in a folder called C:\STATISTICS.
*If the files have been placed in an alternative folder, you will need to change the syntax below accordingly.

*=====================================================================.
*1.2.1 Example 1.3.3a How to Create a Histogram in SPSS (Pryce, p.1-24).
*=====================================================================.

*Open the householddata.sav file .  

  GET FILE='C:\STATISTICS\householddata.sav'.

  GRAPH  /HISTOGRAM=Dprice  /TITLE= 'Histogram of House Purchase Price '
  /FOOTNOTE= 'Source: Hypothetical House Price Data'.

  GRAPH   /HISTOGRAM=Bincbasic  /TITLE= 'Histogram of Basic Income'
  /FOOTNOTE= 'Source: Hypothetical Basic Income Data'.

*Edit the graphs by double clicking on them.

*=====================================================================.
1.2.2 Example 1.5.6a Computing the Sample Mean in SPSS (Pryce, p.1-33).
*=====================================================================.

* Open the employees.sav data.  
  
   GET FILE='C:\STATISTICS\employees.sav'.

*Compute the average number of employees who worked in the firms you sampled.  

*The simplest way to compute the mean of a variable is to use the Descriptives function.
 
  DESCRIPTIVES VARIABLES=size  /STATISTICS=MEAN .


*=====================================================================.
*1.2.3 Example 1.5.6b Computing the Sample Standard Deviation in SPSS (Pryce, p.1-35).
*=====================================================================.

* Open the employees.sav data. 
  
   GET FILE='C:\STATISTICS\employees.sav'.

*Compute the standard deviation of employees who worked in the firms you sampled.

  DESCRIPTIVES VARIABLES=size  /STATISTICS= STDDEV.

*If you want to calculate the variance as well, simply add this to the list of statistics you ask SPSS to compute.

  DESCRIPTIVES VARIABLES=size  /STATISTICS= STDDEV   VARIANCE.




*=====================================================================.
*1.2.4 Example 1.5.7 How to obtain a Summary of FIle Info (Pryce, p. 1-37).
*=====================================================================.

Open up the employees.sav dataset:

  GET FILE='C:\STATISTICS\employees.sav'.

*Click on File, Display Data File Information, Working File, or simply run the following syntax.

  DISPLAY DICTIONARY.


*=====================================================================.
*1.2.5 Exercise 2.2 Understanding & Calculating Areas under a Density Curve (Pryce, p.2-14).
*=====================================================================.

 
*=====================================================================.
*1.2.6 Example 2.6a Sampling Distribution of Means (Pryce p. 2-17).
*=====================================================================.

*You have the selling prices of all properties on the south side of Glasgow sold in the second half of 2004.  
*i.e. you have information on the population, a total of 3,731 sales.  

*1. Open the houseprices_pop_2004q3q4.sav .
*2. Run descriptives on the sellingprice variable to calculate the population mean.
*3. Run a histogram on the population selling price to verify that house prices do indeed have a non-normal distribution.
*4. Take a random sample of 100 prices from this population and calculate the mean selling price.  
*5. Take repeated random samples of 100 properties.
*6. Calculate the mean selling price of each sample.
*7. Plot a histogram of all the sample means.

*1. Open the houseprices_pop_2004q3q4.sav .

GET 'C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.

*2. Run descriptives on the sellingprice variable. 

  DESCRIPTIVES   VARIABLES=sellingprice .

*3. Run a histogram on the population selling price to verify that house prices do indeed have a non-normal distribution.

  GRAPH /HISTOGRAM=sellingprice /TITLE= 'Histogram of the Population of House Prices'.

*4. Take a random sample of 100 prices from this population and calculate the mean selling price.  
*Highlight all three lines below and run as one command:.

  TEMPORARY.
  SAMPLE 100 from 3731.
  DESCRIPTIVES   VARIABLES=sellingprice .
 
*5. Take repeated random samples of 100 properties.
*6. Calculate the mean selling price of each sample.
*7. Plot a histogram of all the sample means.

*You can do this by repeating step 4 above (making a note of the sample mean in each repetition), or you can use the CLT macro.

*### NB ### You need to be able to access your H: drive to be able to run this macro.

*To use the CLT macro, first highlight all of the program below and run as one command by pressing CTRL+R.

*.....................................................................................................................................................................................
*CLT macro for HOME use.
DEFINE CLT (variable = !ENCLOSE('(',')') /nsample = !ENCLOSE('(',')') /Npop = !ENCLOSE('(',')') /reps = !ENCLOSE('(',')') ).
!DO !L = 1 !TO !reps.
- TITLE !reps Repeated Samples of size !nsample .
- temporary.
- sample !nsample from !Npop.
- MATRIX.
- GET VARIABLE / VARIABLES = !variable.
- COMPUTE N = NROW(VARIABLE).
- COMPUTE I = MAKE(n,1,1).
- COMPUTE X_BAR = (1/N)*(TRANSPOS(I) * VARIABLE).
- SAVE {X_BAR} / OUTFILE =!CONCAT('CLT__', !variable, '_sample', !L, '.sav')  /VARIABLES = X_BAR.
- END MATRIX.
!DOEND.
GET FILE= !CONCAT('CLT__', !variable, '_sample', '1.sav').
!DO !J = 2 !TO !reps.
- ADD FILES /FILE=*
/FILE=!CONCAT('CLT__', !variable, '_sample', !J, '.sav').
- EXECUTE.
!DOEND.
SAVE / OUTFILE =!CONCAT('CLT__n', !nsample, !variable, '_sample', 'ALL', !reps, '.sav') .
TITLE !reps Repeated Samples of size !nsample .
GRAPH /HISTOGRAM=X_BAR /TITLE= 'Histogram of Sample Means from Repeated Samples'.
TITLE !reps Repeated Samples of size !nsample .
DESCRIPTIVES VARIABLES=X_BAR /STATISTICS=MEAN STDDEV MIN MAX .
!ENDDEFINE.
*.....................................................................................................................................................................................

*The CLT macro allows you to draw multiple random samples, and plots a histogram of the means of those samples.  
*The macro works by taking a random sample from the file currently in memory, .
*computing the mean and saving the mean of that sample as a separate single-cell data file in your current folder. 
*It repeats this until the desired number of samples have been extracted and then combines all the single-cell data files into a single column called X_BAR .
*and saves it as a new dataset. 
*It then runs a histogram on the column of sample means contained in the X_BAR variable, and finally computes a table of descriptives.

*To run the macro on sellingprice based on a sample size of 100, population of 3731 and 120 repetitions, you need to open you data set. 

  GET 'C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.

*Then run the following syntax:

  CLT variable=(sellingprice) nsample=(100) Npop=(3731) reps=(120).

*where the items in parenthesis can be changed according to the desired specification.  
*In this case, the variable we are interested in is sellingprice.  
*So this is the name of the variable we have entered in the first set of brackets.  
*We want to extract samples of size 100, so we enter ‘100’ in the second set of brackets.  
*We need to tell SPSS how large our population is.  In this case it is 3,731, which we enter in the third set of brackets.  
*Finally, we enter the number of repetitions – the number of samples we want to draw – in the fourth set of brackets.  
*Let’s go for 120.

*### NB ### the CLT macro as run above will create 120 files on your H: drive.
*you should delete these using My Computer once you have completed the exercise since subsequent exercises will create more files .
*and you don't want to clog up your H: drive.

*=====================================================================.
*1.2.7 Exercise 2.6b Impact on CLT of Reducing Sample Size (Pryce, p.2-20).
*=====================================================================.

*Open up the Glasgow_houseprices_pop_2004q3q4.sav file and re-run the CLT command with 60 observations. 
*Copy and paste the histogram of means into a word processor, then re-run the CLT command again with 30 observations.  
*Repeat for 20 observations, 10 and 5. Your syntax should look like this: .

*Sample size of 100.
  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.
  CLT variable=(sellingprice) nsample=(60) Npop=(3731) reps=(120).

*Sample size of 30.
  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.
  CLT variable=(sellingprice) nsample=(30) Npop=(3731) reps=(120).

*Sample size of 20.
  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.
  CLT variable=(sellingprice) nsample=(20) Npop=(3731) reps=(120).

*Sample size of 10.
  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.
  CLT variable=(sellingprice) nsample=(10) Npop=(3731) reps=(120).

*Sample size of 5.
  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.
  CLT variable=(sellingprice) nsample=(5) Npop=(3731) reps=(120).

 	 	 

*=====================================================================.
*1.2.8 Exercise 2.8 Proportions and the CLT (Pryce, p.2-23).
*=====================================================================.

*Open the Glasgow_houseprices_pop_2004q3q4.sav. 

  GET  FILE='C:\STATISTICS\Glasgow_houseprices_pop_2004q3q4.sav'.

*Create a new variable called ‘over100k’ which equals zero if the house price is less than or equal to Ł100,000.
*and equals one if the house price is greater than Ł100,000.  

  COMPUTE over100k = 0.
  IF(sellingprice > 100000) over100k = 1.
  EXECUTE.

*Alternatively, you could use the RECODE command:.

  RECODE  sellingprice  (Lowest thru 100000=0)  (100000.01 thru Highest=1)  INTO  over100k .
  EXECUTE .

*If you calculate the average of this variable it will give you the proportion of properties over Ł100K, so you can treat it as a proportion.  

  DESCRIPTIVES   VARIABLES=over100k  .

*Run a histogram on over100k.  

  GRAPH /HISTOGRAM=over100k /TITLE= 'Histogram: Proportion of House Prices > Ł100k'.

*Does it look as you expected? Now obtain the sampling distribution of the over100k variable using the CLT syntax.
*(let the sample size = 100, and use two hundred repetitions).  

  CLT   variable=(over100k)   nsample=(100)   Npop=(3731)   reps=(200).


*=====================================================================.
*1.2.9 Additional Exercise: Sampling Distribution of Mean Landvalue .
*(NB: this exercise uses one of the standard SPSS datafiles which come with SPSS 13).
*=====================================================================.
*Open the Home sales [by neighborhood].sav dataset which is in the C:\Program Files\SPSS\ folder on the lab computers:.

GET  FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'.

*1. Compare the histogram of the population land value .
*with the sampling distribution of mean land value.

*2. Compare the population mean land value .
*with the mean-of-means (i.e. the average value of the sampling distribution of mean land value).

*3. Attempt the exercise using sample sizes of 100, 30, and 5. Draw 140 samples.

*1. and 2.: First compute the population mean and run the population histogram:

GET  FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'.
DESCRIPTIVES VARIABLES=landval  /STATISTICS=MEAN STDDEV.
GRAPH /HISTOGRAM=landval /TITLE= 'Histogram: landval'.

*3. Then run the CLT syntax for the required sample sizes, based on 140 repetitions in each instance:

*Sample size of 100.
GET  FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'.
DESCRIPTIVES VARIABLES=landval  /STATISTICS=MEAN STDDEV.
CLT variable=(landval) nsample=(100) Npop=(2440) reps=(140).

*Sample size of 30.
GET  FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'.
DESCRIPTIVES VARIABLES=landval  /STATISTICS=MEAN STDDEV.
CLT variable=(landval) nsample=(30) Npop=(2440) reps=(140).

*Sample size of 5.
GET  FILE='C:\Program Files\SPSS\Home sales [by neighborhood].sav'.
DESCRIPTIVES VARIABLES=landval  /STATISTICS=MEAN STDDEV.
CLT variable=(landval) nsample=(5) Npop=(2440) reps=(140).



*=====================================================================.

*End of exercises.

*=====================================================================.