2. What’s in this PowerPoint?
• Why learning statistics?
• Two Perspectives of Statistics
• Descriptive Statistics
• Inferential Statistics
3. Why is my evil
lecturer forcing me
to learn statistics?
4. Why oh why?
• What do you learn in this class?
Research
• What is research?
To answer some interesting questions
• How do you answer the research questions?
Collect data
Explain & analyze the data
• Numbers = data
6. So you’ve done hypothesis…
• Let’s identify the variables
• For example:
Research Question
• Is there a relationship between gender and English
competence?
Hypothesis
• There is a correlation between gender and English
competence
Variables?
9. Measuring Variables
Variables
categorical
Binary Only 2 categories
Nominal > 2 categories
Ordinal
Categories w/ logical ORDER,
difference doesn’t matter
continuous
Interval
equal interval = equal
difference
Ratio
The difference makes sense,
clear /natural 0
10. So what level of measurements are our variables?
Gender
• Categorical?
Binary? Male vs. Female
Nominal? Male vs. Female vs.
Gay vs. Lesbian
Ordinal? No!
• Continuous?
Interval? No!
Ratio? No!
English Competence
• Categorical?
Binary? No..
Nominal? No..
Ordinal? Beginner vs.
Intermediate vs. Advanced
(but…)
• Continuous?
Interval? GPA 1.5-4
Ratio? 0-100
11. But why do we need to know these?
• Statistics is about explaining the data in
meaningful ways and as detailed as possible
Meaningful
• Clear (female is not male, GPA 3.00>1.50 but those
with GPA 3.00 is not as twice smarter) descriptive
statistics
Detailed
• more accurate analyses, more accurate explanation
of the population inferential statistics
12. Golden Rule
• Aim for higher level of measurement
Binary
Nominal
Ordinal
Interval
Ratio
preferred
14. Data – what is it?
• In Quantitative research, data mostly consist
of numbers or words that are converted to
numbers (such as in discourse analysis)
15. How to prepare your data?
• Use tools!
Calculator – um, really?
MS Excel
SPSS
• Why Excel?
Ubiquitous
Free
Easy to use
Can be converted to SPSS for more
detailed analyses
16. Preparing the Data in MS Excel
• Open the file “Statistics-
Complete.xls”
• Columns variables
• Rows cases
• Cell Address
Column A to ZZ
Row 1, 2, 3 to ∞
Example: A2 column A, row
2
• First Row name of variable
(for analysis)
18. Two perspectives
• Descriptive Statistics
To describe or summarize the data
Results of the data only
• Inferential Statistics
To make inferences about the population from
the data (sample)
20. How do you describe data?
Data
Description
Itself
(size)
Frequency (how many/often)
Percentage (how big)
Against
each
other
Central tendency
(how they are
placed)
Mean
Median
Modus
Dispersion (how
they are spread)
Low vs. High
Range
Standard
Deviation
Against
population
Normal
distribution
Kurtosis
Skewness
21. Let’s learn and practice
• See the file “Statistics-complete.xls”
• You will find the data for the variables “gender” and
“competence”
• Variable in columns, cases in rows
• Variable naming rules (for exporting to SPSS)
Short, explanatory
Must be unique
No spaces, blanks, or !,?, ‘, and *
Must begin with a letter, followed by either a letter, any
digit, a full stop or symbols @, #, _ or $
Cannot end with a full stop or underscore
Are not case sensitive
22. Using Formula in MS Excel
• Go to Tab “Formula”
Click the icon fx “Insert
Function”
• Go to fx bar
Click the icon fx, choose
from the dropdown menu
• Type “=“ at the formula
bar, followed by the
formula
a pop-up text will guide
you on how the string of
the formula should be)
23. How do you describe data? By Itself
• Frequency – how many? How often?
A.k.a. tallies, To count up the number of things or
people in different categories
• Raw frequencies
COUNT – the number of cases (e.g. how many
cases)
COUNTIF – the number of cases based on certain
conditions (e.g. how many males/females)
SUM – the total of certain numbers (e.g. combining
2 variables)
24. How do you describe data? By Itself
• Group Sum/Percentage – how big?
Raw frequencies can be converted into
percentages
Graphical display of data (a.k.a. pie charts)
Other ways to display data (histogram, line)
• How?
Group the data – using COUNTIF
Insert Chart – using Tab “Insert” |
“Column” or “Pie”
25. How do you describe your data? Against each other
• Central Tendency – how are they placed
among each other?
The tendency of a set of numbers to cluster
around a particular value (Brown)
What are they?
• Mean
• Mode
• Median
26. How do you describe your data? Against each other
Mean
A.k.a. average
Sum of all values in a distribution divided by the
number of values
AVERAGE
27. How do you describe your data? Against each other
Mode
• Frequently occurring values in a set of numbers
• MODE
28. How do you describe your data? Against each other
Median
• The middle value
• The data needs to be sorted from smallest to highest
• MEDIAN
29. How do you describe your data? Against each other
• Dispersion
To what extent the individual values vary away
from the central tendency
What are they?
• Low-High
• Range
• Standard Deviation
30. How do you describe your data? Against each other
Low-High
• The lowest and the highest values
• MIN, MAX
Range
• The highest – the lowest + 1
• Input the MIN and MAX and calculate
Standard Deviation
• To what extent a set of scores varies in
relation to the mean
• STDEV
31. How do you describe your data? Against the population
Normal Distribution – how representative are they?
A.k.a. Bell Curve
How the values usually disperse in real
population
SDs -3 -2 -1 M 1 2 3
2.14% 13.59% 34.13% 34.13% 13.59% 2.14%
32. How do you describe your data? Against the population
Kurtosis
• How peaked or flat the curve
• The more positive, the more peaked
Skewness
• A few values are much larger or smaller than the
typical values found in the data set
• Negative vs. positive
NP
33. Checking Normality in MS Excel
• Create a BIN (percentile of
your data)
• Sort your data from the lowest
to the highest
• Create the case number (nth
data) 81 is 20th data
34. Using Normality Percentage
1. Remember the
percentage of normality
cumulative
percentage
• 2.14% lowest
2.14%
• 13. 59% low 15.73%
(2.14 + 13.59)
• 68.26% mid
83.99% (2.14 + 13.59
+ 68.26)
• 13.59% high
97.58%
• 2.14% highest
100%
2. Convert the data to
meet the percentage of
normality (e.g. the data
in the file is 20, so 20 is
100%, 19.516 is 97.58%,
and so on).
35. Using Normality Percentage
3. Identify the bin
numbers (cut points)
E.g. 100% is 20th data
case in the file 81
97.58% is the approx.
19th data case 79
4. Decide how many times
the data occur within
the bin numbers
[FREQUENCY] 46-47
pts = 1 time, 46-52 pts= 2
times, and so on; the
final one 81 should be
20 times
5. Decide the number of
the data under 47 is 1
score, 47-52 is 2 scores,
and so on.
36. Using Mean (Average) & Standard Deviation
1. Remember the
calculation for
normality using
average +/- standard
deviation (-3 to 3)
2. Calculate the
normality data for
deciding bin
numbers using the
formula:
M +/- (3*SD)
M +/- (2*SD)
M+ /- (1*SD)
• Follow Step. 4 & 5 in
using normality
percentage
37. Generating the histogram
1. Select the data in the ‘number of data’
2. Click in the Menu Bar – Insert | Column |
2D-Column
3. To make the histogram clearer, click the
whole histogram, right click ‘Select
Data’
• In ‘Horizontal (Category) Axis Labels,
click ‘Edit’
• In ‘Axis Label Range’ bar, select the
bin numbers, then ‘OK’ and ‘OK’
4. To add the trendline, select the bar
(yellow or green), click ‘Add Trendline’
• In ‘Trendline Options’, select
‘polynomial’ and adjust the order
(1/2/3/4) until it shows normality
line
38. Too complicated? Let’s try the smart way
• Activate Add-ins for Statistical Procedures
1
2
3
4
5
6
39. Smart Way…
• Once activated, you should have something
like this in your Menu:
40. How to do descriptive Statistic?
• Menu | Data Analysis | Descriptive Statistics
• Select the data range that you want as an
Input Range
• Select the output range
• Tick Summary Statistics
• Voila!