Lecture One
· Statistics:
· Descriptive:
· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).
· How? Tabular, graphic, and numerical.
· Inferential:
· (Probability) sample population (of interest)
Sample statistic population parameter
· How? Confidence interval, hypothesis testing.
· Variable: things that vary. That is, different cases have different values. [draw data structure]
· Unit of analysis: what the statement/variable is about. To see why this matters, consider the following hypothetical example:
· Are female applicants less likely to get admitted than male applicants at the department level?
· Are female applicants less likely to get admitted than male applicants at the college level?
· Scale of measurement:
· Variables are always coded using numeric values, but values have different functions.
· Nominal: values are used for differentiation only. Race, gender, marital status, etc.
· Ordinal: ranking order of values is meaningful. Life satisfaction, level of support, etc.
· Interval: distance between values is meaningful. HDI, IQ, temperature, etc.
· Ratio: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.
· Usually, we call both interval and ratio variables continuous variables.
· Why important? It determines what technique to use.
· [optional] Reliability and validity issues:
· Reliability: consistency across repeated measurements.
· Validity:
· Construct validity: are we measuring what we want to measure?
· Internal validity: causality = association + temporal order + lack of spuriousness.
· External validity: generalizability.
· Reliability and construct validity are criteria for measurement evaluations.
· Internal and external validities are criteria used to evaluate research designs.
· There are more kinds of validities, which mean different things to different researchers.
· More on variable:
· Why variable? Population heterogeneity == the fundamental truth of social science.
· Essentialism vs. Population thinking: is variation REAL?
· Data structure: row == case, column == variable
· What method to use: depends on the scale of measurement/type of the variable(s)
· What it is about: unit of analysis
· Variables in relationship: independent variable (a.k.a. predictor, explanatory variable, exogenous variable) dependent variable (a.k.a. response, outcome, endogenous variable)
· Univariate description:
· The choice of methods (mainly) depends on the level of measurement.
· Frequency Table: title, frequency count ( f), relative frequency ( p), relative cumulative frequency
· Bar Chart: title, axes, labels, bars, (spacing b/w bars)
· Histogram: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)
· Contingency Table: cross-classification, cell counts, marginal totals, row/column percentages.
· To be continued…
· Bivariate relationship:
· Association: X is associated/correlated with Y.
· Influence: X has an impact on Y.
· Causality: X causes Y.
· (Randomized Controlled) Experiment: random assignment
· Observational Study: association/influence + temporal order + no spuriousness
· Important implication: association or influence ≠ causality
· A third variable Z might complicate the observed bivariate relationship b/w X and Y:
· Spuriousness: the observed XY relationship is due to a common cause Z.
· Interaction/specification/ modification: the observed XY relationship differs toward different groups of Z.
· Mediation/mechanism/intervening: the observed XY relationship mediates through Z.
Practice Problems
1. Identify the level of measurement for each of the following variables:
(A) Type of residence (i.e., dorm, off-campus apartment, condominium, or parent’s home, other)
(B) Height in inches
(C) The rating of the overall quality of a textbook on a scale from “Excellent” to “Poor”
(D) Lab section
(E) Level of measurement
2. For each of the following research questions, identify the unit of analysis, the independent variable, and the dependent variable:
(A) Are social movement participation rates higher in countries that have a longer history of democratic rule?
(B) Do school districts with more highly educated teachers tend to have higher standardized test scores among the students in the district?
3. Assume that the answers to the research questions in Problem 2 are both yes. Consider the following additional variables that go with the above questions:
(A) median household income of the country
(B) median household income of the school district
For each research question, decide whether the additional variable is most likely to be a source of spuriousness, a possible mechanism, or a possible modifier of the implied relationships in question 2. Explain each answer in one or two sentences and draw a diagram showing how each additional variable is related to the independent and dependent variables from each part of Question 2.
[Note that mechanism and mediator and intervenor are synonymous and that interaction, specification, and modification are synonyms.]