Lecture One
· Statistics:
· Descriptive:
· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).
· How? Tabular, graphic, and numerical.
· Inferential:
· (Probability) sample population (of interest)
Sample statistic population parameter
· How? Confidence interval, hypothesis testing.
· Variable: things that vary. That is, different cases have different values. [draw data structure]
· Unit of analysis: what the statement/variable is about. To see why this matters, consider the following hypothetical example:
· Are female applicants less likely to get admitted than male applicants at the department level?
· Are female applicants less likely to get admitted than male applicants at the college level?
· Scale of measurement:
· Variables are always coded using numeric values, but values have different functions.
· Nominal: values are used for differentiation only. Race, gender, marital status, etc.
· Ordinal: ranking order of values is meaningful. Life satisfaction, level of support, etc.
· Interval: distance between values is meaningful. HDI, IQ, temperature, etc.
· Ratio: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.
· Usually, we call both interval and ratio variables continuous variables.
· Why important? It determines what technique to use.
· [optional] Reliability and validity issues:
· Reliability: consistency across repeated measurements.
· Validity:
· Construct validity: are we measuring what we want to measure?
· Internal validity: causality = association + temporal order + lack of spuriousness.
· External validity: generalizability.
· Reliability and construct validity are criteria for measurement evaluations.
· Internal and external validities are criteria used to evaluate research designs.
· There are more kinds of validities, which mean different things to different researchers.
· More on variable:
· Why variable? Population heterogeneity == the fundamental truth of social science.
· Essentialism vs. Population thinking: is variation REAL?
· Data structure: row == case, column == variable
· What method to use: depends on the scale of measurement/type of the variable(s)
· What it is about: unit of analysis
· Variables in relationship: independent variable (a.k.a. predictor, explanatory variable, exogenous variable) dependent variable (a.k.a. response, outcome, endogenous variable)
· Univariate description:
· The choice of methods (mainly) depends on the level of measurement.
· Frequency Table: title, frequency count ( f), relative frequency ( p), relative cumulative frequency
· Bar Chart: title, axes, labels, bars, (spacing b/w bars)
· Histogram: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)
· Contingency Table: cross-classification, cell counts, marginal totals, row/column percentages.
· To be continued…
· Bivariate relationship:
· Association: X is associated/correlated with Y.
· Influence: X has an impact on Y.
· Causality: X causes Y.
· (Randomized Controlled) Experiment: random assignment
· Observational Study: association/influence + temporal order + no spuriousness
· Important implication: association or influence ≠ causality
· A third variable Z might complicate the observed bivariate relationship b/w X and Y:
· Spuriousness: the observed XY relationship is due to a common cause Z.
· Interaction/specification/ modification: the observed XY relationship differs toward different groups of Z.
· Mediation/mechanism/intervening: th