## Lecture One

· __Statistics__:

· __Descriptive__:

· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).

· How? Tabular, graphic, and numerical.

· __Inferential__:

· (Probability) sample population (of interest)

Sample statistic population parameter

· How? Confidence interval, hypothesis testing.

· __Variable__: things that vary. That is, different cases have different values. [draw data structure]

· __Unit of analysis__: what the statement/variable is about. To see why this matters, consider the following hypothetical example:

· Are female applicants less likely to get admitted than male applicants *at the department level*?

· Are female applicants less likely to get admitted than male applicants *at the college level*?

· __Scale of measurement__:

· Variables are always coded using numeric values, but values have different functions.

· __Nominal__: values are used for differentiation only. Race, gender, marital status, etc.

· __Ordinal__: ranking order of values is meaningful. Life satisfaction, level of support, etc.

· __Interval__: distance between values is meaningful. HDI, IQ, temperature, etc.

· __Ratio__: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.

· Usually, we call both interval and ratio variables __continuous variables__.

· Why important? It determines what technique to use.

· [optional] Reliability and validity issues:

· __Reliability__: consistency across repeated measurements.

· __Validity__:

· __Construct validity__: are we measuring what we want to measure?

· __Internal validity__: causality = association + temporal order + lack of spuriousness.

· __External validity__: generalizability.

· __Reliability__ and __construct validity__ are criteria for measurement evaluations.

· __Internal__ and __external validities__ are criteria used to evaluate research designs.

· There are more kinds of validities, which mean different things to different researchers.

· More on __variable__:

· Why variable? Population heterogeneity == the fundamental truth of social science.

· Essentialism vs. Population thinking: is variation REAL?

· Data structure: row == case, column == variable

· What method to use: depends on the scale of measurement/type of the variable(s)

· What it is about: unit of analysis

· Variables in relationship: __independent variable__ (a.k.a. predictor, explanatory variable, exogenous variable) __dependent variable__ (a.k.a. response, outcome, endogenous variable)

· Univariate description:

· The choice of methods (mainly) depends on the level of measurement.

· __Frequency Table__: title, frequency count ( *f*), relative frequency ( *p*), relative cumulative frequency

· __Bar Chart__: title, axes, labels, bars, (spacing b/w bars)

· __Histogram__: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)

· __Contingency Table__: cross-classification, cell counts, marginal totals, row/column percentages.

· To be continued…

· Bivariate relationship:

· __Association__: X is associated/correlated with Y.

· __Influence__: X has an impact on Y.

· __Causality__: X causes Y.

· (Randomized Controlled) Experiment: random assignment

· Observational Study: association/influence + temporal order + no spuriousness

· Important implication: association or influence ≠ causality

· A third variable Z might complicate the observed bivariate relationship b/w X and Y:

· __Spuriousness__: the observed XY relationship is due to a common cause Z.

· Interaction/specification/ __modification__: the observed XY relationship differs toward different groups of Z.

· __Mediation__/mechanism/intervening: th