## Lecture One

· __Statistics__:

· __Descriptive__:

· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).

· How? Tabular, graphic, and numerical.

· __Inferential__:

· (Probability) sample population (of interest)

Sample statistic population parameter

· How? Confidence interval, hypothesis testing.

· __Variable__: things that vary. That is, different cases have different values. [draw data structure]

· __Unit of analysis__: what the statement/variable is about. To see why this matters, consider the following hypothetical example:

· Are female applicants less likely to get admitted than male applicants *at the department level*?

· Are female applicants less likely to get admitted than male applicants *at the college level*?

· __Scale of measurement__:

· Variables are always coded using numeric values, but values have different functions.

· __Nominal__: values are used for differentiation only. Race, gender, marital status, etc.

· __Ordinal__: ranking order of values is meaningful. Life satisfaction, level of support, etc.

· __Interval__: distance between values is meaningful. HDI, IQ, temperature, etc.

· __Ratio__: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.

· Usually, we call both interval and ratio variables __continuous variables__.

· Why important? It determines what technique to use.

· [optional] Reliability and validity issues:

· __Reliability__: consistency across repeated measurements.

· __Validity__:

· __Construct validity__: are we measuring what we want to measure?

· __Internal validity__: causality = association + temporal order + lack of spuriousness.

· __External validity__: generalizability.

· __Reliability__ and __construct validity__ are criteria for measurement evaluations.

· __Internal__ and __external validities__ are criteria used to evaluate research designs.

· There are more kinds of validities, which mean different things to different researchers.

· More on __variable__:

· Why variable? Population heterogeneity == the fundamental truth of social science.

· Essentialism vs. Population thinking: is variation REAL?

· Data structure: row == case, column == variable

· What method to use: depends on the scale of measurement/type of the variable(s)

· What it is about: unit of analysis

· Variables in relationship: __independent variable__ (a.k.a. predictor, explanatory variable, exogenous variable) __dependent variable__ (a.k.a. response, outcome, endogenous variable)

· Univariate description:

· The choice of methods (mainly) depends on the level of measurement.

· __Frequency Table__: title, frequency count ( *f*), relative frequency ( *p*), relative cumulative frequency

· __Bar Chart__: title, axes, labels, bars, (spacing b/w bars)

· __Histogram__: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)

· __Contingency Table__: cross-classification, cell counts, marginal totals, row/column percentages.

· To be continued…

· Bivariate relationship:

· __Association__: X is associated/correlated with Y.

· __Influence__: X has an impact on Y.

· __Causality__: X causes Y.

· (Randomized Controlled) Experiment: random assignment

· Observational Study: association/influence + temporal order + no spuriousness

· Important implication: association or influence ≠ causality

· A third variable Z might complicate the observed bivariate relationship b/w X and Y:

· __Spuriousness__: the observed XY relationship is due to a common cause Z.

· Interaction/specification/ __modification__: the observed XY relationship differs toward different groups of Z.

· __Mediation__/mechanism/intervening: the observed XY relationship mediates through Z.

### Practice Problems

1. Identify the level of measurement for each of the following variables:

(A) Type of residence (i.e., dorm, off-campus apartment, condominium, or parent’s home, other)

(B) Height in inches

(C) The rating of the overall quality of a textbook on a scale from “Excellent” to “Poor”

(D) Lab section

(E) Level of measurement

2. For each of the following research questions, identify the unit of analysis, the independent variable, and the dependent variable:

(A) Are social movement participation rates higher in countries that have a longer history of democratic rule?

(B) Do school districts with more highly educated teachers tend to have higher standardized test scores among the students in the district?

3. Assume that the answers to the research questions in Problem 2 are both yes. Consider the following additional variables that go with the above questions:

(A) median household income of the country

(B) median household income of the school district

For each research question, decide whether the additional variable is most likely to be a source of spuriousness, a possible mechanism, or a possible modifier of the implied relationships in question 2. Explain each answer in one or two sentences and draw a diagram showing how each additional variable is related to the independent and dependent variables from each part of Question 2.

[Note that *mechanism* and *mediator and intervenor* are synonymous and that *interaction*, *specification*, and *modification* are synonyms.]