The arithmetic average — add all observations and divide by the count. Best estimate of the population mean μ.
When
Always the starting point. Pair with the standard error to quantify precision.
Ch.3Standard Deviation (conceptual)
s=n−1∑(Yi−Yˉ)2
How spread out data are around the mean. Dividing by n−1 corrects for bias (Bessel's correction).
When
Quantifying variability. Used in t-tests, CIs, and ANOVA.
Ch.3Standard Deviation (computational)
s=n−1∑Yi2−nYˉ2
Algebraically equivalent but avoids rounding errors when computing by hand. ΣYᵢ² = square each value then sum.
When
Preferred for hand calculations from raw data.
Ch.4Standard Error of the Mean
SEYˉ=ns
How much Ȳ varies from sample to sample. Larger n → smaller SE. Doubling precision requires quadrupling n.
When
Building confidence intervals for μ, or as the t-test denominator.
Probability Distributions
Ch.7Binomial Distribution
Pr[X=x]=(xn)px(1−p)n−x
Probability of exactly x successes in n independent trials, each with probability p. Mean = np, Variance = np(1−p).
When
Counting successes in a fixed number of binary trials. E.g. # of mutant offspring.
Ch.8Poisson Distribution
P[X=x]=x!μx⋅e−μ
Probability of x events when the average rate is μ. Key property: mean = variance = μ.
When
Counting rare, random events per unit time or space. Test fit with χ² GOF.
Ch.10Normal Distribution
f(x)=2πσ21⋅e−2σ2(x−μ)2
The bell curve, described by mean μ and variance σ². 68% within 1σ, 95% within 2σ, 99.7% within 3σ.
When
Describing symmetric continuous measurements. Many parametric tests assume normally distributed errors/residuals, or normality within groups, especially for small samples.
Ch.5Bayes' Theorem
Pr[A∣B]=Pr[B]Pr[B∣A]⋅Pr[A]
Updates probability of A given evidence B. Pr[A] = prior, Pr[B|A] = likelihood, Pr[A|B] = posterior.
When
Reversing conditional probabilities. E.g. given a positive test, actual probability of disease?
Confidence Intervals
Ch.4CI for the Mean
Yˉ±SEYˉ⋅tα(2),df
95% CI (α = 0.05): if repeated, 95% of intervals would contain true μ. df = n−1.
When
After estimating a mean. Report as: Ȳ = X (95% CI: lower, upper).
Ch.12CI for Difference in Means
(Yˉ1−Yˉ2)±SEYˉ1−Yˉ2⋅tα(2),df
Interval for the true difference μ₁ − μ₂. If it excludes 0, the difference is significant at level α.
When
After a two-sample t-test to report the plausible range of the difference.
Ch.7Agresti-Coull (Proportion CI)
p~=n+4X+2,p~±1.96n+4p~(1−p~)
Better than the Wald interval. Adding 2 phantom successes and 2 failures stabilizes the interval near 0 or 1.
When
Estimating a population proportion p. Always preferred over Wald CI in BIOL 300.
Ch.11CI for the Variance
χα/2,df2df⋅s2≤σ2≤χ1−α/2,df2df⋅s2
Uses two χ² critical values (asymmetric interval because χ² is skewed). df = n−1.
When
Estimating population variance σ² directly from a single sample.
Chi-Square & Proportions
Ch.8Chi-Square Statistic
χ2=∑Ei(Oi−Ei)2
Larger χ² = more departure from H₀. df = (categories − 1) minus parameters estimated. Assumptions: expected count > 1 in all cells; no more than 20% of cells have expected count < 5.
When
Testing fit to a theoretical distribution (GOF) or independence in a contingency table.
Ch.9Odds Ratio
OR=b⋅ca⋅d
From a 2×2 table (a=top-left, b=top-right, c=bottom-left, d=bottom-right). OR > 1: event more likely in group 1.
When
Measuring association between two binary variables in a 2×2 contingency table.
Ch.9CI for Odds Ratio
ln(OR)±Zα⋅SE[ln(OR)]
The CI is built on the log scale (where ln(OR) is approximately normal), then exponentiated. If CI excludes 1, association is significant.
When
Reporting uncertainty around an estimated odds ratio. If CI includes 1, no significant association.
t-Tests
Ch.11One-Sample t
t=s/nYˉ−μ0
Tests H₀: μ = μ₀. Large |t| means Ȳ is far from μ₀ in SE units. df = n−1.
When
Comparing a sample mean to a specific hypothesized value. One group, one mean.
Ch.12Pooled Sample Variance
sp2=df1+df2df1s12+df2s22
Weighted average of both groups' variances. Valid only when assuming σ₁² = σ₂².
When
First step in the pooled two-sample t-test.
Ch.12SE for Pooled Two-Sample t
SEYˉ1−Yˉ2=sp2(n11+n21)
Standard error of the difference in means, assuming equal variances.
When
Part of pooled two-sample t-test, after computing sp².