Reference

Formula Sheet

Hover any formula for a plain-English explanation and guidance on when to use it.

Descriptive Statistics

4 formulas

Ch.3Sample Mean

\bar{Y} = \dfrac{\sum Y_i}{n}

hover for explanation

The arithmetic average — add all observations and divide by the count. Best estimate of the population mean μ.

When

Always the starting point. Pair with the standard error to quantify precision.

Ch.3Standard Deviation (conceptual)

s = \sqrt{\dfrac{\sum(Y_i - \bar{Y})^2}{n-1}}

hover for explanation

How spread out data are around the mean. Dividing by n−1 corrects for bias (Bessel's correction).

When

Quantifying variability. Used in t-tests, CIs, and ANOVA.

Ch.3Standard Deviation (computational)

s = \sqrt{\dfrac{\sum Y_i^2 - n\bar{Y}^2}{n-1}}

hover for explanation

Algebraically equivalent but avoids rounding errors when computing by hand. ΣYᵢ² = square each value then sum.

When

Preferred for hand calculations from raw data.

Ch.4Standard Error of the Mean

SE_{\bar{Y}} = \dfrac{s}{\sqrt{n}}

hover for explanation

How much Ȳ varies from sample to sample. Larger n → smaller SE. Doubling precision requires quadrupling n.

When

Building confidence intervals for μ, or as the t-test denominator.

Probability Distributions

4 formulas

Ch.7Binomial Distribution

\Pr[X=x] = \binom{n}{x}\, p^{\,x}\,(1-p)^{n-x}

hover for explanation

Probability of exactly x successes in n independent trials, each with probability p. Mean = np, Variance = np(1−p).

When

Counting successes in a fixed number of binary trials. E.g. # of mutant offspring.

Ch.8Poisson Distribution

P[X=x] = \dfrac{\mu^x \cdot e^{-\mu}}{x!}

hover for explanation

Probability of x events when the average rate is μ. Key property: mean = variance = μ.

When

Counting rare, random events per unit time or space. Test fit with χ² GOF.

Ch.10Normal Distribution

f(x) = \dfrac{1}{\sqrt{2\pi\sigma^2}} \cdot e^{-\dfrac{(x-\mu)^2}{2\sigma^2}}

hover for explanation

The bell curve, described by mean μ and variance σ². 68% within 1σ, 95% within 2σ, 99.7% within 3σ.

When

Describing symmetric continuous measurements. Many parametric tests assume normally distributed errors/residuals, or normality within groups, especially for small samples.

Ch.5Bayes' Theorem

\Pr[A|B] = \dfrac{\Pr[B|A]\cdot\Pr[A]}{\Pr[B]}

hover for explanation

Updates probability of A given evidence B. Pr[A] = prior, Pr[B|A] = likelihood, Pr[A|B] = posterior.

When

Reversing conditional probabilities. E.g. given a positive test, actual probability of disease?

Confidence Intervals

4 formulas

Ch.4CI for the Mean

\bar{Y} \pm SE_{\bar{Y}} \cdot t_{\alpha(2),\, df}

hover for explanation

95% CI (α = 0.05): if repeated, 95% of intervals would contain true μ. df = n−1.

When

After estimating a mean. Report as: Ȳ = X (95% CI: lower, upper).

Ch.12CI for Difference in Means

(\bar{Y}_1 - \bar{Y}_2) \pm SE_{\bar{Y}_1-\bar{Y}_2} \cdot t_{\alpha(2),\, df}

hover for explanation

Interval for the true difference μ₁ − μ₂. If it excludes 0, the difference is significant at level α.

When

After a two-sample t-test to report the plausible range of the difference.

Ch.7Agresti-Coull (Proportion CI)

\tilde{p} = \dfrac{X+2}{n+4}, \quad \tilde{p} \pm 1.96\sqrt{\dfrac{\tilde{p}(1-\tilde{p})}{n+4}}

hover for explanation

Better than the Wald interval. Adding 2 phantom successes and 2 failures stabilizes the interval near 0 or 1.

When

Estimating a population proportion p. Always preferred over Wald CI in BIOL 300.

Ch.11CI for the Variance

\dfrac{df \cdot s^2}{\chi^2_{\alpha/2,\, df}} \leq \sigma^2 \leq \dfrac{df \cdot s^2}{\chi^2_{1-\alpha/2,\, df}}

hover for explanation

Uses two χ² critical values (asymmetric interval because χ² is skewed). df = n−1.

When

Estimating population variance σ² directly from a single sample.

Chi-Square & Proportions

3 formulas

Ch.8Chi-Square Statistic

\chi^2 = \sum \dfrac{(O_i - E_i)^2}{E_i}

hover for explanation

Larger χ² = more departure from H₀. df = (categories − 1) minus parameters estimated. Assumptions: expected count > 1 in all cells; no more than 20% of cells have expected count < 5.

When

Testing fit to a theoretical distribution (GOF) or independence in a contingency table.

Ch.9Odds Ratio

\widehat{OR} = \dfrac{a \cdot d}{b \cdot c}

hover for explanation

From a 2×2 table (a=top-left, b=top-right, c=bottom-left, d=bottom-right). OR > 1: event more likely in group 1.

When

Measuring association between two binary variables in a 2×2 contingency table.

Ch.9CI for Odds Ratio

\ln(\widehat{OR}) \pm Z_\alpha \cdot SE[\ln(\widehat{OR})]

hover for explanation

The CI is built on the log scale (where ln(OR) is approximately normal), then exponentiated. If CI excludes 1, association is significant.

When

Reporting uncertainty around an estimated odds ratio. If CI includes 1, no significant association.

t-Tests

7 formulas

Ch.11One-Sample t

t = \dfrac{\bar{Y} - \mu_0}{s / \sqrt{n}}

hover for explanation

Tests H₀: μ = μ₀. Large |t| means Ȳ is far from μ₀ in SE units. df = n−1.

When

Comparing a sample mean to a specific hypothesized value. One group, one mean.

Ch.12Pooled Sample Variance

s_p^2 = \dfrac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}

hover for explanation

Weighted average of both groups' variances. Valid only when assuming σ₁² = σ₂².

When

First step in the pooled two-sample t-test.

Ch.12SE for Pooled Two-Sample t

SE_{\bar{Y}_1-\bar{Y}_2} = \sqrt{s_p^2\!\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}

hover for explanation

Standard error of the difference in means, assuming equal variances.

When

Part of pooled two-sample t-test, after computing sp².

Ch.12Two-Sample t (Pooled)

t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{SE_{\bar{Y}_1-\bar{Y}_2}}

hover for explanation

Tests H₀: μ₁ = μ₂ assuming equal variances. df = n₁ + n₂ − 2.

When

Comparing two independent group means when σ₁² = σ₂² is plausible.

Ch.12Welch's Two-Sample t

t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}

hover for explanation

Does NOT assume equal variances. More robust than pooled version. Uses Welch-Satterthwaite df.

When

Default choice for two independent means — use unless told variances are equal.

Ch.12Welch-Satterthwaite df

df = \dfrac{(s_1^2/n_1 + s_2^2/n_2)^2}{\dfrac{(s_1^2/n_1)^2}{n_1-1} + \dfrac{(s_2^2/n_2)^2}{n_2-1}}

hover for explanation

Approximate df for Welch's t-test. Always round down. Will be between min(n₁,n₂)−1 and n₁+n₂−2.

When

Needed to look up the critical t-value in Welch's t-test.

Ch.12Variance Ratio F-Test

F = \dfrac{s_1^2}{s_2^2}

hover for explanation

Tests H₀: σ₁² = σ₂² (equal variances). Place the larger variance in the numerator. df₁ = n₁−1, df₂ = n₂−1.

When

A test for the null hypothesis that two normal populations have the same variance. Do NOT use it to decide whether to use Welch's t-test.

Regression & Correlation

19 formulas

Ch.16Sum of Cross-Products

SP_{xy} = \sum(X_i-\bar{X})(Y_i-\bar{Y}) = \sum XY - \dfrac{(\sum X)(\sum Y)}{n}

hover for explanation

Measures how X and Y covary. Computational form avoids rounding error.

When

First step in regression and correlation. Compute alongside SS_x.

Ch.16Sum of Squares for X

SS_x = \sum(X_i-\bar{X})^2 = \sum X^2 - \dfrac{(\sum X)^2}{n}

hover for explanation

Total variability in predictor X. Note: (ΣX)²/n is the correction factor.

When

Required for slope b, correlation r, and SE_b.

Ch.17SS Total (for Y)

SS_{Total} = \sum Y_i^2 - \dfrac{(\sum Y_i)^2}{n}

hover for explanation

Total variability in the response variable Y. Used to partition variance into regression and residual.

When

Computing r² and MS_residual in regression.

Ch.17SS Regression

SS_{reg} = b \cdot SP_{xy}

hover for explanation

Variability in Y explained by the linear relationship with X.

When

Computing r² = SS_regression / SS_total.

Ch.17SS Residual

SS_{res} = SS_{Total} - SS_{reg}

hover for explanation

Variability in Y NOT explained by X. Used to compute MS_residual.

When

Computing MS_residual = SS_residual / (n−2).

Ch.17Regression Slope

b = \dfrac{SP_{xy}}{SS_x}

hover for explanation

Change in predicted Y per 1-unit increase in X. Sign = direction, magnitude = rate. df = n−2.

When

The key parameter in linear regression.

Ch.17Regression Intercept

a = \bar{Y} - b\bar{X}

hover for explanation

Predicted Y when X = 0. Often meaningless if X = 0 is outside the data range. Fitted line: Ŷ = a + bX.

When

After computing b. Needed to make predictions.

Ch.17Residual Mean Square

MS_{res} = \dfrac{SS_{res}}{n-2}

hover for explanation

Average squared deviation from the fitted line. Smaller = better fit. df = n−2.

When

Computing SE_b and assessing model fit.

Ch.17Standard Error of Slope

SE_b = \sqrt{\dfrac{MS_{res}}{SS_x}}

hover for explanation

Precision of the estimated slope b.

When

Testing H₀: β = 0, or building a CI for the true slope.

Ch.17CI for Slope

b \pm t_{\alpha[2],\, df} \cdot SE_b

hover for explanation

Confidence interval for the true slope β. df = n−2. If CI excludes 0, slope is significant.

When

Reporting uncertainty in a regression slope estimate.

Ch.17t-Test for Slope

t = \dfrac{b - \beta_0}{SE_b}

hover for explanation

Tests H₀: β = β₀ (usually β₀ = 0, i.e. no linear relationship). df = n−2.

When

Testing whether the linear slope differs from a null hypothesized value (often 0).

Ch.17CI for Predicted Value

\hat{Y} \pm t_{\alpha[2],\, df} \cdot SE_{\hat{Y}}

hover for explanation

Confidence interval for the mean response at a given X. SE_Ŷ accounts for uncertainty in both a and b.

When

Estimating the mean Y at a specific X value.

Ch.17Coefficient of Determination (r²)

r^2 = \dfrac{SS_{reg}}{SS_{Total}}

hover for explanation

Proportion of variance in Y explained by X. r² = 0.80 means 80% of variability is accounted for.

When

Reporting goodness of fit. Ranges 0 to 1.

Ch.16Pearson Correlation

r = \dfrac{SP_{xy}}{\sqrt{SS_x \cdot SS_y}}

hover for explanation

Standardized linear association. Ranges −1 to +1. Measures only linear relationships.

When

Measuring strength/direction of linear association between two continuous variables.

Ch.16Standard Error of r

SE_r = \sqrt{\dfrac{1 - r^2}{n-2}}

hover for explanation

Use for testing H₀: ρ = 0 via t = r / SE_r, df = n−2. Not for CIs — use Fisher's z instead.

When

Hypothesis testing for correlation only.

Ch.16Fisher's z-Transform

z = \tfrac{1}{2}\ln\!\dfrac{1+r}{1-r}, \quad \sigma_z = \dfrac{1}{\sqrt{n-3}}

hover for explanation

Transforms r to approximately normal z. Build CI on z-scale, back-transform: r = (e²ᶻ−1)/(e²ᶻ+1).

When

Building confidence intervals for the Pearson correlation r.

Ch.13Spearman Rank Correlation

r_s = 1 - \dfrac{6\sum d_i^2}{n^3 - n}

hover for explanation

Non-parametric correlation using ranks. dᵢ = rank(Xᵢ) − rank(Yᵢ). Measures monotonic association.

When

Data violate normality or are ordinal. Identify-only in BIOL 300.

Ch.17Pooled MS_error (Comparing Regressions)

(MS_e)_p = \dfrac{(SS_e)_1 + (SS_e)_2}{(df_e)_1 + (df_e)_2}

hover for explanation

Combines residual variance from two separate regressions, assuming they share the same σ². Used when comparing slopes.

When

Testing whether two regression lines have the same slope (H₀: β₁ = β₂).

Ch.17SE for Difference in Slopes

SE_{b_1-b_2} = \sqrt{\dfrac{(MS_e)_p}{SS_{x_1}} + \dfrac{(MS_e)_p}{SS_{x_2}}}

hover for explanation

Standard error of the difference between two regression slopes. Uses pooled residual MS.

When

Building a t-test or CI for the difference between two slopes: t = (b₁−b₂) / SE_{b₁−b₂}.

ANOVA

6 formulas

Ch.15Grand Mean

\bar{Y} = \dfrac{\sum n_i \bar{Y}_i}{N}

hover for explanation

Overall mean weighted by group size. N = total sample size.

When

Computing MS_groups and R² in ANOVA.

Ch.15MS Between Groups

MS_{groups} = \dfrac{\sum n_i(\bar{Y}_i - \bar{Y})^2}{k-1}

hover for explanation

Between-group variability weighted by n. k = number of groups, df = k−1. Large = groups spread far apart.

When

Numerator of F. Reflects signal.

Ch.15MS Within Groups (Error)

MS_{error} = \dfrac{\sum s_i^2(n_i-1)}{N-k}

hover for explanation

Pooled within-group variance. Also called MS_within or s²_pooled. df = N−k. Assumes equal variances.

When

Denominator of F and Tukey-Kramer SE. Reflects noise.

Ch.15F-Statistic

F = \dfrac{MS_{groups}}{MS_{error}}

hover for explanation

Signal-to-noise ratio. Under H₀, F ≈ 1. df₁ = k−1, df₂ = N−k.

When

Testing H₀: all group means equal. Use for 3+ groups.

Ch.15ANOVA R²

R^2 = \dfrac{SS_{groups}}{SS_{total}}

hover for explanation

Proportion of variance explained by group membership (η²). Effect size measure.

When

After significant F-test. Report as effect size.

Ch.15Tukey-Kramer q

q = \dfrac{\bar{Y}_i - \bar{Y}_j}{SE}, \quad SE = \sqrt{s^2_{pooled}\!\left(\dfrac{1}{n_i}+\dfrac{1}{n_j}\right)}

hover for explanation

Post-hoc pairwise comparison controlling family-wise error. s²_pooled = MS_error. Compare to Table F critical value.

When

After significant ANOVA, to find which specific pairs differ.

BIOL 300 · UBC