BIOL 300 Practice Hub/Hypothesis Test Generator

Formula Sheet

Hover any formula for an explanation.

Descriptive Statistics

Ch.3Sample Mean

\bar{Y} = \dfrac{\sum Y_i}{n}

The arithmetic average — add all observations and divide by the count. Best estimate of the population mean μ.

When

Always the starting point. Pair with the standard error to quantify precision.

Ch.3Standard Deviation (conceptual)

s = \sqrt{\dfrac{\sum(Y_i - \bar{Y})^2}{n-1}}

How spread out data are around the mean. Dividing by n−1 corrects for bias (Bessel's correction).

When

Quantifying variability. Used in t-tests, CIs, and ANOVA.

Ch.3Standard Deviation (computational)

s = \sqrt{\dfrac{\sum Y_i^2 - n\bar{Y}^2}{n-1}}

Algebraically equivalent but avoids rounding errors when computing by hand. ΣYᵢ² = square each value then sum.

When

Preferred for hand calculations from raw data.

Ch.4Standard Error of the Mean

SE_{\bar{Y}} = \dfrac{s}{\sqrt{n}}

How much Ȳ varies from sample to sample. Larger n → smaller SE. Doubling precision requires quadrupling n.

When

Building confidence intervals for μ, or as the t-test denominator.

Probability Distributions

Ch.7Binomial Distribution

\Pr[X=x] = \binom{n}{x}\, p^{\,x}\,(1-p)^{n-x}

Probability of exactly x successes in n independent trials, each with probability p. Mean = np, Variance = np(1−p).

When

Counting successes in a fixed number of binary trials. E.g. # of mutant offspring.

Ch.8Poisson Distribution

P[X=x] = \dfrac{\mu^x \cdot e^{-\mu}}{x!}

Probability of x events when the average rate is μ. Key property: mean = variance = μ.

When

Counting rare, random events per unit time or space. Test fit with χ² GOF.

Ch.10Normal Distribution

f(x) = \dfrac{1}{\sqrt{2\pi\sigma^2}} \cdot e^{-\dfrac{(x-\mu)^2}{2\sigma^2}}

The bell curve, described by mean μ and variance σ². 68% within 1σ, 95% within 2σ, 99.7% within 3σ.

When

Describing symmetric continuous measurements. Many parametric tests assume normally distributed errors/residuals, or normality within groups, especially for small samples.

Ch.5Bayes' Theorem

\Pr[A|B] = \dfrac{\Pr[B|A]\cdot\Pr[A]}{\Pr[B]}

Updates probability of A given evidence B. Pr[A] = prior, Pr[B|A] = likelihood, Pr[A|B] = posterior.

When

Reversing conditional probabilities. E.g. given a positive test, actual probability of disease?

Confidence Intervals

Ch.4CI for the Mean

\bar{Y} \pm SE_{\bar{Y}} \cdot t_{\alpha(2),\, df}

95% CI (α = 0.05): if repeated, 95% of intervals would contain true μ. df = n−1.

When

After estimating a mean. Report as: Ȳ = X (95% CI: lower, upper).

Ch.12CI for Difference in Means

(\bar{Y}_1 - \bar{Y}_2) \pm SE_{\bar{Y}_1-\bar{Y}_2} \cdot t_{\alpha(2),\, df}

Interval for the true difference μ₁ − μ₂. If it excludes 0, the difference is significant at level α.

When

After a two-sample t-test to report the plausible range of the difference.

Ch.7Agresti-Coull (Proportion CI)

\tilde{p} = \dfrac{X+2}{n+4}, \quad \tilde{p} \pm 1.96\sqrt{\dfrac{\tilde{p}(1-\tilde{p})}{n+4}}

Better than the Wald interval. Adding 2 phantom successes and 2 failures stabilizes the interval near 0 or 1.

When

Estimating a population proportion p. Always preferred over Wald CI in BIOL 300.

Ch.11CI for the Variance

\dfrac{df \cdot s^2}{\chi^2_{\alpha/2,\, df}} \leq \sigma^2 \leq \dfrac{df \cdot s^2}{\chi^2_{1-\alpha/2,\, df}}

Uses two χ² critical values (asymmetric interval because χ² is skewed). df = n−1.

When

Estimating population variance σ² directly from a single sample.

Chi-Square & Proportions

Ch.8Chi-Square Statistic

\chi^2 = \sum \dfrac{(O_i - E_i)^2}{E_i}

Larger χ² = more departure from H₀. df = (categories − 1) minus parameters estimated. Assumptions: expected count > 1 in all cells; no more than 20% of cells have expected count < 5.

When

Testing fit to a theoretical distribution (GOF) or independence in a contingency table.

Ch.9Odds Ratio

\widehat{OR} = \dfrac{a \cdot d}{b \cdot c}

From a 2×2 table (a=top-left, b=top-right, c=bottom-left, d=bottom-right). OR > 1: event more likely in group 1.

When

Measuring association between two binary variables in a 2×2 contingency table.

Ch.9CI for Odds Ratio

\ln(\widehat{OR}) \pm Z_\alpha \cdot SE[\ln(\widehat{OR})]

The CI is built on the log scale (where ln(OR) is approximately normal), then exponentiated. If CI excludes 1, association is significant.

When

Reporting uncertainty around an estimated odds ratio. If CI includes 1, no significant association.

t-Tests

Ch.11One-Sample t

t = \dfrac{\bar{Y} - \mu_0}{s / \sqrt{n}}

Tests H₀: μ = μ₀. Large |t| means Ȳ is far from μ₀ in SE units. df = n−1.

When

Comparing a sample mean to a specific hypothesized value. One group, one mean.

Ch.12Pooled Sample Variance

s_p^2 = \dfrac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}

Weighted average of both groups' variances. Valid only when assuming σ₁² = σ₂².

When

First step in the pooled two-sample t-test.

Ch.12SE for Pooled Two-Sample t

SE_{\bar{Y}_1-\bar{Y}_2} = \sqrt{s_p^2\!\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}

Standard error of the difference in means, assuming equal variances.

When

Part of pooled two-sample t-test, after computing sp².

Ch.12Two-Sample t (Pooled)

t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{SE_{\bar{Y}_1-\bar{Y}_2}}

Tests H₀: μ₁ = μ₂ assuming equal variances. df = n₁ + n₂ − 2.

When

Comparing two independent group means when σ₁² = σ₂² is plausible.

Ch.12Welch's Two-Sample t

t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}

Does NOT assume equal variances. More robust than pooled version. Uses Welch-Satterthwaite df.

When

Default choice for two independent means — use unless told variances are equal.

Ch.12Welch-Satterthwaite df

df = \dfrac{(s_1^2/n_1 + s_2^2/n_2)^2}{\dfrac{(s_1^2/n_1)^2}{n_1-1} + \dfrac{(s_2^2/n_2)^2}{n_2-1}}

Approximate df for Welch's t-test. Always round down. Will be between min(n₁,n₂)−1 and n₁+n₂−2.

When

Needed to look up the critical t-value in Welch's t-test.

Ch.12Variance Ratio F-Test

F = \dfrac{s_1^2}{s_2^2}

Tests H₀: σ₁² = σ₂² (equal variances). Place the larger variance in the numerator. df₁ = n₁−1, df₂ = n₂−1.

When

A test for the null hypothesis that two normal populations have the same variance. Do NOT use it to decide whether to use Welch's t-test.

Regression & Correlation

Ch.16Sum of Cross-Products

SP_{xy} = \sum(X_i-\bar{X})(Y_i-\bar{Y}) = \sum XY - \dfrac{(\sum X)(\sum Y)}{n}

Measures how X and Y covary. Computational form avoids rounding error.

When

First step in regression and correlation. Compute alongside SS_x.

Ch.16Sum of Squares for X

SS_x = \sum(X_i-\bar{X})^2 = \sum X^2 - \dfrac{(\sum X)^2}{n}

Total variability in predictor X. Note: (ΣX)²/n is the correction factor.

When

Required for slope b, correlation r, and SE_b.

Ch.17SS Total (for Y)

SS_{Total} = \sum Y_i^2 - \dfrac{(\sum Y_i)^2}{n}

Total variability in the response variable Y. Used to partition variance into regression and residual.

When

Computing r² and MS_residual in regression.

Ch.17SS Regression

SS_{reg} = b \cdot SP_{xy}

Variability in Y explained by the linear relationship with X.

When

Computing r² = SS_regression / SS_total.

Ch.17SS Residual

SS_{res} = SS_{Total} - SS_{reg}

Variability in Y NOT explained by X. Used to compute MS_residual.

When

Computing MS_residual = SS_residual / (n−2).

Ch.17Regression Slope

b = \dfrac{SP_{xy}}{SS_x}

Change in predicted Y per 1-unit increase in X. Sign = direction, magnitude = rate. df = n−2.

When

The key parameter in linear regression.

Ch.17Regression Intercept

a = \bar{Y} - b\bar{X}

Predicted Y when X = 0. Often meaningless if X = 0 is outside the data range. Fitted line: Ŷ = a + bX.

When

After computing b. Needed to make predictions.

Ch.17Residual Mean Square

MS_{res} = \dfrac{SS_{res}}{n-2}

Average squared deviation from the fitted line. Smaller = better fit. df = n−2.

When

Computing SE_b and assessing model fit.

Ch.17Standard Error of Slope

SE_b = \sqrt{\dfrac{MS_{res}}{SS_x}}

Precision of the estimated slope b.

When

Testing H₀: β = 0, or building a CI for the true slope.

Ch.17CI for Slope

b \pm t_{\alpha[2],\, df} \cdot SE_b

Confidence interval for the true slope β. df = n−2. If CI excludes 0, slope is significant.

When

Reporting uncertainty in a regression slope estimate.

Ch.17t-Test for Slope

t = \dfrac{b - \beta_0}{SE_b}

Tests H₀: β = β₀ (usually β₀ = 0, i.e. no linear relationship). df = n−2.

When

Testing whether the linear slope differs from a null hypothesized value (often 0).

Ch.17CI for Predicted Value

\hat{Y} \pm t_{\alpha[2],\, df} \cdot SE_{\hat{Y}}

Confidence interval for the mean response at a given X. SE_Ŷ accounts for uncertainty in both a and b.

When

Estimating the mean Y at a specific X value.

Ch.17Coefficient of Determination (r²)

r^2 = \dfrac{SS_{reg}}{SS_{Total}}

Proportion of variance in Y explained by X. r² = 0.80 means 80% of variability is accounted for.

When

Reporting goodness of fit. Ranges 0 to 1.

Ch.16Pearson Correlation

r = \dfrac{SP_{xy}}{\sqrt{SS_x \cdot SS_y}}

Standardized linear association. Ranges −1 to +1. Measures only linear relationships.

When

Measuring strength/direction of linear association between two continuous variables.

Ch.16Standard Error of r

SE_r = \sqrt{\dfrac{1 - r^2}{n-2}}

Use for testing H₀: ρ = 0 via t = r / SE_r, df = n−2. Not for CIs — use Fisher's z instead.

When

Hypothesis testing for correlation only.

Ch.16Fisher's z-Transform

z = \tfrac{1}{2}\ln\!\dfrac{1+r}{1-r}, \quad \sigma_z = \dfrac{1}{\sqrt{n-3}}

Transforms r to approximately normal z. Build CI on z-scale, back-transform: r = (e²ᶻ−1)/(e²ᶻ+1).

When

Building confidence intervals for the Pearson correlation r.

Ch.13Spearman Rank Correlation

r_s = 1 - \dfrac{6\sum d_i^2}{n^3 - n}

Non-parametric correlation using ranks. dᵢ = rank(Xᵢ) − rank(Yᵢ). Measures monotonic association.

When

Data violate normality or are ordinal. Identify-only in BIOL 300.

Ch.17Pooled MS_error (Comparing Regressions)

(MS_e)_p = \dfrac{(SS_e)_1 + (SS_e)_2}{(df_e)_1 + (df_e)_2}

Combines residual variance from two separate regressions, assuming they share the same σ². Used when comparing slopes.

When

Testing whether two regression lines have the same slope (H₀: β₁ = β₂).

Ch.17SE for Difference in Slopes

SE_{b_1-b_2} = \sqrt{\dfrac{(MS_e)_p}{SS_{x_1}} + \dfrac{(MS_e)_p}{SS_{x_2}}}

Standard error of the difference between two regression slopes. Uses pooled residual MS.

When

Building a t-test or CI for the difference between two slopes: t = (b₁−b₂) / SE_{b₁−b₂}.

ANOVA

Ch.15Grand Mean

\bar{Y} = \dfrac{\sum n_i \bar{Y}_i}{N}

Overall mean weighted by group size. N = total sample size.

When

Computing MS_groups and R² in ANOVA.

Ch.15MS Between Groups

MS_{groups} = \dfrac{\sum n_i(\bar{Y}_i - \bar{Y})^2}{k-1}

Between-group variability weighted by n. k = number of groups, df = k−1. Large = groups spread far apart.

When

Numerator of F. Reflects signal.

Ch.15MS Within Groups (Error)

MS_{error} = \dfrac{\sum s_i^2(n_i-1)}{N-k}

Pooled within-group variance. Also called MS_within or s²_pooled. df = N−k. Assumes equal variances.

When

Denominator of F and Tukey-Kramer SE. Reflects noise.

Ch.15F-Statistic

F = \dfrac{MS_{groups}}{MS_{error}}

Signal-to-noise ratio. Under H₀, F ≈ 1. df₁ = k−1, df₂ = N−k.

When

Testing H₀: all group means equal. Use for 3+ groups.

Ch.15ANOVA R²

R^2 = \dfrac{SS_{groups}}{SS_{total}}

Proportion of variance explained by group membership (η²). Effect size measure.

When

After significant F-test. Report as effect size.

Ch.15Tukey-Kramer q

q = \dfrac{\bar{Y}_i - \bar{Y}_j}{SE}, \quad SE = \sqrt{s^2_{pooled}\!\left(\dfrac{1}{n_i}+\dfrac{1}{n_j}\right)}

Post-hoc pairwise comparison controlling family-wise error. s²_pooled = MS_error. Compare to Table F critical value.

When

After significant ANOVA, to find which specific pairs differ.

Open full formula sheet →

Practice Questions

Select your chapter, pick the tests you want to practise, then generate a worked question.

I have covered up to

Test types

FULL CALCULATION

IDENTIFY ONLY

Select your tests above and generate a question.

Quick Reference

Tests you must calculate

Ch.495% Confidence Interval for Mean
Ch.7Binomial Test
Ch.8Chi-Square Goodness-of-Fit Test
Ch.8Poisson Goodness-of-Fit Test
Ch.9Chi-Square Contingency Test
Ch.11One-Sample t-Test
Ch.12Two-Sample t-Test (Pooled Variance)
Ch.12Paired t-Test
Ch.15One-Way ANOVA
Ch.16Pearson Correlation Coefficient
Ch.17Linear Regression (Slope t-Test)

Identify-only tests

Ch.13Mann-Whitney U Test
Ch.15Kruskal-Wallis Test
Ch.16Spearman Rank Correlation
Ch.9Fisher's Exact Test
Ch.18ANCOVA (Analysis of Covariance)
Ch.18Multifactor ANOVA

Key formulas

t = (ȳ − μ₀) / (s / √n)
χ² = Σ (O−E)² / E
F = MS_groups / MS_error
r = SP_xy / √(SS_x · SS_y)
b = SP_xy / SS_x
SE_b = √(MS_resid / SS_x)