BIOL 300 Practice Hub/Formula Sheet
Practice →

Reference

Formula Sheet

Hover any formula for a plain-English explanation and guidance on when to use it.

Descriptive Statistics

4 formulas
Ch.3Sample Mean
Yˉ=Yin\bar{Y} = \dfrac{\sum Y_i}{n}
hover for explanation

The arithmetic average — add all observations and divide by the count. Best estimate of the population mean μ.

When

Always the starting point. Pair with the standard error to quantify precision.

Ch.3Standard Deviation (conceptual)
s=(YiYˉ)2n1s = \sqrt{\dfrac{\sum(Y_i - \bar{Y})^2}{n-1}}
hover for explanation

How spread out data are around the mean. Dividing by n−1 corrects for bias (Bessel's correction).

When

Quantifying variability. Used in t-tests, CIs, and ANOVA.

Ch.3Standard Deviation (computational)
s=Yi2nYˉ2n1s = \sqrt{\dfrac{\sum Y_i^2 - n\bar{Y}^2}{n-1}}
hover for explanation

Algebraically equivalent but avoids rounding errors when computing by hand. ΣYᵢ² = square each value then sum.

When

Preferred for hand calculations from raw data.

Ch.4Standard Error of the Mean
SEYˉ=snSE_{\bar{Y}} = \dfrac{s}{\sqrt{n}}
hover for explanation

How much Ȳ varies from sample to sample. Larger n → smaller SE. Doubling precision requires quadrupling n.

When

Building confidence intervals for μ, or as the t-test denominator.

Probability Distributions

4 formulas
Ch.7Binomial Distribution
Pr[X=x]=(nx)px(1p)nx\Pr[X=x] = \binom{n}{x}\, p^{\,x}\,(1-p)^{n-x}
hover for explanation

Probability of exactly x successes in n independent trials, each with probability p. Mean = np, Variance = np(1−p).

When

Counting successes in a fixed number of binary trials. E.g. # of mutant offspring.

Ch.8Poisson Distribution
P[X=x]=μxeμx!P[X=x] = \dfrac{\mu^x \cdot e^{-\mu}}{x!}
hover for explanation

Probability of x events when the average rate is μ. Key property: mean = variance = μ.

When

Counting rare, random events per unit time or space. Test fit with χ² GOF.

Ch.10Normal Distribution
f(x)=12πσ2e(xμ)22σ2f(x) = \dfrac{1}{\sqrt{2\pi\sigma^2}} \cdot e^{-\dfrac{(x-\mu)^2}{2\sigma^2}}
hover for explanation

The bell curve, described by mean μ and variance σ². 68% within 1σ, 95% within 2σ, 99.7% within 3σ.

When

Describing symmetric continuous measurements. Many parametric tests assume normally distributed errors/residuals, or normality within groups, especially for small samples.

Ch.5Bayes' Theorem
Pr[AB]=Pr[BA]Pr[A]Pr[B]\Pr[A|B] = \dfrac{\Pr[B|A]\cdot\Pr[A]}{\Pr[B]}
hover for explanation

Updates probability of A given evidence B. Pr[A] = prior, Pr[B|A] = likelihood, Pr[A|B] = posterior.

When

Reversing conditional probabilities. E.g. given a positive test, actual probability of disease?

Confidence Intervals

4 formulas
Ch.4CI for the Mean
Yˉ±SEYˉtα(2),df\bar{Y} \pm SE_{\bar{Y}} \cdot t_{\alpha(2),\, df}
hover for explanation

95% CI (α = 0.05): if repeated, 95% of intervals would contain true μ. df = n−1.

When

After estimating a mean. Report as: Ȳ = X (95% CI: lower, upper).

Ch.12CI for Difference in Means
(Yˉ1Yˉ2)±SEYˉ1Yˉ2tα(2),df(\bar{Y}_1 - \bar{Y}_2) \pm SE_{\bar{Y}_1-\bar{Y}_2} \cdot t_{\alpha(2),\, df}
hover for explanation

Interval for the true difference μ₁ − μ₂. If it excludes 0, the difference is significant at level α.

When

After a two-sample t-test to report the plausible range of the difference.

Ch.7Agresti-Coull (Proportion CI)
p~=X+2n+4,p~±1.96p~(1p~)n+4\tilde{p} = \dfrac{X+2}{n+4}, \quad \tilde{p} \pm 1.96\sqrt{\dfrac{\tilde{p}(1-\tilde{p})}{n+4}}
hover for explanation

Better than the Wald interval. Adding 2 phantom successes and 2 failures stabilizes the interval near 0 or 1.

When

Estimating a population proportion p. Always preferred over Wald CI in BIOL 300.

Ch.11CI for the Variance
dfs2χα/2,df2σ2dfs2χ1α/2,df2\dfrac{df \cdot s^2}{\chi^2_{\alpha/2,\, df}} \leq \sigma^2 \leq \dfrac{df \cdot s^2}{\chi^2_{1-\alpha/2,\, df}}
hover for explanation

Uses two χ² critical values (asymmetric interval because χ² is skewed). df = n−1.

When

Estimating population variance σ² directly from a single sample.

Chi-Square & Proportions

3 formulas
Ch.8Chi-Square Statistic
χ2=(OiEi)2Ei\chi^2 = \sum \dfrac{(O_i - E_i)^2}{E_i}
hover for explanation

Larger χ² = more departure from H₀. df = (categories − 1) minus parameters estimated. Assumptions: expected count > 1 in all cells; no more than 20% of cells have expected count < 5.

When

Testing fit to a theoretical distribution (GOF) or independence in a contingency table.

Ch.9Odds Ratio
OR^=adbc\widehat{OR} = \dfrac{a \cdot d}{b \cdot c}
hover for explanation

From a 2×2 table (a=top-left, b=top-right, c=bottom-left, d=bottom-right). OR > 1: event more likely in group 1.

When

Measuring association between two binary variables in a 2×2 contingency table.

Ch.9CI for Odds Ratio
ln(OR^)±ZαSE[ln(OR^)]\ln(\widehat{OR}) \pm Z_\alpha \cdot SE[\ln(\widehat{OR})]
hover for explanation

The CI is built on the log scale (where ln(OR) is approximately normal), then exponentiated. If CI excludes 1, association is significant.

When

Reporting uncertainty around an estimated odds ratio. If CI includes 1, no significant association.

t-Tests

7 formulas
Ch.11One-Sample t
t=Yˉμ0s/nt = \dfrac{\bar{Y} - \mu_0}{s / \sqrt{n}}
hover for explanation

Tests H₀: μ = μ₀. Large |t| means Ȳ is far from μ₀ in SE units. df = n−1.

When

Comparing a sample mean to a specific hypothesized value. One group, one mean.

Ch.12Pooled Sample Variance
sp2=df1s12+df2s22df1+df2s_p^2 = \dfrac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}
hover for explanation

Weighted average of both groups' variances. Valid only when assuming σ₁² = σ₂².

When

First step in the pooled two-sample t-test.

Ch.12SE for Pooled Two-Sample t
SEYˉ1Yˉ2=sp2 ⁣(1n1+1n2)SE_{\bar{Y}_1-\bar{Y}_2} = \sqrt{s_p^2\!\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}
hover for explanation

Standard error of the difference in means, assuming equal variances.

When

Part of pooled two-sample t-test, after computing sp².

Ch.12Two-Sample t (Pooled)
t=Yˉ1Yˉ2SEYˉ1Yˉ2t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{SE_{\bar{Y}_1-\bar{Y}_2}}
hover for explanation

Tests H₀: μ₁ = μ₂ assuming equal variances. df = n₁ + n₂ − 2.

When

Comparing two independent group means when σ₁² = σ₂² is plausible.

Ch.12Welch's Two-Sample t
t=Yˉ1Yˉ2s12/n1+s22/n2t = \dfrac{\bar{Y}_1 - \bar{Y}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}
hover for explanation

Does NOT assume equal variances. More robust than pooled version. Uses Welch-Satterthwaite df.

When

Default choice for two independent means — use unless told variances are equal.

Ch.12Welch-Satterthwaite df
df=(s12/n1+s22/n2)2(s12/n1)2n11+(s22/n2)2n21df = \dfrac{(s_1^2/n_1 + s_2^2/n_2)^2}{\dfrac{(s_1^2/n_1)^2}{n_1-1} + \dfrac{(s_2^2/n_2)^2}{n_2-1}}
hover for explanation

Approximate df for Welch's t-test. Always round down. Will be between min(n₁,n₂)−1 and n₁+n₂−2.

When

Needed to look up the critical t-value in Welch's t-test.

Ch.12Variance Ratio F-Test
F=s12s22F = \dfrac{s_1^2}{s_2^2}
hover for explanation

Tests H₀: σ₁² = σ₂² (equal variances). Place the larger variance in the numerator. df₁ = n₁−1, df₂ = n₂−1.

When

A test for the null hypothesis that two normal populations have the same variance. Do NOT use it to decide whether to use Welch's t-test.

Regression & Correlation

19 formulas
Ch.16Sum of Cross-Products
SPxy=(XiXˉ)(YiYˉ)=XY(X)(Y)nSP_{xy} = \sum(X_i-\bar{X})(Y_i-\bar{Y}) = \sum XY - \dfrac{(\sum X)(\sum Y)}{n}
hover for explanation

Measures how X and Y covary. Computational form avoids rounding error.

When

First step in regression and correlation. Compute alongside SS_x.

Ch.16Sum of Squares for X
SSx=(XiXˉ)2=X2(X)2nSS_x = \sum(X_i-\bar{X})^2 = \sum X^2 - \dfrac{(\sum X)^2}{n}
hover for explanation

Total variability in predictor X. Note: (ΣX)²/n is the correction factor.

When

Required for slope b, correlation r, and SE_b.

Ch.17SS Total (for Y)
SSTotal=Yi2(Yi)2nSS_{Total} = \sum Y_i^2 - \dfrac{(\sum Y_i)^2}{n}
hover for explanation

Total variability in the response variable Y. Used to partition variance into regression and residual.

When

Computing r² and MS_residual in regression.

Ch.17SS Regression
SSreg=bSPxySS_{reg} = b \cdot SP_{xy}
hover for explanation

Variability in Y explained by the linear relationship with X.

When

Computing r² = SS_regression / SS_total.

Ch.17SS Residual
SSres=SSTotalSSregSS_{res} = SS_{Total} - SS_{reg}
hover for explanation

Variability in Y NOT explained by X. Used to compute MS_residual.

When

Computing MS_residual = SS_residual / (n−2).

Ch.17Regression Slope
b=SPxySSxb = \dfrac{SP_{xy}}{SS_x}
hover for explanation

Change in predicted Y per 1-unit increase in X. Sign = direction, magnitude = rate. df = n−2.

When

The key parameter in linear regression.

Ch.17Regression Intercept
a=YˉbXˉa = \bar{Y} - b\bar{X}
hover for explanation

Predicted Y when X = 0. Often meaningless if X = 0 is outside the data range. Fitted line: Ŷ = a + bX.

When

After computing b. Needed to make predictions.

Ch.17Residual Mean Square
MSres=SSresn2MS_{res} = \dfrac{SS_{res}}{n-2}
hover for explanation

Average squared deviation from the fitted line. Smaller = better fit. df = n−2.

When

Computing SE_b and assessing model fit.

Ch.17Standard Error of Slope
SEb=MSresSSxSE_b = \sqrt{\dfrac{MS_{res}}{SS_x}}
hover for explanation

Precision of the estimated slope b.

When

Testing H₀: β = 0, or building a CI for the true slope.

Ch.17CI for Slope
b±tα[2],dfSEbb \pm t_{\alpha[2],\, df} \cdot SE_b
hover for explanation

Confidence interval for the true slope β. df = n−2. If CI excludes 0, slope is significant.

When

Reporting uncertainty in a regression slope estimate.

Ch.17t-Test for Slope
t=bβ0SEbt = \dfrac{b - \beta_0}{SE_b}
hover for explanation

Tests H₀: β = β₀ (usually β₀ = 0, i.e. no linear relationship). df = n−2.

When

Testing whether the linear slope differs from a null hypothesized value (often 0).

Ch.17CI for Predicted Value
Y^±tα[2],dfSEY^\hat{Y} \pm t_{\alpha[2],\, df} \cdot SE_{\hat{Y}}
hover for explanation

Confidence interval for the mean response at a given X. SE_Ŷ accounts for uncertainty in both a and b.

When

Estimating the mean Y at a specific X value.

Ch.17Coefficient of Determination (r²)
r2=SSregSSTotalr^2 = \dfrac{SS_{reg}}{SS_{Total}}
hover for explanation

Proportion of variance in Y explained by X. r² = 0.80 means 80% of variability is accounted for.

When

Reporting goodness of fit. Ranges 0 to 1.

Ch.16Pearson Correlation
r=SPxySSxSSyr = \dfrac{SP_{xy}}{\sqrt{SS_x \cdot SS_y}}
hover for explanation

Standardized linear association. Ranges −1 to +1. Measures only linear relationships.

When

Measuring strength/direction of linear association between two continuous variables.

Ch.16Standard Error of r
SEr=1r2n2SE_r = \sqrt{\dfrac{1 - r^2}{n-2}}
hover for explanation

Use for testing H₀: ρ = 0 via t = r / SE_r, df = n−2. Not for CIs — use Fisher's z instead.

When

Hypothesis testing for correlation only.

Ch.16Fisher's z-Transform
z=12ln ⁣1+r1r,σz=1n3z = \tfrac{1}{2}\ln\!\dfrac{1+r}{1-r}, \quad \sigma_z = \dfrac{1}{\sqrt{n-3}}
hover for explanation

Transforms r to approximately normal z. Build CI on z-scale, back-transform: r = (e²ᶻ−1)/(e²ᶻ+1).

When

Building confidence intervals for the Pearson correlation r.

Ch.13Spearman Rank Correlation
rs=16di2n3nr_s = 1 - \dfrac{6\sum d_i^2}{n^3 - n}
hover for explanation

Non-parametric correlation using ranks. dᵢ = rank(Xᵢ) − rank(Yᵢ). Measures monotonic association.

When

Data violate normality or are ordinal. Identify-only in BIOL 300.

Ch.17Pooled MS_error (Comparing Regressions)
(MSe)p=(SSe)1+(SSe)2(dfe)1+(dfe)2(MS_e)_p = \dfrac{(SS_e)_1 + (SS_e)_2}{(df_e)_1 + (df_e)_2}
hover for explanation

Combines residual variance from two separate regressions, assuming they share the same σ². Used when comparing slopes.

When

Testing whether two regression lines have the same slope (H₀: β₁ = β₂).

Ch.17SE for Difference in Slopes
SEb1b2=(MSe)pSSx1+(MSe)pSSx2SE_{b_1-b_2} = \sqrt{\dfrac{(MS_e)_p}{SS_{x_1}} + \dfrac{(MS_e)_p}{SS_{x_2}}}
hover for explanation

Standard error of the difference between two regression slopes. Uses pooled residual MS.

When

Building a t-test or CI for the difference between two slopes: t = (b₁−b₂) / SE_{b₁−b₂}.

ANOVA

6 formulas
Ch.15Grand Mean
Yˉ=niYˉiN\bar{Y} = \dfrac{\sum n_i \bar{Y}_i}{N}
hover for explanation

Overall mean weighted by group size. N = total sample size.

When

Computing MS_groups and R² in ANOVA.

Ch.15MS Between Groups
MSgroups=ni(YˉiYˉ)2k1MS_{groups} = \dfrac{\sum n_i(\bar{Y}_i - \bar{Y})^2}{k-1}
hover for explanation

Between-group variability weighted by n. k = number of groups, df = k−1. Large = groups spread far apart.

When

Numerator of F. Reflects signal.

Ch.15MS Within Groups (Error)
MSerror=si2(ni1)NkMS_{error} = \dfrac{\sum s_i^2(n_i-1)}{N-k}
hover for explanation

Pooled within-group variance. Also called MS_within or s²_pooled. df = N−k. Assumes equal variances.

When

Denominator of F and Tukey-Kramer SE. Reflects noise.

Ch.15F-Statistic
F=MSgroupsMSerrorF = \dfrac{MS_{groups}}{MS_{error}}
hover for explanation

Signal-to-noise ratio. Under H₀, F ≈ 1. df₁ = k−1, df₂ = N−k.

When

Testing H₀: all group means equal. Use for 3+ groups.

Ch.15ANOVA R²
R2=SSgroupsSStotalR^2 = \dfrac{SS_{groups}}{SS_{total}}
hover for explanation

Proportion of variance explained by group membership (η²). Effect size measure.

When

After significant F-test. Report as effect size.

Ch.15Tukey-Kramer q
q=YˉiYˉjSE,SE=spooled2 ⁣(1ni+1nj)q = \dfrac{\bar{Y}_i - \bar{Y}_j}{SE}, \quad SE = \sqrt{s^2_{pooled}\!\left(\dfrac{1}{n_i}+\dfrac{1}{n_j}\right)}
hover for explanation

Post-hoc pairwise comparison controlling family-wise error. s²_pooled = MS_error. Compare to Table F critical value.

When

After significant ANOVA, to find which specific pairs differ.

BIOL 300 · UBC