Hypothesis Testing Explained: A Complete Step-by-Step Stats Guide

By Pankaj Kumar|Last Updated: July 12, 2026|

Introduction
The 5-Step Hypothesis Testing Framework
Worked Example 1: One-Sample T-Test (Engineering Context)
Worked Example 2: Two-Sample T-Test (A/B Testing Context)
Worked Example 3: Chi-Square Test (Categorical Data)
Common Mistakes & How to Avoid Them
Software Walkthroughs
Practice Problems
Key Takeaways

Key Takeaways

All hypothesis tests follow the same five-step framework, regardless of test type.
The p-value is the probability of observing your data if H₀ were true — not the probability H₀ is true.
Choose your test based on data type: means use t-tests; categorical counts use chi-square.
State hypotheses before seeing data — setting them after invalidates the test.
Rejecting H₀ provides evidence for H₁; it does not prove H₁ is true.

Hypothesis testing intimidates most students because it feels abstract. You’re asked to “test” something, but the logic remains fuzzy. The truth: hypothesis testing is a structured decision-making process that becomes intuitive once you follow the five-step framework. This guide walks you through each step with real worked examples, software implementations, and common mistakes to avoid.

Students working through statistics problems often find that a solid grasp of discrete mathematics tutor concepts — particularly logic and set theory — builds the foundation that makes hypothesis testing click.

The 5-Step Hypothesis Testing Framework

All hypothesis tests follow the same logical structure, regardless of test type. Master this framework and you can apply it to any test.

Step 1: State the Hypotheses (H₀ and H₁)

Every hypothesis test begins with two competing claims about the population parameter.

Null Hypothesis (H₀): The default assumption—usually “no effect” or “no difference.”

Example: μ = 100 (the population mean equals 100)
Example: p₁ = p₂ (the two populations have equal proportions)
Always contains “=” (equality)

Alternative Hypothesis (H₁): Your research claim—what you want to prove.

Two-tailed: H₁: μ ≠ 100 (different, either direction)
One-tailed (left): H₁: μ < 100 (less than)
One-tailed (right): H₁: μ > 100 (greater than)

Key Decision: Should you state your claim as H₀ or H₁?

Best practice: State your claim as H₁ (the alternative). Here’s why: If evidence supports H₁, you have a stronger result (“We found evidence for…”) than if you fail to reject H₀ (“We didn’t find evidence against…”).

Common Mistake: Choosing between hypotheses after seeing the data. This is “cart before the horse” and invalidates your test. Hypotheses must be determined beforehand.

Step 2: Choose Significance Level (α)

The significance level (alpha) is your Type I error tolerance—the probability of falsely rejecting a true null hypothesis.

Standard: α = 0.05 (5% false positive rate tolerated)
Conservative: α = 0.01 (1% false positive rate, stricter)
Lenient: α = 0.10 (10%, less common)

In Plain Language: If α = 0.05, you’re willing to be wrong 5% of the time by claiming an effect exists when it doesn’t.

When to adjust α:

Medical/safety testing: Use α = 0.01 (lower tolerance for false positives)
Exploratory research: Can use α = 0.10
Standard: α = 0.05

Step 3: Select Test Statistic

The test statistic is a single number calculated from your sample data that summarizes evidence against H₀. Different data types require different tests.

Those running analyses in data science tutoring sessions frequently encounter all of these test types in real-world datasets.

When to use each test:

Data Type	Test	Formula	Example
1 mean vs. population	One-sample t-test	t = (x̄ – μ₀) / (s/√n)	Is average engineer height 5’10”?
2 independent means	Two-sample t-test	t = (x̄₁ – x̄₂) / √(s₁²/n₁ + s₂²/n₂)	Do men and women earn differently?
2 paired means	Paired t-test	t = (d̄) / (sd/√n)	Does weight change before/after diet?
2+ categorical variables	Chi-square test	χ² = Σ(O – E)² / E	Is product preference independent of gender?
1 proportion vs. population	One-sample z-test	z = (p̂ – p₀) / √(p₀(1-p₀)/n)	Is defect rate 5%?

How to choose:

Comparing means of continuous data → t-test or z-test
Comparing counts or proportions → chi-square test
Small sample (n < 30) → t-test
Large sample (n ≥ 30) → can use z-test (or t-test still valid)

Step 4: Calculate Test Statistic and Find P-value

This is mechanical computation—most done by software. The test statistic measures how far your sample result is from the null hypothesis value, expressed in standard errors.

P-value interpretation: The p-value is the probability of observing your sample result (or more extreme) if H₀ were true.

Small p-value (< 0.05): Unlikely result under H₀ → suggests H₀ is false
Large p-value (≥ 0.05): Likely result under H₀ → H₀ is plausible

Not the probability that H₀ is true. This is the most common misconception. yourstatsguru

Step 5: Make Decision and Interpret Results

Decision rule:

If p-value < α: Reject H₀ (statistically significant result)
If p-value ≥ α: Fail to reject H₀ (not statistically significant)

Interpretation language matters:

✓ Correct: “We reject H₀ and conclude there is significant evidence for H₁.”
✗ Incorrect: “We proved H₁” or “H₀ is false”

✓ Correct: “We failed to reject H₀; insufficient evidence for H₁.”
✗ Incorrect: “H₀ is true” or “No effect exists”

Worked Example 1: One-Sample T-Test (Engineering Context)

Scenario: A metal rod manufacturer claims rods average 100 mm in length. Quality control tests a random sample of 25 rods to verify this claim.

Data:

Sample mean: x̄ = 101.5 mm
Sample std. dev: s = 2.3 mm
Sample size: n = 25
Claimed population mean: μ₀ = 100 mm

Step 1: State Hypotheses

H₀: μ = 100 (rods average 100 mm)
H₁: μ ≠ 100 (rods differ from 100 mm, two-tailed)

Step 2: Choose α = 0.05

Step 3: Select One-Sample T-Test

Step 4: Calculate Test Statistic

t = (x̄ – μ₀) / (s / √n)
t = (101.5 – 100) / (2.3 / √25)
t = 1.5 / (2.3 / 5)
t = 1.5 / 0.46
t = 3.26

Degrees of freedom: df = n – 1 = 24

Find p-value: Using t-distribution table or software with t = 3.26, df = 24, two-tailed:
p-value ≈ 0.0038

Step 5: Make Decision

p-value (0.0038) < α (0.05) → Reject H₀

Interpretation: “The sample provides strong evidence that rods differ significantly from the claimed 100 mm average (t(24) = 3.26, p = 0.0038). Quality control should investigate the manufacturing process.”

Understanding probability distributions is essential context for interpreting these results — the guide on choosing the right probability distribution for engineers covers the underlying theory in depth.

Worked Example 2: Two-Sample T-Test (A/B Testing Context)

Scenario: An e-commerce company tests two website designs to see if Design B increases average order value. They randomly assign customers to Design A (current) or Design B (test) and track average orders.

Data:

Design A: n₁ = 150, x̄₁ = $52.40, s₁ = $18.20
Design B: n₂ = 150, x̄₂ = $58.75, s₂ = $19.80

Step 1: State Hypotheses

H₀: μ₁ = μ₂ (no difference in order value between designs)
H₁: μ₁ ≠ μ₂ (designs differ in order value, two-tailed)

Step 2: Choose α = 0.05

Step 3: Select Two-Sample T-Test

Step 4: Calculate Test Statistic

t = (x̄₁ – x̄₂) / √(s₁²/n₁ + s₂²/n₂)
t = (52.40 – 58.75) / √((18.20²/150) + (19.80²/150))
t = -6.35 / √(2.205 + 2.613)
t = -6.35 / √4.818
t = -6.35 / 2.195
t = -2.89

Degrees of freedom: Approximation: df ≈ 298

Find p-value: Using t-distribution with t = -2.89, df ≈ 298, two-tailed:
p-value ≈ 0.0041

Step 5: Make Decision

p-value (0.0041) < α (0.05) → Reject H₀

Interpretation: “Design B significantly increases average order value by $6.35 compared to Design A (t(298) = -2.89, p = 0.0041). This provides strong evidence to recommend rolling out Design B.”

Students who struggle with the mathematical mechanics here may find it useful to revisit foundational material — this guide on calculus for engineering majors addresses the quantitative reasoning skills that underpin statistical computation.

Worked Example 3: Chi-Square Test (Categorical Data)

Scenario: A university wants to know if student satisfaction with campus facilities differs by class year (freshmen, sophomores, juniors, seniors).

Survey Results:

Class	Satisfied	Unsatisfied	Total
Freshman	45	55	100
Sophomore	52	48	100
Junior	58	42	100
Senior	62	38	100
Total	217	183	400

Step 1: State Hypotheses

H₀: Class year and satisfaction are independent (no association)
H₁: Class year and satisfaction are associated (not independent)

Step 2: Choose α = 0.05

Step 3: Select Chi-Square Test of Independence

Step 4: Calculate Expected Frequencies and χ² Statistic

Expected frequency formula: E = (Row Total × Column Total) / N

For Freshman-Satisfied: E = (100 × 217) / 400 = 54.25
For Freshman-Unsatisfied: E = (100 × 183) / 400 = 45.75

(Continuing for all cells…)

Chi-square statistic:
χ² = Σ (Observed – Expected)² / Expected
χ² = (45-54.25)²/54.25 + (55-45.75)²/45.75 + … = 8.47

Degrees of freedom: df = (rows – 1) × (columns – 1) = (4-1) × (2-1) = 3

Find p-value: Using chi-square distribution with χ² = 8.47, df = 3:
p-value ≈ 0.037

Step 5: Make Decision

p-value (0.037) < α (0.05) → Reject H₀

Interpretation: “There is significant association between class year and satisfaction with campus facilities (χ²(3) = 8.47, p = 0.037). Upper-class students report higher satisfaction than freshmen.”

Running chi-square and t-tests in MATLAB tutoring sessions is a common application — MATLAB’s Statistics and Machine Learning Toolbox handles all the test types covered in this guide.

Common Mistakes and How to Avoid Them

Mistake 1: Setting Hypotheses After Seeing Data

What students do: Calculate sample statistics, then write hypotheses based on results.

Why it’s wrong: This defeats hypothesis testing. You already know the answer from summary statistics.

How to fix it: Write hypotheses BEFORE data analysis. Hypothesis testing is a “blind guess” that you then test with data.

Mistake 2: Misinterpreting P-Values

Wrong: “The p-value is the probability H₀ is true” (0.05 = 5% chance H₀ true)

Correct: “The p-value is the probability of observing this data (or more extreme) if H₀ were true”

Example: p = 0.03 means “there’s a 3% chance we’d see results this extreme if H₀ were true”—NOT “3% chance H₀ is true.” yourstatsguru

Mistake 3: Confusing Test Selection

Wrong: Using z-test for small samples (n < 30)

Correct: Use t-test for small samples; z-test for large samples or known population σ.

Wrong: Using t-test for categorical data (proportions)

Correct: Use chi-square for categorical; z-test or binomial for single proportion.

Students who work through these distinctions in SPSS tutoring sessions often find that running the tests in software reinforces which procedure applies to which data type.

Mistake 4: Ignoring Type I and II Errors

Type I Error (False Positive): Rejecting H₀ when it’s actually true. byjus

Probability = α (your chosen significance level)
Example: Concluding a drug works when it doesn’t

Type II Error (False Negative): Failing to reject H₀ when it’s actually false.

Probability = β
Example: Concluding a drug doesn’t work when it does

Key insight: You can’t minimize both errors simultaneously. Lowering α increases β. Choose based on consequences.

Mistake 5: Saying “Prove” or “Accept” H₀

Wrong: “We proved H₁” or “We accept H₀”

Correct: “We reject H₀ in favor of H₁” or “We fail to reject H₀”

Hypothesis testing provides evidence, not proof.

If you’ve recently struggled with a stats exam, the recovery strategies in this guide on what to do after failing a midterm apply directly to statistics courses as well.

Software Walkthroughs

Excel: One-Sample T-Test

Data in cells A2:A26 (25 rod lengths)

Formula:

=T.TEST(A2:A26, 100, 2, 1)

Where:
– A2:A26 = data range
– 100 = hypothesized mean
– 2 = two-tailed test
– 1 = one-sample test

Result: p-value directly displayed

R: Two-Sample T-Test

# Create data
design_a <- rnorm(150, mean = 52.4, sd = 18.2)
design_b <- rnorm(150, mean = 58.75, sd = 19.8)

# Perform two-sample t-test
result <- t.test(design_a, design_b)

# View results
print(result)
# Shows: t-statistic, df, p-value, confidence interval

Python: Chi-Square Test

from scipy.stats import chi2_contingency
import pandas as pd

# Create contingency table
data = np.array([[45, 55], [52, 48], [58, 42], [62, 38]])

# Perform chi-square test
chi2, p_value, dof, expected = chi2_contingency(data)

print(f”Chi-square statistic: {chi2:.2f}”)
print(f”P-value: {p_value:.4f}”)
print(f”Degrees of freedom: {dof}”)

Analysts who run these tests regularly in Stata tutoring sessions will recognize that Stata’s ttest and tabulate commands produce equivalent output with minimal syntax.

Practice Problems

Problem 1

A coffee shop claims its espresso shots average 30 mL. A customer measures 12 shots: mean = 28.5 mL, SD = 1.8 mL. Test at α = 0.05.

Solution: t(11) = -2.88, p ≈ 0.015. Reject H₀. Shots are significantly smaller than claimed.

Problem 2

Two teaching methods are tested. Method A: n = 40, mean = 75, SD = 12. Method B: n = 40, mean = 79, SD = 13. Test at α = 0.05.

Solution: t ≈ -1.35, p ≈ 0.18. Fail to reject H₀. No significant difference.

Problem 3

Survey data: Does preference for Product X differ by age group?

Younger: 70 prefer, 30 don’t. Older: 50 prefer, 50 don’t. Test at α = 0.05.
Solution: χ² ≈ 8.0, p ≈ 0.005. Reject H₀. Strong association between age and preference.

For students who want to see how these statistical reasoning skills connect to broader mathematical problem-solving, this post on A-Level Further Maths momentum and impulse illustrates how structured frameworks apply across quantitative disciplines.

Hypothesis Testing Explained: A Complete Step-by-Step Stats Guide

Contents

The 5-Step Hypothesis Testing Framework

Step 1: State the Hypotheses (H₀ and H₁)

Step 2: Choose Significance Level (α)

Step 3: Select Test Statistic

Step 4: Calculate Test Statistic and Find P-value

Step 5: Make Decision and Interpret Results

Worked Example 1: One-Sample T-Test (Engineering Context)

Worked Example 2: Two-Sample T-Test (A/B Testing Context)

Worked Example 3: Chi-Square Test (Categorical Data)

Common Mistakes and How to Avoid Them

Mistake 1: Setting Hypotheses After Seeing Data

Mistake 2: Misinterpreting P-Values

Mistake 3: Confusing Test Selection

Mistake 4: Ignoring Type I and II Errors

Mistake 5: Saying “Prove” or “Accept” H₀

Software Walkthroughs

Excel: One-Sample T-Test

R: Two-Sample T-Test

Python: Chi-Square Test

Practice Problems

Problem 1

Problem 2

Problem 3

Related Reading

Top Tutors, Top Grades! Only At My Engineering Buddy.