Coverage

Code coverage is a measure of how much of your code is executed when you run your tests. It helps you identify untested parts of your codebase and can guide you in writing more comprehensive tests.

Code coverage metrics can be interpreted in various ways, and expectations around coverage vary widely across teams and organizations.

Some teams aim for high coverage percentages (e.g., 80% or higher), while others focus on specific areas of the codebase that are critical to the application’s functionality. It’s important to remember that high coverage does not guarantee the absence of bugs, and low coverage does not necessarily indicate poor quality.

Many use coverage primarily to identify untested code rather than to celebrate tested code. This approach focuses on examining what’s missing rather than the overall percentage.

At its core, code coverage answers the question: “What parts of my code have been tested?” But it doesn’t answer the equally important question: “How well has my code been tested?” This distinction is crucial for properly interpreting and using coverage data in professional settings.

In industry contexts, code coverage serves several purposes:

Quality assurance: Higher coverage generally correlates with fewer bugs reaching production, though this relationship isn’t perfect.
Technical debt identification: Low-coverage areas often indicate legacy code or functionality that isn’t well understood by the current team.
Onboarding aid: Coverage reports help new team members identify critical paths through the codebase and understand testing expectations.
Continuous integration gate: Many organizations set minimum coverage thresholds that must be maintained for code to be merged.
Refactoring safety net: High coverage provides confidence when refactoring, as tests will catch most regressions.

Coverage Metrics

Different coverage metrics measure different aspects of code execution. Understanding each metric’s strengths and limitations is essential for effective use.

Line Coverage

Line coverage (sometimes called statement coverage, even though there are not an exact match) is the most basic and commonly used metric, reporting the percentage of code lines executed during testing. While intuitive, it has limitations—a line with complex logic might be “covered” even if only one code path through it is tested.

Line coverage is often used as the default metric because it’s easy to understand and communicate across technical and non-technical stakeholders. Many teams start with targets around 70-80% line coverage.

Function Coverage

Function coverage measures the percentage of functions or methods executed during testing. This metric is valuable for identifying completely untested functionality but says nothing about how thoroughly each function is tested.

Function coverage is particularly valuable in microservice architectures or libraries, where ensuring each public API endpoint has at least minimal testing is critical.

Branch Coverage

Branch coverage measures the percentage of possible branches (decision points) in the code that have been executed. This provides deeper insight than line coverage by ensuring that both outcomes of conditional statements are tested.

if (userIsAuthenticated) {
  // Branch 1
  return dashboard();
} else {
  // Branch 2
  return loginPage();
}

For full branch coverage, tests must exercise both the authenticated and unauthenticated paths.

There are other metrics such as statement, path, MC/DC, toggle, or mutation.

Condition Coverage

Condition coverage measures whether each boolean subexpression has been evaluated to both true and false. This becomes especially important with compound conditions.

if (isAdmin || (isEditor && contentBelongsToUser)) {
  // Allow edit
}

Full condition coverage requires tests that evaluate each subexpression (isAdmin, isEditor, and contentBelongsToUser) to both true and false values.

We can achieve this with four tests:

Test	`isAdmin`	`isEditor`	`contentBelongsToUser`	Outcome
1	true	not evaluated	not evaluated	true
2	false	true	true	true
3	false	true	false	false
4	false	false	not evaluated	false

Condition coverage is particularly valued in high-reliability domains like fintech, healthcare, and safety-critical systems, where edge cases in complex logic can have significant consequences.

Path Coverage

Path coverage measures the percentage of possible execution paths through a function that have been tested. This is the most comprehensive but also the most difficult to achieve, as the number of paths grows exponentially with code complexity.

Complete path coverage is rarely targeted except in the most critical systems. Instead, teams often focus on covering critical paths and high-risk scenarios identified through risk analysis.

Critical paths refer to the most important execution paths in a program that are essential to its core functionality, business logic, or safety requirements. These paths often handle high-risk, high-value, or frequently executed operations.

Here’s an example that has four possible execution paths and six branches:

Loading diagram...

Source

graph TD
  A[Start] --> B{isAdmin}
  B -->|true| C[Grant full access]
  B ==>|false| D{isAuthor}
  D ==>|true| E{isPublished}
  D -->|false| F[Access denied]
  E ==>|true| G[Grant read-only access]
  E -->|false| H[Grant edit access]
  C --> I[End]
  F --> I
  G ==> I
  H --> I

  linkStyle 2,3,5,9 stroke:#16a34a, color:#16a34a

  classDef covered fill:#9f9,stroke:#484,stroke-width:2px
  classDef uncovered fill:#f99,stroke:#844,stroke-width:2px
  classDef partial fill:#ff9,stroke:#884,stroke-width:2px

  class B,D,E,G covered
  class C,F,H uncovered

We outline one of the possible paths (in greenb bold) which will be executed if isAdmin is false, isAuthor is true, and isPublished is true.

In this example, the path coverage is 25% (1 out of 4 paths) and the branch coverage is 50% (3 out of 6 branches).

Modified Condition/Decision Coverage (MC/DC)

MC/DC is an advanced metric used in safety-critical systems that ensures each condition in a decision independently affects the outcome. It’s required by standards like DO-178C in aviation software.

MC/DC is primarily used in regulated industries where software failures could lead to physical harm, such as aerospace, automotive, and medical devices.

Given this example:

if (isAdmin || (isEditor && contentBelongsToUser)) {
  // Allow edit
}

We’d need a minimum of 5 tests to achieve MC/DC, i.e., to prove that each condition can independently affect the outcome of the decision:

Test	`isAdmin`	`isEditor`	`contentBelongsToUser`	Outcome
1	true	false	false	true
2	false	false	false	false
3	false	true	true	true
4	false	true	false	false
5	false	false	true	false

Coverage Targets and Industry Standards

Coverage targets vary widely across industries and organizations:

Startups and web applications: Often target 60-80% line coverage, focusing on business-critical paths (that’s if they test at all)
Enterprise software: Typically aims for 70-90% line and branch coverage
Financial services: Often requires 80-95% branch coverage for core transaction systems. Standards like ISO/IEC 25010 (software quality) and regulations (e.g., PCI DSS for payment systems) indirectly drive such high targets.
Healthcare software: Coverage of 80-95% is common, with a focus on condition coverage to ensure complex logic (e.g., diagnostic algorithms) is thoroughly tested.
Safety-critical systems: Can require 100% MC/DC coverage for the highest criticality components

Coverage Tools

Vitest supports two different coverage tools:

native with v8
instrumented with istanbul

Instrumented means that the code is transformed to add coverage tracking, which can be more accurate but also slower. v8 relies on the JavaScript engine to track coverage, which can be faster but less accurate in some cases.

By default, Vitest uses v8. You can run your test suite with --coverage to generate a coverage report

npx vitest --coverage

If you want to use istanbul, you can set the coverage option in your vitest.config.ts file:

import { defineConfig } from 'vitest/config'

export default defineConfig({
  test: {
    coverage: {
      provider: 'istanbul' // or 'v8'
    },
  },
})

Let’s try it out on examples from previous lectures.

Mutation Testing

Mutation testing is an advanced testing technique that evaluates the quality of your test suite by introducing small, deliberate faults (mutations) into your code and then checking if your tests can detect these changes. Unlike traditional coverage metrics that only measure which code is executed, mutation testing assesses how effective your tests are at identifying incorrect behavior.

Mutation generation: The mutation testing framework creates variants of your code by applying mutation operators. These operators make small syntactic changes like replacing + with -, changing true to false, or removing statements.
Test execution: Each mutant (modified version of your code) is tested against your existing test suite.
Analysis: If your tests fail when run against a mutant, that mutation is considered “killed” (good). If tests pass despite the mutation, the mutation “survives” (bad), indicating a weakness in your test suite.

Traditional coverage metrics can be misleading—code might be executed but not actually verified. Consider this function and test:

function isAdult(age) {
  return age >= 18;
}

test('returns true if age >= 18', () => {
  const result = isAdult(18);
  // Missing assertion!
});

This achieves 100% line coverage but doesn’t verify the function’s behavior. Mutation testing would identify this issue by creating mutations like:

function isAdult(age) {
  return age < 18; // '>=' changed to '<'
}

Since the test would still pass with this mutation, it reveals that the test isn’t actually checking the function’s output.

Mutation testing is particularly valuable for catching weak assertions or missing edge cases, but it’s not meant to test every possible code change.

One of the most popular mutation testing tools for JavaScript is Stryker. It integrates with various testing frameworks, including Vitest.

To install Stryker, run:

npm init stryker

You can specify Vitest as the test runner during setup. Stryker will create a configuration file (stryker.conf.json) where you can customize options like mutation operators, test framework, and coverage thresholds.

Now that the configuration is set up, you can run Stryker with:

npx stryker run

This will execute your tests against the generated mutants and provide a report on how many were killed or survived.

We can try it out on some examples from previous lectures.

Possible Mutations

Mutation testing frameworks typically support a variety of mutation operators or mutators, which are the rules for generating mutants.

Stryker’s mutators are designed to be practical and representative of common errors, but they don’t exhaustively cover every syntactic or semantic alteration possible in JavaScript.

The number of possible mutations in a codebase grows exponentially with code complexity. For example, a single function with multiple conditionals and operators could theoretically have thousands of unique mutations if every combination were considered.

Some mutants are equivalent, meaning they produce the same behavior as the original code despite syntactic changes.

Computational cost

Mutation testing can be computationally expensive, especially for large codebases or complex applications. Each mutant requires a full test suite execution, which can lead to long feedback loops.

To mitigate this, consider the following strategies:

Selective mutation: Focus on high-risk areas of your codebase or specific modules rather than running mutation tests on the entire codebase.
Parallel execution: Use parallel test runners to distribute the workload across multiple CPU cores or machines.
Incremental mutation testing: Only run mutation tests on code that has changed since the last run, rather than the entire codebase.
Test suite optimization: Ensure your test suite is efficient and fast. Use techniques like test parallelization, mocking, and minimizing external dependencies to reduce execution time.
Mutation sampling: Instead of running all possible mutations, randomly select a subset to test. This can provide a good indication of test suite quality without the full computational cost.

Property-Based Testing and Fuzzing

Property-based testing is a testing technique where you define properties or invariants that your code should satisfy, and the testing framework generates random inputs to verify these properties. This approach can uncover edge cases and unexpected behavior that traditional example-based tests might miss.

Property-based testing is particularly useful for functions with complex input domains or side effects, as it can explore a wide range of scenarios without requiring exhaustive test case definitions.

Fuzzing is a related technique that involves generating random or semi-random inputs to test software, aiming to uncover crashes, bugs, or vulnerabilities by stressing the system in unexpected ways. Fuzzing is often used in security testing to identify vulnerabilities in software.

A common property-based testing library for JavaScript is fast-check which integrates with Vitest. It allows you to define properties and automatically generates test cases to verify them. It also support fuzzing.

Best Practices

Code coverage is an invaluable tool in the modern developer’s toolkit, but its true value comes from how it’s interpreted and applied.

Coverage as a Guide, Not a Goal

In mature development organizations, coverage is viewed as a tool for identifying testing gaps rather than as an end in itself. This perspective shift is important—the goal isn’t to achieve an arbitrary percentage but to use coverage data to guide testing efforts toward untested or under-tested code paths.

Remember: high coverage doesn’t guarantee high-quality tests. A test suite can achieve 100% coverage with tests that don’t verify correct behavior—they simply execute code without meaningful assertions.

Risk-Based Coverage Targets

Modern industry practices often involve setting different coverage targets for different parts of the codebase based on risk:

High-risk code: Core business logic, authentication, payment processing, etc. (90%+ branch coverage)
Medium-risk code: UI controllers, data transformations (70-80% coverage)
Low-risk code: Configuration, logging, etc. (50-60% coverage)

This approach allocates testing resources more efficiently than a one-size-fits-all target.

Coverage Debt Management

When working with legacy codebases, teams often implement a “coverage debt” strategy:

Establish current coverage as the baseline
Require all new code to meet higher coverage standards
When modifying legacy code, improve its coverage
Gradually increase overall coverage over time

This approach prevents technical debt from growing while acknowledging it’s impractical to immediately retrofit tests onto all legacy code.

Integration with CI/CD

Most enterprise development workflows integrate coverage checks into CI/CD pipelines:

Coverage reports generated automatically on every build
Trends tracked over time using tools like SonarQube or Codecov
Pull requests that decrease coverage flagged for review
Coverage thresholds enforced as merge gates

Additional Resources

Mutation-testing our JavaScript SDKs by Sentry Engineering
A Practical Tutorial on Modified Condition/Decision Coverage by NASA
Why Property-Based by Nicolas Dubien