Software Testing Fundamentals

Testing has always been a vital part of software development.

Even in the early stages of learning to program, before formal software testing concepts were introduced, you likely tested your code by providing input and comparing the actual behavior to your expectations.

This process—comparing actual behavior against expected behavior—is the essence of software testing.

Manual testing, where a person (such as a developer, tester, or end-user) runs the code, inputs data, and observes the behavior, still plays an important role in software development.

However, in modern development, “testing” typically refers to automated testing. Automated testing involves running a suite of tests against a codebase, where each test automatically executes part or all of the code and compares the actual behavior to the expected behavior.

Automated testing is usually driven by developers, who design and implement these tests to validate the software they are building.

Understanding the fundamentals of automated software testing is essential: why we test, the types of tests we can perform, and how they function at a high level. These fundamentals will be explored in this discussion.

Functional vs Non-Functional Tests

Akin to functional and non-functional requirements, tests can also be categorized into two types:

Functional Tests: These tests validate that the software behaves as expected. They check whether the software meets its functional requirements, such as correctly processing user input or returning the expected output.
Non-Functional Tests: These tests evaluate the software’s performance, security, usability, and other non-functional aspects. They ensure that the software meets its non-functional requirements, such as response time, scalability, and reliability.

White Box vs Black Box Testing

In addition to functional and non-functional tests, tests can also be classified based on their approach:

White Box Testing: This approach involves testing the internal workings of the software. Testers have access to the code and can design tests based on the software’s structure, algorithms, and logic. White box testing is often used for unit tests, where the focus is on individual functions or components.
Black Box Testing: This approach treats the software as a “black box”, where the internal workings are not visible to the tester. Testers focus on the inputs and outputs of the software without knowledge of its internal code. Black box testing is commonly used for functional tests, where the emphasis is on verifying that the software meets its requirements.

Waterfall vs Agile Testing

Depending on the software development methodology, testing can be integrated into the development process in different ways:

Waterfall Testing: In the waterfall model, testing is typically performed after the development phase. This means that all code is written before any testing occurs, which can lead to discovering issues late in the process.
Agile Testing: In agile methodologies, testing is integrated throughout the development process. Tests are written alongside the code, and continuous testing is performed to ensure that new features do not introduce bugs. This approach allows for faster feedback and more efficient bug detection.

In their book “Agile Testing”, Janet Gregory and Lisa Crispin describe the Agile Testing Quadrants, which categorize different types of tests based on their purpose and focus. The quadrants help teams understand the various testing activities needed to ensure software quality:

Loading diagram...

Source

quadrantChart
  title Agile Testing Quadrants
  x-axis Supporting the Team --> Critique Product
  y-axis Technology-Facing --> Business-Facing
  quadrant-1 Manual
  quadrant-2 Automated & Manual
  quadrant-3 Automated
  quadrant-4 Tools
  Functional Tests: [0.25, 0.85] radius: 0
  Examples: [0.25, 0.8] radius: 0
  Story Tests: [0.25, 0.75] radius: 0
  Prototypes:[0.25, 0.7] radius: 0
  Simulations: [0.25, 0.65] radius: 0
  Exploratory: [0.75, 0.85] radius: 0
  Testing Scenarios: [0.75, 0.8] radius: 0
  Usability Testing: [0.75, 0.75] radius: 0
  "UAT (User Acceptance Testing)": [0.75, 0.7] radius: 0
  Alpha/Beta: [0.75, 0.65] radius: 0
  Unit Tests: [0.25, 0.35] radius: 0
  Component Tests: [0.25, 0.3] radius: 0
  "Performance & Load Testing": [0.75, 0.35] radius: 0
  Security Testing: [0.75, 0.3] radius: 0
  'ility' Testing: [0.75, 0.25] radius: 0

The “ility” testing category refers to non-functional tests that evaluate various aspects of the software, such as maintainability, interoperability, compatibility, reliability, instability, and safety. These depend on the specific context of the software or business.

Why Perform Software Testing?

The primary goal of automated software testing is to prevent software bugs from reaching production and affecting users.

The cost of software bugs can range from

Depending on the report, inadequate software testing costs the US economy anywhere from tens of billions to over two trillion dollars annually, not to mention serious non-economic impacts like loss of life.

Beyond preventing damage, automated testing offers other compelling benefits, especially for developers.

Well-designed automated tests give us confidence that our code will work as intended. This confidence allows us to modify our code without excessive worry about breaking it. Good tests enable us to add features, refactor code, and rely on the tests to quickly catch mistakes before they reach users.

The process of implementing tests can also clarify what you want your software to do, leading to better-designed systems by forcing you to consider use cases, boundary conditions, and architecture decisions.

Investing time in writing good tests upfront can save time later by reducing the effort needed to maintain your software.

For these reasons, we want to become proficient software testers. As we learn, remember that while good tests increase our confidence, they don’t guarantee bug-free code. As Edsger Dijkstra said:

“Testing shows the presence, not the absence of bugs.”

What Makes a Good Test?

A good automated software test has several key characteristics that ensure its effectiveness and reliability. Let’s explore these features in detail:

Focus on a Single Behavior: A good test targets one specific behavior within the software, such as calling a function, invoking an API method, or interacting with a specific UI element (e.g., a form or button). This ensures clarity and precision.
Based on Specific Input: The test should use well-defined inputs, such as function parameters, API values, or user interactions (e.g., submitting a form). This helps isolate the behavior being tested.
Observable Results: The behavior being tested must produce measurable results, such as a function’s return value, an API response, or a visible change in the UI (e.g., a form submission).
Known Expected Results: The expected outcome for the given input must be clearly defined. This allows the test to determine whether the behavior is correct.
Controlled Environment: Tests should run in a predictable and isolated environment, such as a single process on a dedicated machine, to ensure consistency.

The primary purpose of a software test is to compare the actual results against the expected results. If they match, the test passes; otherwise, it fails.

Tests Should Be Unchanging

Good tests are designed to capture the correct behavior of the software and should remain unchanged as long as that behavior stays consistent. Once a test is written, it should not need to be modified.

For example, changes like refactoring, adding new features, or fixing bugs should not alter the existing behaviors of the software. A good test ensures that these modifications do not unintentionally impact the software’s current functionality.

If existing tests need to be updated during refactoring or feature additions, it may indicate that the changes are unintentionally affecting the software’s behavior. Poorly written tests, however, may require updates, which highlights the importance of writing high-quality tests from the start.

Avoid Testing Implementation Details

When writing tests, it can be tempting to rely on internal implementation details, but this approach should generally be avoided.

For instance, consider a feature where a dialog opens when a user clicks a button. The mechanism controlling this dialog might involve a boolean variable named showDialog, which is set to true when the button is clicked. Testing this by verifying that showDialog is true after the button click relies on implementation details.

This approach is problematic because it makes tests brittle. If the code is refactored—such as renaming or removing showDialog—the test would fail and require modification, even if the functionality remains unchanged.

The best practice is to write tests that interact with the software as a user would, using only public APIs and observing visible changes in the UI. For example, instead of checking the value of showDialog, a better test would verify that the dialog is actually displayed in the UI after the button is clicked. This could involve checking for the presence of the dialog’s text or other visible elements.

When Testing Implementation Details Is Necessary

While avoiding implementation details is ideal, there are scenarios where testing them is unavoidable. This is particularly true when critical behavior cannot be observed through the public interface.

For example, you might need to test whether a data access function retrieves data from a cache instead of a database. If this behavior is not evident from the function’s return value, examining the function’s internals may be required.

Throughout this course, we will explore tools and techniques for writing tests that simulate user interactions and focus on observable behaviors, while minimizing reliance on internal implementation details.

The Developer User Vs. The End User

When we say you should “implement tests that interact with our software the way a user would”, remember that there can be two different kinds of users:

The End User: a person who interacts directly with the code, such as by clicking buttons and typing text.
The Developer User: another developer who uses your code in their own code, such as by incorporating a class you wrote into their application.

Regardless of which type of user your code has (or if it has both, like a reusable UI component), always implement tests using the interface that user would use. This means clicks and typing for end users, and public method calls for developer users. Don’t test implementation details if you can avoid it.

Determinism and Flaky Tests

A key characteristic of effective automated software tests is determinism. A good test should consistently produce the same result—either passing or failing—when run under identical conditions, such as the same code and inputs.

Tests that behave unpredictably, sometimes passing and sometimes failing without any changes to the code or test, are referred to as flaky tests. These tests can erode developers’ confidence in the test suite and waste valuable time as they try to diagnose the cause of the failures.

To ensure reliable tests, it is crucial to avoid creating flaky tests whenever possible. Strategies for preventing flakiness will be discussed later in the course.

That said, some degree of test flakiness may be unavoidable in specific scenarios. For example, a test that relies on a network service might fail due to uncontrollable factors like network instability.

In such cases, the impact of flakiness can be minimized by automatically rerunning any failing tests. This approach helps ensure that transient issues do not disrupt the overall testing process.

Different Types Of Automated Software Tests

We’ll focus on three main types of automated software tests: unit tests, integration tests, and end-to-end tests. These types are distinguished by their scope, or how much code is being tested. The difference between unit test and integration test is often a matter of degree, but the difference between integration test and end-to-end test is more pronounced.

Google classifies tests based on their size. Small tests run in a single process, medium tests run on a single machine, and large tests run wherever they want. This is a little different from the unit/integration/end-to-end classification, but it is a useful way to think about the size of tests.

A more common metric is the scope of the test, which refers to how much code is being validated (not executed). The three main types of tests are:

Unit Tests

A unit test focuses on validating a small, specific piece of code, such as a single function or class. The term “unit test” originates from the idea of testing individual “units” of code.

For instance, a unit test might verify that a function returns the correct output for a given set of inputs or that a form’s submit handler properly validates user input.

Unit tests are typically small and straightforward to implement. Developers often write multiple unit tests to evaluate a unit of code under various conditions, referred to as test cases.

For example, when testing a shipping address form, unit tests might check that the form:

Accepts a valid address.
Rejects invalid inputs (e.g., missing ZIP code or city name).
Handles unusual inputs gracefully (e.g., a city name that is 257 characters long).

Tests for rare or edge-case inputs are known as boundary cases and are essential to include in a comprehensive test suite.

By covering a wide range of scenarios, unit tests provide confidence that the code behaves as expected in different situations.

However, the primary limitation of unit tests (as humorously illustrated by many clever gifs) is that they do not verify whether different units of code work together correctly. This is where integration tests come into play.

Integration Tests

An integration test has a broader scope than a unit test and focuses on validating the interactions between multiple software components.

For instance, while a unit test might check that a shipping address form properly validates input, an integration test would verify that the form successfully passes a valid address to a shipping cost calculator, which then computes the correct shipping cost. This test specifically ensures the interaction between the form and the shipping cost calculator works as expected.

Another example of an integration test could involve verifying that a user’s shipping address is correctly stored in the database after the form is submitted. This would validate the interaction between the address form and the database.

By testing how different parts of the code interact, integration tests provide greater confidence in the overall functionality of the system compared to unit tests.

However, integration tests are inherently more complex to implement and maintain due to their wider scope and the dependencies between components.

As a result, balancing the use of unit tests and integration tests is essential. We will explore strategies for achieving this balance later in the course.

End-To-End (E2E) Tests

End-to-end tests (also known as functional tests) have the broadest scope, validating large, interconnected portions of the codebase as they work together. These tests often cover entire pages or user flows and may involve verifying the behavior of multiple application processes running on different machines (e.g., a web client, web/API server, and database server).

End-to-end tests evaluate an application’s behavior through its user interface and are typically executed using a “robot” that simulates user actions, such as clicking buttons and typing input.

For example, an end-to-end test for an e-commerce application might validate the entire checkout process. The “robot” would simulate a user clicking the “checkout” button, entering shipping and payment information, clicking the “complete checkout” button, and receiving a transaction confirmation.

The implementation of an end-to-end test encodes the expected behavior of the application at each step. The testing “robot” then verifies that the actual behavior matches these expectations.

While end-to-end tests provide the highest level of confidence by simulating real user interactions with the entire application, they are also the most expensive to implement, run, and maintain.

Despite their cost, end-to-end tests are invaluable because they ensure the application behaves as intended for real users, offering the greatest assurance of reliability.

Other Types Of Automated Software Tests

While this course focuses on unit, integration, and end-to-end tests, there are other types of automated software testing worth mentioning:

Performance Tests: These tests ensure that a piece of software uses resources (e.g., runtime or memory) efficiently and stays within acceptable limits. For example, a performance test might measure how long a piece of code takes to execute. If the execution time exceeds a predefined threshold, the test fails.
Load Tests: These tests evaluate whether an application can maintain its performance under heavy usage. Load tests often simulate a large number of users interacting with the application simultaneously. If the application slows down significantly or encounters errors during the test, it may indicate the need for code optimization or larger architectural changes, such as adding more servers.
Static Tests: These tests analyze the code without executing it. They can catch syntax, typos, type errors, style violations, security risks, and other issues before the code is run. Static tests are often performed by a linter integrated into the code editor, which provides immediate feedback to developers as they write code, or by a static analyzer.

Balancing Different Types Of Automated Tests

Most production software projects benefit from a combination of unit, integration, and end-to-end tests. However, determining the ideal proportion of each type to balance their advantages and disadvantages has been a topic of debate.

The Testing Pyramid

One widely recognized model for balancing these tests is the testing pyramid:

E2E Tests

Few, fancy, user-obsessed, slowest, and pricey

Integration Tests

Connecting the dots, moderately complex and expensive, slower

Unit Tests

Fast, focused, and cheap

The testing pyramid emphasizes that unit tests are the fastest and least expensive to write, run, and maintain. Therefore, they should make up the majority of your test suite. As you move up the pyramid to integration and end-to-end tests, which are slower and more expensive, the number of these tests should decrease.

In summary, the pyramid suggests:

Unit Tests: The majority of your tests, as they are fast and cost-effective.
Integration Tests: A smaller proportion, focusing on interactions between components.
End-to-End Tests: The fewest, as they are the slowest and most expensive.

The Testing Trophy

Another model, the testing trophy, offers a different perspective:

The testing trophy suggests that most tests should be integration tests, with unit tests forming the next-largest proportion and end-to-end tests being the smallest. Additionally, the trophy highlights the importance of static tests, which analyze code without executing it.

Proponents of the testing trophy argue that it better reflects the confidence gained from each type of test. Confidence generally increases as you move up the pyramid or trophy, with end-to-end tests providing the highest assurance.

They also point out that advancements in testing tools have made integration tests faster and easier to maintain, making it more practical to rely on a larger suite of integration tests.

Choosing the Right Balance

Both models provide valuable insights, but the ideal balance depends on your project’s specific needs. The key is to leverage the strengths of each type of test while minimizing their weaknesses to create a reliable and efficient testing strategy.

Test-Driven Development (TDD)

Test-Driven Development (TDD) is a software development methodology that emphasizes writing automated tests before implementing the corresponding code. This approach follows a repeating cycle known as the “red, green, refactor” cycle.

Loading diagram...

Source

flowchart TD
  START:::hidden e1@==>|"Update list<br>as we learn"| A
  e1@{ animate: true }
  A[Write a list of test cases] ==> B[Pick one test case]
  B ==> C[Write the test]
  C ==>|Test Fails| D[Write code to make it pass]
  D ==>|Test Passes| E[Refactor the code]
  E ==>|Test Passes| B
  style C fill:#f87171, stroke:#dc2626
  linkStyle 3 stroke:#dc2626, color:#dc2626
  linkStyle 4,5 stroke:#16a34a, color:#16a34a
  style D fill:#4ade80, stroke:#16a34a
  style E fill:#60a5fa, stroke:#2563eb

The TDD cycle consists of three key steps:

❌ Test Fails: Start by writing a test that defines the behavior of the next feature you plan to implement. Since the feature does not yet exist, the test will fail, producing a “red” error message.
✅ Test Passes: Write the minimum amount of code required to make the test pass. At this stage, focus solely on functionality, not on code design or optimization.
🔄 Refactor: Once the test passes, improve the code’s structure and readability without changing its behavior. The test should continue to pass after refactoring. If it fails, the refactoring introduced an issue that needs to be addressed.

This cycle is repeated for each new feature or behavior you want to implement.

Benefits of TDD

One of the primary advantages of TDD is that writing tests first forces you to think critically about your code’s interface and behavior before implementation. During the “red” phase, you must decide:

How the code will be called.
What behavior it will exhibit.
How it will communicate its results.

By designing the test first, you approach your code from the perspective of its user (in this case, the test itself). This often results in code with a clear, simple, and user-friendly interface.

Working in Small Increments

A key to successful TDD is working in small, manageable increments. Start by writing a single test with a narrow scope, then implement just enough code to make that test pass. Repeat this process until the entire feature is complete. Breaking down problems into smaller steps makes the development process more manageable and reduces the risk of introducing errors.

To gain a better understanding of working in small increments with TDD, I highly recommend reading the “A TDD Example” section from Test-Driven Development in The Art of Agile Development.

Avoiding Common Pitfalls

TDD does not require writing all tests upfront. Attempting to do so is considered an anti-pattern, as it can lead to wasted effort if requirements change. Instead, focus on one test at a time, iterating through the “red, green, refactor” cycle for each feature.

Behavior-Driven Development (BDD)

Behavior-Driven Development (BDD) is an extension of TDD that emphasizes collaboration between developers, testers, and non-technical stakeholders. BDD encourages writing tests in a language that is easily understood by all parties involved, making it easier to communicate requirements and expectations.

Limits of Automated Testing and the Importance of Manual Testing

While automated software testing is a fundamental part of modern development, it’s important to understand its limitations. Automated testing cannot handle every scenario, and manual testing remains essential in certain cases.

Manual testing is particularly valuable when tasks require human creativity or judgment.

For example, security testing often involves uncovering complex and subtle vulnerabilities. This is an area where human exploratory testing excels compared to automated tools. Exploratory testing relies on human intuition and creativity to identify vulnerabilities that automated systems might overlook. Once a vulnerability is discovered, an automated test can be created to ensure the issue is permanently resolved.

Similarly, evaluating the quality of subjective elements—such as search results, audio, or video—is difficult to automate. These tasks often depend on human judgment, making systematic manual testing the most effective approach for such assessments.

Additional Readings

Testing Overview from Software Engineering at Google
Canonical Test Driven Development by Kent Beck
Introducing Behavior-Driven Development by Dan North
Different types of testing and the testing pyramid from Microsoft
The Purpose of Testing from Agile Testing by Janet Gregory and Lisa Crispin
When to Unit, E2E, and Integration Test by ThePrimeagean