Maintainability

Software maintainability is a critical quality attribute that determines the long-term viability of any software system. It refers to the ease with which software can be understood, modified, and extended throughout its lifecycle. While initial development often captures the spotlight, studies show that maintenance activities account for 60-80% of total software costs over a system’s lifetime.

Maintainable software exhibits several key characteristics:

Readability (i.e., understandable): clear structure and consistent conventions that help new developers quickly grasp the codebase
Modularity (i.e., modifiable and extensible): allowing changes to be made with minimal risk of introducing unintended side effects and enabling new functionality to be added without architectural overhauls
Testability: well-defined components that can be verified independently
Resiliency: the ability handle and recover from problems effectively, avoiding cascading failures

These characteristics are not just technical niceties; they have profound implications for team productivity, software quality, and business agility.

In April 2018, TSB Bank migrated 5.4 million customer accounts from Lloyds Banking Group’s legacy IT system to a new platform, Proteo4UK, provided by its parent company, Sabadell. The migration was rushed, with inadequate testing and poor due diligence on the supplier’s capabilities.

The new system lacked proper capacity management controls, and non-functional testing (e.g., performance and stress tests) was incomplete. After the migration, the platform collapsed, locking 1.9 million customers out of online and mobile banking, exposing sensitive data, and causing some customers to see others’ accounts.

It took up to 8 months to fully restore services. The incident resulted in a £48 million fine from the UK’s Financial Conduct Authority (FCA) for breaching data protection regulations, and TSB’s CEO resigned. The bank’s reputation suffered, with customers losing trust in its ability to manage their data securely.

Technical Debt

The significance of maintainability becomes apparent when examining the consequences of its neglect. Poorly maintained systems accumulate what Ward Cunningham termed “technical debt”: shortcuts and suboptimal solutions that, like financial debt, accrue interest over time.

As this debt compounds, teams experience decreased velocity, increased defect rates, and growing resistance to change. What might have taken hours in a well-maintained system could require weeks in one plagued by maintenance issues.

Understanding Technical Debt

Technical debt occurs when development teams choose a quick, expedient solution over a better approach that would take longer to implement. Just as financial debt has principal and interest payments, technical debt has immediate benefits but ongoing costs:

Principal: The original suboptimal code or design decision.
Interest: The ongoing cost of working around, maintaining, and extending the flawed solution.
Compound Interest: As debt accumulates, it becomes increasingly expensive to make changes.

Technical debt can be categorized into four quadrants based on two axes: reckless vs. prudent and inadvertent vs. deliberate. This framework helps teams understand the nature of their technical debt and prioritize remediation efforts.

Loading diagram...

Source

quadrantChart
  title Technical Debt Quadrants
  x-axis Reckless --> Prudent
  y-axis Inadvertent --> Deliberate
  quadrant-1 Deliberate-Prudent
  quadrant-2 Deliberate-Reckless
  quadrant-3 Inadvertent-Reckless
  quadrant-4 Inadvertent-Prudent
  Shortcuts taken: [0.25, 0.85] radius: 0
  despite knowing: [0.25, 0.8] radius: 0
  better approaches: [0.25, 0.75] radius: 0
  Strategic decisions: [0.75, 0.85] radius: 0
  made with full awareness: [0.75, 0.8] radius: 0
  Poor practices: [0.25, 0.35] radius: 0
  due to lack of: [0.25, 0.3] radius: 0
  knowledge: [0.25, 0.25] radius: 0
  Learning debt: [0.75, 0.35] radius: 0
  that becomes apparent: [0.75, 0.3] radius: 0
  over time: [0.75, 0.25] radius: 0

Deliberate-Prudent

// Quick fix to meet critical deadline
function calculateTax(amount) {
  // TODO: Replace with proper tax service integration after Q4 release
  return amount * 0.08; // Hardcoded tax rate for MVP
}

Deliberate-Reckless

// "We don't have time for design patterns"
class OrderManager {
  processOrder(order) {
    // 200 lines of mixed responsibilities
    // Database logic, email sending, inventory updates all in one method
  }
}

Inadvertent-Prudent

// Original implementation seemed fine, but now we know better
class User {
  validateEmail(email) {
    return email.includes('@'); // Too simplistic, learned better validation later
  }
}

Inadvertent-Reckless

// No understanding of best practices
let userData = {}; // Global variable used throughout application
function updateUser(name, email) {
  userData.name = name;
  userData.email = email;
  // Direct DOM manipulation mixed with business logic
  document.getElementById('userName').innerHTML = name;
}

Measuring Technical Debt

Measuring technical debt is challenging, as it often involves subjective assessments of code quality and maintainability. However, several metrics can help quantify the impact of technical debt.

Code Quality Metrics

We have already discussed one code quality metric, code coverage, which measures the percentage of code executed during automated tests.

Other common code quality metrics include:

Cyclomatic Complexity: Measures the number of linearly independent paths through a program’s source code. Higher complexity indicates more difficult-to-maintain code.
Code Churn: Measures the amount of code added, modified, or deleted over time, for a given piece of code. High churn can indicate instability or frequent changes, i.e., design issues.
Code Duplication: Measures the percentage of code that is duplicated across the codebase. High duplication can lead to increased maintenance effort, i.e., refactoring opportunities.

DORA Metrics: Measuring Delivery Performance

The DORA (DevOps Research and Assessment) metrics provide a comprehensive view of how technical debt impacts software delivery performance:

Deployment Frequency: How often your team deploys code to production. High technical debt often reduces deployment frequency as teams become more cautious about changes.
Lead Time for Changes: The time from code commit to production deployment. Technical debt increases this metric as changes require more careful testing and coordination.
Change Failure Rate: The percentage of deployments that cause failures in production. Systems with high technical debt typically have higher failure rates due to poor modularity and testing.
Time to Restore Service: How long it takes to recover from failures. Technical debt makes systems harder to understand and debug, increasing recovery time.

Take the DORA Quick Check to better understand how these metrics apply to your team and how different industries compare.

SPACE Metrics: Understanding Developer Experience

The SPACE framework complements DORA by examining how technical debt affects developer productivity and well-being:

Satisfaction: Measures how developers feel about their work, tools, and team environment. It includes job satisfaction, sense of accomplishment, and well-being. Correlates with productivity. Example metrics:
- Developer survey scores on job satisfaction.
- Retention rates or turnover.
- Net Promoter Score (NPS) for internal tools or processes.
Performance: Focuses on the outcomes of development work, emphasizing quality and impact over quantity. It looks at the value delivered to users or the business. Example metrics:
- Reliability (e.g., system uptime or error rates).
- Customer satisfaction with features.
- Business impact (e.g., revenue from a feature or adoption rates).
Activity: Tracks the volume or frequency of development actions, providing context for what developers are doing. It’s not a direct measure of productivity but helps understand workflows. Example metrics:
- Number of commits or pull requests.
- Code review frequency.
- Tickets closed or stories completed.
Communication: Evaluates how effectively teams collaborate and share knowledge. This includes clarity and quality of interactions. Example metrics:
- Time to resolve code review comments.
- Frequency of team interactions (e.g., meetings or chat activity).
- Documentation quality or contribution rates.
Efficiency: Measures how smoothly work flows with minimal waste or delays. It focuses on reducing friction in processes. Example metrics:
- Time from code commit to deployment (cycle time).
- Number of interruptions or context switches.
- Percentage of time spent on productive tasks vs. administrative overhead.

Technical debt negatively impacts all SPACE dimensions: frustrated developers work in hard-to-understand codebases, reduced performance due to careful navigation of brittle systems, increased communication overhead to coordinate risky changes, and constant context switching to handle debt-related issues.

Managing Technical Debt

Managing technical debt requires a proactive approach that balances short-term needs with long-term sustainability. Here are some strategies:

Create a technical debt backlog with items categorized by:
- Impact: How much does this debt slow down development?
- Risk: What’s the probability and severity of failure?
- Effort: How much work is required to address it?
Allocate 20% of development capacity to addressing technical debt.
Use the “Boy Scout Rule” — leave code better than you found it.

When NOT to Address Technical Debt

While addressing technical debt is crucial, there are times when it may not be the right priority:

End-of-Life Systems: Don’t invest in systems being retired within 6-12 months.
Stable, Rarely-Changed Code: If code works reliably and changes infrequently, debt may be acceptable.
Experimental Features: For prototypes or A/B tests, perfect code may be premature optimization.
Resource Constraints: During critical business periods, defer non-critical debt work.

Strategies for Teams

To effectively manage technical debt, teams should:

Make Debt Visible: Use code comments, documentation, and issue trackers to highlight areas of technical debt.
Regular Debt Review: Hold monthly “debt retrospectives” to review accumulating debt areas, celebrate successfully paid-down debt, and adjust debt allocation based on team velocity
Educate the Team: Train teams to recognize debt patterns and make informed trade-offs between speed and quality.

The key to managing technical debt is treating it as a business concern, not just a technical one. Like financial debt, some technical debt is strategic and manageable, while excessive debt can cripple an organization’s ability to respond to market changes and customer needs.

SOLID

Maintaining software effectively requires both technical practices and organizational commitment. At the technical level, principles like SOLID guide developers toward more maintainable architectures:

Single Responsibility Principle (SRP)
Open/Closed Principle (OCP)
Liskov’s Substitution Principle (LSP)
Interface Segregation Principle (ISP)
Dependency Inversion Principle (DIP)

Single Responsibility Principle (SRP)

The Single Responsibility Principle states that a class should have only one reason to change, meaning it should only have one job or responsibility. This reduces complexity and makes the code easier to understand and maintain. It reduces the risk of unintended side effects when changes are made, as each class is focused on a single task.

Here’s an example that violates SRP:

class User {
  constructor(name, email) {
    this.name = name;
    this.email = email;
  }

  saveUser() {
    // Logic to save user to a database
  }

  sendEmail(message) {
    // Logic to send an email
  }
}

In this example, the User class has two responsibilities: managing user data and sending emails. If we need to change the email logic, we risk affecting the user data management.

Refactor the code to separate responsibilities into two classes:

class User {
  constructor(name, email) {
    this.name = name;
    this.email = email;
  }

  saveUser() {
    // Logic to save user to a database
  }
}
class EmailService {
  sendEmail(user, message) {
    // Logic to send an email to the user
  }
}

This refactoring adheres to SRP by creating a dedicated EmailService class for email-related functionality, allowing each class to focus on its specific responsibility. This also makes it easier to test and maintain each class independently, as changes to email logic won’t affect user data management, and vice versa.

Open/Closed Principle (OCP)

The Open/Closed Principle states that software entities (classes, modules, functions, etc.) should be open for extension but closed for modification. This means you can add new functionality without changing existing code, reducing the risk of introducing bugs.

Here’s an example that violates OCP:

function calculateDiscount(customer) {
  if (customer.type === "regular") {
    return customer.total * 0.1; // 10% discount
  } else if (customer.type === "premium") {
    return customer.total * 0.2; // 20% discount
  } else if (customer.type === "vip") {
    return customer.total * 0.3; // 30% discount
  }
  return 0;
}

To add a new customer type (e.g., “student”), you must modify the calculateDiscount function, violating OCP. This can lead to errors and requires retesting the entire function.

Refactor the code to adhere to OCP by using polymorphism:

class Customer {
  constructor(total) {
    this.total = total;
  }

  calculateDiscount() {
    return 0; // Default implementation
  }
}
class RegularCustomer extends Customer {
  calculateDiscount() {
    return this.total * 0.1; // 10% discount
  }
}
class PremiumCustomer extends Customer {
  calculateDiscount() {
    return this.total * 0.2; // 20% discount
  }
}
class VipCustomer extends Customer {
  calculateDiscount() {
    return this.total * 0.3; // 30% discount
  }
}
class StudentCustomer extends Customer {
  calculateDiscount() {
    return this.total * 0.15; // 15% discount
  }
}

In this refactored code, each customer type is represented by a separate class that extends the base Customer class. Each subclass implements its own calculateDiscount method. This allows you to add new customer types without modifying existing code, adhering to the Open/Closed Principle. You can simply create a new subclass for any new customer type, and the existing code remains unchanged.

Liskov’s Substitution Principle (LSP)

The Liskov’s Substitution Principle states that objects of a superclass should be replaceable with objects of a subclass without affecting the correctness of the program. This means that subclasses should extend the behavior of the superclass without changing its expected behavior.

Here’s an example that violates LSP:

class Rectangle {
  constructor(width, height) {
    this.width = width;
    this.height = height;
  }

  setWidth(width) {
    this.width = width;
  }

  setHeight(height) {
    this.height = height;
  }

  getArea() {
    return this.width * this.height;
  }
}

class Square extends Rectangle {
  constructor(size) {
    super(size, size); // calls the parent constructor
  }

  // Override to enforce square properties
  setWidth(width) {
    this.width = width;
    this.height = width; // Square must have equal sides
  }

  setHeight(height) {
    this.width = height;
    this.height = height; // Square must have equal sides
  }
}

While this might seem perfectly logical in mathematical terms, it violates LSP because a Square cannot be substituted for a Rectangle without changing the expected behavior. For example, if you set the width of a Square, it also changes the height, which is not the case for a Rectangle.

To adhere to LSP, we redesign the system so subclasses can be substituted for the base class without breaking expected behavior. Here, we use a Shape base class with an abstract getArea method, and both Rectangle and Square implement it correctly.

class Shape {
  getArea() {
    throw new Error("Method 'getArea()' must be implemented.");
  }
}
class Rectangle extends Shape {
  constructor(width, height) {
    super();
    this.width = width;
    this.height = height;
  }

  getArea() {
    return this.width * this.height;
  }
}
class Square extends Shape {
  constructor(size) {
    super();
    this.size = size;
  }

  getArea() {
    return this.size * this.size;
  }
}

In this refactored code, both Rectangle and Square extend the Shape class and implement the getArea method correctly. If we have to substitute a Square or Rectangle for a Shape, it will work as expected without breaking the program’s correctness. This adheres to Liskov’s Substitution Principle.

Interface Segregation Principle (ISP)

The Interface Segregation Principle states that no client should be forced to depend on methods it does not use. This means that interfaces should be small and specific to the clients that use them, rather than large and general-purpose.

Here’s an example that violates ISP:

class Printer {
  print() { throw new Error('Must implement'); }
  scan() { throw new Error('Must implement'); }
  fax() { throw new Error('Must implement'); }
}

// Simple printer forced to implement methods it doesn't support
class BasicPrinter extends Printer {
  print() {
    // Logic to print a document
  }

  scan() {
    throw new Error('This printer cannot scan');
  }

  fax() {
    throw new Error('This printer cannot fax');
  }
}

In this example, the Printer class has methods for printing, scanning, and faxing documents. If a client only needs to print documents, it is forced to depend on the scan and fax methods, which it does not use.

To adhere to ISP, we can create smaller, more focused interfaces:

class Printable {
  print() { throw new Error('Must implement print'); }
}

class Scannable {
  scan() { throw new Error('Must implement scan'); }
}

class Faxable {
  fax() { throw new Error('Must implement fax'); }
}

class Stapleable {
  staple() { throw new Error('Must implement staple'); }
}

// Simple printer only implements what it needs
class BasicPrinter extends Printable {
  print() {
    // Logic to print a document
  }
}

In this refactored code, we have separate classes for Printable, Scannable, and Faxable. Each class has a single responsibility and clients can depend only on the interfaces they need. If a client only needs to print documents, it can use the Printable class without being forced to depend on scanning or faxing methods. This adheres to the Interface Segregation Principle, promoting better maintainability and flexibility in the codebase.

Dependency Inversion Principle (DIP)

The Dependency Inversion Principle states that high-level modules should not depend on low-level modules; both should depend on abstractions. This means that the design should be such that high-level components are not tightly coupled to low-level components, allowing for easier changes and testing.

Here’s an example that violates DIP:

// Low-level module
class Database {
  saveOrder(order) {
    // Database-specific logic
  }
}

// High-level module
class OrderProcessor {
  constructor() {
    this.database = new Database(); // Direct dependency on concrete Database
  }

  processOrder(order) {
    this.database.saveOrder(order); // Tightly coupled to Database
  }
}

In this example, the OrderProcessor class directly depends on the concrete Database class, violating DIP. If we want to change the database implementation (e.g., switch to a different database technology), we must modify the OrderProcessor class, which is not ideal.

To adhere to DIP, we can introduce an abstraction (interface) for the database and make the OrderProcessor depend on that abstraction instead of a concrete implementation:

// Abstraction (interface-like class)
class Storage {
  saveOrder(order) {
    throw new Error("Method 'saveOrder()' must be implemented.");
  }
}

// Low-level module: Database implementation
class DatabaseStorage extends Storage {
  saveOrder(order) {
    // Database-specific logic
  }
}

// Low-level module: File system implementation
class FileStorage extends Storage {
  saveOrder(order) {
    // File system-specific logic
  }
}

// High-level module
class OrderProcessor {
  constructor(storage) {
    this.storage = storage; // Depend on abstraction, not concrete class
  }

  processOrder(order) {
    this.storage.saveOrder(order); // Use the injected storage
  }
}

In this refactored code, we have an abstract Storage class that defines the saveOrder method. The OrderProcessor class now depends on this abstraction rather than a concrete implementation. We can inject any storage implementation (e.g., DatabaseStorage, FileStorage) into the OrderProcessor constructor, allowing for greater flexibility and easier testing. This adheres to the Dependency Inversion Principle, promoting better maintainability and reducing coupling between high-level and low-level modules.

In the example above, we could further enhance the design by using a Factory Pattern to create the appropriate storage implementation based on configuration or runtime conditions. This allows us to encapsulate the instantiation logic and keep the OrderProcessor class focused on its primary responsibility.

Here’s a brief example of how a Factory Pattern could be applied:

class StorageFactory {
  static createStorage(type) {
    if (type === 'database') {
      return new DatabaseStorage();
    } else if (type === 'file') {
      return new FileStorage();
    }
    throw new Error('Unknown storage type');
  }
}
// Usage
const storageType = 'database'; // This could come from configuration
const storage = StorageFactory.createStorage(storageType);
const orderProcessor = new OrderProcessor(storage);

Case Study: Pinnacle Insurance and the Legacy Mainframe Dilemma

This is a fictional case study for illustrative purposes, based on real-world experience from yours truly.

Background

Pinnacle Insurance, a Chicago-based insurer founded in 1975, serves 2 million policyholders with life, auto, and home insurance, generating $5 billion annually. Its 500-strong IT department, led by CTO Maria Alvarez, maintains “CoreSys,” a suite of IBM z/OS mainframe systems programmed in COBOL since the 1990s. CoreSys handles policy administration, claims processing, underwriting, and billing, processing 50,000 transactions daily with 99.99% uptime. COBOL’s performance excels, executing complex transactions in milliseconds, surpassing many modern languages for batch processing. Its operational costs are low, with software expenses at $2 million annually (vs. $10 million for hardware and IBM licensing) and minimal resource usage (10% CPU at peak).

Outside the mainframe, Pinnacle is a Microsoft shop, using SQL Server for databases, C# for backend development, and ASP.NET for web interfaces in non-core applications like customer relationship management. CoreSys integrates poorly with these systems, requiring custom middleware. All new IT employees undergo a mandatory two-week COBOL training program, but its basic coverage leaves hires like junior developer Priya Sharma unprepared for CoreSys’s complexity.

The insurance industry is evolving, with InsureTech startups offering real-time quoting and mobile apps. Pinnacle’s leadership aims to launch a digital platform, but CoreSys’s limitations raise questions about modernization, technical debt, and developer productivity.

The Situation

CoreSys comprises 2 million lines of COBOL code across 12 modules. While COBOL’s speed and low software costs keep it efficient, technical debt is mounting. Only 40% of CoreSys is documented, with critical knowledge held by veterans like lead architect Tom Reynolds, nearing retirement. The 500,000-line policy administration module takes weeks to modify due to tight coupling and no automated tests. Batch processing prevents real-time data access for a planned mobile app, and linking CoreSys to Microsoft SQL Server requires costly middleware. The two-week COBOL training fails to equip new hires for CoreSys’s monolithic codebase, slowing onboarding.

The IT team, averaging 45 years old, includes 200 COBOL experts, with 30% planning to retire within five years. Annual mainframe costs total $15 million. Software releases occur monthly, with the third week dedicated to extensive manual testing. Deployments happen overnight on Fridays, with engineers on standby Saturdays to monitor outcomes. Critical bugs must be fixed over the weekend; otherwise, the release is rolled back. Minor bugs are deferred for later patches. Despite this rigor, one in four releases fails, risking regulatory fines. Preparing changes takes 4-6 weeks, slowed by manual reviews and testing. When failures occur, restoring service takes two days on average, as basic logs offer little insight without modern monitoring tools.

Developers are frustrated, rating job satisfaction low due to outdated COBOL tools, weekend standbys, and the disconnect between mainframe and Microsoft systems. The slow pace of delivering new features leads to customer complaints about delayed services. Much of the team’s time is spent reworking failed releases, leaving little room for innovation. Onboarding new hires takes six months due to knowledge silos and inadequate training. Manual testing and batch job monitoring frequently interrupt developers’ focus, reducing their efficiency.

The Challenge

CEO David Kim wants a mobile app by 2027 for real-time quotes and policy management. Maria Alvarez’s team must decide:

Maintain CoreSys: Add observability (e.g., IBM Z monitoring), automate testing, and enhance COBOL training.
Incremental Modernization: Refactor key modules and use APIs to link with Microsoft systems.
Full Migration: Replace CoreSys with a cloud-native platform (e.g., Azure, C#), leveraging Pinnacle’s Microsoft expertise.
Hybrid Approach: Migrate non-critical functions to Azure, keeping CoreSys for transactions.

Maintaining CoreSys leverages COBOL’s speed and low costs but struggles with real-time needs and manual processes. Migration aligns with the Microsoft stack but risks $50 million and years of effort. Technical debt and low morale (10% turnover) complicate the decision.

Recent Incident

A regulatory update to the billing module failed, overcharging 5,000 customers. The Saturday standby team traced the issue to an undocumented COBOL subroutine, but fixing it required Tom Reynolds’ expertise, exposing knowledge gaps. A weekend rollback cost $500,000, intensifying modernization debates.

Reflections

After reading the case study, answer the following questions:

Trade-Offs: How do COBOL’s low costs, high performance, and monthly release cycle affect migration decisions? Compare to cloud costs and Microsoft alignment.
Migration Risks and Maintenance Constraints: What risks does migration to Azure/C# pose? How does maintaining CoreSys impact agility?
Technical Debt Analysis: Identify CoreSys’s technical debt. How does it affect maintainability and Microsoft integration?
Technical Debt Prioritization Plan: Propose a debt prioritization plan, considering the release process. Which delivery or team metrics would improve?
Team Experience and Morale Issues: How do COBOL training, manual testing, and weekend standbys impact team morale and efficiency? Suggest improvements (e.g., automation).
Talent Retention Strategy: How can Pinnacle retain talent like Priya and reduce reliance on experts like Tom?
Recommended Approach: Recommend an approach for Maria Alvarez’s team. Justify using delivery and team metrics, plus Microsoft context.
Action Plan: Outline a 6-month plan to address debt and support the mobile app, considering the release cycle.

Consider pros and cons, costs, developer experience, hiring, and risks. Use metrics to support your recommendations. Discuss the impact of technical debt on maintainability.

Additional Resources

The SOLID Principles, Explained with Motivational Posters
What is the difference between Factory and Strategy patterns?
Flyvbjerg, Bent and Budzier, Alexander and Aaen, Jon and Keil, Mark and Zottoli, M., The Uniqueness of IT Cost Risk: A Cross-Group Comparison of 23 Project Types (May 08, 2025). Available at SSRN: https://ssrn.com/abstract=5247223 or http://dx.doi.org/10.2139/ssrn.5247223
Developer productivity with Dr. Nicole Forsgren (creator of DORA, co-creator of SPACE) by The Pragmatic Engineer (Podcast)