When Machines Break: A Guide to Controlling Reactive Maintenance Costs

February 1, 2026
Dr.-Ing. Simon Spelzhausen

Reactive maintenance is a strategy where repairs are performed only after equipment failure. While often viewed as inefficient, it is a financially valid choice for non-critical assets where replacement costs are low. This guide details how to manage risks, control emergency spend, and triage incidents effectively within a broader maintenance strategy.

Definition: Reactive maintenance is the practice of restoring an asset to operational condition only after it has broken down or malfunctioned.

Interested in reducing unplanned downtime? Explore Makula’s preventive maintenance features.

What is Reactive Maintenance?

In the world of facility and asset management, reactive maintenance is often the default setting for organisations just starting. It is commonly referred to by several names, including emergency maintenance management, breakdown maintenance, or a run-to-failure strategy.

Regardless of the terminology, the core concept remains the same: you do not interfere with the machine or asset until it ceases to function. Unlike preventive maintenance, which relies on calendar-based schedules, or predictive maintenance, which relies on data sensors, reactive maintenance relies solely on failure as the trigger for action.

While often demonised as "fighting fires," a calculated reactive approach is actually a crucial component of a balanced maintenance portfolio. The secret lies in distinguishing between uncontrolled chaos and a deliberate strategy. When applied correctly, it allows maintenance teams to focus their limited resources on critical assets while letting less important equipment run its course.

When Reactive Maintenance is Acceptable

Implementing a blanket proactive strategy for every single light bulb, door handle, or office chair in a facility is financially ruinous. A run-to-failure strategy is perfectly acceptable—and often smarter—in specific scenarios where the cost of prevention outweighs the cost of the cure.

Smart maintenance managers utilise reactive strategies when:

  • Assets are Non-Critical: The failure of the asset does not impact production lines, safety, or core business operations. If a break room coffee machine fails, production continues; if the main conveyor belt snaps, the factory stops. Reactive maintenance is suitable for the former.
  • Low Capital Cost: The cost of replacing the item is negligible compared to the cost of maintaining it. Spending hours inspecting operational widgets worth £10 is a waste of skilled labour. It is more economical to replace them upon failure.
  • Redundancy Exists: You have a backup system in place that switches on automatically (or with minimal manual intervention), allowing you to fix the primary unit without urgency. The redundancy buffers the operational risk.
  • End of Life: The asset is scheduled for disposal or replacement soon. Investing in preventive care for a machine due to be scrapped in two months yields no return on investment (ROI).
  • Random Failure Patterns: Some electronics fail randomly and do not exhibit wear-and-tear signs that preventive maintenance could catch. In these instances, scheduled inspections add no value.

Cost and Risk Breakdown

While the upfront cost of reactive maintenance is zero (you spend nothing until it breaks), the backend costs can be substantial if not managed. Understanding these cost drivers is essential for effective emergency maintenance management.

The hidden costs often spiral because reactive work is urgent. You are paying for speed and convenience rather than efficiency. Below is a breakdown of where the money actually goes during a reactive incident.

Incident Cost Example: Hydraulic Pump Failure

Cost Driver Description Estimated Cost Impact
Downtime Production stoppage while the machine is offline. £500 - £10,000+ per hour
Overtime Labour Technicians are called in after hours or at weekends. 1.5x to 2x standard hourly rate
Expedited Parts Rush shipping or courier fees for spares not in stock. 20% - 50% premium on parts
Collateral Damage When a part fails violently, it may damage other components. Variable (often high)
Safety Risks Rushed repairs often bypass standard safety checks. Potential fines or injury claims

To manage these costs, you must have strict governance. You cannot simply let things break; you must be prepared for when they break. This preparation is what separates a "run-to-failure" strategy from simple negligence.

Emergency Triage Workflow

When a critical failure occurs, chaos is the enemy. An effective reactive strategy requires a disciplined triage workflow to determine if a breakdown is a true emergency or just a nuisance. Without triage, technicians may rush to fix a minor issue while a critical line remains down.

Step-by-Step Triage Process

  1. Incident Reporting: An operator flags the issue via the Computerised Maintenance Management System (CMMS) or helpdesk.
  2. Initial Assessment: The maintenance supervisor reviews the report immediately. Key questions: Is this a safety hazard? Is production stopped?
  3. Classification:
    • P1 (Emergency): Immediate threat to safety, environment, or total production. Dispatch immediately.
    • P2 (Urgent): Impairs performance or quality, but operation continues. Schedule within 24 hours.
    • P3 (Routine): Non-critical. Add to the backlog for the next available slot.
  4. Resource Allocation: Check for spare parts and technician availability. If parts are missing for a P1, initiate emergency procurement.
  5. Execution: Perform the repair.
  6. Root Cause Analysis (RCA): For P1 incidents, determine why it failed to prevent recurrence. (See [Art #3: Root Cause Analysis Basics] for more on this).

Spares and Vendor Strategies

A run-to-failure strategy is risky if you do not have the right parts on the shelf. If you choose to let an asset fail, you must ensure the replacement can be enacted swiftly to minimise the Mean Time To Repair (MTTR).

The "Just-in-Case" Inventory

For assets designated as reactive, maintain a "min-max" inventory level. You do not need a warehouse full of spares, but you do need the critical consumables that frequently break.

  • Consumables: Belts, fuses, seals, and hoses should always be on hand.
  • Rotables: Keep refurbished motors or pumps ready to swap out, allowing you to repair the broken one offline without time pressure.

Vendor SLAs

For specialised equipment where you do not hold stock, you need strong Service Level Agreements (SLAs) with vendors. Reliance on external support is a major vulnerability in emergency maintenance management.

  • Emergency Call-out: Define the maximum response time in the contract (e.g., 4 hours).
  • Part Availability: Ensure suppliers guarantee stock of critical components for your older machinery.
  • After-Hours Support: Verify that your vendor answers the phone at 2 AM on a Sunday.

KPIs and Dashboards

You cannot improve what you do not measure. Even if you are operating reactively, you need data to ensure you aren't bleeding cash. Dashboards provide visibility into the health of your reactive strategy.

Key Performance Indicators (KPIs) to Track:

  • Cost Per Incident: The total load of labour + parts + downtime for a single repair. This helps justify moving an asset from reactive to preventive later if the costs become too high.
  • Mean Time to Repair (MTTR): How long does it take to get the asset back online once it fails? In a reactive environment, lowering MTTR is the primary goal.
  • Percentage of Emergency Work: Ideally, emergency work should be under 10-20% of total maintenance hours. If this creeps higher, your reactive strategy is becoming chaotic and eating into your planned work.
  • Stockout Rate: How often do reactive repairs get stalled because a part isn't available? High stockout rates indicate a failure in your spares strategy.

Monitoring these metrics helps you decide when to shift gears. See [Art #1: Maintenance Metrics That Matter] for a deeper dive into setting up these dashboards.

Decision Rubric: When to Allow Run-to-Failure

How do you formally decide if a machine should be left to break? You need a decision rubric. This prevents emotional decision-making ("I think we should just leave it") and replaces it with logic based on data and risk.

The Decision Logic

Use this logic flow to categorise your assets:

  1. Safety Check: Does failure risk injury or environmental breach?
    • Yes: Must be Preventive/Predictive.
    • No: Continue.
  2. Criticality Check: Does failure stop production or service delivery?
    • Yes: Must be Preventive.
    • No: Continue.
  3. Cost Analysis: Is (PM Cost per year) > (Repair Cost + Downtime Cost)?
    • Yes: Run-to-Failure is approved.
    • No: Schedule Maintenance.

Implementation Controls & Governance

Adopting a reactive approach for certain assets is not an excuse for negligence. You need strict controls to ensure "reactive" doesn't become "reckless".

Standard Operating Procedures (SOPs)

Even for reactive repairs, you need SOPs. Technicians should not be guessing how to fix a broken boiler under pressure. Accessible, clear guides ensure that when things break, they are fixed correctly and safely. Digital SOPs accessible via mobile tablets are ideal for this.

Emergency Kits

For known reactive assets (like facility plumbing or lighting), keep "Crash Kits" ready. These are pre-packed boxes with the tools and parts needed for the most common failures. This eliminates the time spent searching for tools when the pressure is on.

Safety Checks

Emergency repairs are high-risk for accidents because people are rushing. Implement a mandatory "Take 5" safety pause before any emergency work begins. Technicians must assess energy isolation (Lockout/Tagout) before touching the broken asset. Speed should never compromise safety.

Training Drills

Just as you have fire drills, you should have maintenance drills for critical failures. Simulate a major breakdown and time how long it takes your team to diagnose, locate parts, and "fix" the issue. This highlights bottlenecks in your workflow before a real emergency occurs.

Slash Emergency Repairs and Save Thousands Today

Every hour your machines sit idle costs money. With Makula, you can triage incidents instantly, track spares accurately, and empower your team to act fast. Stop reactive chaos before it hits your bottom line.

Book Your Free Demo Now

Frequently Asked Questions

Reactive maintenance is a strategy where maintenance is performed only after equipment fails or malfunctions. It requires rapid response protocols to minimise downtime and is often applied to non-critical or low-cost assets.

Reactive maintenance is acceptable for non-critical assets, low-cost items, equipment with redundancy, end-of-life assets, or machines with random failure patterns where preventive maintenance provides little ROI.

Hidden costs include production downtime, overtime labour, expedited parts shipping, collateral damage to other equipment, and safety risks from rushed repairs.

Use a step-by-step workflow: incident reporting, initial assessment, classification (P1 emergency, P2 urgent, P3 routine), resource allocation, execution, and root cause analysis for critical failures.

Maintain "min-max" inventory for consumables and rotables. Use strong SLAs with vendors for specialized equipment, ensuring fast emergency response, parts availability, and after-hours support.

Track Cost Per Incident, Mean Time to Repair (MTTR), Percentage of Emergency Work, and Stockout Rate to monitor performance, control costs, and decide when to shift assets to preventive maintenance.

Implement SOPs for repairs, maintain emergency kits, enforce mandatory safety checks, and conduct training drills to ensure reactive maintenance is safe, controlled, and effective.

Dr.-Ing. Simon Spelzhausen
Mitbegründer und Chief Product Officer

Dr.-Ing. Simon Spelzhausen, ein Engineering-Experte mit einer nachgewiesenen Erfolgsbilanz bei der Förderung des Geschäftswachstums durch innovative Lösungen, hat sich durch seine Erfahrung bei Volkswagen weiter verbessert.