svg
Head

Barclays : Proactive SLA Breach Alerting for VaR Reporting

1. Context and Background

In an investment bank, Value at Risk (VaR) is a critical daily metric used by Risk Managers to assess market exposure. VaR is calculated for T-1 Close of Business and must be delivered within strict timelines to support:

  • Regulatory compliance
  • Risk oversight
  • Trading and capital decisions

The VaR calculation depends on multiple upstream data feeds, each with defined SLAs. A delay or failure in any single feed can block the entire batch, leading to late or missed VaR reporting.

2. Problem Statement

Problem:

SLA breaches in upstream feeds were often detected too late, only after the VaR batch was already delayed, leaving RTB teams with little time to act.

Why this was critical:

  • VaR is time-sensitive and market-critical
  • Late detection caused panic-driven firefighting
  • Downstream systems were also impacted
  • No proactive visibility into which feed caused the delay

3. Existing Workflow (Before the Change)

  1. VaR batch runs for T-1 COB
  2. Multiple feeds load sequentially
  3. If a feed fails or breaches SLA:
    • Issue is often discovered at the end of the process
  4. RTB teams react late, under time pressure
  5. VaR delivery and downstream processes are delayed

This was a reactive system, heavily dependent on manual investigation.

4. User Personas

Primary Users

Run-the-Bank (RTB) Support Teams

  • Responsible for feed monitoring and issue resolution
  • Need early signals to act quickly

Secondary Users

Risk Managers & Downstream Consumers

  • Depend on timely VaR delivery
  • Impacted indirectly by upstream delays

5. Root Cause Analysis (PM Thinking)

The core issue was lack of real-time SLA visibility.

Key gaps:

  • SLA performance was calculated but not surfaced proactively
  • No immediate alert when a breach occurred
  • No prioritisation of issues based on business criticality
  • No simple signal indicating system health

6. Opportunity Identification

This was an opportunity to shift from:

Reactive incident handling → Proactive risk prevention

By detecting SLA breaches as soon as they occurred, RTB teams could:

  • Act earlier
  • Resolve issues before VaR timelines were impacted
  • Reduce operational stress and escalation cycles

7. Proposed Solution

Feature: SLA Breach Monitoring & Alerting System

A traffic-light–based alerting mechanism was introduced to monitor feed completion against SLAs and notify RTB teams immediately upon breach.

Key Capabilities

1. SLA Tracking

  • SLA defined per feed
  • Daily comparison of actual completion time vs SLA threshold

2. Traffic Light Status

  • 🟢 Green: Feed completed within SLA
  • 🟡 Amber: Approaching SLA threshold
  • 🔴 Red: SLA breached

3. Real-Time Notifications

  • Automated alerts sent to RTB teams via:
    • Email
    • Internal messaging channels (e.g., Themes)

4. Actionability

  • Alert clearly identified:
    • Affected feed
    • SLA breached
    • Potential impact on VaR timelines

8. Why This Was the Right Solution

  • Minimal disruption to existing systems
  • No change to VaR calculation logic
  • Focused on visibility and early intervention
  • Aligned with bank-wide operational risk principles

This was a low-risk, high-impact internal product enhancement.

9. Success Metrics

North Star Metric

On-Time VaR Delivery Rate

Input Metrics

  • Number of SLA breaches detected proactively
  • Average time between SLA breach and RTB action
  • Reduction in manual investigation time

Outcome Metrics

  • Reduction in VaR delays caused by upstream feeds
  • Reduction in downstream processing delays
  • Improved operational stability during EOD processing

Guardrail Metrics

  • False-positive alert rate
  • Alert fatigue for RTB teams

10. Impact & Results (Qualitative, Safe to Share)

  • RTB teams received early visibility into feed issues
  • Panic-driven escalations were significantly reduced
  • Issues were resolved earlier in the processing window
  • Downstream systems experienced fewer cascading delays
  • Overall confidence in VaR delivery improved

11. Risks & Trade-offs

  • Too many alerts → mitigated with threshold tuning
  • Alert fatigue → mitigated via traffic-light prioritisation
  • Dependency on SLA accuracy → SLAs reviewed and standardised

12. Final Impact Statement (Portfolio-Ready)

By introducing proactive SLA breach alerting with a traffic-light system, the VaR reporting pipeline shifted from reactive firefighting to early risk mitigation. This significantly improved operational resilience, reduced last-minute escalations, and helped ensure timely delivery of one of the bank’s most critical risk metrics.

Ready to build something impactful?

Leveraging technology and product strategy to solve real-world problems. Let's connect.

svg