Skip to content

Runbook: Performance Degradation

Last Updated: 2026-02-21 Severity: Medium Estimated TTR: 1 hour Owner: Development Team

Symptoms

  • Rota generation takes longer than expected
  • API response times increased
  • Users report slow page loads
  • High CPU or memory usage on server

Detection

  • Alert: SlowCalculationPerformance (p95 > 5s)
  • Dashboard: Application Performance Dashboard
  • Query: Check APM tools (Datadog/NewRelic) for increased latency

Diagnosis Steps

  1. Check current response times:

    # Check recent request logs
    docker compose logs web -f --tail=100 | grep "calculation"
    
    # Use Django Debug Toolbar or APM to identify slow queries
    docker compose exec web python manage.py shell
    >>> from django.db import connection
    >>> from django.db import reset_queries
    >>> from calculations.date_utils import get_bank_holidays_in_year
    >>> reset_queries()
    >>> get_bank_holidays_in_year(2024)
    >>> len(connection.queries)
    
    Expected: < 20 queries per calculation, p95 < 5s If different: May have N+1 query issues or missing indexes

  2. Check server resource usage:

    # CPU and memory
    docker stats --no-stream
    
    # Database connections
    docker compose exec postgres psql -U rota rota_cc -c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"
    
    Expected: CPU < 70%, Memory < 80%, < 100 DB connections If different: Resource exhaustion may be causing slowdowns

  3. Run performance benchmarks:

    # Run benchmarks to compare against baseline
    docker compose exec web pytest calculations/tests/benchmarks/ --benchmark-only --benchmark-compare=<baseline_file>
    
    Expected: Within 10% of baseline If different: Performance regression detected

  4. Check for long-running queries:

    # PostgreSQL slow query log
    docker compose logs postgres -f --tail=100
    
    # Or check pg_stat_statements
    docker compose exec postgres psql -U rota rota_cc -c "SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"
    
    Expected: No queries taking > 1s If different: May need query optimization or indexing

Root Causes

Cause Likelihood How to Confirm
Large date range calculations High Check if users requesting 3+ year ranges
High clinician count High Check if calculations for 100+ clinicians
Missing database indexes Medium Check query execution plans
N+1 query problems High Review code for loops with queries
Cache not working Medium Check cache hit rates

Resolution Steps

For Large Date Range Calculations

  1. Check current cache configuration:

    docker compose exec redis redis-cli INFO stats | grep keyspace
    

  2. Implement or optimize result caching:

    # Ensure calculations use caching
    from django.core.cache import cache
    
    def calculate_with_cache(clinician_id, start_date, end_date):
        cache_key = f"calc_{clinician_id}_{start_date}_{end_date}"
        result = cache.get(cache_key)
        if not result:
            result = perform_calculation(clinician_id, start_date, end_date)
            cache.set(cache_key, result, timeout=3600)
        return result
    

  3. Verify: Response time improved to acceptable levels

For High Clinician Count

  1. Implement batch processing:

    # Process clinicians in batches
    from itertools import islice
    
    def batch_process_clinicians(clinicians, batch_size=25):
        clinicians = list(clinicians)
        for i in range(0, len(clinicians), batch_size):
            batch = clinicians[i:i + batch_size]
            process_batch(batch)
    

  2. Use Celery for async processing:

    # Offload large calculations to background tasks
    from tasks.celery_app import app
    
    @app.task
    def calculate_clinicians_async(clinician_ids, start_date, end_date):
        for clinician_id in clinician_ids:
            calculate_for_clinician(clinician_id, start_date, end_date)
    

  3. Verify: Large calculations complete within acceptable time

For Missing Database Indexes

  1. Identify slow queries and add indexes:

    # Create migration
    docker compose exec web python manage.py makemigrations --empty your_app
    
    # In the migration file:
    from django.db import migrations, models
    
    class Migration(migrations.Migration):
        dependencies = [('your_app', 'previous_migration')]
    
        operations = [
            migrations.RunSQL(
                "CREATE INDEX CONCURRENTLY idx_clinician_shift_dates ON config_shift(clinician_id, date);"
            ),
        ]
    

  2. Verify: Query times improved

For N+1 Query Problems

  1. Identify problematic code:

    # Look for patterns like:
    for clinician in clinicians:
        shifts = clinician.shift_set.all()  # N+1!
    

  2. Fix with select_related/prefetch_related:

    # Use prefetch_related for reverse FK
    clinicians = Clinician.objects.prefetch_related('shift_set').all()
    
    # Or use bulk queries
    shifts = Shift.objects.filter(clinician__in=clinicians)
    

  3. Verify: Reduced query count

Verification

After applying fix, verify: - [ ] p95 response time < 5s - [ ] No performance regression in benchmarks - [ ] CPU and memory usage normal - [ ] User complaints stopped

Prevention

  • Set up continuous benchmarking in CI/CD
  • Implement query review in code review process
  • Use Django Debug Toolbar in development
  • Regular performance testing before releases
  • Monitor and alert on query performance metrics

Escalation

  • If unresolved after 2 hours, escalate to: Tech Lead
  • On-call contact: See on-call roster
  • Related runbooks: calculation_failures.md, celery_queue_issues.md
  • Related alerts: SlowCalculationPerformance, HighDatabaseQueryDuration