Runbook: Performance Degradation¶
Last Updated: 2026-02-21 Severity: Medium Estimated TTR: 1 hour Owner: Development Team
Symptoms¶
- Rota generation takes longer than expected
- API response times increased
- Users report slow page loads
- High CPU or memory usage on server
Detection¶
- Alert:
SlowCalculationPerformance(p95 > 5s) - Dashboard: Application Performance Dashboard
- Query: Check APM tools (Datadog/NewRelic) for increased latency
Diagnosis Steps¶
-
Check current response times:
Expected: < 20 queries per calculation, p95 < 5s If different: May have N+1 query issues or missing indexes# Check recent request logs docker compose logs web -f --tail=100 | grep "calculation" # Use Django Debug Toolbar or APM to identify slow queries docker compose exec web python manage.py shell >>> from django.db import connection >>> from django.db import reset_queries >>> from calculations.date_utils import get_bank_holidays_in_year >>> reset_queries() >>> get_bank_holidays_in_year(2024) >>> len(connection.queries) -
Check server resource usage:
Expected: CPU < 70%, Memory < 80%, < 100 DB connections If different: Resource exhaustion may be causing slowdowns -
Run performance benchmarks:
Expected: Within 10% of baseline If different: Performance regression detected -
Check for long-running queries:
Expected: No queries taking > 1s If different: May need query optimization or indexing
Root Causes¶
| Cause | Likelihood | How to Confirm |
|---|---|---|
| Large date range calculations | High | Check if users requesting 3+ year ranges |
| High clinician count | High | Check if calculations for 100+ clinicians |
| Missing database indexes | Medium | Check query execution plans |
| N+1 query problems | High | Review code for loops with queries |
| Cache not working | Medium | Check cache hit rates |
Resolution Steps¶
For Large Date Range Calculations¶
-
Check current cache configuration:
-
Implement or optimize result caching:
# Ensure calculations use caching from django.core.cache import cache def calculate_with_cache(clinician_id, start_date, end_date): cache_key = f"calc_{clinician_id}_{start_date}_{end_date}" result = cache.get(cache_key) if not result: result = perform_calculation(clinician_id, start_date, end_date) cache.set(cache_key, result, timeout=3600) return result -
Verify: Response time improved to acceptable levels
For High Clinician Count¶
-
Implement batch processing:
-
Use Celery for async processing:
-
Verify: Large calculations complete within acceptable time
For Missing Database Indexes¶
-
Identify slow queries and add indexes:
# Create migration docker compose exec web python manage.py makemigrations --empty your_app # In the migration file: from django.db import migrations, models class Migration(migrations.Migration): dependencies = [('your_app', 'previous_migration')] operations = [ migrations.RunSQL( "CREATE INDEX CONCURRENTLY idx_clinician_shift_dates ON config_shift(clinician_id, date);" ), ] -
Verify: Query times improved
For N+1 Query Problems¶
-
Identify problematic code:
-
Fix with select_related/prefetch_related:
-
Verify: Reduced query count
Verification¶
After applying fix, verify: - [ ] p95 response time < 5s - [ ] No performance regression in benchmarks - [ ] CPU and memory usage normal - [ ] User complaints stopped
Prevention¶
- Set up continuous benchmarking in CI/CD
- Implement query review in code review process
- Use Django Debug Toolbar in development
- Regular performance testing before releases
- Monitor and alert on query performance metrics
Escalation¶
- If unresolved after 2 hours, escalate to: Tech Lead
- On-call contact: See on-call roster
Related Issues¶
- Related runbooks:
calculation_failures.md,celery_queue_issues.md - Related alerts:
SlowCalculationPerformance,HighDatabaseQueryDuration