Skip to content

Runbook: Data Inconsistency

Last Updated: 2026-02-21 Severity: High Estimated TTR: 2 hours Owner: Development Team

Symptoms

  • Calculated values don't match expected results
  • Shift counts don't add up correctly
  • Deficiency values are incorrect
  • Leave balances are wrong

Detection

  • Alert: Manual report or user complaint
  • Dashboard: Data Quality Dashboard (if configured)
  • Query: Run data validation scripts

Diagnosis Steps

  1. Identify specific inconsistent data:

    # Check shift counts for specific clinician
    docker compose exec web python manage.py shell
    >>> from config.models import Clinician, Shift
    >>> from calculations.shift_counter import ShiftCounter
    >>> clinician = Clinician.objects.get(id=<id>)
    >>> counter = ShiftCounter()
    >>> counter.count_worked_shifts(clinician, date(2024,1,1), date(2024,1,31))
    

  2. Verify source data:

    # Check shift records in database
    docker compose exec web python manage.py shell
    >>> shifts = Shift.objects.filter(clinician_id=<id>, date__gte='2024-01-01', date__lte='2024-01-31')
    >>> for s in shifts:
    ...     print(s.date, s.type, s.duration, s.status)
    
    Expected: Shifts match expected schedule If different: Source data may be incorrect

  3. Check for race conditions:

    # Check audit logs for concurrent modifications
    docker compose exec web python manage.py shell
    >>> from config.models import AuditLog
    >>> logs = AuditLog.objects.filter(
    ...     entity_type='Shift',
    ...     action__in=['CREATE', 'UPDATE']
    ... ).order_by('-timestamp')[:50]
    
    Expected: Sequential modifications If different: May indicate race conditions

  4. Verify calculation logic:

    # Run calculation with debug logging
    docker compose exec web python manage.py shell
    >>> import logging
    >>> logging.basicConfig(level=logging.DEBUG)
    >>> # Re-run calculation
    
    Expected: Calculation follows expected logic If different: Logic bug may exist

Root Causes

Cause Likelihood How to Confirm
Incomplete migration High Check if all migrations applied
Race condition in concurrent updates Medium Check audit logs for timing
Cache invalidation issue Low Clear cache and recheck
Business rule misunderstanding High Verify expected vs actual calculation logic
Database constraint bypass Low Check data integrity constraints

Resolution Steps

For Incomplete Migration

  1. Check migration status:

    docker compose exec web python manage.py showmigrations
    

  2. Apply any missing migrations:

    # Create backup first
    docker compose exec web python manage.py dumpdata > backup_before_migration.json
    
    # Apply migrations
    docker compose exec web python manage.py migrate
    
    # Verify data integrity
    docker compose exec web python manage.py check
    

  3. Verify: Data consistency restored

For Race Conditions

  1. Identify affected records:

    # Find records modified within same second
    docker compose exec web python manage.py shell
    >>> from config.models import AuditLog
    >>> from django.db.models import Count
    >>> duplicates = AuditLog.objects.values('entity_id', 'timestamp')\
    ...     .annotate(count=Count('id'))\
    ...     .filter(count__gt=1)
    

  2. Implement locking for critical operations:

    # Use select_for_update() to prevent race conditions
    from django.db import transaction
    
    with transaction.atomic():
        clinician = Clinician.objects.select_for_update().get(id=<id>)
        # Perform calculation and update
        clinician.save()
    

  3. Verify: No new race conditions detected

For Cache Issues

  1. Clear all caches:

    docker compose exec redis redis-cli FLUSHALL
    

  2. Restart services to clear in-memory cache:

    docker compose restart web
    docker compose restart celery_worker
    

  3. Recalculate affected data:

    # Trigger recalculation for affected period
    docker compose exec web python manage.py shell
    >>> from calculations.tasks import recalculate_clinician
    >>> recalculate_clinician(clinician_id=<id>, start_date=..., end_date=...)
    

  4. Verify: Data now consistent

For Business Rule Issues

  1. Document expected behavior:

    # Create test case for expected behavior
    def test_specific_business_rule():
        clinician = create_test_clinician()
        result = calculate_deficiency(clinician, start, end)
        assert result == expected_value
    

  2. Update calculation logic if needed:

    # Fix calculation to match business rules
    def calculate_deficiency(clinician, start_date, end_date, as_of_date):
        # Updated logic here
        pass
    

  3. Verify: All tests pass

Verification

After applying fix, verify: - [ ] Inconsistent data corrected - [ ] Validation scripts pass - [ ] No new inconsistencies reported - [ ] Audit logs show expected behavior

Prevention

  • Add database constraints for critical data
  • Implement proper transaction isolation
  • Add validation checks before data commits
  • Use locking for critical operations
  • Regular data integrity audits
  • Comprehensive test coverage for business rules

Escalation

  • If unresolved after 4 hours, escalate to: Tech Lead
  • If data loss possible, escalate immediately: Senior Developer
  • On-call contact: See on-call roster
  • Related runbooks: calculation_failures.md, performance_degradation.md
  • Related migrations: Check migration log for recent schema changes