Backup and Restore¶

This guide covers backup and restore procedures for the RotaCC system. It is intended for anyone responsible for keeping the system running -- whether that is a developer, a sysadmin, or an on-call responder.

Overview¶

The system provides two independent backup mechanisms:

	JSON (dumpdata)	pg_dump
What it captures	Application data from specific Django models	Complete PostgreSQL database
Format	Gzip-compressed JSON (`.json.gz`)	Gzip-compressed SQL (`.sql.gz`)
Selective restore	Yes -- pick individual models	No -- restores the entire database
Requires PostgreSQL	No	Yes
Recommended for	Quick snapshots, migrating specific data	Full disaster recovery

Use pg_dump for your primary backups. It is the safer, more complete option. JSON backups are useful when you need to move individual models between environments or inspect backup contents by hand.

Backups are stored under the backups/ directory at the project root. Every backup is tracked in the database via a BackupMetadata record (see Backup Metadata below).

Automatic Backups¶

Two Celery beat tasks run automatically:

Task	Schedule	What it does
`daily_pg_dump_backup`	Every day at 03:00 UTC	Creates a pg_dump backup
`cleanup_old_backups_task`	Every day at 04:00 UTC	Deletes old backups using tiered retention

These are configured in rota/settings/base.py under CELERY_BEAT_SCHEDULE.

Backup failure alerting¶

If the daily pg_dump fails, the task:

Retries up to 3 times with exponential backoff (~60 s, ~120 s, ~240 s).
Sends an alert email to all admins who have opted in to system failure notifications.
Logs the failure for investigation.

If you receive a backup failure email, check the Celery worker logs and the backups/ directory for disk space issues.

Automatic cleanup and retention policy¶

The cleanup task applies different rules depending on backup type:

pg_dump backups (tiered retention):

Tier	Retention
Daily	Keep all backups within the last 30 days
Weekly	Keep one backup per ISO week for 12 weeks
Monthly	Keep one backup per month for 12 months
Older than 12 months	Delete

JSON backups (simple retention):

Keep everything within the last 6 months; delete the rest.

Manual Backups¶

pg_dump backup (recommended)¶

Run the management command from the project root:

uv run python manage.py pg_dump_backup

Expected output:

Creating pg_dump backup...
pg_dump backup created successfully
  File: backup_2026-05-07_030000.sql.gz
  Path: /path/to/project/backups/backup_2026-05-07_030000.sql.gz
  Size: 1.24 MB

Add a description to help identify the backup later:

uv run python manage.py pg_dump_backup --description "Before clinician import"

Prerequisites:

The database engine must be PostgreSQL.
The pg_dump command must be available on the system (postgresql-client package).

JSON backup¶

For an application-level backup of Django model data:

uv run python manage.py backup_data

Expected output:

Creating backup...
Backup created successfully
  File: backup_20260507_143000.json.gz
  Path: /path/to/project/backups/backup_20260507_143000.json.gz
  Size: 256.00 KB

Common options:

# Add a description
uv run python manage.py backup_data --description "Pre-migration backup"

# Back up only specific models
uv run python manage.py backup_data --models config.Shift,config.LeaveRequest

# Exclude audit logs to save space
uv run python manage.py backup_data --no-include-audit

# Show per-model record counts
uv run python manage.py backup_data --verbose

# Copy the backup to a custom location
uv run python manage.py backup_data --output /tmp/rota-backup.json.gz

# Attribute the backup to a user
uv run python manage.py backup_data --user admin

Manual cleanup¶

Preview what would be deleted:

# Simple retention (default 6 months)
uv run python manage.py cleanup_old_backups --dry-run

# Tiered retention for pg_dump backups
uv run python manage.py cleanup_old_backups --tiered --dry-run

Actually delete old backups:

# Delete JSON backups older than 6 months
uv run python manage.py cleanup_old_backups

# Delete using tiered retention for pg_dump, plus 6-month JSON cleanup
uv run python manage.py cleanup_old_backups --tiered

# Custom retention period for JSON backups
uv run python manage.py cleanup_old_backups --retention-months 12

Restoring¶

The restore procedure depends on the backup type.

Restoring from a pg_dump backup¶

This replaces the entire database. Use this for full disaster recovery.

Step 1: Identify the backup.

Find the backup ID from the admin interface (see Admin Interface) or query the database:

uv run python manage.py shell -c "
from backup_restore.models import BackupMetadata
for b in BackupMetadata.objects.filter(backup_type='pg_dump').order_by('-timestamp')[:5]:
    print(f'{b.id}  {b.timestamp}  {b.filename}  {b.get_file_size_display()}')
"

Step 2: Run a dry run (optional but recommended).

There is no dry-run mode for pg_dump restore. Instead, verify the backup file exists and the metadata looks correct in the admin interface.

Step 3: Restore the backup.

You must provide the backup UUID and the actual database name as a safety check:

uv run python manage.py pg_restore_backup \
  --backup-id <uuid> \
  --database-name rota_db \
  --confirm

Expected output:

Creating safety backup before restore...
Safety backup created: backup_2026-05-07_143500.sql.gz
Enabling maintenance mode...
Maintenance mode enabled. Non-staff users will see 503 page.
Restoring from backup: backup_2026-05-07_030000.sql.gz
Database restored successfully!
Maintenance mode disabled.

The restore process automatically:

Creates a safety backup of the current database before overwriting anything.
Enables maintenance mode (non-staff users see a 503 page).
Restores the database from the backup file.
Disables maintenance mode (even if the restore fails).

Restoring from a JSON backup¶

This replaces application data for the models included in the backup.

Step 1: Validate the backup without restoring.

uv run python manage.py restore_data --input backups/backup_20260507_143000.json.gz --dry-run

Expected output:

Validating backup file...
Backup file is valid
  Version: 1.1
  Timestamp: 2026-05-07T14:30:00+00:00
  File size: 262144 bytes

Backup contents:
  - SystemConfiguration: 1 records
  - User: 5 records
  - Clinician: 12 records
  ...

[DRY RUN] No data was restored
To actually restore, run again with --confirm flag

Always run --dry-run first. It validates the file structure and shows you exactly what is in the backup.

Step 2: Perform the restore.

uv run python manage.py restore_data \
  --input backups/backup_20260507_143000.json.gz \
  --confirm

You can add --verbose for an interactive confirmation prompt and per-model output.

Important warnings:

The restore deletes all existing data for each model before inserting the backup data.
The entire restore runs inside an atomic transaction. If anything fails, all changes are rolled back.
Models are restored in dependency order to satisfy foreign key constraints.

Backup Metadata¶

Every backup is tracked in the BackupMetadata model (backup_restore.BackupMetadata). This is what each field records:

Field	Purpose
`id`	UUID primary key -- use this to reference a specific backup
`filename`	Name of the backup file on disk
`file_path`	Full filesystem path to the backup file
`timestamp`	When the backup was created
`description`	Optional human-readable description
`backup_type`	`json` or `pg_dump`
`models_included`	List of model names (JSON backups only; empty for pg_dump)
`record_counts`	Per-model record counts (JSON backups only; empty for pg_dump)
`file_size_bytes`	Size of the backup file on disk
`created_by`	The user who created the backup (null for automated backups)
`is_valid`	Whether the backup file passed validation
`created_at`	When the metadata record was created

Querying backup history¶

From a Django shell:

from backup_restore.models import BackupMetadata

# List the 10 most recent backups
for b in BackupMetadata.objects.order_by('-timestamp')[:10]:
    print(f"{b.timestamp}  {b.backup_type:7}  {b.get_file_size_display():>10}  {b.filename}")

# Count backups by type
from django.db.models import Count
BackupMetadata.objects.values('backup_type').annotate(count=Count('id'))

# Find backups created by a specific user
BackupMetadata.objects.filter(created_by__username='admin')

# Find backups older than 90 days
from django.utils import timezone
from datetime import timedelta
cutoff = timezone.now() - timedelta(days=90)
BackupMetadata.objects.filter(timestamp__lt=cutoff).count()

Admin Interface¶

The backup system provides a web interface in the Django admin for staff users.

Viewing backups¶

Navigate to Django Admin and look under Backup Restore for Backup Metadata. The list view shows:

Filename and timestamp
Models included (first 3 shown, with a count of remaining)
File size (human-readable)
Validity status
Who created the backup

You can filter by validity and date, and search by filename or description.

Downloading backups¶

From the backup list, each backup has a download action that serves the backup file directly from disk.

Creating a JSON backup¶

From the backup list page, the "Create Backup" button opens a form where you can:

Add a description
Select specific models to include (or leave empty for all models)

Uploading a backup¶

The "Upload Backup" button allows you to upload a backup file from another system. Accepted file types:

.json.gz -- JSON backups
.sql.gz -- pg_dump backups
.sql -- uncompressed SQL dumps

Maximum upload size is 100 MB (configurable via MAX_BACKUP_UPLOAD_SIZE in settings).

The system validates the file on upload and rejects anything that does not look like a valid backup.

Restoring a backup¶

Each backup in the list has restore actions:

JSON restore: Click "Restore", review the backup contents and validation details, type CONFIRM in the text field, and submit.

pg_dump restore: Click "PostgreSQL Restore" (only shown for pg_dump backups). You must type the actual database name and check a confirmation checkbox. The system creates a safety backup before proceeding.

Disaster Recovery¶

If something goes wrong with the database -- a failed migration, accidental data deletion, or corruption -- follow this procedure.

Step 1: Assess the situation¶

Determine the scope of the problem:

Is the application still running? Can users log in?
Is the database responding?
What changed recently? (Check the audit log if accessible.)

Step 2: Create a safety backup¶

Before doing anything else, take a backup of the current state even if it is damaged:

uv run python manage.py pg_dump_backup --description "Pre-recovery snapshot of current state"

If pg_dump fails because the database is in a bad state, try a JSON backup of whatever models are accessible:

uv run python manage.py backup_data --description "Emergency partial backup"

Step 3: Choose a backup to restore¶

List recent backups:

uv run python manage.py shell -c "
from backup_restore.models import BackupMetadata
for b in BackupMetadata.objects.filter(backup_type='pg_dump', is_valid=True).order_by('-timestamp')[:10]:
    print(f'{b.id}  {b.timestamp}  {b.filename}  {b.get_file_size_display()}')
"

Pick the most recent valid backup from before the problem occurred.

Step 4: Restore¶

For a full restore, use the pg_dump procedure:

uv run python manage.py pg_restore_backup \
  --backup-id <uuid> \
  --database-name rota_db \
  --confirm

For a partial restore (specific models only), use the JSON procedure with --dry-run first:

uv run python manage.py restore_data \
  --input backups/<filename>.json.gz \
  --dry-run

uv run python manage.py restore_data \
  --input backups/<filename>.json.gz \
  --confirm --verbose

Step 5: Verify¶

After restoring, check the following:

Can you log in? Open the site and log in with an admin account.
Are clinicians present? Check the clinician list in the admin.
Are recent shifts visible? Look at the rota for the current period.
Is Celery running? Check that the Celery worker and beat processes are healthy.
Run a validation query:

uv run python manage.py shell -c "
from config.models import Clinician, Shift
from django.contrib.auth import get_user_model
User = get_user_model()
print(f'Users: {User.objects.count()}')
print(f'Clinicians: {Clinician.objects.count()}')
print(f'Shifts: {Shift.objects.count()}')
"

Compare these counts against the record_counts stored in the backup metadata.

Step 6: Document¶

Record what happened, which backup was restored, and the timestamp of both the incident and the recovery. The restore process creates an audit log entry automatically, but a human note in your incident log is also valuable.

Troubleshooting¶

Backup creation fails with "pg_dump command not found"¶

Install the PostgreSQL client tools:

# Ubuntu / Debian
sudo apt-get install postgresql-client

# CentOS / RHEL
sudo yum install postgresql

# Docker -- add to your Dockerfile
RUN apt-get update && apt-get install -y postgresql-client

Backup creation fails silently (Celery task)¶

Check:

Celery worker logs: look for the task name tasks.backup_tasks.daily_pg_dump_backup.
Disk space on the backups/ directory: df -h /path/to/project/backups/.
File permissions: the Celery worker process needs write access to backups/.
Database connectivity: can the worker reach PostgreSQL?

Restore fails with a validation error¶

The backup file may be corrupted, or the data model may have changed since the backup was created. Try:

Use --dry-run to see if the file parses at all.
Check whether any new migrations have been applied since the backup was taken. You may need to roll back migrations to match the backup's schema.
Use a more recent backup that matches the current schema.

"Database name mismatch" during pg_restore¶

The --database-name argument must exactly match the NAME value in DATABASES['default'] in your settings. Check your current database name:

uv run python manage.py shell -c "
from django.conf import settings
print(settings.DATABASES['default']['NAME'])
"

Backup file is missing from disk¶

The BackupMetadata record exists but the file has been deleted or moved. You can clean up orphaned metadata by deleting the record from the admin interface.

Passwords after restore¶

User passwords are stored in backups as Django password hashes (e.g., pbkdf2_sha256$...). After restoring, users can log in with their original passwords. Passwords are never stored in plaintext.