Trust
Business Continuity & Disaster Recovery
This policy describes how Bella keeps the platform running through provider outages, software failures, and operator error, and how customer data is restored if it is corrupted, deleted, or rendered unavailable.
1. Targets
Bella targets a Recovery Time Objective (RTO) of 4 business hours for core application availability following a material service disruption, and a Recovery Point Objective (RPO) of 24 hours for database-backed customer data based on current backup cadence. These are operational targets, not contractual SLA guarantees. Bella documents restoration procedures and will periodically test restoration as part of its SOC 2 readiness program. Where a third-party provider outage affects availability, recovery depends in part on that provider's restoration timeline.
2. Backup strategy
- Database — PostgreSQL is backed up daily. Backups are encrypted at rest with provider-managed keys.
- Retention — backups are retained per the Data Retention & Disposal Policy: in-window backups remain available for restore; aged backups are deleted on schedule.
- Object storage — uploaded files and generated artifacts are stored on durable object storage with provider-managed redundancy.
- Configuration — deployment configuration, environment variables, and infrastructure definitions are version-controlled and reproducible. Production state is documented enough to rebuild from scratch using the version-controlled artifacts.
- Source code — production code is hosted in a managed repository with redundant storage and a documented restore path.
3. Restoration testing
Bella commits to at least one full restore test per calendar year as part of the SOC 2 readiness program: restoring a recent backup into an isolated environment and verifying schema integrity and a representative sample of customer data. The first scheduled restore test is targeted within the SOC 2 readiness window; the methodology and result will be documented and fed back into this policy.
4. Failure classes and responses
- Single host failure — the supervised process manager restarts affected services automatically. Most application failures recover within minutes without intervention.
- Database corruption — restore from the most recent clean backup; replay only verified-clean changes since the backup. Affected tenants are notified per the Incident Response Policy.
- Provider outage (compute) — Bella's primary provider offers redundant zones; the platform is engineered to tolerate single-zone failures. A regional or provider-wide outage is handled per the provider's stated incident process; Bella posts customer-impacting notifications on the status page once identified. A fully-automated real-time status page is on the operational roadmap.
- Provider outage (third-party subprocessor) — an outage at Cloudflare affects all public ingress to Bella because Cloudflare fronts every public endpoint; in that case, recovery depends on Cloudflare's stated incident timeline. Outages at integration providers (Intuit, Plaid, Twilio, Telnyx, Stripe, OpenAI, Anthropic) degrade only the features that depend on the affected provider; core application functions and customer data access continue to operate for users already inside an authenticated session. Each integration's failure mode is documented in code (e.g., Plaid sync degrades to manual upload, Twilio degrades to delayed SMS).
- Catastrophic data loss — restore from the latest verified backup. Where post-backup data is recoverable from audit logs or upstream sources, replay applies. Customers are notified per the Incident Response Policy.
5. Continuity of personnel
Operational responsibility for Bella's platform is held by the platform engineering lead, with documented runbooks accessible to a small named set of engineers. In the event of unavailability of the policy owner, the named secondary engineer carries operational decisions until the primary returns. Critical credentials are recoverable via a documented procedure that does not require the policy owner's presence.
6. Communication during disruption
- Material disruption is posted on the public status page as soon as it is identified.
- Customer-facing email notifications go out for incidents that materially affect their use of the platform, with updates at meaningful milestones.
- Subprocessor-driven outages are clearly attributed when known, with our remediation timeline alongside the provider's.
7. Roles and responsibilities
- Policy owner — Bella platform engineering lead.
- Annual review — this policy and the RTO / RPO targets are reviewed at least once per calendar year.
- Exercise — at least one BC/DR exercise per year, paired with the incident response tabletop.