The Database Backup Checklist

For a SaaS product, your users data and your intellectual property are your most valuable resources. In most cases, your code is already hosted on an external platform like Github and distributed on every of your developers' machines thanks to Git. But what about your database?

You most likely already have a database backup script in place or are using the automated backups of AWS or another cloud provider. This is a good start but it's still risky and it's likely you'll lose data one day.

It doesn't happen just to others, a lot of big and respected companies already lost some of their customers' data. Backups aren't what you spend time on until it's too late. For some insights, you can look at GitLab Postmortem of their database outage.

Here's a checklist of all required features your database backup strategy needs to implement to be sure you'll never lose your data. Of course, DBacked implements all of them.

  • Cron job are not fail-proof. It's easy to forget to configure them again after a server migration. An external service should make sure your cron job has started and completed. You can use a service like Healthchecks.io or roll your own.
    DBacked checks that a backup was started within the configured frequency and sends you an email if it's late.
  • If you use a tool like pg_dump, mysqldump or mongodump to create a dump of your database, you should make sure the command exited without an error. If running in a cron, you'll not be notified if the command fails. There are multiple reason why the program can fail like not having enough space available on your server to store the backup or changing the credentials in your database and forgeting to update them in your script. You can integrate this check in your backup script and send an email if an error is detected.
    DBacked checks for error while executing the dumper program. If one is detected, the error output of the dumper is saved and the system sends you an email.
  • An automated system needs to check what you backed up. You could be backing up an empty staging database because you forgot to change the credentials in your script. A simple way to do this is to implement minimum size for your backup and check that there isn't a huge size difference between two backups. You should also restore a backup once in a while to check manualy its content.
    DBacked implements these checks and sends you an alert if a potential problem is detected. Every 3 months, we'll send you a reminder to check your last backup content.
  • You will provide your database credentials to the tool you'll use to backup your database so you should be sure no backdoor has been added to the code. You need to build your own script or only execute open-source code that has been verified by the community and is actively maintained.
    DBacked agent code (what you'll execute on your server) is open-source and available on GitHub.
  • All your cold storage should be encrypted with a secure algorithm and strong key before leaving your server. Even if you use HTTPS to upload to S3 for example, not encrypting your files means someone with access to your AWS account can read your data. Under the GDPR, if your customer data is leaked and it came from an unencrypted cold storage, you could be charged for not securing it enough.
    DBacked uses RSA and AES to asymmetrically encrypt your backup before sending it to the storage servers. The implementation is taken from NodeJS, has been audited multiple times and is documented in the agent README.
  • It's easy to store your backups on your database server but it means that if the server dies (SSD and hard drives fail a lot), you'll lose everything at once. Your backup storage should also be redundant in multiple geographic zones. In case of a natural disaster, if your database server and backup server are in the same datacenter, you'll lose all your data.
    DBacked uses Amazon S3 which guarantees a 99,999999999% durability, enough for your backups. It also replicate your files in multiple data centers.
  • Not every protocol implement a way to detect corruption during the transfer. For example, FTP doesn't provides a way to confirm the integrity of a file (SFTP does). If a backup is corrupted, you won't notice until you need to restore it. You should compute a checksum (MD5 is fine) of your file before uploading and then compare it to the checksum of the uploaded file.
    DBacked sends your file to S3 in chunks and use the Content-MD5 to ask S3 to check the MD5 checksum of the chunk. If an error is detected, the chunk is discarded and uploaded again.
  • It shouldn't be possible to delete your backups by accident. That means that any deletion operation can be reverted in a long-enough time frame (2 days minimum) and you should be alerted when it happens.
    The only way to delete a backup in DBacked is if it's outside of the retention period. You can accidentaly change the retention period to a very small value but you'll receive an alert by email and your backups will only be deleted after two days. During this time, if you increase the retention period, they will not be deleted.
  • By definition, a backup should be restorable but to prevent long downtime of your service, it should be fast to restore it. It should also be easy to restore it so that you don't depend on a single person (who can be unavailable) in your team to do it.
    DBacked restore process is packaged in the agent, you only need to execute the dbacked restore --last-backup command to start the restore process. The backup is then downloaded and streamed to your database. In most cases, it will max out your server internet connection.
  • Most database systems, including PostgreSQL, MySQL and MongoDB recommends that you stop the database before making a copy of the database files. The reason is that the files can be modified by a database write while you are copying them, resulting in a corrupted backup. You should use a program like pg_dump, mysqldump or mongodump or a frozen snapshot of your filesystem if it supports it.
    DBacked uses pg_dump, mysqldump or mongodump depending on your database type to make sure no inconsistent state can be encountered.
  • Amazon RDS provides an automated backup system but it's limited to a retention of 35 days and a frequency of one per day. The automated backups will be deleted when the database is deleted (which someone can do by accident). Also, the backups are encrypted with a key you don't own. If you want to only use the backups provided by RDS, you should create a script that will manualy backup your RDS instance and transfer the ownership of these backups to another AWS account that will only be used for this usage. This will protect you from an attacker getting access to your AWS account.
    DBacked can be used in association with the AWS RDS backup system to be sure your data is always backed up and no accidental deletion can happens. You can install it easily on a small EC2 server.

    Creating a backup strategy that implements all these points can be long and costly. Some open-source projects can help you with this but still requires that you monitor them.

    DBacked provides a hosted database backup solution for PostgreSQL, MySQL and MongoDB that covers every point above and is configured in 5 minutes. Why wait?

    Backup before it's too late

    Register and install DBacked on your server in 5 minutes and be sure you'll never lose your clients data.

    Open-source plan on Github. 30-day trial, no card needed