Introduction DR & DRP 📝
Today, we will be discussing our data redundancy strategy and disaster recovery plan to handle small to medium-sized enterprise applications, which are typically monolithic apps running on a single server. The goal of a disaster recovery plan is to mitigate major issues that break the app and render it unusable. The plan should recover the application as quickly as possible so that users can continue to use the app as usual.
For data redundancy, it will help achieve the goal of the recovery plan. Without data redundancy, we cannot achieve the goal of the recovery plan. In simple terms, DR is a data backup in any form.
Data Redundancy 🗂
According to techopedia, data redundancy means ”a condition created within a database or data storage technology in which the same piece of data is held in two separate places.”. Data redundancy can occur accidentally or deliberately for backup and recovery purposes. In this blog, we will be discussing DR, which is deliberately created for backup and recovery purposes.
Usually, when there is a larger budget available, we can spin up another server that clones the main database server and periodically syncs that data to the backup server. But for the sake of cost efficiency, we tried to approach it with another DR strategy.
Here’s our simple and cost-effective DR strategy:
- Set up a simple storage service (AWS S3, Google Cloud Storage, etc.)
- Periodically dumping main database server
- Send those dumps to your chosen storage service provider
- Done 😁
This may not be the best approach for creating a data redundancy/backups strategy, but it balances out the main purpose of the DR itself with cost efficiency. Setting up a simple storage service is way cheaper than spinning up another database server.
Although it may seem simple, we still need some automated storage management for the backups. For that purpose, we need to implement a strategy that can clean unused older backups. At @elanode, we always try to balance our clients’ needs, meet industry best practices, and consider costs, so this DR is a de facto strategy to achieve reliable backup while balancing client’s budget and needs.
Disaster Recovery Plan 📖
The recovery plan may be a bit more complex than the DR strategy above. The DRP consists of some manual steps, pre-configured server base image, automated scripts, application images, containers, etc. We try to make our application setup as consistent as possible to reduce inconsistencies between each production deployment. Note that this plan is meant for SME’s apps with low to medium traffic and can tolerate application downtime to a certain extent.
At the higher level terms with the estimated time, here’s usually our plan to approach disaster recovery for a single server monolithic full-stack application:
- Decide whether the server is unrecoverable in a short amount of time (~3 mins.)
- Spin up a new server with our base image & automated setup scripts (~10 mins.)
- At the same time, gather the latest data backups
- Redirect the DNS to the new server (fast 300seconds TTL records)
- Install/redirect deployment pipelines for the app image & containers to the new server (~5 mins.)
- Restore data backups by automated scripts (~5 mins.)
- Test the new deployed application (~5 mins.)
Of course, this is not the fastest or most sophisticated plan. However, in reality, most applications can tolerate some downtime. Based on our experience, this should be sufficient since the plan is meant to be the last resort for resolving major application issues.
Considering client needs and budget, we must take multiple factors into account, such as cost-effectiveness. By far, this is our best approach overall. If there is a larger budget available, we can try other options such as periodically taking server snapshots, setting up a disaster recovery center with backup servers, and load balancing to the backup servers. However, these options are for bigger and higher traffic apps, and that’s a different story!