Disaster Recovery: We’ve Done Tier Zero, How Do We Start Tier One?

Posted April 25, 2024 by Kevin Finch

Kevin Finch

Kevin works with clients to help them establish and improve their Disaster Recovery and Business Continuity programs, and also to help them transform their companies into becoming more agile and resilient. Prior to joining Sayers, Kevin has established and led Business Continuity Program Offices at several Fortune 100 companies, working in Financial Services, Manufacturing, Logistics, and Insurance.

Tags:

I have talked to a lot of companies these days who have made the decision to take their Disaster Recovery (DR) program seriously. Maybe they haven’t done a full-blown Business Impact Analysis (BIA), but they do have a good idea of what the most critical parts of their infrastructure are. Many of these companies have put the resources into making sure their most critical pieces of infrastructure (networking, storage, virtual server environment, desktop environment, Active Directory, etc.) have been protected with some level of redundancy and restorability. That’s usually the set of systems with the most dependencies, the lowest tolerance for downtime, and most businesses know which ones those are without having to do a BIA. Most people call that their “Tier Zero” environment, which makes sense because it’s the tier of infrastructure that all the other tiers are built on.

From a resiliency point of view, Tier Zero is the best place to start for a lot of companies. Protecting that base infrastructure is intuitive, and it’s where you can get the most bang for the buck for the resources you have when you’re starting to make your business more resilient. Also, Tier Zero infrastructure is usually the easiest thing to get into your budget, because it’s easy to justify protecting it.

“Don’t be so concerned about the size of your next step…. The direction is what matters!”
Joseph “DJ Run” Simmons

But it’s really the next step where lots of businesses pause or lose their direction. When companies expand the scope of their DR program beyond those dozen or so “most critical” Tier Zero systems, and start to look at several dozen “very critical” Tier One systems, they’re not sure where to begin. Tier Zero systems are generally owned by a single team, so it’s easy to isolate your DR efforts. Tier One systems, on the other hand, can be both technically and politically complex for a lot of companies because they often come with lots of dependencies and stakeholders. That can be intimidating. Plus, how do we decide which systems are the most important? Which ones have the lowest tolerance for downtime, and which ones have the least tolerance for lost data? Do we have contractual obligations to keep some of these applications running? Which ones will hurt our reputation, cost us money, or make us lose customers if they go down? There’s a lot of questions to ask, and the answers aren’t always clear cut. I completely understand why a lot of companies can “lose their way” at this point in building up their program.

That lack of direction is why it’s important to knuckle down and finally finish your BIA. When done correctly, that BIA can let you see the parts of your business that are most impacted by those Tier One applications. It will let you see which ones you are most dependent on and give you some realistic figures in calculating your tolerance for system downtime and data loss. A good BIA can even help you prioritize your IT spending to make the most of the resources you allocate to your DR program (a topic I’m presenting at a conference in May of 2024).

Now, I understand a traditional BIA can be big and scary, and can require a tremendous effort. Best practices say you should look at every aspect of your business, and every application in your environment, and fully enumerate all of those interdependencies. I’m not going to say to go against best practices, but I am going to say that many businesses can see a lot of value in doing a fraction of the work.

If I wanted to protect my business better after those Tier Zero systems and applications were protected, I would take a hard look at what parts of the company had the absolute least tolerance for a business interruption. (We can list out later what the reasons for their low tolerance might be. Those reasons will vary by industry, market conditions, contractual obligations, and a number of other factors unique to our business.) Once we’ve designated those top 3-5 critical business processes, we can list out the critical systems that those business processes use. Then, we prioritize those critical systems as the first ones to protect as Tier One systems for recovery.

Taking things a step further, I would look at those systems we just identified as Tier One and perform a quick Gap Analysis on their recoverability. What I mean by that is, I would list them all out and then verify that the recovery infrastructure was in place to bring those systems back online if the business needs them. We probably haven’t done complete recovery testing yet on these systems, so we might find ourselves having to make a “best guess” on recoverability, but hopefully that best guess will serve us for the time being. Now we have an idea of what our Tier One systems are, and when they can be recovered.

“When the next step is unclear, the best way to figure it out is to take action. Constant motion is the key to execution.”
Scott Belsky, American Entrepreneur

I’m not saying that this is absolutely the best approach, and frankly, it doesn’t follow best practices to the letter. However, with just a short exercise, you can filter down your list of Tier One systems from (probably) over 100 to a couple dozen, and assess if you’ve got any chance of recovering those systems within a useful timeframe. With that information in hand, you can start to make realistic decisions about allocating additional resources to protecting those systems and applications. You will probably also find that many systems and applications you designate as Tier One for one business unit are widely used elsewhere in the company. Increasing the recoverability of those systems maximizes the value of your efforts, because you’re protecting systems that are used by many areas of your business.

If you can’t perform a full BIA, and you aren’t sure what steps to take after you’ve protected all your Tier Zero systems, an abbreviated analysis like this can be just the thing to get the ball rolling again. It can be a quick, imperfect solution to a very difficult problem. Still not sure what to do to make the most impact? Stuck in “analysis paralysis” with application dependency mapping? Sayers is here to help. Our Business Resiliency team has decades of experience in all aspects of building and maturing programs just like yours, including unraveling tricky business application dependencies. We’re here to help you make the most of your team’s time and effort.

Disaster Recovery: We’ve Done Tier Zero, How Do We Start Tier One?

Kevin Finch

Addresses

Have a Question?