Disaster Recovery Planning

Conclusion: 

The need to have a disaster recovery (DR) plan that is understood, agreed, and jointly owned by all elements of the organisation is essential in preparing for a disaster event. An effective DR plan will focus on managing the risk associated with completing a successful restoration and recovery in a time, and to a level of effectiveness, acceptable to business.

To ensure the plan is effective at mitigating the risks associated with completion of restoration and resumption of services after a disaster event; the DR plan must also clearly identify how the plan is to be verified and therefore reduce the risk of not completing a successful disaster recovery.

The key focus of the DR plan must always be about the restoring delivery of business functions. The technical delivery may be from ICT services on-premise, outsourced providers, or Cloud. Regardless of technical delivery to business, the impact of an ICT disaster event needs a verified plan!

Conclusion

With the growth of dependence on ICT for business to perform effectively, many organisations have increased risk associated with the ability of ICT to provide service continuity. ICT downtime means business is negatively impacted. Many organisations believe the DRP is a problem that is ICTs to solve. Whilst ICT will lead the planning and do a lot of the heavy lifting when a disaster occurs, it can only be successful with the assistance and collaboration of its business partners. It will be the business that sets the priorities for restoration and accepts the risk.

Both business and ICT need to be comfortable that the disaster recovery (DR) plan has been verified to ensure a reasonable expectation that recovery will be successful.

The Latest

18 March 2021: Veeam released a report which suggests that 58% of backups fail. After validating these claims, and from the direct experiences of our advisors who have been CIOs or infrastructure managers in previous years, IBRS accepts there is merit in Veeam’s claim.

The real question is, what to do about it, other than buying into Veeam’s sales pitch that its backups give greater reliability?

Why it’s Important

Sophisticated ransomware attacks are on the rise. So much so that IBRS issued a special alert on the increasing risks in late March 2021. Such ransomware attacks specifically target backup repositories. This means creating disconnected, or highly-protected backups is more important than ever. The only guarantee for recovery from ransomware is a combination of well-structured backups, coupled with a well-rehearsed cyber incident response plan. 

However, protecting the backups is only useful if those backups can be recovered. IBRS estimates around 10-12% of backups fail to fully recover, which is measuring a slightly different, but more important situation than touted by Veeam. Even so, this failure rate is still far too high, given heightened risk from financially-motivated ransomware attacks.

Who’s impacted

  • CIO
  • Risk Officers reporting to the board
  • CISCO
  • Infrastructure leads

What’s Next?

IBRS has identified the ‘better-practice’ from backup must include regular and unannounced, practice runs to recover critical systems from backups. These tests should be run to simulate as closely as possible to events that could lead to a recovery situation: critical system failures, malicious insider and ransomware. Just as organisations need to rehearse cyber incident responses, they also need to thoroughly test their recovery regime. 

Related IBRS Advisory

  1. Maintaining disaster recovery plans
  2. Ransomware: Don’t just defend, plan to recover
  3. Running IT-as-a-Service Part 59: Recovery from ransomware attacks
  4. Ransomware, to pay or not to pay?
  5. ICT disaster recovery plan challenges
  6. Testing your business continuity plan

The Latest

28 March 2021: AWS has a history of periodically lowering the costs of storage. But even with this typical behaviour, its recent announcement of an elastic storage option that shaves 47% off current service prices is impressive. Or is it?

The first thing to realise is that the touted savings are not apples for apples. AWS’s new storage offering is cheaper because it resides in a single-zone, rather than being replicated across multiple zones. In short, the storage has a higher risk of being unavailable, or even being lost by an outright failure. 

Why it’s Important

AWS has not hidden this difference. It makes it clear that the lower cost comes from less redundancy. Yet this architectural nuance may be overlooked when looking at ways to optimise Cloud costs.

One of the major benefits of moving to Platform-as-a-Service offerings is the increased resilience and availability of the architecture. Cloud vendors, including AWS, do suffer periodic failures within zones. Examples include the AWS Sydney outage in early 2020 and the Sydney outage in 2016 which impacted banking and e-commerce services.  

But it is important to note that even though some of Australia’s top companies were effectively taken offline by the 2016 outage, others just sailed on as if little had happened. The difference is how these companies had leveraged the redundancies available within Cloud platforms. Those that saw little impact to operations when the AWS Sydney went down had selected redundancies in all aspects of their solutions.

Who’s impacted

  • Cloud architects
  • Cloud cost/contract specialists
  • Applications architects
  • Procurement leads

What’s Next?

The lesson from previous Australian AWS outages is that organisations need to carefully match the risk of specific application downtime. This new announcement shows that significant savings (in this case 47%) are possible by accepting a greater risk profile. However, while this may be attractive from a pure cost optimisation/procurement perspective, it also needs to be tempered with an analysis of the worst case scenario, such as multiple banks being unable to process credit card payments in supermarkets for an extended period.

Related IBRS Advisory

  1. VENDORiQ: AWS second data centre in Australia
  2. Post COVID-19: Four new BCP considerations
  3. Running IT-as-a-Service Part 55: IBRS Infrastructure Maturity Model

IBRS advisor Dr Wissam Raffoul, who specialises in transforming IT groups into service organisations, said legacy tech stacks had a lot of 'single point failures' which could bring whole systems to their knees.

Full story.

Conclusion: At the start of 2020, businesses had carefully-devised strategies in place which had been put together the year before. The onslaught of the global pandemic has either put these strategies to the test or caused them to be scrapped completely. The coronavirus has imposed changes everywhere we look and across different industries. Some businesses were forced to close shop. Others have been on a path of fast-tracked innovation and transformation. Before the pandemic, organisational behaviour had been structured to usher in growth and expansion. Although these are still valid goals, another factor has been added and that is survival.

With an economic crisis looming, consumer behaviour will inevitably change. Building and rebuilding the business requires its executives to be resilient and agile. A change in mindset is key. Alternative perspectives are relevant in pivoting in this new normal. After the period of adjustment has set in, managing IT may look different from how things were previously done.

Conclusion: The disaster recovery plan (DRP) should be seen as significantly more than a technical document for IT resources to be accessed only in times of crisis restoration. Use regular IT DRP updates and testing as a valuable marketing tool and keep the DRP ready for when disaster strikes.

A recently released survey revealed nearly one-quarter of all respondents cited lack of budget as a major challenge for BCP/DRP funding. This challenge will be even more daunting after the anticipated post-coronavirus budget cuts, so it is critical to remember the DRP is not just required to be technically savvy; it contains useful information to suit the non-technical audience when attaching the DRP to support funding to keep it current.

Conclusion: IT services are critical to reducing the impact of pandemics on public health, jobs and the overall wellbeing of nations. To prepare IT for this challenge, organisations should:

  • Embed pandemics management into their business continuity plans
  • Define fallback strategies to operate during pandemics
  • Plan the transition to the normal mode of operations when the time comes

Being prepared: IBRS has created a BCP checklist to help you create and/or update your business continuity plan.

This diagram is to be used in the following ways:

  • A checklist to ensure all BCP steps have been actioned and/or updated as required
  • An easy reminder to update key supporting documents to the BCP to remain current which include:
    • Enterprise risk frameworks
    • Business impact analysis documents
    • Evacuation and lockdown procedures
    • Recovery plans and testing of these plans
    • IT disaster recovery plans
    • Communication plans
    • Regular executive reporting

Conclusion: Australian organisations must have strong disaster recovery plans, be it for natural disasters or man-made disasters. The plans need to deal with the protection and recovery of facilities, IT systems and equipment. It is also critical that the plan deals with the human side of the impact of a disaster on the workforce. What planning needs to be done, what testing will be done, what will happen during a disaster and what needs to be done after a disaster?

This planning can be complex and confronting. Whilst testing the failover of IT systems can be relatively straightforward, testing the effectiveness of the workforce side of a plan will be difficult, and may even disturb employees who may prefer to think “surely it will never happen to us”.

Conclusion: when considering Cloud based email (Microsoft or Google) organisations should critically re-evaluate the need for third party Email Archive add-ons. Since Cloud-based email has virtually unlimited mailbox capacity the archive/storage management features of third party Email Archive add-ons many not be needed.

For many organisations the native compliance and eDiscovery features in Cloud based email are satisfactory and will rapidly mature and improve over time. Organisations that are very large, highly regulated, or at risk of litigation should evaluate the benefit of the more comprehensive, and more polished, third party Email Archive add-ons, whether that be Cloud or On-premises.

Conclusion: While many IT organisations believe that using public IaaS (e.g. AWS, Microsoft Azure, Google) to host business applications is a cost-effective strategy, they still require to manage the hosted environment themselves or select an external service provider to manage it for them. Towards this, it is critical to understand the current service management maturity level prior to choosing an in-house or outsourced solution. This note provides a self-assessment service management maturity model to create a solid foundation for selecting sourcing options. IBRS recommend that IT organisations with maturity level 3 or higher retain the service management function in-house, whereas, IT organisations below maturity level 3 should outsource the service management function.

Conclusion: 80% of traditional outsourcing contracts established in Australia during the last 25 years were renewed with the same service provider. However, with the emergence of public Cloud, IT organisations should examine the feasibility and cost-effectiveness of migrating to public Cloud prior to renewing the existing outsourcing contracts.

Conclusion: Disaster recovery continues to be an issue for many clients. Approaches based on tape have a low cost benefit but often recovery takes too long to meet the business’ requirements. The popular new approach of replicating data to a secondary data centre enables rapid recovery but at a cost which is prohibitive for some applications or smaller organisations.

An emerging third approach is to use Cloud infrastructure (IaaS) as a warm standby. This is attractive both in terms of cost and recovery time and can also be used as a strategic stepping stone for adopting IaaS.

Conclusion: With most organisations now completely dependent on IT systems for their day-to-day operations, and ongoing viability, ensuring the availability and recoverability of these systems is one of the IT organisation’s most important responsibilities. However, like many other forms of insurance, disaster recovery planning is not seen to be urgent by IT or the business, and often fails to meet the requirements of the business.

IT executives need to look for the early warning signs that their disaster recovery plan is compromised, and if found, take action to defuse this ticking time-bomb that could blow up their career.

Conclusion: Most branch office data is poorly protected by the organisation’s existing backup strategy. Recent improvements in network connectivity, and the commoditisation of advanced deduplication techniques, fundamentally change the landscape and make highly automated, reliable and cost effective branch office affordable to most organisations.

Organisations with extensive branch office data that is not adequately protected should revaluate their branch office backup strategy.

Conclusion: Organisations with existing Business Continuity Plans (BCPs) may find them to be a poor fit when dealing with the unique circumstances surrounding a pandemic. The chief characteristic is massively depleted numbers of available workers, with as many as 25-40% of staff absent throughout the entire government and business eco-system. Those without effective plans face the prospect of severe disablement that may take many months of recovery. For them, urgent action is required to draft pandemic-specific BCPs or to modify, then test, existing BCPs.

Conclusion: Consistent with its belief that the global financial crisis has heralded a new era in IT, IBRS has identified a series of management maxims to serve as a source of reference for IT executives navigating economic uncertainty.

Conclusion:In our November 2008 survey1we found many organisations are using archiving to manage their rapidly growing unstructured data. On further in-depth research we found that these archiving projects are mostly IT driven, focused on silos of data, and are largely limited to automating storage tiering (HSM) to control storage costs. While this is a sensible starting point, IT organisations could extract more value from archiving by offering enterprise search and eDiscovery to the data owners.

Conclusion: Many organisations do not distinguish between backup and archive and assume their backup data is also their archival data. This makes the backup environment overly complex and difficult to operate and creates a very poor archival platform.

Organisations that separate these processes find that backups shrink significantly, resulting in much smaller backup windows and much faster recovery times. This also enables the archival data to be optimised to meet desired business requirements. That is, cost, retrieval time, compliance, discovery and so on.

Conclusion: In recessionary economies, as in war, values and behaviours change in response to the times. Formerly valued business success factors may no longer apply; management thinking once considered outmoded may now have new relevance. At an organisational level, focus is likely to be on the lower strata of Maslow’s hierarchy of needs1. Indeed, C-level executives will be appraised on their ability to contribute to meeting these needs.

Conclusion: Economic downturns alter organisational dynamics and can herald changes in the executive power hierarchy. IT can be particularly vulnerable if seen as a cost centre and order taker. As economic forecasts darken, a common scenario is for the balance of power to swing to the CFO. Then, an economic austerity agenda is usually pursued, characterised by a program of across-the-board cost cuts that have Chief Executive imprimatur.

The financial press has begun using the term GFC as a short form for the Global Financial Crisis. Whilst outside the scope of this paper to speculate on the length and socio-economic effects of the GFC, there is no doubt that its impact will be experienced widely across business sectors and indeed within government. As consumer confidence recedes, corporate earnings shrink and revenue forecasts are revised downward, nothing is more certain than IT budgets being trimmed in 2009.

Conclusion: The International Standards Organisation has just released a new International Standard that focuses on Disaster Planning for IT1. This new standard reflects the changed/outsourced IT world. It provides guidelines for information technology disaster recovery services as part of business continuity management that apply to both “in-house” and “outsourced” ICT environments. This new approach for Disaster Recovery (DR) Standards should stimulate organisations to re-examine their IT DR plans to ensure that they meet current best practice and that the processes they are using to maintain their DR planning are satisfactory.

Conclusions: While there is now an increasing emphasis on Business Continuity Management (BCM), many organisations still focus on disaster recovery planning. Unwisely they restrict their focus to restoring IT infrastructure, giving only a “cursory nod” towards a more holistic business orientation that focuses on all critical business operations. Some create an artificial air of confidence by developing their business continuity plans and then not proving them. Others have little appreciation of the quality of their Business Continuity Plans (BCP) and whether or not they meet good practice. In all these cases there can be no assurance that the BCPs will be of any practical use if and when they are needed. The outcome will be, at least, serious and could be catastrophic.

In most businesses, regardless of size or industry, formal business continuity and/or disaster recovery planning is consistently under-funded and generally neglected by management. The business risks associated with this attitude can be very high but are not understood. Those plans that are in place simply don’t work. This is not surprising since disaster recovery hasn''t been given sufficient consideration, ensuring that plans are rarely tested (if ever) and equally rarely updated to reflect changes in process, technology or applications. In an emergency, there are many continuity requirements within an organisation’s business and services covering processes, facilities, and personnel. IT and a range of business units across the whole organisation must work together, both in planning for continuity and in its execution.