Discovery Risk Mitigation – Preservation

Data Preservation

Should It Stay or Should It Go?

A terrible wordplay on The Clash’s 1981 classic perhaps, but to retain or dispose of data is a core theme in this, the last piece in our Discovery Risk Mitigation series of blogs.

You’ll recall the previous discussion that Information Governance is increasingly both a critical aspect of achieving an improved response to discovery events, such as investigations, litigation and data subject access requests (DSARs) but also to mitigating the inherent risk that data poses to a business, if not correctly managed.

I’ve intimated in previous articles in this series that there is often a reluctance to dispose of corporate data for the fear that it is either a valuable knowledge asset, there is a lack of clarity around what constitutes compliant retention for the organisation, or most likely that it is just extremely hard work to establish visibility over the data estate, due to a history of poor information governance. Many legacy applications simply don’t have a management layer providing oversight of the data under management. More modern Cloud-based platforms do but that’s only relevant to more recent content. Remember also that different industries have different regulations so couple these factors together with the perception that storage is cheap, it’s perhaps easy to understand why some companies just retain everything… forever.

But what truly is the cost of adopting such an unsophisticated (dare I suggest, crude) approach?

Indeed, storage is (relatively) cheap these days. But it’s certainly not free and we’re generating data at an ever-expanding rate, so it all adds up. Remember also, that disclosure applies potentially to all the data in your estate, so if you choose to retain it all, it could become subject to an expensive eDiscovery exercise at some later stage. So, in addition to the actual cost of storage, perhaps the real cost of data retention should be measured using these two metrics:

  1. The cost to identify, collect, process and disclose data per TB from the information estate, which can be a very disruptive exercise, pulling organisational resources away from their operational day-to-day responsibilities; and
  2. The risk to a business that something has unnecessarily been retained that could adversely impact the outcome of an investigation or legal process, or damage company reputation.

I fully acknowledge that neither is easy to quantify but my intention is to illustrate that the potential cost of indiscriminate data retention is far higher than just the cost of physical storage.

The discussion reverts once more to good data governance. To reiterate, at Salient we are strongly of the opinion that you should only retain data:

  • For as long as you are mandated to do so to remain compliant with legislation covering your industry, and/or;
  • It is realistically likely to have some residual value as a knowledge asset (which we believe generally decays over time).

So, having made that leap of faith, what should your retention policies be and where should you keep the data?

Retention Policies

This piece is not intended to be a set of hard and fast rules that prescribe retention and disposition policies but rather to serve as a reminder of good practice. To that end, it would be advisable to seek qualified advice regarding the regulations governing retention in your industry and to consider those when establishing and implementing a policy.

Your data will undoubtedly fall into various categories for which different policies will apply, so the information governance discipline of understanding the nature of the data, its lineage (where it came from and what process created it) and where it resides, will pay dividends here.

There are many aspects to consider, but singling out a few of interest:

Knowledge Assets

When it comes to the retention and value of knowledge assets, we’re into a highly subjective area and the answers will vary industry by industry, company by company and probably at an even more granular level than that. Again, some independent advice may prove helpful to challenge whatever the accepted wisdom is but it’s unlikely that it all has equal value, nor should it all be retained indefinitely. There is no absolute right answer here but looking at when any ‘perceived’ asset was last accessed and by whom will probably prove insightful, perhaps making those hard decisions a little less daunting.

Cloud Content

As noted earlier, not all Line of Business (LOB) applications include features and functionality for managing data retention and disposition, although modern Cloud-based systems, such as the Microsoft 365 platform, now have in-place preservation capabilities and tools to help you classify and better understand the nature and hence the value or compliance category of content.

But you should check your contracts with Cloud-based providers to ensure that they have the capability for your policies to be implemented. It would be frustrating to say the least, to implement a seven-year retention policy, only to find that data is not actually removed when that period expires and as a result becomes legitimately subject to discovery.

Dark Data

These modern platforms will assist you with retention policies for data being generated now and in the future. However, what about the legacy data that reside in archives, file servers and backups? This constitutes the so-called Dark Data that we posit will provide the greatest challenges and headaches and that may lead to the blunt instrument of retaining or removing it all!  

How do you even start to assess this legacy estate? As suggested earlier, it is a considerable task and overhead in terms of time and money, which has no immediate perceived operational value, but absolutely should be top of mind for compliance, risk and legal officers and ultimately the C-suite.

Tooling and solutions are now coming to market which assist with this sometimes monumental task. Solutions which can provide the missing management layer over legacy LOB systems and additionally allow you to both qualify and quantify the data for retention and either manage it in place or move it once and for all into more cost-effective and more appropriate long-term storage.

Universal Archive

Data that you’re retaining for compliance reasons is likely to be for long periods and needs to be immutable (i.e. stored with confidence that it cannot be subsequently manipulated). It’s also highly likely that it will rarely, if ever, be accessed or even queried, until its disposition. That may not be quite the case for knowledge assets but if you’ve conducted your assessment of their real value realistically and assigned retention policies accordingly, I suspect it would hold true. 

Retaining data for the long term within LOB applications is not a hugely practical or desirable solution. For a start, it may well ‘bloat’ the system, slow it down and impact the performance for active data transactions. It may also be an expensive way to keep the data (i.e. in terms of storage, licensing and maintenance) and as mentioned previously, the system may not even have disposition capabilities.  

And we can say with confidence that whatever you do decide to keep, make sure you’re only keeping it once! Duplicating information would be a little short of madness in terms of cost and other overheads.

So why not look to consolidate all the data that you need to retain within a cloud tenant of your own in a Universal Archive? Email data from legacy users, archives retrieved from expensive and difficult to search or analyse third party systems and any other aged content from LOB systems (having been through that analysis process), all in one place and within your own estate. That way, you always retain control of the data and can more readily apply and act on retention and disposition policies in the one place. You are also better placed to ensure single instancing of data.

And moreover, you can include the Universal Archive within your broader data map, to support discovery exercises and DSARs by reducing the number of sources you have to consider or search and even start to mine intelligence from the data using the ever-growing range of Artificial Intelligence tools that are available. Salient Preserve is such a Universal Archive, which leverages the very low-cost Microsoft Azure BLOB storage back end.

But to bring the discussion back to investigatory and eDiscovery processes, it has been suggested in earlier articles that the work product from such exercises does itself constitute data that should be subject to proper governance. And despite having previously suggested that knowledge assets decay with time and that you should be circumspect in what you retain, I’m going to appear to contradict myself by suggesting that the time and money in forensically collecting, processing and making decisions on the data, can often warrant its retention. In some jurisdictions, you are even obliged to do so for an extended period of time at the conclusion of a matter, but it’s worth establishing the value in these assets to avoid discarding it, if there’s a realistic chance that you’d only have to repeat the exercise at a later date and at great expense. For example, retaining a forensic image of an ex-CEO’s laptop could be a prudent step to take, depending on the circumstances of their departure.

In conclusion, there are no black and white answers, only shades of grey and hopefully, this series of articles has been helpful in examining some of the issues. If this last piece has stimulated further questions for you regarding data preservation, how you might automate legacy and dark data analysis, or if you’d like to find out more about Salient Preserve, please contact us here.