It’s estimated that 70 – 80% of litigation cost is spent on document review, with another significant portion going to eDiscovery hosting, sometimes for extended periods (costed on a per GB basis). That makes containing data volumes to that which is strictly necessary, an imperative in managing overall eDiscovery costs – and one of the greatest areas of opportunity for eDiscovery cost reduction.
There are several avenues to address this data volume challenge. Our personal favourite (when time allows) is proactive information governance (making sure you are only retaining what is strictly necessary or has true value). Once an investigation has started, however, the most effective way to reduce data volumes for review and hosting is by using AI to intelligently cull irrelevant, duplicate and otherwise non-responsive data.
Here’s how that works.
How to use AI to filter the junk and find the gems
There are two main ways artificial intelligence can be used to streamline data collection and reduce data volumes.
Unsupervised Machine Learning
Unsupervised machine learning tools, such as clustering and entity extraction and communications analysis, are used to group and classify information by concept based on automated content analysis.
This is helpful for two reasons. Firstly, it enables entire categories of information to be excluded from the investigation. Secondly, it provides useful context (often presented visually) that helps investigators get a better handle on what is and isn’t important and target their search more effectively.
Supervised Machine Learning
Excluding privileged information from discovery is vital for a number of reasons, but can also have a considerable effect on reducing data volumes.
Of course, privilege identification and review is notoriously nuanced and prone to grey areas, making it difficult to train AI to make accurate, independent privilege distinctions. However, AI can still be extremely effective when paired with human reviewers, accelerating and improving the process while offering an additional level of quality control.
This collaborative process involves reviewers making binary choices about pieces of information surfaced and presented to them by AI. Choices like whether all or part of a document is relevant, privileged, or related to another issue in the matter. The machine learning model then “learns” from these decisions, prioritising its search to surface documents with the highest probability of relevance, first.
The result is a far higher “hit rate”, much earlier in the process.
Out of the box, so called ‘portable’ models are available and furthermore with solutions such as the Reveal-Brainspace platform which underpins the Salient eDiscovery service, standard models can be refined to make them even more applicable to a particular jurisdiction or geography (say) and saved for future use, along with any entirely bespoke models you may build from scratch.
More ways to drive down eDiscovery costs
AI and machine learning are evolving at an impressive pace, but they aren’t the only tools in today’s data-volume-reducing arsenal. In our experience, the most effective way to reduce litigation costs relating to excessive data volumes is through proactive information governance.
That means taking active measures to remove ROT (redundant, outdated and trivial data) and map your data estate to know exactly what resides where. It also means having skilled users with the right tools in place to complete in-place EDA at the very start of the investigation process.
Equally important, however, is embracing the mindset that every eDiscovery engagement is an opportunity to streamline the next. This continuous improvement is a core component of Salient’s cost-conscious (and cost-reducing) approach to eDiscovery.
We don’t just provide our clients with expert skills and innovative technology. We also offer actionable insights after every engagement to leave you better informed and better equipped to handle whatever eDiscovery challenges the future may bring. Find out more about how we solve eDiscovery challenges.
Read more of our series on practical AI for eDiscovery
Practical AI for eDiscovery: today, tomorrow and in future
We’re still a long way away from discovering where artificial intelligence will lead us. But preparing for that mysterious future shouldn’t stop us from making the most of what we have here and now.
1. Intelligent culling - a critical component of cost-effective eDiscovery
The most effective way to reduce data volumes for review and hosting is by using AI to intelligently cull irrelevant, duplicate and otherwise non-responsive data.
2. Using AI to improve inclusion
How do you find what you need when you don’t necessarily know what you’re looking for? With the help of AI and an investigative mindset, it’s possible to automatically expose leading indicators from within a much larger dataset.
3. Expectation vs reality: what Generative AI really offers eDiscovery
Is generative AI really the next frontier in eDiscovery? How much of the hype is grounded in reality? We explore practical applications for GenAI in eDiscovery.
4. Finetuning Generative AI for eDiscovery
AI may be powerful, but it still requires human input to deliver high quality results. We share our GenAI learnings around how context and prompt influence output and how GenAI output can be finetuned.