Building Machine Learned Models – Why the Law Firm is Ideally Placed to Add Value

December 18, 2020
React

As the name implies, the World of data science is, by definition, dependent on data, and that is certainly the case when it comes to machine learning modelling in the context of eDiscovery. Be it supervised or unsupervised learning, the machine has to cut its teeth on relevant, rich data to be able to build something that reliably delivers accurate results and thus can deliver on the objective of accelerating the EDRM process.

The potential for AI technologies to wade through the vast volumes of electronic documentation and communications is increasingly being recognised. Still, the reality is that unless we have pre-built models where those models are built on relevant content, we are always faced with having to build afresh each time, which rather nullifies the advantage of letting the machine do the heavy lifting.

Terminology in property law, for example, is very different across the World and whilst its meaning may well be largely the same, a model built from content in one jurisdiction may fail miserably in another. The same will probably hold true for industry-specific terminology not being readily transferable and of course the challenge is compounded across multiple languages or where regional variations or lexicons exist.

The Opportunity for the Trusted Advisor

With this in mind and against a growing backdrop of privacy and confidentiality, finding publicly available, representative training data sets for anything can be a problem. After all, when did you last sit through a demonstration of an eDiscovery technology that didn’t use the aged Enron data set?

This is where the law firm can add value.

Law firms, with their position as trusted advisors, often have access to client content within their document management estate, or indeed as part of an eDiscovery or litigation exercise they are conducting. And it is this which provides the key to the problem described above.

Computers don’t do language. They do numbers. Think, therefore, of the building of AI models as being the process of converting language (with contextual insight) into corresponding mathematical constructs. Be it supervised (i.e. with human intervention) or unsupervised (algorithmically looking for previously undetected patterns), in simple terms the resultant model then decides whether a new example falls to one side or the other of a mathematical divide, along with a measure of confidence. By that stage, any client confidential materials that contributed to the building of the model are totally obfuscated, represented only by a mathematical vector.

Building a Solution

By adopting the latest technologies and then through the normal process of eDiscovery analysis and review, the law firm can build machine-learned models, either for generic use or specific to their specialist areas of practice, industries and jurisdictions, models which are then transferable and can be re-used on later matters. Not only can this accelerate subsequent reviews, but it also overcomes the problem of having to re-build models every engagement. It so strengthens the added value that the firm can deliver.

Vendors can and do provide “out of the box” models, and it may prove advantageous to use these as a quick start to seed your model. For example, a model that is trained to detect bullying behaviour may be sufficiently generic to identify some relevant content in a data set, but local language variations between say American English and British English may mean that it is not as effective as you might want. But by building on top of that existing model through the provision of positive and negative feedback, simply as a function of your review process, a more pertinent model can evolve and be saved.

If you’d like to find out more about how Salient Discovery can help you with building machine-learned models for eDiscovery and Cognitive Analytics purposes, contact Salient eDiscovery today.

eDiscovery

Microsoft Purview eDiscovery for complex investigations: what it does and where expertise still matters

Microsoft Purview eDiscovery is a powerful starting point for investigations inside Microsoft 365, but complex matters still demand the right expertise. This article explains what Purview can do, where Standard and Premium differ, and when a specialist partner can help teams manage defensible searches, review workflows, cross-platform data, and high-risk investigations with greater confidence.

July 21, 2026

DSAR

Recent UK ICO changes: what corporate legal teams, DSAR practitioners and DPOs need to know

If you handle DSARs for a living, you’ll know the feeling: another wave of data protection reform, another set of

July 21, 2026

DeepDive

More extensive, quicker and cheaper: can AI-powered Enhanced Due Diligence deliver on all three?

Traditionally, enhanced due diligence (EDD) has involved a trade-off: go broader, deeper and slower, or move quickly and risk missing something vital.

Today, tools like DeepDive are changing that equation, delivering EDD that is more extensive, faster, and more cost-effective, without sacrificing defensibility.

June 23, 2026

DSAR

Navigating complex DSARs with confidence

Complex DSARs require more than just good intentions. They demand a clear strategy, specialist tools, and the expertise to manage legal, operational, and reputational risks under pressure.

June 3, 2026

Forensic Investigation
Support

Litigation
Support

Digital Forensic
Investigations

Case Studies

eDiscovery

Digital forensics

Open source intelligence (OSINT) research

Data Subject Access Requests

Insights

Case Studies

Featured Insight

Microsoft Purview eDiscovery for complex investigations: what it does and where expertise still matters

Podcast

Listen to our latest podcast:

About us

Careers at Salient Discovery

Building Machine Learned Models – Why the Law Firm is Ideally Placed to Add Value

The Opportunity for the Trusted Advisor

Building a Solution

Microsoft Purview eDiscovery for complex investigations: what it does and where expertise still matters

Recent UK ICO changes: what corporate legal teams, DSAR practitioners and DPOs need to know

More extensive, quicker and cheaper: can AI-powered Enhanced Due Diligence deliver on all three?

Navigating complex DSARs with confidence