Salient Logo
Salient Logo

Proof of concept: OCR & analysis of scanned/handwritten documents

Salient uses AI OCR technology to overcome extreme document quality issues, demonstrating effective analysis and classification capabilities on the extracted results.

Client Sector: Law
Technology Used: AI OCR, Power BI, Reveal
AI OCR

The Challenge

Salient was approached by a law firm with approximately 20,000 agreements that required the identification of specific clauses within them, and general categorising. Their biggest concern was that the quality of the documents was poor. Many were hardcopy scans and included handwritten notes.

The firm had already attempted to analyse and categorise the documents using a number of alternative platforms. However, the extraction technologies of these platforms could not adequately handle the grainy scans, varying orientations and handwritten notes prevalent amongst the source material. The client’s challenge to Salient was to provide a proof of concept to show that our technology could succeed where others could not.

Challenges

  • Identification of clauses and classification of 20,000 documents
  • Poor quality documents including handwritten notes
  • Proof of concept required to demonstrate capability of Reveal platform

Results

  • AI OCR engine was used that was able to process >15,000 pages per minute in 150+ languages
  • The client was able to use the Reveal platform to identify required clauses to a higher degree of accuracy than previously
Our solution

Salient successfully demonstrated that, using our solution, the client would be able to accurately identify specific clauses within a far higher percentage of the agreements than previously, despite the poor quality of the source material.

They would also be able to easily prioritise specific categories of agreement for review using the overlaid metadata fields. This would enable them to meet their own client’s expectations by delivering results on high priority agreements first.

In order to improve the accuracy of the review, we proposed an Artificial Intelligence (AI) Optical Character Recognition (OCR) engine with handwriting recognition. Tests showed that this could successfully extract high quality, searchable text from the client’s agreements for further analysis. It could also process in excess of 15,000 pages per minute, and OCR in 150+ languages if necessary.

The native agreements and extracted text could then be ingested into our SaaS eDiscovery Platform (powered by Reveal) where the client could quickly identify responsive clauses in the agreements. Each agreement would also be overlayed with metadata such as owner, location and agreement type, enabling the client to prioritise specific categories of agreements for review.

We also proposed a custom Power BI dashboard that would allow the client to track the progress of their review by parameters like “reviewer” and “agreement category”.