The global shift towards remote and hybrid workplaces has triggered exponential growth in the use of email and other electronic messaging. For investigators, this presents both opportunity and challenge: vast volumes of potentially rich investigative insights “locked” within a data type that is difficult to interrogate by its sheer volume alone. How do you find the hidden gems?
Structured, semi-structured and unstructured data
When it comes to investigative simplicity, structured data is the golden child. Residing in organised databases, complete with clearly defined field names and relationships, structured data is relatively easy to analyse to reveal the insights investigators need.
Structured data has also been a source of discoverable ESI for years, and most investigators are intimately familiar with the tools and processes required to analyse it.
Unfortunately, of all organisational data, only 20% is structured (on average). The rest – including all text-based, audiovisual, and most user-generated data – is unstructured, presenting a far more challenging investigative picture.
If structured data is a library, unstructured data is a haystack. It has limited predefined format or organisation and does not conform to any single standard. Unstructured data is also qualitative rather than quantitative, making it far more difficult to collect, process and analyse using conventional investigative tools or approaches.
It’s not all “Wild West” when it comes to unstructured data, however. The metadata associated with unstructured data can provide important additional context that is a little easier to parse and analyse. This metadata is considered semi-structured, and includes details such as creation date/time, email sender/recipient, modification dates, author, etc.
Creating order from chaos
As the volume of unstructured data has skyrocketed, so too has the speed of development of technology capable of processing it. Natural Language Processing (NLP) is a prime example, giving rise to tools like Alexa, Siri and customer support chatbots while laying the foundation for next-generation AI technologies.
These technologies are now being routinely applied in eDiscovery to unlock the investigative potential of unstructured data. They use NLP to effectively introduce a degree of order to the “chaos”, allowing unstructured data to be enriched through entity extraction and detecting and grouping content by concept in order to build relationships that link seemingly disparate facts and circumstances into more cohesive fact patterns.
Impressively, NLP-enabled AI technology is also capable of identifying and analysing the sentiments behind the opinions, actions and reactions it extracts from unstructured data. This is particularly valuable for sentiment-rich data sources like communications, which can hold critical insights into the attitudes fuelling specific behaviour.
This can not only shape the direction of an active investigation, but it can also pre-empt undesirable behaviour when used as part of a proactive eDiscovery strategy.
Navigating known sentiment analysis challenges
NLP-driven sentiment analysis is good, but it’s far from perfect. After all, even humans struggle to accurately identify sentiment at times (particularly when voice and body language cues are removed from the picture).
While technology is advancing rapidly, there remain certain areas in which sentiment analysis is known to struggle. Examples include missing context, irony and sarcasm. And whilst the use of emojis might be considered as providing semi-structured context, human traits of irony and sarcasm extend into their use too.
Overcoming these issues is often a matter of knowing exactly how to deploy your tools for optimal results. It’s one of the many reasons an expert hand at the tiller tends to deliver more accurate and reliable results.
Unlocking the full fact picture
Of course, there’s not much point in analysing communication patterns, sentiments and anomalies if you can’t organise the (likely vast) volume of results into a cohesive fact picture that reveals any unusual relationships and/or issues.
In this, we cannot overstate the advantage of data visualisation tools like Brainspace.
Brainspace takes advanced visual analytics in Reveal to the next level, compiling investigative insights into a consolidated fact picture while providing reviewers with an intuitive platform from which to digest, navigate and explore all potential leads and otherwise relevant insights.
This ability to follow the right threads down the right rabbit holes – all without losing sight of the big picture – makes every difference when it comes to extracting value from a rising flood of unstructured data.