The art of prompt engineering for GenAI tools in eDiscovery

November 14, 2024
React

Getting the prompt right is critical: the art of prompt engineering

When Reveal Ask is enabled on a matter, a more complex semantic index is built which leverages natural language processing (NLP) techniques to help you find data that is based on the interpreted intent or contextual meaning of the query, rather than pure lexical (search terms) methods.

One clear observation we and many clients have drawn, is that getting the prompt right is critical. And in fact, we need to scratch a little further beneath the covers to understand the function of and the relationship between the two boxes in the Ask interface:

Neither is the actual prompt that grounds the LLM itself.

The Question box in Ask is just that; a question box and the Additional Instructions provides instruction to the LLM to support its synthesis of the output.

The actual prompt is a long, proprietary and internal device which uses various inputs, not least of which are the question and additional instructions that the user inputs.

But having a better understanding of the effect of these 2 boxes is critical in constructing the best queries and hence in receiving optimal results from Ask.

So what does each of the inputs do in more detail? You need to think in terms of Search first and then LLM Reasoning second.

The Question box – Search

As mentioned, when enabling Ask on a matter, the system builds a more comprehensive semantic index. It is the key terms and phrases used in the question box which are used by the engine to filter result snippets from the semantic index. It is therefore critical to get that right, to ensure the most relevant snippets of content are selected. When you submit a question to Reveal Ask, the query that is run against the semantic index will use the terms and phrases provided quite literally in the question, to identify the likely most relevant content to pass to the response synthesis stage. As an example, imagine you wanted to detect exchanges of pricing information via email or messages relating to a particular product. Using a generic question such as “Identify examples of information exchange” will prioritise the selection of content which are responsive to the highlighted terms, which is unlikely to yield a good response. By comparison, by being more specific such as the question “ identify emails and messages where pricing for product x is discussed” will again use the highlighted terms to probably yield more relevant snippets to pass to synthesis for a superior response.

The Instructions box – LLM Reasoning

The Instructions box provides the LLM with guidance as to how it should construct its synthesised response. It has no influence on the content which is selected from the semantic index from which the response will be built, as it is applied after those snippets have been returned. Accordingly, getting the question ‘right’ can have significant benefits. As a general rule of thumb, focus on the query design in the first box and then prioritisation and structure of the response in the second. However, within these general guidelines, our advice would be to experiment with variations of the inputs. But some areas to consider would include:

Explicitly including the key topics or themes of the query in the prompt rather than being generic. For example:

“Identify any examples of pricing collusion in this data set”

“Detail any instances where discussion on pricing has taken place between Party A and any of Parties X, Y and Z”

Consider the complexity of the query. Rather than try to build too many aspects into the desired response, break the query up into smaller tasks. This would appear to reduce the likelihood of hallucination or confusion in the response as the semantic search task that gets run is more focused. For example:

“Detail any instances where discussion on pricing has taken place between Party A and any of Parties X, Y and Z and in each case identify the persons involved and the specifics of the pricing discussion being had”

Judicious use of terms like ‘top’, ‘key’ and ‘most’. As described above, the use of terms such as these will be interpreted in the search phase of the process quite literally, and as such, are not necessarily valuable. The engine is not designed to provide exhaustive lists of facts and an understanding of its internal process can help here. The engine will always try to return the ‘top’ results, irrespective of the instruction, as it returns the top 100 (currently) most semantically relevant hits before invoking a further proprietary relevance ranking algorithm to sequence what is sent to the LLM for synthesis. Even when asking for a summary of events, be aware that the engine may truncate its response when there may be further content that it hasn’t considered if the richness of the content set might yield more than 100 snippets. [See comments on granularity later.]

Use Additional Instructions wisely. As mentioned above, these provide guidance to the engine in terms of prioritisation and structure of the synthesised response. In our experience, we find that always asking for detailed responses as part of additional instructions, generally yields a more full and better structured response, especially when combined with requesting results in a timeline. [Note: Try also requesting results in tabular format if you haven’t already.]

On the topic of timelines, it should be noted that these are derived purely from the body text of the content, not from any associated metadata. If you want to restrict searches and questions to a specific date range based on metadata, you should use the filtering from the dashboard view first to pre-select content and then ask your question of that corpus.

GenAI for eDiscovery: Success stories using Reveal Ask

We’ve been testing out RevealAsk with our clients to evaluate the potential for GenAI in their eDiscovery projects. Find out our successes, what we’ve learnt along the way and our views on where the technology will take us next.

1. Introduction, a reminder about how GenAI works in eDiscovery and a practical example about using Reveal Ask.

2. The art of prompt engineering and what we’ve learnt about the importance of getting the prompt ‘right’.

3. How to use Ask to accelerate an investigation: our learnings and strategies for success.

4. Getting the most from the integrated tool set – the power of Ask in combination with the wider Reveal toolset.

5. What works well, which use cases are best suited to the Ask capability and some observations about future enhancements

DeepDive

More extensive, quicker and cheaper: can AI-powered Enhanced Due Diligence deliver on all three?

Traditionally, enhanced due diligence (EDD) has involved a trade-off: go broader, deeper and slower, or move quickly and risk missing something vital.

Today, tools like DeepDive are changing that equation, delivering EDD that is more extensive, faster, and more cost-effective, without sacrificing defensibility.

June 23, 2026

DSAR

Navigating complex DSARs with confidence

Complex DSARs require more than just good intentions. They demand a clear strategy, specialist tools, and the expertise to manage legal, operational, and reputational risks under pressure.

June 3, 2026

AI in eDiscovery

Automation in early case assessment: harnessing AI for accuracy and efficiency

Early Case Assessment (ECA) is a critical first step in the litigation process. It allows legal teams to evaluate the

April 28, 2026

DeepDive

The hidden challenges of searching publicly available information and how to solve them

Publicly available information should be one of the most powerful assets available to investigators. The internet offers unprecedented access to

April 28, 2026

Forensic Investigation
Support

Litigation
Support

Digital Forensic
Investigations

Case Studies

eDiscovery

Digital forensics

Open source intelligence (OSINT) research

Data Subject Access Requests

Insights

Case Studies

Featured Insight

More extensive, quicker and cheaper: can AI-powered Enhanced Due Diligence deliver on all three?

Podcast

Listen to our latest podcast:

About us

Careers at Salient Discovery

The art of prompt engineering for GenAI tools in eDiscovery

Getting the prompt right is critical: the art of prompt engineering

The Question box – Search

The Instructions box – LLM Reasoning

GenAI for eDiscovery: Success stories using Reveal Ask

More extensive, quicker and cheaper: can AI-powered Enhanced Due Diligence deliver on all three?

Navigating complex DSARs with confidence

Automation in early case assessment: harnessing AI for accuracy and efficiency

The hidden challenges of searching publicly available information and how to solve them