Continuous active learning in antitrust preparation


An Epiq corporate client in the technology sector faced a massive antitrust litigation. Instead of sitting idle in anticipation, the client engaged a much more proactive strategy. Ahead of a discovery request, the client instructed their outside counsel to begin to review documents, build their legal case, and understand their potential risk. Starting with a handful of key custodians, the client collected documents and email potentially related to the matter and provided these to Epiq. This targeted collection had 240,000 documents. Despite the targeted approach, less than 6 percent of the collected documents were relevant to the antitrust matter. This is not unusual in technology matters where search terms are often over-inclusive. However, with the potential matter looming, the client needed to complete this review quickly and inexpensively.


Epiq recommended the technology platform NexLP to streamline first level review. NexLP’s predictive coding tool Cosmic provides a fast and efficient way to focus the review team on the truly interesting documents. In this case Epiq recommended using existing seed documents to kick start the process and then a predictive coding process called Infinite Learning (continuous active learning) to keep the review team focused on the most relevant content. NexLP Cosmic provides state of the art document classification based on a combination of example documents provided up front and ongoing review decisions. Predictive coding uses these example documents to score all the documents in the collection on range of 0 to 100. Documents scored with a 0 are obviously not relevant, documents with a 100 are obviously relevant.


Outside counsel provided a small set of 60 seed documents to kick start the process. The examples were a mix of relevant and not relevant documents and email. Although the sample was small, the NexLP Cosmic predictive coding algorithm was able to learn from them and identify more documents that were likely to be relevant. The platform was able to do so by evaluating not just the text of the example documents, but other aspects as well such as who is sent the email, who they are sent it to, the email domains in the distribution, the kinds of companies or associations referenced in the examples, among many other elements. NexLP includes many metadata features as variables in the predictive coding model. The NexLP Cosmic classifier ingested this information and then provided initial document scores. The highest scoring documents were immediately batched out to a team of contract attorney reviewers since these had a much higher prevalence of relevant content then the overall population.

Once the review team began coding documents, these were fed back into NexLP Cosmic to update the predictive coding model. These were largely high scoring documents with a small mix of documents selected by the system to improve the model.


Although 240,000 documents were initially provided for review, only 25,000 were reviewed by the contract review team including attachments and parent emails. 215,000 documents were found to be clearly not relevant through both statistical and selective sampling. This enabled outside counsel to complete the assessment of risk and the validity of legal arguments much faster than initially planned. The project was also completed significantly under budget. As a result, additional key custodians were added to the process, enabling a more thorough preparation for the upcoming matter.



view all right


view all right

Legalweek New York


© 2021 Epiq. All rights reserved.

By continuing to browse and accepting this banner, you consent to the storing of first and third-party cookies on your device to enhance site navigation, analyze site usage, and assist in Epiq’s marketing efforts. Read more on our cookie notice.