Yes, eDiscovery predictive coding is super cool. But can we also agree that email threading just works?
For the last decade, predictive coding has dominated e-Discovery conferences and preoccupied industry thought leadership. In practice, despite the buzz – and several generations of technology development –predictive coding is still used in less than 1 percent of cases. That’s a trivial adoption rate for most technology, but especially for one that gets so much attention.
Are we missing the e-Discovery forest for the predictive coding trees? Let’s take a step back from whatever mysterious allure has gripped our collective imagination and ask: What’s the goal? Simply put, it is to reduce the volume of data that needs human review.
Most companies have too much electronic data to manage without a volume reduction strategy. Predictive coding is often fantastic when used, but why not get some buzz going for reducing the volumes of the other 99% of cases? There are other technologies – including email threading, near-duplicate identification, and clustering – which are applicable to nearly every eDiscovery case, though also woefully underused.
Email threading vs. predictive coding
Let’s take a closer look at email threading. This is an already “solved” technology that is proven to achieve, right now, huge reductions in data sizes and documents to be reviewed.
Email threading works on a straightforward premise: reduce volume by removing repetitive content. A typical review has lots of duplicate emails (e.g., replies, forwards, and copies in every recipient’s inbox). But all that’s really needed is a single iteration of each unique piece. Threading accomplishes this by reconstructing the entire conversation to produce the most “inclusive email”—usually the latest one in the thread—which shows the full text of every message transmitted. Less complete earlier versions are excluded. If an earlier email had an attachment that was later dropped, the attachment is picked up. If the conversation branches out in different directions, the most inclusive email in each of those branches is also included, so no content is eliminated unless it is truly redundant. The idea is ensure that a human reviewer sees every bit of unique content, however small.
Everything about email threading is uncontroversial and proven. There is no debate about its effectiveness. It cuts costs. Clients usually pay an upfront fee, but it’s a miniscule expense when compared to the big savings downstream, since attorneys don’t have to review lots of redundant content. Email constitutes roughly 80% of all e-Discovery data. Threading cuts that volume by one-third to one-half. That’s a significant savings at any scale. Even implementation is easy. The software brings together the email from wherever it lives, which might be across multiple platforms, like Outlook, Gmail, and IBM notes and others. And there are no defensibility issues, since it doesn’t do probabilistic reductions.
Electronic discovery email threading: Standard procedure
Despite all of its advantages, email threading suffers from a low adoption rate. At 35 percent usage, it is enormously more prevalent than the 1 percent currently enjoyed by predictive coding, but it’s still too low. Every review involving email should use threading. For some projects, predictive coding may be an appropriate additional option, but running the threading software should be standard procedure. Every service provider should offer it—or explain why they shouldn’t. Clients should expect it by default, and ask (or direct their attorney to ask) about it when hiring a provider.
Workflow strategies generally don’t happen in a vacuum. Decisions are made in the context of live cases, with pressing deadlines and specific requests. There is little opportunity for mulling over the nuanced implications presented by a new process when they “already know what works”.
I’m sure there will be another round of sessions on predictive coding at the next big eDiscovery conference. And that’s fantastic. But can we get at least one session for email threading? Maybe we can get adoption over 50 percent.