Efficient and Effective eDiscovery Uses Continuous Active Learning (CAL)

Most document review today uses some form of technology-assisted review (TAR). TAR uses computer software to categorize documents as responsive or nonresponsive, based on human review of a subset of documents from the collection. TAR includes the capability to prioritize documents on a scale of most to least likely to be responsive, allowing human reviewers to manually review an ever-decreasing volume of documents.

While lawyers may worry that computer-assisted review of electronically stored information (ESI) will fail to retrieve appropriately relevant documents, studies comparing manual review and TAR show TAR is superior, provided the underlying algorithm has been properly coded and trained.

Continuous active learning’s list looks like a search engine’s

Most TAR tools use supervised machine learning to accomplish their work. In supervised machine learning, a computer algorithm ranks a document collection by analyzing it in terms of features of training documents. The learning algorithms used for TAR should not be confused with unsupervised machine-learning algorithms used for clustering, near-duplicate detection, and latent semantic indexing, which receive no input from the user and do not rank or classify documents.

In supervised machine learning, “learners” (computers) infer how to differentiate relevant from non-relevant documents by examining training examples. The training examples consist of documents a human teacher has previously coded as relevant or non-relevant.

Three types of technology-assisted review in eDiscovery

TAR usually employs one of three protocols: SAL (simple active learning), SPL (simple passive learning), or CAL (continuous active learning). CAL is simpler than SAL and SPL because CAL does not require careful creation of seed sets, does not require a determination of when to stop training, and does not require the selection and review of large random control sets, training sets, or validation sets.

After being exposed to the initial training set, a computer using CAL repeatedly selects the next-most-likely-to-be-relevant documents for review, coding, and training, and continues to do so until it can no longer find any more relevant documents. CAL resembles an internet search engine because its presentation of documents to the user ranks them from most likely to be relevant to least likely. As it works, CAL refines its decision-making process about which of the documents are most likely to be relevant based on a user’s feedback.

CAL saves time and money

Research comparing CAL with other methods of supervised machine learning shows CAL achieves better efficiency and effectiveness. Comparisons show a review team would have to look at substantially more documents using protocols other than CAL. In one example, 50,000 more documents would need to be manually reviewed. Assuming the cost of review is $1 a document, CAL would provide $50,000 in savings.

Provided the proper algorithms underlying the TAR tool are being used, CAL will achieve superior results, with less review effort than the other protocols.

Click here to read more eDiscovery TAR success stories.

Filed under: continuous active learning, eDiscovery, predictive coding, technology-assisted review