E-Discovery

Page 8 - E-Discovery
P. 8



S8 | MONDAY, OCTOBER 5, 2015 | E-Discovery
| NYLJ.COM





Practical Considerations 
or “negative,” often with relevance scores. A 

higher score does not necessarily mean that 
Predictive Coding
a document is more relevant, but rather that 
In Using the tool has determined that it has a greater 
likelihood of being relevant.
Predictive coding can also be effective on 
foreign language documents, including Asian 
languages.


Quality Control

An additional step that is frequently taken, 
although not always deemed necessary, is to 
“validate” the effectiveness of predictive cod- 
ing through a quality control check. Reviewers 
code a random sample drawn from the overall 
document population, excluding documents 
from the seed and training sets. This sample 

is known as the “control sample” or “valida- 
tion sample.”
The coding of the control sample is then 
compared to the tool’s decisions on the same 
documents. If the number of “false positives” 
and “false negatives” in the predictive coding 
results—as compared to the control sample— 
is acceptable, the training is complete. If not, 
you may seek to improve the results with 

further training.

Review Before Production

A few years ago, when predictive coding 
first gained some notoriety as a technology 
for document review, some envisioned docu- 
ments being blindly produced after only the 
“computer” reviewed them.

The typical workflow that has emerged in 
practice, by contrast, is to review, prior to any 
production, documents that the predictive 
coding tool has identified as likely relevant. 
This allows for false positives—i.e., irrelevant 
documents—and privileged documents to be 
removed before production.


Continuous Training

Predictive coding technology has been 
evolving. One noteworthy development 
has been the appearance of tools utilizing 
a training methodology known as “continu- K
ous active learning” or “CAL.” CAL, in effect, TOC
combines the training and final review phases IGS
described above.
B

After initially training the predictive model 
with a seed set, a CAL tool will present review- 
ers with documents that it has identified as relevant. To do so, it utilizes machine learning, to be relevant documents, or through other 
likely relevant and others it has strategically BY GARETH EVANS
in which reviewers code sample documents means.
selected for training. The review continues— AND JENNIFER REARDEN
drawn from the overall document population.
After processing the seed set, machine 
and the model is continuously trained—until Essentially, the predictive coding tool iden- learning is then refined through iterative 
all the relevant documents have been found P redictive coding has tremendous appeal, tifies other documents in the population that review of “training sets.” These are batches 
at the desired rate of recall.
at least in theory. As a practical matter, share similar features with the sample docu- of documents that the tool selects for review- 
Vendors of CAL tools claim that they train however, many have been deterred from
ments coded as “positive” (i.e., relevant or ers to code until the predictive coding model 

the predictive model faster and that reviewers using it because various hurdles can arise. responsive) or “negative” (i.e., irrelevant or is “stabilized,” i.e., when additional training 
end up reviewing fewer irrelevant documents Nevertheless, with some forethought and non-responsive).
does not result in any meaningful improve- 
than with other tools.
preparation, and by involving those with the ment in results.
right expertise, many of the hurdles can be How Does It Work?
Some predictive coding tools select train- 
What’s in It for the Producing Party?
overcome, or at least minimized, and parties ing documents strategically instead of just 
may more often realize the potential benefits To understand how to make predictive cod- randomly, e.g., documents that appear to be 
For the producing party, significantly of predictive coding.
ing practical, you first need to have a general close to the boundary between “positive” and 
increased speed, substantial cost savings and understanding of how it works.
“negative,” or samples from clusters of simi- 

improved accuracy are among the potential
What Is Predictive Coding?
The traditional workflow for predictive lar documents. Using these techniques, the 
coding has involved commencing machine model may achieve stabilization more quickly.
Predictive coding—often referred to as learning with a “seed set” of pre-coded docu- The tool then applies the learning from the 
GARETH EVANS and JENNIFER REARDEN are partners “technology assisted review” or “TAR”—uses ments. The seed set can consist of a sample seed and training sets to the entire document 
at Gibson, Dunn & Crutcher’s Orange County and mathematical and statistical algorithms to selected at random, through the use of initial population. It identifies the likelihood that the 
New York offices, respectively.
determine whether documents are likely to be
search terms, documents already determined
remaining documents are either “positive”
6 7 8 9 10