SCOTT ROBBIN: How have you seen the discovery process evolve throughout your career?
NATHALIE HOFMAN: The discovery process has evolved tremendously. The first cases on which I worked didn’t use any culling mechanisms at all. Naturally, the responsiveness rates were abysmal. The adoption of search terms became more prevalent, and search term syntax and use quickly became more sophisticated. The emergence of analytics in the e-discovery space, including substantive search term analysis and early case assessment work, had the biggest impact on discovery and has paved the way for adoption of the use of machine learning tools and technology-assisted review.
Why has the e-discovery process become such a headache for law departments today?
There is a significant amount of pressure on law departments as business units. Budgets are shrinking, but the demands are increasing because of the volume and complexity of the data. In addition, the landscape is becoming more technical and requires a substantial amount of knowledge. Outsourcing was once optional, but it is now almost a requirement, as in-house departments often do not have the bandwidth, expertise, or architecture to support large e-discovery matters. Having a trusted consultative advisor who can help law departments wade through the landscape, both on a case-by-case basis, as well as to design a process and approach overall, can be tremendously beneficial.
Can you tell us about Huron Legal’s Integrated Analytics offering? How does Integrated Analytics eliminate up to 70 percent of irrelevant data?
Huron Legal believes the use of Integrated Analytics can provide significant cost savings to our clients. Our innovative Integrated Analytics offering is a hybrid of machine learning and predictive coding and is aimed at reducing the non-responsive documents in the review population. The Analytics team, composed of data miners, statisticians, and legally trained technologists, uses state-of-the-art tools and expert-vetted processes to drive significant reductions in the size of the population to be reviewed. Importantly, the integrity of the process is validated through a rigorous independent, statistical standard, Accept on Zero (AOZ), which is used by other industries, including the Department of Defense, pharmaceutical, and manufacturing. The process is conducted in three phases: the assessment phase, the training phase, and the elimination and review phase.
In the assessment phase, a statistical random sample of approximately 2,600 documents is reviewed using a double-blind methodology. This review may be conducted by the client’s legal team or by highly vetted analytics reviewers in Huron Legal’s state-of-the-art analytics facility. The double-blind approach allows for ambiguities in the protocol to be easily isolated and decided. The legal team is an integral part of this process, as it provides guidance on the protocol and serves as the arbiter of all of the conflicts found in the double-blind review. The assessment phase identifies the initial conceptual universe of documents and provides insight into the probable responsiveness rate of the population.
During the training phase, several iterations of sample rounds are conducted to further “train” the system. Both random samples and targeted samples are created to isolate as many concepts present in the collection as possible. The training process is verified through several key statistical and software metrics. The result of the training phase is to identify those documents that are likely non-responsive to the issues in the legal matter.
Once the training process is complete, the elimination review phase of the project commences. Those documents identified as likely non-responsive based on the training phase are isolated, and a sample set of those documents is reviewed using AOZ statistical methodology. Using the AOZ approach, if a single responsive document is found in the sample set, the set is rejected, a triage and correct process is triggered, and a new assessment is conducted within the population to ensure conceptually similar documents are isolated and reviewed. A new AOZ process will then be completed to ensure no additional outliers are identified.
Simultaneous to the elimination review, a responsiveness review is conducted on the documents found likely to be responsive. The documents are reviewed in optimized documents sets, ensuring excellent quality and efficiency. In addition, the machine learning scores are factored into the quality control and assurance procedures to derive the benefit of technology-assisted quality assurance. Because of the scalability of the Huron Legal process, the project will be staffed appropriately to ensure the production schedule is met.
Huron Legal keeps the legal team apprised of progress throughout the process with graphical reports and update meetings. At the completion of the review, an audit trail package is provided to the legal team. Additionally, as part of this offering, Huron Legal retains a team of key resources to assist in defending the entire process, including machine-learning and statistical experts who are willing to testify and/or submit affidavits to support the high standards and reasonableness of the process. These resources are provided at no additional charge to the client.
Can you share any success stories of clients that have adopted the Integrated Analytics solution?
We just completed a case that had a very large data set, a tight deadline, and a difficult opposing party – the e-discovery trifecta! We used the Integrated Analytics process and were able to make significant cuts to the population, thereby allowing us to meet the deadline and reduce the cost of the review.
Before we began the Integrated Analytics process, we completed an Immediate Case Assessment, which not only provided the client with a solid understanding of the document population and a significant number of the key documents in the case, but also allowed us to hone the search terms that were being applied in the case. Once the term list was finalized, we began the Integrated Analytics process and began to train the algorithm. Because the time frame was tight, any documents that were not viable candidates for machine learning were routed directly to review. This bifurcated workflow allowed us to be nimble and efficient. Additionally, as a standard part of the process, all documents that are used as samples for the algorithm are fully reviewed, including confidentiality and privilege decisions, so that documents never have to be re-reviewed and a bank of producible documents already exists by the time you reach stabilization.
We met the deadline without sacrificing a strong audit trail and defensibility and had a very happy client.
How would you respond to companies that are concerned about finding a balance between technologies like predictive coding and more traditional discovery methods?
First, and foremost, learning to identify cases that are or are not good candidates for predictive coding is vital. Not every case is a predictive coding case. There are many variables that can affect a matter’s fit, such as the clarity of the scope, timelines, ability of counsel to integrate, trainability of the issues for the algorithm, and volumes. When a matter arises, a strategic discussion with outside counsel, the service provider, and members of the internal legal team will allow for an informed decision that works for all of the players.
What are the major defensibility issues surrounding predi
ctive coding? Have there been any recent court decisions of note?
The major defensibility issues around predictive coding can be divided into two general categories: process and technology. On the process side, a party using predictive coding will need to be prepared to articulate the process used to “train” the predictive coding tool, including how seed sets were identified and how decisions were made on the relevance of documents. Transparency is likely to be key here, with judges and opposing counsel wanting a detailed understanding of the process employed. With respect to the technology, judges and lawyers tend to be skeptical of any “black box” technology, so describing how the software works (in combination with a well-defined process) may be required to obtain a positive ruling on defensibility.
There are several recent court decisions touching on technology-assisted review. The Dasilva Moore case was the first case in which a court approved the use of technology-assisted review. In that case, the protocol built in a process including transparency and participation by the plaintiffs in the defendants’ predictive coding process, as well as offering the plaintiffs an opportunity to contest results. Similarly, in Global Aerospace v. Landow Aviation, No. CL 61040 (Va. Cir. Ct., Loudoun Cty., Order, Apr. 23, 2012), a state court in Virginia held that the defendants could move forward with predictive coding for document production, though it was “without prejudice to the receiving party raising with the court an issue as to completeness or the content of the production or the ongoing use of predictive coding.”
The court in the Kleen Products case was confronted with a slightly different issue—a request by the plaintiffs to require the defendants to use predictive coding for production after the defendants made a number of large (and expensive) productions using search terms. Judge Nolan held two days of hearings, after which she determined that the employment of additional search terms in lieu of restarting productions using technology- assisted review was appropriate.
What should clients look for as they evaluate implementing solutions like Integrated Analytics?
There are two primary considerations. First, do you understand the basic process being used by the algorithm to make determinations about documents? If it is unclear how the algorithm’s decisions are being derived, that should raise a flag. You can, and should, expect to have a clear audit trail to accompany any results, identifying why a document was classified the way it was. Second, is there a well-considered, thorough, documented process being used in conjunction with the technology? Machine learning tools are not “plug and play.” There are many decisions that need to be made along the way, various statistical and data measures to be taken, and checks and balances that should be performed at predetermined intervals. Without a strong process, which includes a road map that has been tested and vetted on what to do when something goes wrong, you are at risk of sacrificing quality and defensibility.
Are there certain clients or industries that are embracing Integrated Analytics more than others? What do companies that are on the fence about implementing need to know?
There are clients at every level and in every sector that are embracing Integrated Analytics. That said, legal teams with strong technical understanding seem to make the leap more quickly than those that are less technology-savvy. If a company is on the fence regarding using Integrated Analytics, or any other predictive technologies for that matter, they should identify the source of their hesitation: Is it defensibility? Lack of trust in the technology? Concern over disrupting a workflow that is comfortable and familiar? All of these issues can seem very real and daunting. However, a frank and thorough discussion of the methodology and process behind the Integrated Analytics offering will generally help alleviate concerns.
“The legal community will continue to evolve. I expect we will reach a consensus in the next year or so as to the appropriate standards that should be implemented for predictive coding cases. Once that happens, acceptance and adoption will be widespread”
As technology is increasingly used to make the e-discovery process more efficient, where do you see the future going?
In the next five years, I expect that as an industry we will be much more nimble and able to react quickly to discovery demands. The legal community will continue to evolve. I expect we will reach a consensus in the next year or so as to the appropriate standards that should be implemented for predictive coding cases. Once that happens, acceptance and adoption will be widespread.
From a technology perspective, not only will we see tools that are able to derive results more completely and quickly, I anticipate that many of these processes will begin benefiting clients in new ways, such as identifying better document retention policies, isolating areas of legal risk, and offering other opportunities for improvement. I also expect that with the changes we are seeing in storage architecture, many analytics-based processes, including the use of predictive technologies, will soon be done upstream, without the need for widespread collection beforehand.