Howard Sklar, Senior Counsel, Recommind, discussed the value of predictive coding in improving time and cost efficiencies while improving accuracies in document discovery and review processes.

PDF available for download here.

LAUREN EVERHART: Can you start us off by giving us a little bit of your background and your role at Recommind?

HOWARD SKLAR: My background is mainly in regulatory compliance. I started as a prosecutor in Bronx County, New York, where I developed a specialty in computer crime and investigations. Then I went to the Securities and Exchange Commission, where I spent six years as a senior enforcement counsel in the Branch of Internet Enforcement. We initially dealt solely with Internet fraud, but we eventually branched out and became a generalist organization. I took on virtually every type of case that the SEC handles in those six years. After the SEC, I went in-house at American Express as their head of compliance for three operating divisions, specifically for the compliance of the travel-related business lines: both consumer and business-to-business. Those businesses comprise about 25,000 employees operating in 100 countries. In addition to that, I was tasked with starting, building and running the global anti-corruption compliance program for all of American Express. I did that for several years, and then I went to Hewlett-Packard, where I ran their global anti-corruption program and also worked with the global trade division on compliance with U.S. sanctions programs.

From there, I came to Recommind. At Recommind, I help customers and potential customers understand the value proposition of our offerings, including predictive coding. I do a lot of writing and speaking on that matter. I also handle some of our internal compliance efforts.

What is predictive coding, and how is it different than ediscovery and other forms of discovery?

Predictive coding is a patented process that incorporates some of the technology that Recommind developed. Essentially, it is a way of reviewing documents that adds incredible efficiencies to the document discovery review process to save companies and firms both time and money.

Discovery is the most expensive part of litigation. Each side in a litigation matter has the responsibility to conduct a reasonable search for documents requested by the other side. Previously, law firm associates did this by collecting hard copies of all potentially relevant documents from warehouses and then having a room full of people read through all of those documents one by one. Then we started getting into technology-assisted reviews, in which documents were stored on a computer as files. It still required a human being to look at every single document, and there might be as many as five million documents to sift through.

The volumes of documents started getting such that it was no longer cost effective to go through them that way, and companies began trying to create efficiencies in labor costs. So instead of hiring associates to do that work, they turned to paralegals. Then, instead of paralegals, they looked to contract lawyers; then they went to contract lawyers in India; and then everyone was using contract lawyers in places like South Dakota. But labor costs can only go down so far.

Eventually, technologies began to be developed to help lawyers prioritize the reviews, so instead of having to look through all five million documents, you only had to review a subset of that to search for relevant documents and then pull only those relevant documents. That technology has since gone through several evolutions and now revolutions. Certain mathematicians came up with a way for the computer to search not just for what words are in a document, which is what we call a keyword search, but also the concepts in a document, irrespective of the words used. So, for example, if I say in one document that I love cheese and went to a great restaurant last night, and then I say in another document that I tried some really good fettuccini Alfredo at dinner last night, you could do a food concept search through the rest of the documents and the computer would provide all documents that mention food. There’s no way you would find all of that information if you were just doing a keyword search for that same amount of information. You would have to do individual searches for fettuccini, restaurants, cheese, etc., which wouldn’t be nearly as thorough or easy.

That technological ability is the heart of the predictive coding process. In predictive coding, you put together a seed set of documents that represents what you’re looking for—it doesn’t have to be a large set; it can be a small set of documents—which then goes into a computer program that has already read and analyzed the information included in the five million documents I mentioned earlier and understands what concepts are included in them, whether they’re about food, a mergers-and-acquisitions transaction, stock fraud or something else. So you input the concept you’re looking for, and the computer does a search for similar information and produces documents it considers responsive. Then a human being reviews those documents and judges whether those documents are truly responsive or not, and you feed that information back into the program. You keep going through these iterations—inputting a seed set, reviewing the results and putting the right information back into the system—to find more documents to refine the process. The program gets better at finding what you’re looking for as you go along, and at some point it will say that it can no longer find any documents with the concepts you’re searching for. That happens after you've reviewed between 1 and 25 percent of the total set. So instead of looking at five million documents, you now only have to look at 50,000 to a million. You save 75 to 99 percent of your review time and expense through the use of this process.

Are there other elements of predictive coding that make it extremely cost effective?

Yes, another cost-effective element is the way in which the system works, which is to produce the most relevant documents first. This is particularly important in a regulatory investigation, where the regulators request documents in an abbreviated time frame. In that situation, you need to start a rolling production of documents to your regulator as soon as possible.

Before now, finding a highly relevant document was frankly hit-or-miss; it was just a question of where that particular document was located in the set. If you had a set of 2,453,000 documents, you might be halfway through your review before you gave that document to the regulator. With this new technology, the cream rises to the top, and you are able to give the regulators the best documents faster. That gives you credibility. It also potentially allows you to shorten the scope of an investigation, and if you can control the scope of an investigation, you’ll have significantly lower costs. Anything that helps you to do that is a massively cost-effective tool.

In talking about predictive coding, we also often refer to automated coding. How are the two different?

They are actually very different. Unlike automated coding, the predictive coding system does not produce documents to the other side that a human being hasn’t reviewed. It is a review-prioritization process that emphasizes senior involvement in the case by putting documents in front of people that are most likely to be relevant at a much faster rate than ever before. The nice thing about the predictive coding system, and one of the reasons that it’s defensible, is that it tells you why it thinks it’s relevant by highlighting which words and phrases in the document it found relevant. So you can defend that selection in court, if necessary.

Obviously, that affects the amount of risk associated with this form of discovery. Can you describe some of the main cases that have addressed the risk aspect and what has led to predictive coding successes?

I don’t see a future in which predictive coding is not the standard of review for companies and law firms. There wasn’t a lot of data about this before, but people have been doing research on the subject and found that our traditional processes are incredibly risky. Predictive coding presents less risk because it is more efficient and more accurate, and it’s been totally accepted by the courts for that reason. It not only gains you a higher confidence that your overall review was accurate, but it also makes the human reviewers more efficient and accurate because they can avoid the mind-numbing doldrums that come with looking at irrelevant document after irrelevant document. When that happens, it’s much tougher to stay focused when they finally come across a high percentage of highly relevant documents.

The courts have always been behind the curve in terms of addressing new technologies and new ways of going about things. Before they get involved, something has to be adopted, and then it has to have widespread use, and then it has to be challenged. We are at that point now with predictive coding, where it is gaining widespread acceptance and has been used in hundreds of cases. A couple of those cases have been challenged, and in those cases, the courts have found that predictive coding is an acceptable method of review. In fact, there was one case in which a party went even further and declared not just that predictive coding should be allowed but that it should be mandated by the court. Each side has backed off on the issue in that particular case, but that was the first time that argument was raised. It won’t be the last. Predictive coding will continue to be a judicially accepted method of conducting reviews.

Switching gears slightly, what’s the significance of confidence levels? What is the difference between a 95 percent confidence level and a 99 percent confidence level? Or is it not significant enough to be concerned about?

Confidence levels are defined at the end of a review. Any time you conduct a review—whether you’ve done it with a keyword search or with another process—when you’re at the end of a review, you look at the documents you either found to be irrelevant or that a human being has not looked at. In predictive coding or a keyword search, you will always have a batch of documents at which nobody has looked because they didn’t contain any key words or concepts you requested the computer to search for. So you’re left with a batch of documents about vacation requests, Fantasy Football and other things that are not relevant to the litigation. You then have to sample those documents in a process called statistical sampling. Statistical sampling has been around for quite a while, around a hundred years or more. At its most basic, it involves getting statistics from a sample batch of documents to show how accurate you were in not leaving anything important in the junk pile. You want a certain level of confidence about that, and you gain that level of confidence by the amount of documents you review from the junk pile, which can vary a lot based on how confident you need to be.

The requirement is that the search be reasonable, and because we use computers that are capable of extreme accuracy, we overcompensate a lot of times and say that we want a 99 percent confidence level plus an interval level, which is the margin of error. People talk about intervals in polls during presidential elections, where one candidate may have 95 percent of the votes with a +/-2 percent interval of error. It’s the same concept in statistical sampling during discovery review.

If you want a 95 percent confidence level, +/-2 percent, you will have to review a certain number of documents in order to meet that confidence level. So you may have to review something like 3,945 documents. It’s very specific because it’s all mathematically derived; the statistical sampling tables are all set and have been for a long time. So you review those, and in that pile, you’re allowed a certain number of errors within that 95 percent—an error is, of course, a relevant document in the junk pile—and if you’re better than that number, you’ve passed your test. You can say that you have a 95 percent level of confidence, +/-2 percent.

I have a hard time seeing the difference in reasonability when the standard confidence level is between 95 and 99 percent and you’re +/-2 percent below that. Frankly, it’s asking for extreme levels of sensitivity in an area where all you have to be is reasonable. So I think either one will pass. If we were to get down to 70 percent confidence levels, then there’s a discussion to be had there, but the difference between 95 and 99 percent is very case-dependent at this point.

What would your rebuttal be to somebody who is concerned about losing the savings while dealing with coding objections?

I don’t think that’s a realistic worry; it’s a worry promoted by the marketing departments of companies that don’t have predictive coding. In certain cases, objections to keyword search string costs are something to consider, but predictive coding will not be challenged to any extent greater or less than any other search methodology someone uses, and you’re not going to get any more or fewer objections to predictive coding than you would with anything else. It’s not something with which you lose incremental cost savings. And as the industry adjusts to the new reality, we’ll have even more savings than we do now.

Is there anything else you’d like to add?

There is one more thing I would like to point out. We’ve talked about predictive coding in a one-use sense, which is to produce documents to the other side in litigation. There are two other uses that are just as important and have the same kind of efficiencies.

One of these other uses is in incoming document productions, where you receive requested documents from the other side and have to go through them to identify important documents. You don’t want to spend a lot of time and money doing that, so predictive coding is very valuable in those situations as well.

We sometimes hear that predictive coding is great for big cases but not as valuable for small cases. I think it’s exactly the opposite. When you’re dealing with a small case, you will often have very limited funds with which to do a review. If you have a case that’s worth a million dollars, you’re not going to spend $750,000 on a document review. You’re extremely limited in terms of budget, and taking advantage of predictive coding’s efficiencies will probably be even more important to you than in larger cases.


Howard Sklar
Prior to joining Recommind, Mr. Sklar was a global trade and anti-corruption strategist with Hewlett-Packard Co. At HP, he was in charge of the company's global anti-corruption compliance program. He was also counsel to the global trade division, giving relevant business units advice on compliance with U.S. sanctions. Before HP, Howard was the Global Anti-Corruption Leader for American Express Co.

Before going in-house, Mr. Sklar served 12 years as a prosecutor and regulator, first as an assistant district attorney in Bronx County, NY, where he developed a specialty in computer crime investigation, and then as a senior enforcement attorney in the Branch of Internet Enforcement at the Securities and Exchange Commission. Mr. Sklar has lectured on the FCPA both nationally and internationally.

Lauren Everhart
Lauren Everhart is a director at Argyle Executive Forum. In this role, Lauren manages and leads client experience and client service delivery for Argyle’s content and event partners. She also manages the content development, editorial speaker recruitment, and execution of a number of Argyle’s annual business events. Lauren has been with Argyle Executive Forum since 2008. She holds a Bachelor of Arts degree from the State University of New York at Albany and a J.D. from New York Law School.