Not All Machine Learning Is Created Equal

There are more and more companies entering the Contract Discovery and Analytics market and that is a good thing. It confirms what we’ve known all along — that contracts hold critical information that helps organizations better manage their M&A activity, regulatory compliance initiatives, procurement and sales functions, make better decisions, and create competitive advantage for them in their markets.

These new entrants talk about using machine learning, essentially AI, to extract intelligence from contracts. The part they don’t talk about is that it is not just an ML engine that will successfully extract the valuable insight, but all the training and work that goes into it. It takes years to build an effective data extraction and normalization engine for contracts, and we know because we’ve been doing it for over seven years and have learned many valuable lessons along the way.

To shed some light on this, the customers we work with typically have two requirements. First, they have lots of contracts to be processed, typically from 10,000s to 100,000s from which they need to get information, and second, they have a specific set of business objectives in mind. These objectives can range from “getting their house in order”, migrating contracts to a business system, running broad scale analytics, kicking off a regulatory compliance initiative, performing due diligence, or any number of projects where contract data is needed. These are rarely one-time projects as the need for understanding contractual data is driven by ever changing regulations, business events or the need to look for cost savings or revenue generating opportunities.

Some projects will require lower levels of accuracy to achieve the informational objectives, and others may require much higher degrees of accuracy. These objectives translate into scores for precision and recall which are achieved by the approaches in tuning and training the system. Precision is the percentage of retrieved instances in a search that are relevant to the search (or how useful the results are), and recall is the percentage of relevant instances in the population that are retrieved in an extraction (or how complete the results are).

It is the combination of massive numbers of contracts, and specific precision and recall requirements for a project that drive the need for higher degrees of scalability and accuracy out of a ML system. On top of these is the need for increased usability, which means abstracting away the complexity of the system for business users, so that they, and not only data scientists can effectively use the system and maximize the value they receive.

So, the lessons learned and the work done by Seal over those seven years has resulted in a very scalable, usable, and accurate system. Our competitors have a long way to go.

It is important to mention there is a another factor in training that is almost always disregarded by companies and the competition, namely standard deviation. The standard deviation in terms of ML is the “trust-ability” of any model or method to extract information. When we talk about trusting a model, we expect there to be a low standard deviation.

Why is this important? Let’s take an example of training a solution with 20 examples. The system reports that the recall and precision values are very high, and that the model being used appears to be the most effective. So, they deploy this model to the data set of 200,000 contracts, but when reviewing the information, they see the extractions are very wrong with many errors. So, they are tasked with reviewing ten more contracts, as adding more data will surely make the model better, right?

But, once the model is trained again, the recall and precision values decrease significantly. The lawyers and users are at a loss as to why, as they were told by the vendor that only a few examples are needed and that adding more data will always make the system better.

Well, they are finding this is simply not true. A good model, like any statistical function needs data, but it also needs an appropriate amount of data before the swings in its learning are smoothed out. This is called the learning curve, and typically results in the gradual reduction in the standard deviation. To put this into context, suppose there was a cohort of 20 people, all American, from the East Coast states. Then a model learns from only those elements and it is very good on that specific dataset. However, if five people from Europe and five people from the West Coast are added to the dataset, the model now performs badly. This is because it had too few examples in the first place, and adding more data caused it to change significantly when so few examples are used. This is high standard deviation, and it is the reason Seal uses a traffic light system, and different algorithms dependent on data amounts and requirements, but done in a way that abstracts it from the user in a simple and automated way.

An example of a learning curve is below using the assignment clause often found in contracts as it is one of the harder clauses to detect. It can be within many different states and the wording is often counter intuitive and dependent on other clauses within a contract.

Shown above is the initial zoomed in learning curve for the first 600 examples for the Assignable model, with a learning curve that extends (not shown) out past 10,000 labeled examples. What is shown above is the initial learning curve with the standard deviation going high, and then leveling off with more examples. For example, if a user selected the model at 50 examples, they would have close to 1 in recall (the “R” in the graph), however the precision (P) is sub optimal. Adding 50 more examples and the recall decreases but the system has learned and become significantly more accurate. Our final model after tuning and testing is over 90% accurate. This is achieved by reviewing and manually checking and labeling multiple times the clauses in over 10,000 contracts from many different verticals, and many millions of unlabeled examples for testing, review and further training of our neural network.

For clarity in the above graph, the standard deviation is not shown directly, but we can calculate it using the following formula:

However, the standard deviation is visually apparent in the fluctuation of the P and R lines from 0 to 200-300 examples before smoothing out. Also for clarity, the graph shows an “F” line which represents the F(x) score, and is the weighted average of both the P and R scores.

Along with the intense focus on accuracy as described above, system scalability also was a critical need learned early in Seal’s life. We found ourselves pushed by large customers with 100,000s of contracts that needed to be processed in tight timeframes. One notable example was Thomson Reuters, who in 2011, asked us to extract content from 1.2 million documents located within EMC Documentum, SharePoint, home grown repositories, Oracle, and OpenText, and push them into Salesforce. That stretched us at the time, but it was only the beginning, as we have now worked with numerous customers with 500,000 to over 2M documents, pushing us and teaching us along the way. After just a few years, the dynamic configuration and extraction within Seal provided scalability of upwards of 800,000 documents processed in one day.

Usability has also been a key tenet for Seal since the beginning. We’ve always wanted the system to be used by the people who need data. If we had a system only appropriate for legal professionals or data scientists, we know the distance between extractions and business value may be too large to consistently overcome. The trick for our competitors is resolving the often-opposing forces of sophistication and power on the one hand, and the usability for business users on the other. We know we’ve solved it at Seal.

Finally, it is the ML engine, and the components that make up the broader platform that meets the precision and recall objectives for a particular clause. A ML engine cannot provide all the capabilities on its own to deliver the results that Seal does — it is several technologies and techniques working together that does it. These include:

  • Natural Language Processing (NLP) to optimize the capabilities for the system to understand written language and process it within the ML engine
  • Latent Semantic Indexing (LSI) for identifying and extracting information not presented in standard terms or language, but exists through associations of words or phrases or in different locations in a document
  • The use of Deep Learning methods to increase performance of the ML engine
  • The inclusion of UDML (in Version 5) to simplify training and automatically select the best model and hyper parameters for any given data, with users only required to select the text to train on
  • Including document review capabilities within the system for efficient side-by-side review and comparison across clauses and language
  • Extensive reporting and data visualization to be able to easily draw actionable insight from the data
  • Automatic discovery and linkage of related documents such amendments to master agreements
  • Simplicity within the UI for information layering and normalization, to allow the ML framework to effectively use all available information and to allow users and engineers to quickly find and prepare it for use

The result of this combination of technologies are a platform with unique capabilities for users. This includes the flexibility for users to provide just one example to the system and meet their objectives, or efficiently provide from 50 to 300 examples for different outcomes — depending on their needs. Add to this the extremely granular extraction levels, the normalization of numeric values, our extensive APIs, and our patented non-standard clause detection, and our customers receive extremely high value with the Seal platform.

So, to our new (and existing) competitors, welcome to the neighborhood. But, you must realize Seal has a significant lead in this market, and it will take a lot to catch up.