Machine Learning: To Supervise or Not to Supervise, is that the Question?

As many readers will know I like to use analogies within my posts to make, what can be complex concepts within the world of AI and Machine Learning, easier to understand. This post will be no exception.

Over the last few years the legal tech space has blossomed and with that many new companies have started, some copying what Seal is doing within contractual documents and content, whilst others are creating a new slant on the same concept. What is undeniable is that this area of technology is booming. As a result, there is much said publicly about which technology is best and specifically within this field of Machine Learning, which methods are the right ones to use.  Each vendor will undoubtedly have their slant to bring to the table and of course, there is room for many differing approaches. However what cannot be in dispute is the methods used and what they are used for.

Understanding Supervised and Unsupervised Learning

Without resorting to many a link from Wikipedia or other source, I thought I would detail, in simple terms, some methods used within the Seal platform and why they are used. I would state that these have been in use in some form or other right since Ulf and I founded the company back in 2010. I will also try and offer some reasons why confusion can occur and why this confusion is ripe in the current climate. I have read too many “we are the only company to use XX method to analyze contracts” not to help separate hype and misinformation from reality.

So let’s start off with the title to this post – “To supervise or not to supervise, is that the question?”. Why would I have such a title, as we all know ML comes in many flavors, but the two most used within legal AI are supervised and unsupervised.

Before we go into details, let’s just step back and consider this with the analogy of children in a playground and within the classroom.  When in lessons, the group of children are being taught by a teacher and are supervised to assist with learning. They know nothing when starting to learn from parents, yet in time they pick up everything they need to know based on a defined plan. This allows children to apply different knowledge to different tasks without being explicitly taught for that task. This is essentially supervised learning. It is also the same within ML, a model, in this case, is a child within the classroom.

So, what then is unsupervised learning within this context? Well, that is the break time, when the children go out to play. In a classroom they are normally grouped together based on some reason, like age, learning rate, ability or skill set etc. but within the playground where no supervision is given, or very little amounts, they cluster together based on very different qualities. For example, if they play a sport, age becomes irrelevant in many cases, but gender then plays more a role.

So does this mean that unsupervised is worse than supervised and children do not actually learn anything?  Clearly not, they learn different things as do the teachers on patrol. They learn which children have interests in common. They observe which children are outliers that may have fewer friends. So many other things can be understood from this automatic grouping.

Within the classroom, the supervisor has a very strong influence but in the playground where grouping is done unsupervised and undirected, the supervisor has very little say in what and how the groups are formed.  Is one better for learning than the other? Well in the case of children, no, it’s just they are different. Children learn many things that cannot be taught in the classroom when playing, and the two are critical to raising a balanced individual. As we’ll see, the same applies to software.

Supervised Learning in Legal Tech

So now let’s turn this back to legal tech and the platforms that use learning methods. We’ll apply the same logic.

Let’s start with supervised learning.

Supervised learning in the content of machine learning is using many different platforms with many different algorithms, such as SVM (Support Vector Machines), Maximum Entropy (with and without Conditional Random Fields) and Deep Learning with backpropagation. to name just a few. These are all in use within Seal.

Each of those methods could be called a classroom, where models are created based on each task at hand, like children learning Mathematics, English, French, Science and so on. Each model then has a teacher. You would not expect an expert in French say to also be an expert in say Design Tech. So each model is taught based on the teacher providing examples for it to learn from, or in the case of deep learning, providing the examples “after” the model has learned in an unsupervised way.

The last point is one of importance; it’s the first introduction to the combination of methods that I will come onto later. But put simply it’s a method that allows the system to work out links and/or inference within the data without any preconceived ideas. It’s learning information about data, from the data only.  

Unsupervised Learning Methods Used within Legal Tech and the Seal platform

Examples of common algorithms and methods include Latent Semantic Indexing and Analysis, a “nearest neighbor” method such as KNN, SVD (single value decomposition), Word2Vec for word and phase detection and reduction, and Naïve Bayes. This is by no means an exhaustive list but what they have in common is that they are all initially unsupervised methods. And like the children in the playground, you get clusters of documents or words based on similarities or other auto detected features. One clear example in software of where this is used is within the detection of Near Duplicates, where the system groups items based on the similarity of the document words, phrases, and sections. A further example is when performing clustering of information based only on the similarity of each given section. Both are simple in nature and require no supervision.

Let’s go back to the deep learning example, to focus on the combination of unsupervised and supervised. To get the best balance and overall learning skill set, combining the two methods allows a system to see and learn things that potentially are not visible at first, and with no supervision to force a path. Then with the known data examples given, the system can look to fit what it has learned alone to the data it is being told is of a set type. This would be similar to homework for a child. Learning to apply thinking to a problem and then being shown the correct answers, and if they got them correct, no adjustment is required, however, if the answers were slightly off, they could adjust the understanding. And this is what backpropagation does. It attempts to adjust the weights of the layers to meet the required output it is being told is correct.

From the preceding, it should be clear that a good system should contain both supervised and unsupervised methods within it. Providing only one method will give OK results in many cases but will not gain the best overall results. Ask yourself this – how would you like your child to grow up, only ever sitting in a classroom to learn and never having breaks and unsupervised time? Or only ever having breaks and never having a structured learning path? When presented like this, I am sure all of you will agree that a balanced combination of supervised and unsupervised learning is best.

And this is and has always been our view at Seal, balanced and combined. Learn in the best overall way to enable the system to be applied to new challenges that is has never seen before, with new domains. But still using the same underlying functions and methods it has learned when we first taught it. Ask your vendor how they prefer their children to learn, and then ask why the software they are selling is not the same…

Related articles

The Big 4 – Now Present & Correct in Contract Analytics

Blockchain: Why we Need Intelligent Contracts aka Smart Contracts 3.0?

How to Give “I” its Rightful Place In “IP” and “AI”