Deep Learning – War Stories

Authored by Svetoslav Marinov, PhD, Aron Lagerberg, PhD, and Jacob Sznajdman, PhD

With the recent upsurge in success stories about Deep Learning (DL) and the availability of both cheap hardware and open source software libraries, the general impression is it is easy to apply deep learning techniques to just about any problem and succeed by using generic, off-the-shelf methods. This is not usually the case.

The last blog from the ML team here ended with a famous quote from Sun Tzu:

“… if you know yourself and know your enemy, you will gain victory one
hundred out of a hundred. … if you do not know your enemy, you will meet one defeat for
every victory”.

Knowledge of data and domain expertise is what makes the difference between winning and failing. Yet, the perception from the media is the advances of Deep Learning render the latter claim void: you no longer need to know your data, you can just apply a little DL magic and your problem will be solved. This, however, lies far away from the reality we face every day as data scientists within the legal domain.

“When planning a victory according to my counsel, act according to the situation
and make use of external factors. To act according to the situation is to seize
the advantage by adapting one’s plan”

Sun Tzu, ‘Art of War’

To be clear, Deep Learning networks have enormous capacity and promise. But this capacity implies that you need to design your networks carefully to be able to find the structure in your data, which in turn requires domain knowledge. You cannot use an out-of-the-box Convolutional Neural Network (CNN) model and expect state-of-the-art results. You cannot use standard sentiment classification networks and just create a winning movie review app. In order to succeed when applying DL methods to legal domain data, you truly need to understand it. This will give you knowledge to properly adapt your plan and tools to your task.

Arguably, one of the first triumphs in DL, was the pioneering work of LeCun et al, who built a Deep Learning model which could understand raw images. This was back in the 90’s. Twenty years later, Jeff Dean, head scientist at Google Brain states in this article:

“There are fundamental changes that will happen now that computer vision really works”.

Without question, the foundation for this victory is all the hard work, many lost battles and vast amounts of knowledge by numerous teams and experts.

“A general who recklessly underestimates the enemy is sure to be captured.”
Sun Tzu, ‘Art of War’

The state of DL methods for Natural Language Processing (NLP) is very different than that of image recognition. The problem in itself seems to be harder, since the structure of a language is more complicated than that of a picture.

For instance, in an image, the meaning of the data tends to be localized, thus to understand if a given pixel belongs to a nose or an ear of a human face, you need only to look at the pixels nearby. This is not the case with text, however. Words related to each other may end up far away from one another, yet this is not a hindrance for our brain to understand the text. Languages are neither uniform nor standardized. They have their quirks, where semantics is hidden in metaphors, idioms, colloquial expressions, not to mention the explicit obfuscation of meaning by complex embedded structures. Therefore, one should not underestimate the enemy and its traits.

Furthermore, in the visual domain, data is everywhere! Facebook, Flickr, Instagram, etc. are bombarded with millions of photos by users of the most diverse motives every day. In the legal domain, these kinds of volumes are much harder to come by, and the price tag is much higher. This scarcity of data means that it is all the more important to understand it and correctly handle it. In the realms of Deep Learning, this also entails that we carefully have to construct networks to be able to capture the meaning and structure of legal terms, clauses and documents.

“What tends not to work very well today, and is a very active area of research, is areas
where we don’t have very much example data of what we want the system to do, where you have very few labeled examples.”

Jeff Dean on Machine Learning Problems Yet to be Solved

The legal field is a perfect example in support of Dean’s statement. On one hand, the size of data where we have good, labeled examples of certain legal provisions can be far too small for training DL networks. One the other, the nature of contracts has another quirk as well — a provision consisting of only one sentence may or may not exist in a thousand-page contract. This renders our task of successfully finding the needle in the haystack with the highest precision a difficult one.

In order to address these problems, we have built a proprietary framework and environment to be able to easily conduct vast amounts of experiments. In our Machine Learning lab in Sweden we devise ways to work with and find successful solutions to many tricky and hard questions. Our machines work around the clock, constantly seeking to improve the latest results, both using state-of-the-art Deep Learning approaches as well as classic Machine Learning techniques. In the end, the framework automatically picks up to utilize the best models for a given task. But even with today’s advancements in computational power, these networks and models may take long time to train, often several days, and is not unusual for us to devise experiments that take weeks to finish.

Many results will be disappointing, many parameters will be useless, and many models will be unusable. But, sweet are the victories when everything falls, finally, into place and we achieve the desired result. The surviving models on this battlefield of Machine Learning, stand victorious and find their place into the latest release of our product, ready to be applied to customers’ data.