Author’s Note: This blog was co-authored by Emanuella Wallin, Qing Zheng, and Alexandra Kukresh, data scientists on Seal’s Machine Learning (ML) team.
Pictured from left to right: Emanuella Wallin, Qing Zheng, and Alexandra Kukresh
2018 has seen an increasing number of new trends and concerns around AI and machine learning. Research communities, policymakers, media, and other stakeholders are debating about risk, bias, and discrimination within the AI field.
While attending last week’s Nordic Women and Data Summit in Stockholm, we were pleasantly surprised that nearly every speaker highlighted the importance of diversity in data science, quality, and management. This forum inspired us to explore how we, at Seal, proactively work on new processes and tools to build AI software with quality and trust.
Diversity in data science is key to overcoming bias. Because data scientists are critical to how data is used and models are designed, the team of researchers involved in AI production has to be diverse. Here at Seal, the Machine Learning (ML) team consists of 20 skilled data scientists of varying age groups and countries of origin. With backgrounds ranging from mathematics and machine learning to linguistics and law, we bring unique perspectives and experiences to design the AI products at Seal. Working in such a diverse environment results in more effective collaboration and productivity.
Data privacy, quality, and diversity are key factors needed to build trust in AI. In our most recent initiative, we implemented a set of best practices to manage data and model quality within the Seal platform.
The changing regulatory landscape (namely the introduction of GDPR) has driven us to improve data security, storage, and processing. In order to promote and encourage best practices for handling personal data and confidential information, we’ve put an enormous amount of effort into the creation of a data lake and data management infrastructure including data security labs, guidelines, a data factory, etc.
The data lake that we’ve created within Seal is a centralized repository for raw data, boasting high security and well-defined backup policies. The data factory governs the conversion of raw data to clean and indexed files ready for analysis. All of these systems go hand-in-hand to create streamlined processes that prioritize data privacy and security.
The golden rule of machine learning is that the model should be trained on high-quality data. This means that we must have a very good understanding of the data we work with. For this reason, we involve legal experts and linguists at every stage of data preparation and review. Technical expertise and user-friendly tools are an integral part of the process. The ML team has built and implemented a number of pipelines, annotation tools, and guidelines that make the process of working with data easy, timely, and scalable.
We believe that building trust in AI products is a continuous process. Just a few months ago, we established a model and data management framework aimed at maintaining and improving both data and model quality. This framework requires collaboration between feature engineers, QA engineers, and our in-house legal team. To that end, Seal’s AI-powered products are rigorously tested, and quality is assured before release and continuously monitored thereafter.
Diversity within the research group creating an AI product is essential but not enough to ensure data diversity. A good machine learning system has to be trained with large sets of diverse data. This approach enables Seal to have a range of algorithms and datasets customized to solve specific customer problems with increased accuracy. Having diversity in models is not an easy task, it requires hundreds of experiments in addition to time and resources. The Seal team has built lab pipelines that enable us to minimize several manual steps, as well as scale and track the process. Every member of the team runs dozens of automated tests daily, which are used to make decisions for future models and determine inclusion within the platform.
By being innovative and visionary, the Seal team not only delivers a valuable product for customers but contributes to building trust within our AI solution. In order to achieve this, we work continuously to build new tools and frameworks that improve data diversity, quality, and privacy.
To learn more about AI technology, read our AI for Everyone white paper.