Following Episode 1
In this relay series, Samsung Newsroom is introducing tech experts from Samsung’s R&D centers around the globe to hear more about the work they do and the ways in which it is directly improving the lives of consumers.
The second expert in the series is Lukasz Slabinski, Head of the Artificial Intelligence Team at Samsung R&D Institute Poland (SRPOL). Slabinski joined SRPOL in 2013 as a Senior Engineer, and following 8 years of dedicated work, now leads the AI Team at SRPOL. Read on to hear more about the exciting innovation Slabinski and his team are involved with at SRPOL.
Q: Designing solutions for the speech recognition field is known to be highly intricate. When working on language-related technologies, what challenges have you encountered and how have you been overcoming them?
In my opinion, language-related technologies are far more complex than any other ones. Humankind communicates in almost 7000 constantly evolving languages, sub-divided into endless accents and dialects. Moreover, human language is far less objective than, for example, a picture, which can be described in mathematical formulas. People encode their thoughts as a set of sounds or characters into a message, which then needs to be decoded and interpreted by others. Because each phase of this process is personal, creative and non-deterministic, language-based human communication is very complex and ambiguous. Thus, on the one hand, we can enjoy beautiful poetry and funny jokes, and on the other, occasionally suffer from misunderstandings.
The R&D people who work on natural language processing (NLP) often reach their own, innately human, limitations. Even we encounter issues communicating clearly with colleagues at work, or family at home. So how, for example, can an engineer who speaks 2 languages design and code a machine translation system for 40 different languages? We solve this paradox using machine learning technologies.
During the process known as ‘training’, we automatically extract general patterns based on examples from our datasets and memorize them in the form of a model. To build a machine translation system, we train a neural network to map a sentence in different languages based on millions of examples, all carefully collected and cleaned beforehand. It sounds easy, but we deal here with 3 fundamental challenges.
The first challenge is the design of an appropriate machine learning model architecture capable of memorizing and generalizing enough language patterns for given problems such as machine translation, sentiment analysis, text summarization and others.
The second challenge is the preparation of sufficient amount of training data, as machine learning systems can recognize and memorize only those patterns presented in the training dataset.
The final challenge is the deployment of an already-trained machine learning model onto a dedicated Cloud or on-device platform.
We address these challenges by harnessing the vast expertise of our engineers, sophisticated approaches to collecting data and through endless experimentation with the state-of-the-art machine learning architectures.
Q: Can you please briefly introduce your AI Team, the Samsung R&D Institute Poland (SRPOL) and the kind of work that goes on there?
SRPOL is one of the largest international software R&D centers in Poland. It is located in two cities: Warsaw, the capital city of Poland and Cracow which is a major technology hub in its region. We closely collaborate with local start-ups, universities and research institutions.
The mission of the AI Team at SRPOL is the creation of the AI-based features, tools and services capable of facilitating and enriching human lives. We mainly focus on the NLP and Audio Intelligence areas, but we also possess expertise across many different specialties, including recommendation systems, indoor positioning, visual analytics and AR.
Q: As the head of the Polish Institute’s AI Team since 2018, you have overseen a myriad of projects both with and without the NLP focus. What are you and your team working on now?
Regarding the NLP area, we have been continuing our journey that began over 10 years ago by the development of systems such as Machine Translation, Dialogue Systems including Question Answering and Text Analytics. We work both on scalable, powerful cloud-based services as well as on fast and offline working on-device applications.
Audio Intelligence is a newer area for us. We began to focus our research capabilities on it around several years ago as the area had been gaining importance. Currently, we work on sound recognition, separation, enhancement and analysis. During our work, we take all levels of audio processing into consideration, from acoustic scene understanding to the fine-tuning of the embedded audio algorithms on devices with very limited hardware resources, such as wireless earbuds.
Q: Your technological focuses include NLP, text & data mining, audio intelligence and more. Has your research directly affected the development of any specific Samsung product or service, and what benefit has your team’s contribution offered to users?
SRPOL has a long record of commercializing AI technologies, but we did not do it alone. We are proud to be a part of a bigger picture, wherein SRPOL works closely with other Samsung R&D centers and contributes to commercialization.
For example, we contributed to the development of several intelligent text entry features for Samsung’s mobile devices, including the on-screen keyboard, hashtag feature, Samsung Note title recommendation and smart text replies on smartwatches.
We also contributed to the Galaxy Store’s Recommendation System, which suggests the most interesting games to a user based on their preferences.
Q: As an advocate for the new AI fields such as audio intelligence, what do you see as the main trends within your industry right now? How will this technology affect people’s daily lives?
I do believe that audio intelligence will be the next game-changer for all consumer electronic devices. Working on audio analytics is extremely important, as it is the missing part in advanced, truly human-centered AI-based systems.
Powerful NLP systems analyze the user’s intent as expressed by text and speech. Computer vision algorithms are behind almost every camera and visual content’s output. For most of us, it is hard to imagine driving a car without navigation, typing a message without spelling correctors, or searching for information without the Internet. But, except for a few professional applications, so far, we very rarely use intelligent audio technology to enhance our hearing. In my opinion, this is set to change soon.
Let’s imagine that we have a commonly available technology that allows people to select what and how they want to hear. For example, during a lunch with a friend in a park located in a busy city center, someone could choose to hear only the sounds of nature and the person they are speaking with. Or, let’s imagine an advanced VR or AR system, recently referred to as Metaverse that creates an immersive 3D audio experience directly in people’s heads. Just these two concepts generate hundreds of new possible use cases, but let’s go further. How about hearing things that are currently inaudible to people? Now humans can hear only a narrow spectrum of different sounds. Our world is full of meaningful sounds which, for the most part, the current AI technologies are not involved in. With the development of the audio intelligence technologies, I believe that all of this is going to affect people’s lives hugely.
▲ Researchers at Samsung R&D Institute Poland work on Active Noise Cancellation (ANC) technology development with a Head & Torso Simulator (HATS) in an anechoic room.
Q: How have you been incorporating the current trends into the research you do at Samsung R&D Institute Poland?
Aside from NLP and Audio, we are also working to find the most effective ways to build truly multimodal systems. To do that, we proceed with research and analyzing use cases from different perspectives. Such analysis is made possible thanks to our diverse and interdisciplinary team that consists of engineers, linguists, data scientists and more.
Q: What has been your most important achievement at SRPOL so far?
That would be our Machine Translation solution. Our solution has garnered wins at various competitions for five years straight: the International Workshop on Spoken Language Translation (IWSLT) from 2017 to 2020; the Workshop on Machine Translation (WMT) in 2020; and the Workshop on Asian Translation (WAT) in 2021. These are among the most prestigious international competitions in our field.
Winning recognition at WAT this year was a particularly satisfying milestone, as developing our solution for the Asian languages was originally a difficult feat for us as Polish engineers – but this achievement has proven the true power of our technology that goes beyond a mere demo showcase.
Another achievement that I am very proud of is the speed of growth that the audio intelligence team and its technology development have achieved. In just a few years, after starting pretty much from scratch, we were able to stand on the podium of the workshop on Detection and Classification of Acoustic Scenes and Events for two consecutive years, 2019 and 2020. We have also published several scientific papers and patents in this area. I am sure this is just the beginning of our prolific activities in this field.
An interview with Bin Dai, a machine learning expert from Samsung Research Institute China-Beijing can be found in the following episode.