Three LINE Research Papers Selected for Largest International Conference on Acoustics, Speech, and Signal Processing, ICASSP

2022.02.17 Technology

● 10 papers from LINE and NAVER selected in total including seven co-researched pieces

● Quality improvement and practical application of speech recognition and audio source separation technologies were highly recognized

 

TOKYO – February 17, 2022 – LINE Corporation ("LINE") is pleased to announce that three of its research papers have been selected for presentation at the renowned International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022.

 

Held by the IEEE Signal Processing Society, ICASSP is the world's largest international conference in the field of speech, acoustics and signal processing. Boasting a long history, the influential conference will be held for the 47th time this year. For LINE, three papers being selected in addition to seven on joint research with NAVER is a feat following the acceptance of seven papers in 2021.*1

 

The authors of the selected papers will present their work at ICASSP 2022 to be held in May this year.

 

*1 Seven LINE Research Papers Selected for ICASSP 2021, the Largest International Conference on Acoustics, Speech, and Signal Processing https://linecorp.com/en/pr/news/en/2021/3640

 

LINE's fundamental research

At LINE, we position AI as a strategic business. Along with our basic research into the underlying technologies, many of our projects to build AI-based services and features involve NAVER engineers. Through these efforts, we aim to accelerate both R&D into AI tech as well as the growth of our AI-driven businesses.

Our basic research on AI is centered on machine learning and covers many topics including speech, language, and image processing. In the field of speech, acoustics, and signal processing, we have been researching a variety of technologies including Parallel WaveGAN known for its fast and high-quality Text-to-speech synthesis capability, audio source separation that separates various sounds from one another, end-to-end speech recognition that enables direct speech-to-text conversion, and environmental sound recognition for which we won first place in Task 4 of the international DCASE2020 Challenge as a team.

 

LINE's recognitions at ICASSP 2022

This year, LINE's research papers were selected in three categories: speech recognition, audio source separation, and self-supervised learning. The paper on speech recognition describes how to achieve non-autoregressive ASR*2 with fewer parameters using self-conditioned CTC*3. The study focuses on the redundancy in each neural network layer of self-conditioned CTC, and realizes a small ASR model with significantly fewer parameters by recursive use of small networks with similar operations. The resulting performance is comparable to that of self-conditioned CTC using only 38% as many parameters. 

The paper selected in the field of audio source separation studies the signal-to-distortion ratio (SDR), a method used to evaluate outputs in algorithm development. The work achieves 10 to 100-times faster SDR computation compared to conventional implementations.*4 

The paper on self-supervised learning proposes a general-purpose pre-trained model on audio data. While conventional self-supervised learning uses a loss function defined from a single perspective, the paper proposes new loss function definitions based on multiple perspectives. The new definitions allow not only easier management of training data for self-supervised learning but also application of the pre-trained model to a wider range of tasks.

 

*2 Non auto-regressive ASR is a speech recognition method that predicts words without referencing previously recognized text strings. The method recognizes speech faster by predicting words in parallel.

*3 Self-conditioned CTC is a state-of-the-art non auto-regressive ASR technology. It recognizes speech in the intermediate layers of a neural network and sums the features to the subsequent layers to refine the final prediction. [Nozaki and Komatsu, Interspeech2021]

*4 The code for this research is published on GitHub ( https://github.com/fakufaku/fast_bss_eval ).

 

Accepted papers

● NON-AUTOREGRESSIVE ASR WITH SELF-CONDITIONED FOLDED ENCODERS

 Tatsuya Komatsu

● SDR -- MEDIUM RARE WITH FAST COMPUTATIONS

 Robin Scheibler

● SELF-SUPERVISED LEARNING METHOD USING MULTIPLE SAMPLING STRATEGIES FOR GENERAL-PURPOSE AUDIO REPRESENTATION

 Ibuki Kuroyanagi, Tatsuya Komatsu

 

Future goals

LINE's AI tech brand, LINE CLOVA, aims to help create a more convenient and enriching world by resolving the hidden complications in daily life and business, and elevating the quality of social functions and living by utilizing diverse AI technologies and services. Currently, LINE CLOVA offers CLOVA Speech (speech recognition), CLOVA Voice (speech synthesis), and solutions that combine these speech technologies. LINE AiCall is one example that incorporates CLOVA Speech and CLOVA Voice with a dialogue control system. Governments and restaurants have increasingly adopted the solution and are employing AI to give natural responses to user requests and guide them to their goal. Another is CLOVA Note, an application announced last year. It can detect conversations in meetings with high accuracy and record and manage this information as minutes. This high accuracy is due to the application's speech recognition model that permits analysis of many hours of recorded sound data. At LINE, we will continue promoting basic research into AI technologies to enhance the quality of our existing services and also build new features and services.

 

LINE will continue to actively work on developing businesses and boosting service value to further expand its growth and vast potential as a communication platform.