Seven LINE Research Papers Selected for ICASSP 2021, the Largest International Conference on Acoustics, Speech, and Signal Processing

2021.02.26 Technology

● 14 papers selected to represent best of Japanese research — 5 from LINE, 7 from NAVER, and 2 co-researched

● Research on improving speech synthesis quality and voice recognition rates were well-received

 

TOKYO – February 26, 2021 – LINE Corporation ("LINE") announced today that seven of its research papers have been selected for presentation at the renowned International Conference on Acoustics, Speech, and Signal Processing (ICASSP).

 

Selected papers for ICASSP 2021

● PARALLEL WAVEFORM SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH VOICING-AWARE CONDITIONAL DISCRIMINATORS
R. Yamamoto, E. Song, M. Hwang, and J. Kim

 

● TTS-BY-TTS: TTS-DRIVEN DATA AUGMENTATION FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS
M. Hwang, R. Yamamoto, E. Song, and J. Kim

 

● END TO END LEARNING FOR CONVOLUTIVE MULTI-CHANNEL WIENER FILTERING
M. Togami

 

● DISENTANGLED SPEAKER AND LANGUAGE REPRESENTATIONS USING MUTUAL INFORMATION MINIMIZATION AND DOMAIN ADAPTATION FOR CROSS-LINGUAL TTS
D. Xin, T. Komatsu, S. Takamichi, H. Saruwatari

 

● SURROGATE SOURCE MODEL LEARNING FOR DETERMINED SOURCE SEPARATION
R. Scheibler, M. Togami

 

● REFINEMENT OF DIRECTION OF ARRIVAL ESTIMATORS BY MAJORIZATION-MINIMIZATION OPTIMIZATION ON THE ARRAY MANIFOLD
R. Scheibler, M. Togami

 

● JOINT DEREVERBERATION AND SEPARATION WITH ITERATIVE SOURCE STEERING
T. Nakashima, R. Scheibler, M. Togami, N. Ono

 

Research on improving speech synthesis quality and voice recognition rates

The research focused on Parallel WaveGAN, a non-autoregressive speech generation model*1 based on generative adversarial networks (GAN),*2 and the use of voiced and unvoiced information to improve the performance of discriminators. Conventional Parallel WaveGAN systems, which uses a single discriminator, have contended with poor quality issues when handling multi-speaker corpora due to limitations in the discriminator's expressiveness and learning hurdles. The method proposed by the paper focused on the differences between voiced and unvoiced speech and significantly improved the quality of speech synthesis by designing a separate discriminator for the two types of speech. Though research was sequential, ICASSP reviewers commended and selected the paper for the large-scale subjective evaluation experiment conducted (using two female and two male speakers) to verify the effectiveness of its proposed method.

In the field of audio source separation, LINE submitted a paper proposing a new method that combined iterative source steering (ISS)—an audio source separation method that does not utilize deep learning—with a deep learning-based estimation method for sound source models. Reviewers selected the paper after highly rating the framework's improved speech recognition rates over conventional ISS and the ability to apply it even with a non-fixed number of audio sources.

 

*1 A type of machine learning classification model. Two neural networks use input data and images to generate new examples that are similar to the original dataset.

*2 A model that produces speech at each point in time, independent of previous speech. This model is computationally efficient because it can process in parallel.

 

LINE's basic research into speech, acoustics and signal processing focused on speech synthesis, audio source separation, and environmental sound recognition technologies

At LINE, AI is positioned as one of the company's strategic businesses. While collaborating with NAVER to create new AI services/features and conduct basic research into the underlying technologies, LINE endeavors to accelerate both R&D into AI tech and growth of its AI-driven businesses. Aiming to shorten the time between research, development, and production, teams in charge of data platform development, data analysis, machine learning, AI technology development, and basic research have also gone beyond their own businesses and domains to work together.

When it comes to basic research, LINE has placed machine learning at the center while focusing on research areas such as speech, language, and image processing. In the field of speech, acoustics, and signal processing, the company has researched a fast and high-quality GPU speech synthesis technology called Parallel WaveGAN, audio source separation technology that aims to separate various sounds from one another and improve sound quality and recognition rates, and environmental sound recognition technology that uses a machine to automatically detect and identify diverse sounds in the surrounding environment.

 

Aiming to proactively continue basic research into AI tech and enhance value of current services

LINE's AI tech brand, LINE CLOVA, aims to help create a more convenient and enriching world by resolving the hidden complications in daily life and business, and elevating the quality of social functions and living by utilizing diverse AI technologies and services. Currently, LINE CLOVA offers CLOVA Speech (speech recognition), CLOVA Voice (speech synthesis), and solutions that combine these speech technologies.

LINE AiCall is one example—incorporating CLOVA Speech and CLOVA Voice with a dialogue control system, governments and restaurants have increasingly adopted the solution and employed AI to give natural responses to user requests and guide them to their goal. Another is CLOVA Note, an application announced last year. It can detect conversations in meetings with high accuracy and record and manage this information as minutes. This high accuracy is due to the application's speech recognition model, which specializes in analyzing many hours of recorded sound data.

LINE CLOVA will continue striving to both enhance the quality of its existing offerings and create new features/services by proactively advancing basic research on AI tech.

 

Going forward, LINE will continue to forge ahead in developing businesses and boosting service value to further expand its growth and vast potential as a communication infrastructure.

 

■ About ICASSP

Held by the IEEE Signal Processing Society, ICASSP is the world's largest international conference in the field of speech, acoustics and signal processing. Boasting a long history, the influential conference will be held for the 46th time in 2021. Around 1,700 papers have been selected from among 3,600 submissions for ICASSP 2021, with authors of the selected papers to present their work at the virtual event in June.

 

■ About LINE Corporation

Based in Japan, LINE Corporation is dedicated to the mission of "Closing the Distance," bringing together information, services and people. The LINE messaging app launched in June 2011 and since then has grown into a diverse, global ecosystem that includes AI technology, fintech and more.