Research on privacy-preserving data synthesis well received for realizing framework that outperforms existing ones
TOKYO – February 10, 2022 – LINE Corporation ("LINE") announced today that its research paper has been selected for presentation at The Tenth International Conference on Learning Representations (ICLR 2022).
Hosted by The International Conference on Learning Representations (ICLR), ICLR is the world's most renowned international conference in the field of deep learning. Held for the tenth time in 2022, the authors will present their work at the virtual event from April 25 to 29.
Research on privacy-preserving data synthesis well received and accepted
The research proposes a new framework called PEARL (Private Embeddings and Adversarial Reconstruction Learning) which trains generative models*1 while guaranteeing differential privacy (DP)*2, a privacy standard that has recently garnered attention in the field of data analysis and machine learning. Existing deep learning approaches that ensure DP (e.g. DP-SGD)*3 do not provide adequate training of generative models as they require repetitive access to data at each training iteration, which are limited due to DP restrictions. On the other hand, PEARL creates embedding vectors*4 that extract the characteristics of sensitive data in a differentially private manner and uses them to train deep generative models instead of reusing the original data. This means that unlike the existing methods, PEARL has no limitations in the number of training iterations that it can perform. Interestingly, the framework is realized by introducing embedding vectors using the characteristic function*5 and objective functions using the adversarial re-weighting objective*6. Multiple datasets gained from the research experiment show that PEARL outperforms other methods at reasonable levels of privacy (i.e. the value of the privacy parameter, ε, was one or less).
This marks LINE's second research paper on privacy-preserving data synthesis that has been accepted at an international conference, with the first one being accepted at ICDE2021, one of the world's leading forums on data engineering. Going forward, LINE will continue to conduct R&D on data science, privacy protection using AI, and guaranteed reliability.
*1 Generative model: A machine learning model that generates data similar to the training data.
*2 Differential privacy: A statistical privacy standard. Theoretically, the smaller the privacy parameter (ε), the more private the data is guaranteed to be.
*3 DP-SGD: A technique that ensures DP by adding noise to gradients during training. It limits the total number of training iterations performed as the value of ε increases as the iterations increase.
*4 Embedding vector: A numerical representation of elements and characteristics of data translated at a relatively low-dimensional space.
*5 Characteristic function: A function that accurately characterizes the overall distribution of data even in low dimensions.
*6 Adversarial re-weighting objective: The promotion of training by emphasizing the weight of the area in which training is insufficient.
"PEARL: Data Synthesis via Private Embeddings and Adversarial Reconstruction Learning"
Seng Pei Liew, Tsubasa Takahashi, and Michihiko Ueno (LINE AI Research)
● Evaluation of PEARL by comparing the proposed and existing methods of generating images at practical privacy levels
The results show that the proposed method generates higher-quality images than the existing method.
● Evaluation of PEARL by comparing the proposed and existing methods of generating data at practical privacy levels
The graph shows that the proposed method (green) is able to model the trend of real data (blue) better than the existing method (orange).
Fundamental AI research
At LINE, AI is positioned as one of the company's strategic businesses. While collaborating with NAVER to create new AI services/features and conduct basic research into the underlying technologies, LINE endeavors to accelerate both R&D into AI tech and growth of its AI-driven businesses. Aiming to shorten the time between research, development, and production, teams in charge of data platform development, data analysis, machine learning, AI technology development, and basic research have also gone beyond their own businesses and domains to work together.
When it comes to basic research, LINE has placed machine learning at the center while focusing on research areas such as speech, language, and image processing. Recognition of LINE's research in 2021 include the following: - ICASSP, the international conference in the field of speech, acoustics and signal processing, accepted seven of LINE's research papers putting LINE among the top contributors in Japan*7 - ICCV 2021, the international conference on computer vision, accepted two of LINE's papers*8 - INTERSPEECH 2021, the international conference on speech processing, accepted six of LINE's papers *9
*7 Press release published on February 26, 2021: https://linecorp.com/en/pr/news/en/2021/3640
*8 Press release published on July 28, 2021: https://linecorp.com/ja/pr/news/ja/2021/3843 (Japanese only)
*9 Press release published on August 30, 2021: https://linecorp.com/en/pr/news/en/2021/3919
Going forward, LINE will continue to actively work on developing businesses and boosting service quality to further expand its growth and vast potential as a communication platform.