Loie Sun

Loie Sun (孙络祎)

I am a PhD student at College of Computer Science and Technology, Zhejiang University and an intern researcher at Shanghai AI Laboratory, supervised by Prof. Weidi Xie. I obtained my MSc in Shanghai Film Academy, Shanghai University. I earned my bachelor degree from School of Software, Yunnan University.

My research direction is multi-modal representation learning, omni-modal understanding and audio spatio-temporal grounding.

CV / Email: loie@zju.edu.cn / GitHub / Google Scholar

Research

	SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding Luoyi Sun, Xiao Zhou, Zeqian Li, Ya Zhang, Yanfeng Wang, Weidi Xie ACM MM, 2026 ArXiv / Webpage / Benchmark / Code / Model Utilize large audio-language models for audio temporal grounding and construct a challenging needle-in-a-haystack benchmark.
	From Sensing to Reasoning: Empowering Long-form Omni-modal Understanding with Robust Audio Perception Kaiying Yan, Luoyi Sun, Xiao Zhou, Weidi Xie Submitting to ECCV, 2026 Introduce AVDC dataset and AVDC-QA-CoT with two-stage training to advance audio-visual omni-modal understanding.
	Knowledge-enhanced Pretraining for Vision-language Pathology Foundation Model on Cancer Diagnosis Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ge Wang, Ruifen Wang, Lifeng Wang, Xiaojun Yuan, Xin Sun, Ya Zhang, Kun Sun, Yanfeng Wang, Weidi Xie Cancer Cell, 2026 ArXiv / Webpage / Dataset / Code / Model Integrates disease knowledge into pathology vision-language pretraining for cancer diagnosis.
	Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie ACM MM, 2024 ArXiv / Webpage / Dataset / Code / Model Utilize computer vision tools to generate a large-scale, high-quality audio-language dataset.
	Sound Generation Method with Timing-aligned Visual Feature Mapping Zhifeng Xie, Luoyi Sun, Yuzhou Sun, Chunpeng Yu, Lizhang Ma CADCG, 2022 pdf New framework for high-quality sound generation, matching to silent videos in content and timing alignment.
	Multi-Scale Graph Convolutional Interaction Network For Salient Object Detection Wenqi Che, Luoyi Sun, Zhifeng Xie, Youdong Ding, Kanli Han ICIP, 2021 pdf Proposed the multi-scale graph convolutional interaction network (MGCINet), and get the SOTA on five benchmark datasets.

Patents

Audio Perception-Driven Approach and System for Omni-modal Understanding of Long-form Videos, 2026
Weidi Xie, Luoyi Sun, Kaiying Yan, Xiao Zhou
A Speech-Driven Editable Face Reenactment Method, 2023
Jiaheng Zheng, Shiyu Xia, Luoyi Sun, Zhifeng Xie
A Video Scene Segmentation Method Based on Multimodal Semantic Interaction, 2023
Yihui Liao, Zhiwen Jiang, Luoyi Sun, Zhifeng Xie

Honors

Outstanding Graduate, Shanghai Municipal Education Commission, 2023
National Scholarship, Ministry of Education, 2022
The First Prize Scholarship, Shanghai University (Top 5%), 2020, 2021, 2022
Second Class Prize, National Post-Graduate Mathematical Contest in Modeling, 2021
The Second Prize Scholarship, Yunnan University (Top 10%), 2017, 2018
Excellent Student Cadre, Yunnan University (Top 5%), 2017, 2018

Hobbies

Swimming, Cycling, Rock Climbing, Singing, Photography, Watching Movies, Piano

Special Thanks~