Loie Sun (孙络祎)

I am a PhD student at College of Computer Science and Technology, Zhejiang University and an intern researcher at Shanghai AI Laboratory, supervised by Prof. Weidi Xie. I obtained my MSc in Shanghai Film Academy, Shanghai University. I earned my bachelor degree from School of Software, Yunnan University.

My research direction is multi-modal representation learning, omni-modal understanding and audio spatio-temporal grounding.

CV  /  Email: loie@zju.edu.cn  /  GitHub

profile photo
Research
autoacd SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
Luoyi Sun, Xiao Zhou, Zeqian Li, Ya Zhang, Yanfeng Wang, Weidi Xie
2026

Utilize large audio-language models for audio temporal grounding and construct a challenging needle-in-a-haystack benchmark.

autoacd From Sensing to Reasoning: Empowering Long-form Omni-modal Understanding with Robust Audio Perception
Kaiying Yan, Luoyi Sun, Xiao Zhou, Weidi Xie
2026

Introduce AVDC dataset and AVDC-QA-CoT with two-stage training to advance audio-visual omni-modal understanding.

autoacd Knowledge-enhanced Pretraining for Vision-language Pathology Foundation Model on Cancer Diagnosis
Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ge Wang, Ruifen Wang, Lifeng Wang, Xiaojun Yuan, Xin Sun, Ya Zhang, Kun Sun, Yanfeng Wang, Weidi Xie
Cancer Cell, 2026
ArXiv / Webpage / Dataset / Code / Model

Integrates disease knowledge into pathology vision-language pretraining for cancer diagnosis.

autoacd Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
ACM MM, 2024
ArXiv / Webpage / Dataset / Code / Model

Utilize computer vision tools to generate a large-scale, high-quality audio-language dataset.

soundgeneration Sound Generation Method with Timing-aligned Visual Feature Mapping
Zhifeng Xie, Luoyi Sun, Yuzhou Sun, Chunpeng Yu, Lizhang Ma
CADCG, 2022
pdf

New framework for high-quality sound generation, matching to silent videos in content and timing alignment.

sod Multi-Scale Graph Convolutional Interaction Network For Salient Object Detection
Wenqi Che, Luoyi Sun, Zhifeng Xie, Youdong Ding, Kanli Han
ICIP, 2021
pdf

Proposed the multi-scale graph convolutional interaction network (MGCINet), and get the SOTA on five benchmark datasets.

Patents

  • Audio Perception-Driven Approach and System for Omni-modal Understanding of Long-form Videos, 2026
    Weidi Xie, Luoyi Sun, Kaiying Yan, Xiao Zhou
  • A Speech-Driven Editable Face Reenactment Method, 2023
    Jiaheng Zheng, Shiyu Xia, Luoyi Sun, Zhifeng Xie
  • A Video Scene Segmentation Method Based on Multimodal Semantic Interaction, 2023
    Yihui Liao, Zhiwen Jiang, Luoyi Sun, Zhifeng Xie

Honors

  • Outstanding Graduate, Shanghai Municipal Education Commission, 2023
  • National Scholarship, Ministry of Education, 2022
  • The First Prize Scholarship, Shanghai University (Top 5%), 2020, 2021, 2022
  • Second Class Prize, National Post-Graduate Mathematical Contest in Modeling, 2021
  • The Second Prize Scholarship, Yunnan University (Top 10%), 2017, 2018
  • Excellent Student Cadre, Yunnan University (Top 5%), 2017, 2018

Hobbies

  • Swimming, Cycling, Rock Climbing, Singing, Photography, Watching Movies, Piano


Special Thanks~