|
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
Luoyi Sun, Xiao Zhou, Zeqian Li, Ya Zhang, Yanfeng Wang, Weidi Xie
2026
Utilize large audio-language models for audio temporal grounding and construct a challenging needle-in-a-haystack benchmark.
|
|
From Sensing to Reasoning: Empowering Long-form Omni-modal Understanding with Robust Audio Perception
Kaiying Yan, Luoyi Sun, Xiao Zhou, Weidi Xie
2026
Introduce AVDC dataset and AVDC-QA-CoT with two-stage training to advance audio-visual omni-modal understanding.
|
|
Knowledge-enhanced Pretraining for Vision-language Pathology Foundation Model on Cancer Diagnosis
Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ge Wang, Ruifen Wang, Lifeng Wang, Xiaojun Yuan, Xin Sun, Ya Zhang, Kun Sun, Yanfeng Wang, Weidi Xie
Cancer Cell, 2026
ArXiv / Webpage / Dataset / Code / Model
Integrates disease knowledge into pathology vision-language pretraining for cancer diagnosis.
|
|
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
ACM MM, 2024
ArXiv / Webpage / Dataset / Code / Model
Utilize computer vision tools to generate a large-scale, high-quality audio-language dataset.
|
|
Sound Generation Method with Timing-aligned Visual Feature Mapping
Zhifeng Xie, Luoyi Sun, Yuzhou Sun, Chunpeng Yu, Lizhang Ma
CADCG, 2022
pdf
New framework for high-quality sound generation, matching to silent videos in content and timing alignment.
|
|
Multi-Scale Graph Convolutional Interaction Network For Salient Object Detection
Wenqi Che, Luoyi Sun, Zhifeng Xie, Youdong Ding, Kanli Han
ICIP, 2021
pdf
Proposed the multi-scale graph convolutional interaction network (MGCINet), and get the SOTA on five benchmark datasets.
|
|
Patents
- Audio Perception-Driven Approach and System for Omni-modal Understanding of Long-form Videos, 2026
Weidi Xie, Luoyi Sun, Kaiying Yan, Xiao Zhou
- A Speech-Driven Editable Face Reenactment Method, 2023
Jiaheng Zheng, Shiyu Xia, Luoyi Sun, Zhifeng Xie
- A Video Scene Segmentation Method Based on Multimodal Semantic Interaction, 2023
Yihui Liao, Zhiwen Jiang, Luoyi Sun, Zhifeng Xie
|
|
Honors
- Outstanding Graduate, Shanghai Municipal Education Commission, 2023
- National Scholarship, Ministry of Education, 2022
- The First Prize Scholarship, Shanghai University (Top 5%), 2020, 2021, 2022
- Second Class Prize, National Post-Graduate Mathematical Contest in Modeling, 2021
- The Second Prize Scholarship, Yunnan University (Top 10%), 2017, 2018
- Excellent Student Cadre, Yunnan University (Top 5%), 2017, 2018
|
|
Hobbies
- Swimming, Cycling, Rock Climbing, Singing, Photography, Watching Movies, Piano
|
|