Siru Zhong
Logo PhD @ HKUST Guangzhou, On Data-Centric Multimodal Learning

I am an M.Phil. student and an incoming Ph.D. student in Data Science and Analytics at the Hong Kong University of Science and Technology (Guangzhou), under the supervision of Prof. Yuxuan Liang and Prof. Yangqiu Song. I earned my B.E. from the School of Computer and Information at Hefei University of Technology.

My research interests include Multimodal Machine Learning and Data Mining, with a focus on Spatio-Temporal and Urban scenarios. I aim to derive insights from large-scale, heterogeneous, cross-domain data.

Previously, I was a Research Intern at XPENG, working on visual multimodal research and contributing to the XNGP System. I also worked as a Software Engineer at Tencent, where I enhanced QQ’s performance and developed cloud-native tools like CodeSpaces and the Workflow Engine.


Education
  • Hong Kong University of Science and Technology (Guangzhou)
    Hong Kong University of Science and Technology (Guangzhou)
    Ph.D. in Data Science and Analytics
    Feb. 2025 - Feb. 2028
  • Hong Kong University of Science and Technology (Guangzhou)
    Hong Kong University of Science and Technology (Guangzhou)
    M.Phil. in Data Science and Analytics
    Aug. 2023 - Jan. 2025
  • Hefei University of Technology
    Hefei University of Technology
    B.S. in School of Computer and Information
    Sep. 2018 - Jul. 2022
Experience
  • XPENG, Guangzhou
    XPENG, Guangzhou
    Research Intern, Autonomous Driving Center
    May. 2024 - Jul. 2024
  • HKUST, Guangzhou
    HKUST, Guangzhou
    Research Assistant, CityMind Lab
    May. 2023 - Aug. 2023
  • Tencent, Shenzhen
    Tencent, Shenzhen
    Software Engineer, CloudIDE Team
    Jul. 2022 - May. 2023
  • Tencent, Beijing
    Tencent, Beijing
    Software Intern, Fiber Team
    Jun. 2021 - Sep. 2023
Honors & Awards
  • Best Project Award (1/15) in Data Science Computing (DSAA 5021) at HKUST(GZ)
    2024
  • Outstanding Student Award (Top 10%) of the Red Bird Challenge Camp at HKUST(GZ)
    2024
  • iCode Certification at Tencent
    2022
  • Silver Award (2/12) of Code World Program at Tencent
    2022
  • Outstanding Student Award (8/100) of Graduates Training at Tencent
    2022
  • Outstanding Graduation Thesis Award (Top 2%) at HFUT}
    2022
  • Third Prize in National College English Competition
    2021
  • First Prize in CSDN Technology Blogger Competition (Top 10)
    2020
  • Second Prize in Robotics Competition (2D Project Team) at HFUT
    2020
News
2024
Served as a reviewer for ICLR'25
Aug 23
One paper on Urban Multimodal Image-Text Retrieval was accepted by MM'2024
Jul 18
One paper on Spatio-Temporal Prediction and Carpark Dataset was accepted by IJCAI'24
May 14
One paper on Spatio-Temporal Field Neural Network was accepted by IJCAI'24
May 14
One paper on Urban LLM was accepted by WWW'24
Jan 23
Selected Publications (view all )
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang

ACM International Conference on Multimedia (ACM MM) 2024 Poster

First-ever cross-domain framework that integrates the power of LMM and SAM into satellite image-text retrieval.

UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang

ACM International Conference on Multimedia (ACM MM) 2024 Poster

First-ever cross-domain framework that integrates the power of LMM and SAM into satellite image-text retrieval.

Spatio-Temporal Field Neural Networks for Air Quality Inference
Spatio-Temporal Field Neural Networks for Air Quality Inference

Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

A pioneering Spatio-Temporal Field Neural Network model integrates two distinct perspectives on space and time to perform air quality inference.

Spatio-Temporal Field Neural Networks for Air Quality Inference
Spatio-Temporal Field Neural Networks for Air Quality Inference

Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

A pioneering Spatio-Temporal Field Neural Network model integrates two distinct perspectives on space and time to perform air quality inference.

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach
Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

A novel deep-learning prediction model for real-time parking availability in Singapore, analyzing external factors, introducing a new dataset.

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach
Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

A novel deep-learning prediction model for real-time parking availability in Singapore, analyzing external factors, introducing a new dataset.

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International World Wide Web Conference (WWW) 2024 Oral

First-ever cross-domain framework that integrates the power of LMM and SAM into satellite image-text retrieval.

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International World Wide Web Conference (WWW) 2024 Oral

First-ever cross-domain framework that integrates the power of LMM and SAM into satellite image-text retrieval.

Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting
Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

Jianxiang Zhou, Erdong Liu, Wei Chen, Siru Zhong, Yuxuan Liang

Under review. 2024

Introduce the Spatio-Temporal Graph Transformer (STGormer), a model that integrates attribute and structure information in traffic data to learn spatio-temporal correlations and uses a mixture-of-experts module to capture heterogeneity, leading to state-of-the-art performance in traffic forecasting.

Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting
Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

Jianxiang Zhou, Erdong Liu, Wei Chen, Siru Zhong, Yuxuan Liang

Under review. 2024

Introduce the Spatio-Temporal Graph Transformer (STGormer), a model that integrates attribute and structure information in traffic data to learn spatio-temporal correlations and uses a mixture-of-experts module to capture heterogeneity, leading to state-of-the-art performance in traffic forecasting.

UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Model for Urban Indicator Prediction
UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Model for Urban Indicator Prediction

Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang

Under review. 2024

First urban region representation learning framework that explores multi-granularity cross-modal alignment.

UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Model for Urban Indicator Prediction
UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Model for Urban Indicator Prediction

Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang

Under review. 2024

First urban region representation learning framework that explores multi-granularity cross-modal alignment.

All publications