Siru Zhong

I am a Ph.D. student in Data Science and Analytics at HKUST Guangzhou, supervised by Prof. Yuxuan Liang.

My research focuses on spatio-temporal representation learning with cross-domain, multimodal data. I am also interested in efficient AI, aiming to create effective, balanced datasets and develop lightweight, transferable models.

Previously, I was a Research Intern at XPENG, working on visual multimodal research for the XNGP System, and a Software Engineer at Tencent, enhancing QQ’s performance and developing cloud-native tools.

Education
  • Hong Kong University of Science and Technology (Guangzhou)
    Hong Kong University of Science and Technology (Guangzhou)
    Ph.D. in Data Science and Analytics, Information Hub
    Feb. 2025 - Jan. 2028
  • Hong Kong University of Science and Technology (Guangzhou)
    Hong Kong University of Science and Technology (Guangzhou)
    M.Phil. in Data Science and Analytics, Information Hub
    Aug. 2023 - Jan. 2025
  • Hefei University of Technology
    Hefei University of Technology
    B.S. in Internet of Things Engineering, School of Computer Science and Information Engineering
    Sep. 2018 - Jul. 2022
Experience
  • XPENG, Guangzhou
    XPENG, Guangzhou
    Research Intern, Autonomous Driving Center, Visual Imaging
    May. 2024 - Jul. 2024
  • HKUST, Guangzhou
    HKUST, Guangzhou
    Research Assistant, CityMind Lab, Urban Multimodal
    May. 2023 - Aug. 2023
  • Tencent, Shenzhen
    Tencent, Shenzhen
    Software Engineer, CloudIDE Center, Container Orchestration
    Jul. 2022 - May. 2023
  • Tencent, Beijing
    Tencent, Beijing
    Software Intern, QQ Efficiency Center, DevOps
    Jun. 2021 - Sep. 2023
Honors & Awards
  • Best Project Award (1st/15) in Data Science Computing, HKUST(GZ)
    2023
  • Outstanding Student Award (Top 10%) in Red Bird Challenge Camp, HKUST(GZ)
    2023
  • iCode Certification — R&D Engineering Competency Evaluation, Tencent
    2022
  • Silver Award (2nd/12) in Code World Program, Tencent
    2022
  • Outstanding Student Award (8th/100) in New Employee Training, Tencent
    2022
  • Outstanding Graduation Thesis Award (Top 2%), HFUT
    2022
  • Third Prize in National College English Competition
    2021
  • First Prize (Top 10) in CSDN Technology Blogger Competition
    2020
  • Second Prize in 2D Robotics Competition, HFUT
    2020
  • School Scholarship, HFUT
    2018-2021
Extracurricular Activities
  • Greater Bay Area Science Forum attendee, Guangzhou, China
    2024
  • ACM Multimedia Conference attendee, Melbourne, Australia
    2024
  • China National Computer Conference attendee, Yiwu, China
    2024
  • HKUST-GZ System Hub Welcome Party performer, Guangzhou, China
    2023
  • Tencent New Year Gala performer, Shenzhen, China
    2023
  • HFUT Chorus member, Hefei, China
    2018-2022
  • HFUT External Relations Department member, Hefei, China
    2018-2019
News
2024
I was invited to serve as a reviewer for ICLR'25
Aug 23
One paper on Urban Multimodal Image-Text Retrieval was accepted by MM'2024
Jul 18
One paper on Spatio-Temporal Prediction and Carpark Dataset was accepted by IJCAI'24
May 14
One paper on Spatio-Temporal Field Neural Network was accepted by IJCAI'24
May 14
One paper on Urban LLM was accepted by WWW'24
Jan 23
Selected Publications (view all )
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang

ACM International Conference on Multimedia (ACM MM) 2024 Poster

Introduced UrbanCross, a cross-domain satellite image-text retrieval framework that leverages multimodal enhancements and adaptive domain adaptation techniques to bridge diverse urban landscapes, achieving up to a 15% improvement in retrieval performance.

UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang

ACM International Conference on Multimedia (ACM MM) 2024 Poster

Introduced UrbanCross, a cross-domain satellite image-text retrieval framework that leverages multimodal enhancements and adaptive domain adaptation techniques to bridge diverse urban landscapes, achieving up to a 15% improvement in retrieval performance.

Spatio-Temporal Field Neural Networks for Air Quality Inference
Spatio-Temporal Field Neural Networks for Air Quality Inference

Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

Present the Spatio-Temporal Field Neural Network and Pyramidal Inference framework, which integrate field and graph perspectives to achieve state-of-the-art nationwide air quality inference in Mainland China.

Spatio-Temporal Field Neural Networks for Air Quality Inference
Spatio-Temporal Field Neural Networks for Air Quality Inference

Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

Present the Spatio-Temporal Field Neural Network and Pyramidal Inference framework, which integrate field and graph perspectives to achieve state-of-the-art nationwide air quality inference in Mainland China.

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach
Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

Introduce DeepPA, a deep-learning framework and the SINPA dataset for accurately predicting real-time parking availability across Singapore, outperforming existing models and supporting urban planning through a deployed web platform.

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach
Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International Joint Conference on Artificial Intelligence (IJCAI) 2024

Introduce DeepPA, a deep-learning framework and the SINPA dataset for accurately predicting real-time parking availability across Singapore, outperforming existing models and supporting urban planning through a deployed web platform.

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International World Wide Web Conference (WWW) 2024 Oral

Introduce UrbanCLIP, the first large language model–enhanced framework that integrates textual descriptions with satellite imagery through contrastive language-image pretraining, significantly improving urban region profiling performance across major cities.

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

The International World Wide Web Conference (WWW) 2024 Oral

Introduce UrbanCLIP, the first large language model–enhanced framework that integrates textual descriptions with satellite imagery through contrastive language-image pretraining, significantly improving urban region profiling performance across major cities.

All publications