Chi Zhang | Westlake University

About Me

I am a tenure-track Assistant Professor and PI at Westlake University, where I lead the AGI Lab. Before joining Westlake University, I worked as a scientist at Tencent. I obtained my Ph.D. degree at the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore, where I worked under the supervision of Prof. Guosheng Lin. I also work closely with Prof. Chunhua Shen and Prof. Rui Yao in research. I was recognized among 2023 Top 2% Scientists by Stanford University.

We have numerous positions available for PhD students, postdoctoral researchers, visiting students, and research assistants. Interested candidates are welcome to email me for inquiries.

Research Interests

My research primarily revolves around vision and learning. At present, I am focusing on the development of large models to solve AI problems. Recent endeavors include large vision foundation models, multimodal models, and generative AI models.

Hobbies

I like singing and was in Top 8 of Good Voice of Universities 2015 in CUMT. I play football regularly. I am a loyal fan of ~~Football Club of Barcelona~~ ~~PSG~~ Inter Miami for years. My favorite singers are 张学友 and Freddie Mercury.

News

[Jul 2024] 🔥We present EMMA, showing that text-to-image diffusion model can secretly accept multi-Modal prompts.
[Jul. 2024] 🎉🎉🎉I joined Westlake University as an Assistant Professor (PI) and established AGI lab.
[Jul. 2024] Three papers are accepted to ECCV 2024.
[June 2024] Our work on self-supervised 3D scene flow estimation is accepted by TPAMI.
[June 2024] 🔥We present MeshAnything, a study on high-quality mesh generation with autoregressive transformers.
[May 2024] StableLLaVA is accepted by ACL 2024.
[April 2024] 🚀We introduce Metric3D V2, the most capable monocular geometry foundation model for depth and normals estimation. Training codes and demos are available!
[Mar 2024] 🚀We introduce MovieLLM, a long-video understanding multimodal LLM.
[Feb 2024] GaussianEditor and LL3DA are accepted by CVPR2024.
[Dec 2023] 🚀🚀🚀We introduce AppAgent, a multimodal agent for operating smartphone apps.
[Dec 2023] IT3D is accepted by AAAI 2024.
[Dec 2023] We presented FaceStudio, a powerful identity-preserving image synthesis model.
[Nov 2023] We presented ShapeGPT, a multimodal LLM for 3D shape generation.
[Nov 2023] ChartLlama is released! It is a powerful LLM for chart understanding and generation.
[Nov 2023] We presented GaussianEditor, a powerful 3D editing algorithm.
[Oct 2023] Pleased to be recognized among 2023 Top 2% Scientists by Stanford University.
[Sept 2023] We presented Robust Depth for robust geometry-preserving zero-shot depth estimation, which is accepted by ICCV 2023.
[Aug 2023] We presented IT3D, a plug-and-play to improve the results of 3D AIGC models.
[Aug 2023] We have released StableLLaVA, a clever strategy for collecting datasets to train multimodal LLMs.
[Jul. 2023] Our work, Metric3D, accepted by ICCV 2023, won first place in the 2nd Monocular Depth Estimation Competition at CVPR.
[Jul. 2023] Three papers are accepted to ICCV 2023.
[May 2023] We have released StyleAvatar3D, a work for 3D stylized avatar generation.

Recent Projects

AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang*, Zhao Yang*, Jiaxuan Liu*, Yuchen Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
Arxiv Preprint 2023.
[Project Page][PDF][Code]

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang*
Arxiv Preprint 2024.
[Project Page][PDF][Code]

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Yucheng Han, Rui Wang, Chi Zhang*, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang
Arxiv Preprint 2024.
[Project Page][PDF][Code]

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
IEEE Conference on Computer Vision and Pattern Recognition CVPR 2024.
[Project Page][PDF][Code]

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
Wei Yin*, Chi Zhang*, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen
IEEE International Conference on Computer Vision ICCV 2023.
[PDF][Code]

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li*, Chi Zhang*, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
Findings of the Association for Computational Linguistics ACL 2024.
[Project Page][PDF][Code]

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen
The European Conference on Computer Vision ECCV 2024.
[Project Page][PDF][Code]

MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan
The European Conference on Computer Vision ECCV 2024.
[Project Page][PDF][Code]

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning.
Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
IEEE Conference on Computer Vision and Pattern Recognition CVPR 2024.
[Project Page][PDF][Code]

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang*, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin
AAAI Conference on Artificial Intelligence AAAI2024.
[PDF][Code]

ChartLlama: A Multimodal LLM for Chart Understanding and Generation
Yucheng Han*, Chi Zhang*, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang
Arxiv Preprint 2023.
[Project Page][PDF][Code]

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
Chi Zhang, Yiwen Chen, Yijun Fu, Zhenglin Zhou, Gang YU, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen
Arxiv Preprint 2023.
[PDF][Code]

FaceStudio: Put Your Face Everywhere in Seconds
Yuxuan Yan*, Chi Zhang*, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Bin Fu, Gang Yu
Arxiv Preprint 2023.
[Project Page][PDF][Code]

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering
Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen
IEEE International Conference on Computer Vision ICCV 2023.
[PDF]

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen
Arxiv Preprint 2023.
[Project Page][PDF][Code]