|
Publications
[Google Scholar] [Semantic Scholar]
(*: equal contribution)
Preprints
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
Chengzu Li*, Zanyi Wang*, Jiaang Li*, Yi Xu, Han Zhou, Huanyu Zhang, Ruichuan An, Dengyang Jiang, Zhaochong An, Ivan Vulić, Serge Belongie, Anna Korhonen
arxiv.
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing
Huanyu Zhang*, Xuehai Bai*, Chengzu Li*, Chen Liang, Haochen Tian, Haodong Li, Ruichuan An, Yifan Zhang, Anna Korhonen, Zhang Zhang, Liang Wang, Tieniu Tan
arxiv.
Latent Sketchpad: Autoregressive Visual Latent Generation for Interpretable Visual Thoughts in MLLMs
Huanyu Zhang*, Wenshan Wu*, Chengzu Li, Ning Shang, Yan Xia, Yangyu Huang, Yifan Zhang, Li Dong, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei
arxiv.
2026
2025
2024
2023
Binding Language Models in Symbolic Languages
Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu.
ICLR 2023 (spotlight). [code]
2022
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu.
EMNLP 2022, main (oral). [code
|