Zhuowan Li (李卓婉)


I am a software engineer at Google Deepmind. I am currently focusing on personalization of large language models.

I recently finished my Ph.D. in 2024 from Johns Hopkins University, co-advised by Prof. Alan Yuille and Benjamin Van Durme. I am a member of the CCVL lab. I received my B.E. degree from Tsinghua Univeristy in 2018, where I double major in Electronic Engineering and Journalism and Communication. I have also interned at Amazon AWS, Meta AI, Adobe Research and Sensetime.

In part time, I am a big fan of outdoor sports including rock climbing, snowboarding, skiing, hiking, mountaineering, etc. I am learning tennis recently.

CV  /  Google Scholar  /  Twitter  /  Github

profile photo

Email: lizhuowan14 at gmail dot com


News
  • [Nov 2024] I will attend EMNLP 2024 in person at Miami. Happy to connect!
  • [June 2024] I will attend CVPR 2024 in person at Seattle. Happy to chat!
  • [Feb 2024] I graduated from JHU and joined Google as a software engineer!
  • [June 2023] I will attend CVPR 2023 in person at Vancouver. Let me know if you want to talk with me!
  • [May 2023] Started as as applied scentist intern at Amazon AWS.
  • [May 2023] Invited talk at the Computational Cognitive Science Lab at MIT.
  • [February 2023] Super-CLEVR is accepted by CVPR 2023 as Highlight.
  • Last updated: 2024/10/22.


    Publications
    Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
    Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky
    EMNLP Industry Track, 2024
    arXiv / poster

    ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
    Yuxuan Wang, Alan Yuille, Zhuowan Li*, Zilong Zheng*
    COLM, 2024
    arXiv / code

    Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
    Zhuowan Li*, Bhavan Jasani*, Peng Tang, Shabnam Ghadar
    CVPR, 2024
    arXiv

    Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
    Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang
    CVPR (Highlight, top 2.8%), 2024
    arXiv / code

    On the Diagnosis and Generalization of Compositional Visual Reasoning
    Zhuowan Li
    Ph.D. thesis, 2024
    pdf

    Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?
    Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
    EACL, 2024
    arXiv / code (to be released)

    3D-Aware Visual Question Answering about Parts, Poses and Occlusions
    Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille
    NeurIPS, 2023
    arXiv / code and dataset

    Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
    Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille
    CVPR (Highlight, top 2.5%), 2023
    project page / arXiv / code and dataset

    Visual Commonsense in Pretrained Unimodal and Multimodal Models
    Chenyu Zhang Benjamin Van Durme, Zhuowan Li*, Elias Stengel-Eskin*,
    NAACL (Oral), 2022
    arXiv / code and dataset

    SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
    Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille
    CVPR, 2022
    arXiv / code

    Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
    Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
    ICCV, 2021
    arXiv / code

    Context-Aware Group Captioning via Self-Attention and Contrastive Features
    Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille
    CVPR, 2020
    arXiv / project page

    FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
    Yixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
    NeurIPS, 2018
    arXiv / project page / code


    Website theme stolen from Jon Barron.