Publications | Kaijie Zhu

2026

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

Kaijie Zhu, Yuzhou Nie, Yijiang Li, and 10 more authors

arXiv, 2026

arXiv
rePIRL: Learn PRM with Inverse RL for LLM Reasoning

Xian Wu, Kaijie Zhu, Wenbo Guo, and 1 more author

Submitted to ICML, 2026
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle

Yuheng Tang, Kaijie Zhu, and others

ICLR, 2026

Website

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Zekun Li, Shinda Huang, Jiangtian Wang, and 7 more authors

ACL, 2025
MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison

Kaijie Zhu, Xianjun Yang, Jindong Wang, and 2 more authors

ICML, 2025

PromptBench: A Unified Library for Evaluation of Large Language Models

Kaijie Zhu, Qinlin Zhao, Hao Chen, and 2 more authors

JMLR MLOSS, 2024
DyVal: Graph-informed Dynamic Evaluation of Large Language Models

Kaijie Zhu, Jiaao Chen, Jindong Wang, and 3 more authors

ICLR (Spotlight), 2024
Emotionprompt: Leveraging psychology for large language models enhancement via emotional stimulus

Cheng Li, Jindong Wang, Kaijie Zhu, and 4 more authors

ICML, 2024
CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents

Qinlin Zhao, Jindong Wang, Yixuan Zhang, and 4 more authors

ICML (Oral), 2024
DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents

Kaijie Zhu, Jindong Wang, Qinlin Zhao, and 2 more authors

ICML, 2024
AgentReview: Exploring Peer Review Dynamics with LLM Agents

Yiqiao Jin, Qinlin Zhao, Yiyang Wang, and 4 more authors

In The 2024 Conference on Empirical Methods in Natural Language Processing, 2024

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, and 8 more authors

CCS LAMPS Workshop, 2023
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Kaijie Zhu, Xixu Hu, Jindong Wang, and 2 more authors

ICCV, 2023
A survey on evaluation of large language models

Yupeng Chang, Xu Wang, Jindong Wang, and 8 more authors

ACM TIST, 2023