Jiachen Zhao

I am a PhD student at Northeastern University, advised by Prof. Weiyan Shi. I was a scholar in the MATS program, working on reasoning at Prof. Dawn Song’s Lab. I obtained my Master’s degree in Computer Science at UMass Amherst where I was lucky to be advised by Prof. Andrew McCallum and Prof. Hong Yu. I finished my undergraduate study in Computer Science at HKUST.

I am generally interested in understanding the mechanisms of AI models to improve and control them. I am currently working on post-training, especially on understanding and mitigating emergent behaviors. I am broadly interested in continual learning, reasoning and interp. Feel free to email me if you would like to collaborate.

Publications

2026

The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment
Jiachen Zhao, Zhengxuan Wu, Aryaman Arora, Yiyou Sun, David Bau, Weiyan Shi
Preprint.
[pdf]
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought
Jiachen Zhao*, Yiyou Sun*, Weiyan Shi, Dawn Song
Preprint.
[pdf]

2025

LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao, Jing Huang, Zhengxuan Wu, David Bau, Weiyan Shi
Neurips 2025.
[pdf]

2024

Large Language Models are In-context Teachers for Knowledge Reasoning
Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu
EMNLP 24 Findings.
[pdf]
[TL;DR] We propose encoding specificity hypothesis inspired by humans’ memory retrieval to understand prompting LLM. We consider that effective prompts should match LLMs’ own training distribution.
Learning and Forgetting Unsafe Examples in Large Language Models
Jiachen Zhao, Zhun Deng, David Madras, James Zou, Mengye Ren
ICML 24.
[pdf]
Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation
Jiachen Zhao, Wenlong Zhao*, Andrew Drozdov*, Benjamin Rozonoyer, Md Arafat Sultan, Jay-Yoon Lee, Mohit Iyyer, Andrew McCallum
ACL 24.
[pdf]
Adaptive Fusion of Deep Learning with Statistical Anatomical Knowledge for Robust Patella Segmentation from CT Images
Jiachen Zhao, Tianshu Jiang, Yi Lin, Justin Chan, Ping-Keung Lewis Chan, Chunyi Wen, Hao Chen
Journal of Biomedical & Health Informatics.
[pdf]

2023

SELF-EXPLAIN: Teaching Large Language Models to Reason Complex Questions by Themselves
Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu
Workshop on robustness of zero/few-shot learning in foundation models @ NIPS 23.
[pdf]
Student as an Inherent Denoiser of Noisy Teacher
Jiachen Zhao
3rd Workshop on Efficient Natural Language and Speech Processing @ NIPS 23.
[pdf]
[TL;DR] We find model converges to clean labels faster during knowledge distillation. We thus leverage the early checkpoint to denoise teacher labels.
In-Context Exemplars as Clues to Retrieving from Large Associative Memory
Jiachen Zhao
Neural Conversational AI @ ICML 23. Associative Memory & Hopfield Networks @ NIPS 23.
[pdf]

2022

Trigger-free Event Detection via Derangement Reading Comprehension
Jiachen Zhao, Haiqin Yang
arXiv.
[pdf]

Academic service

Reviewer for ICML, Neurips, CoNLL, AAAI, ICLR

Jiachen (Nick) Zhao