Yaoyu Zhang bio photo

Email

My CV

Github

Google Scholar

Publications

A. Condensation phenomenon of deep learning

Condensation phenomenon: Neurons in the same layer tends to align with one another during the training

A1. Condensation phenomenon and its dynamical regime

  1. Tao Luo, Zhi-Qin John Xu, Zheng Ma, Yaoyu Zhang, “Phase Diagram for Two-layer ReLU Neural Networks at Infinite-Width Limit,” Journal of Machine Learning Research (JMLR) 22(71):1−47, (2021).
  2. Hanxu Zhou, Qixuan Zhou, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu, “Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width,” NeurIPS 2022.
  3. Zhi-Qin John Xu, Yaoyu Zhang, Zhangchen Zhou, “An overview of condensation phenomenon in deep learning,” arXiv:2504.09484.

A2. Loss landscape structure—embedding principle series

  1. Yaoyu Zhang, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu, “Embedding Principle of Loss Landscape of Deep Neural Networks,” NeurIPS 2021 spotlight.
  2. Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu, “Embedding Principle: a hierarchical structure of loss landscape of deep neural networks,” Journal of Machine Learning, 1(1), pp. 60-113, 2022.
  3. Hanxu Zhou, Qixuan Zhou, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu, “Towards Understanding the Condensation of Neural Networks at Initial Training,” NeurIPS 2022.
  4. Zhiwei Bai, Tao Luo, Zhi-Qin John Xu, Yaoyu Zhang, “Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks,” CSIAM Trans. Appl. Math., 5 (2024), pp. 350-389.
  5. Leyang Zhang, Yaoyu Zhang, Tao Luo, “Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks,” arXiv:2405.17501 (2024).

A3. Generalization advantage—optimistic estimate series

  1. Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu, “Linear Stability Hypothesis and Rank Stratification for Nonlinear Models,” arXiv:2211.11623 (2022).
  2. Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu, “Optimistic Estimate Uncovers the Potential of Nonlinear Models,” arXiv:2307.08921 (2023).
  3. Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai, “Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization,” Journal of Machine Learning Research 26(69):1−30, 2025.
  4. Tao Luo, Leyang Zhang, Yaoyu Zhang, “Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks,” arXiv:2309.00508 (2023).

A4. Global dynamics and implicit bias

  1. Leyang Zhang, Zhi-Qin John Xu, Tao Luo, Yaoyu Zhang, “Limitation of Characterizing Implicit Regularization by Data-independent Functions,” Transactions on Machine Learning Research (2023).
  2. Zhiwei Bai, Jiajie Zhao, Yaoyu Zhang, “Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion”, NeurIPS 2024.
  3. Jiajie Zhao, Zhiwei Bai, Yaoyu Zhang, “Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target,” arXiv:2405.13787 (2024).

A5. Condensation in language models

  1. Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu, “Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing,” NeurIPS 2024.
  2. Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu, “The Buffer Mechanism for Multi-Step Information Reasoning in Language Models”, arXiv:2405.15302 (2024).
  3. Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu, “Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers”, arXiv:2501.08537 (2025).

B. Frequency Principle of deep learning

Frequency Principle: neural networks tend to learn from low to high frequencies during the training.

  1. First Paper: Zhiqin Xu, Yaoyu Zhang, Yanyang Xiao, “Training Behavior of Deep Neural Network in Frequency Domain,” ICONIP, pp. 264-274, 2019. (arXiv:1807.01251, Jul 2018)
  2. 2021 World Artificial Intelligence Conference Youth Outstanding Paper Nomination Award: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma, “Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks,” CiCP 28(5). 1746-1767, 2020.
  3. Initialization effect: Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma, “A Type of Generalization Error Induced by Initialization in Deep Neural Networks,” MSML 2020.
  4. Linear Frequency Principle: Yaoyu Zhang, Tao Luo, Zheng Ma, Zhi-Qin John Xu, “Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks,” Chinese Physics Letters (CPL) 38(3), 038701, 2021.
  5. Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang, “Theory of the Frequency Principle for General Deep Neural Networks,” CSIAM Trans. Appl. Math. 2 (2021), pp. 484-507.
  6. Linear Frequency Principle: Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang, “On the exact computation of linear frequency principle dynamics and its generalization”, SIAM Journal on Mathematics of Data Science 4 (4), 1272-1292, 2022.
  7. Minimal decay in frequency domain: Tao Luo, Zheng Ma, Zhiwei Wang, Zhi-Qin John Xu, Yaoyu Zhang, “An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network,” MSML 2022.
  8. Overview: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, “Overview Frequency Principle/Spectral Bias in Deep Learning,” Communications on Applied Mathematics and Computation (2024): 1-38.
  9. Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu, “A rationale from frequency perspective for grokking in training neural network,” arXiv:2405.17479 (2024).

C. Deep Learning for Science

D. Computational Neuroscience