<<<<<<< HEAD ======= >>>>>>> 82c7176 (update cite)

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Muxi Diao1,2*, Lele Yang1*, Wuxuan Gong1*, Yutong Zhang1, Zhonghao Yan1,
Yufei Han1, Kongming Liang1, Weiran Xu1, Zhanyu Ma1†
1Beijing University of Posts and Telecommunications, 2Zhongguancun Academy
* Equal contribution Β· † Corresponding author

Overview

(a) Conceptual illustration. When SFT forces the model to override its strong priors, it creates a Confident Conflict. Fitting these conflicts distorts the model's existing representations, leading to catastrophic forgetting. (b) Token-level entropy-probability landscape. Compared to on-policy rollouts (right), the SFT data (left) exhibits a prominent cluster of Low Entropy, Low Probability tokens.

Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "Confident Conflicts"β€”tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates.

We present Entropy-Adaptive Fine-Tuning (EAFT), a novel approach that utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. Unlike methods relying solely on prediction probability, EAFT allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Experiments on Qwen and GLM series (4B-32B parameters) across mathematical, medical, and agentic domains support our hypothesis: EAFT consistently matches the downstream performance of standard SFT while mitigating the degradation of general capabilities.

Our work provides several key insights into the mechanisms of catastrophic forgetting and EAFT's effectiveness:

  • 1. Confident Conflicts are the primary driver of forgetting. We identify a fundamental distributional gap: SFT data contains a prominent cluster of tokens with Low Entropy, Low Probabilityβ€”where the model is highly confident in its own prediction but is forced to fit a divergent ground truth. These "Confident Conflicts" trigger destructive gradient updates that distort the model's existing representations.
  • 2. Pilot experiment validates our hypothesis. By masking out "Confident Conflict" tokens during training (the bottom 15% in both entropy and probability), we observed that catastrophic forgetting was mitigated compared to standard SFT. This suggests that enforcing updates on these conflicting samples is the primary driver of capability degradation.
  • 3. EAFT uses entropy as a soft gating mechanism. Unlike probability-based methods that risk amplifying destructive gradients, EAFT leverages entropy to distinguish rigidity from uncertainty. By down-weighting low-entropy tokens to suppress conflicting gradients, while concentrating supervision on high-entropy ones to facilitate adaptation, EAFT balances domain proficiency with the preservation of general capabilities.
  • 4. EAFT achieves Pareto improvement across diverse domains. Experiments on Qwen and GLM series (4B-32B parameters) across mathematical, medical, and agentic domains show that EAFT consistently matches the downstream performance of standard SFT while mitigating the degradation of general capabilities.

EAFT provides a novel approach for mitigating catastrophic forgetting through entropy-based gating, supporting LLM fine-tuning research and advancing robust domain adaptation.



EAFT Method


Overview of the EAFT framework. The left diagram shows the EAFT loss function with entropy-based gating mechanism, and the right diagram illustrates how EAFT distinguishes between uncertain samples (high entropy) and conflicting data (low entropy).

EAFT leverages token-level entropy as a soft gating mechanism to distinguish between epistemic uncertainty and knowledge conflict, enabling effective domain adaptation while preserving general capabilities.

(1) EAFT Loss Function:
EAFT introduces an entropy-adaptive loss function that modulates the standard cross-entropy loss based on token-level entropy:

EAFT Loss Formula

where $\tilde{H}_t$ is the normalized entropy for token $t$. High entropy (uncertain tokens) receives full supervision weight, while low entropy (confident tokens) receives reduced weight to suppress conflicting gradients.

(2) Entropy-Based Gating Mechanism:
Unlike probability-based methods that may amplify destructive gradients on low-probability tokens, EAFT uses entropy to distinguish between two fundamentally different scenarios:

  • High Entropy, Low Probability: The model is uncertain about its prediction, indicating epistemic uncertainty. These tokens receive full supervision to facilitate learning.
  • Low Entropy, Low Probability: The model is confident in its prediction but disagrees with the ground truth, indicating a "Confident Conflict". These tokens receive reduced supervision to prevent destructive gradient updates.
The following figures illustrate the entropy-probability landscape and how EAFT's gating mechanism operates.

πŸ“Š Entropy-Probability Landscape: Visual representation of token distribution based on entropy and probability, showing how EAFT identifies Confident Conflicts (low entropy, low probability tokens).

πŸ“ˆ EAFT vs Standard SFT Gradients: Comparison of gradient magnitudes between EAFT and traditional supervised fine-tuning, demonstrating EAFT's suppression of destructive gradients.

πŸ“Š EAFT Loss Dynamics During Training: Shows how the entropy-based gating evolves during training, balancing adaptation and preservation.

(3) Training Dynamics and Adaptation:
EAFT naturally adapts to the training process. In early training stages, when the model is uncertain about most tokens (high entropy), EAFT behaves similarly to standard SFT, allowing rapid domain adaptation. As training progresses and the model becomes more confident (lower entropy), EAFT automatically reduces supervision on confident predictions, preventing overfitting to conflicting samples. This self-adaptive mechanism ensures that EAFT achieves Pareto improvement: matching downstream performance while preserving general capabilities.
The following figure demonstrates how EAFT's entropy-based gating evolves during training, showing the dynamic balance between adaptation and preservation.

πŸ“ˆ Pareto Improvement: EAFT achieves better trade-off between domain performance and general capability preservation across mathematical, medical, and agent tool-use domains.

πŸ“Š Training Dynamics: Shows how EAFT's entropy-based gating evolves during training, balancing adaptation and preservation.

Experimental Results

EAFT Experimental Results

Main results on the target domain (Math) and general domain benchmarks. We evaluate performance on AIME24, AIME25 and GSM8K as the training target, alongside MMLU, IFEval, and CLUEWSC for general capabilities. The top two outcomes are bolded and underlined. All results are averaged over three independent runs. The "Avg." represents the average performance of the datasets in the corresponding domain.

Method Math Domain Math Avg. General Domain General Avg.
AIME24 AIME25 GSM8K MMLU IFEval CLUEWSC
Qwen3-4B-Instruct 63.3 47.4 94.3 68.3 77.1 81.0 85.2 81.1
  + SFT 63.3 50.0 94.8 69.4 76.5 79.5 74.5 76.5 (-4.6)
  + SFTKL 63.3 50.0 93.6 69.0 74.5 74.9 89.4 79.6 (-1.5)
  + FLOW 66.7 46.7 94.3 69.2 76.2 78.3 82.8 79.1 (-2.0)
  + DFT 56.7 40.0 93.9 63.5 75.9 77.0 81.4 78.1 (-3.0)
  + TALR 50.0 50.0 93.3 64.4 76.2 78.1 74.5 76.2 (-4.9)
  + EAFT 60.0 53.3 94.5 69.3 76.6 80.1 83.7 80.1 (-1.0)
Qwen2.5-32B-Instruct 22.2 13.3 96.0 43.8 84.1 78.3 91.9 84.8
  + SFT 53.3 50.0 96.3 66.5 76.9 74.2 93.8 81.6 (-3.2)
  + SFTKL 33.3 33.3 94.1 53.6 81.4 68.1 93.2 80.9 (-3.9)
  + FLOW 50.0 50.0 96.3 65.4 78.6 75.1 93.6 82.4 (-2.4)
  + DFT 33.3 36.7 95.9 55.3 77.8 70.0 94.4 80.7 (-4.1)
  + TALR 40.0 43.3 95.3 59.5 73.1 72.5 94.1 79.9 (-4.9)
  + EAFT 53.3 46.7 96.5 65.5 79.0 78.4 93.9 83.7 (-1.1)
GLM4-9B-0414 6.7 6.7 90.1 34.5 70.2 74.4 85.1 76.6
  + SFT 20.0 10.0 90.3 40.1 57.3 69.8 84.8 70.6 (-6.0)
  + SFTKL 13.3 6.7 90.1 36.7 60.0 66.4 85.3 70.5 (-6.1)
  + FLOW 16.7 13.3 91.1 40.4 57.5 71.5 85.2 71.4 (-5.2)
  + DFT 13.3 6.7 89.0 36.4 48.9 69.7 86.0 68.2 (-8.4)
  + TALR 15.6 13.3 91.2 40.0 57.4 71.3 84.5 71.5 (-5.1)
  + EAFT 13.3 13.3 91.5 39.4 60.8 72.0 85.3 72.7 (-3.9)

Medical Domain Results

Results on target (Medical) and General domain benchmarks. We evaluate performance on MedMCQA, MedQA, and PubMedQA for the medical domain, alongside MMLU, IFEval, and CLUEWSC for general capabilities. The top two outcomes are bolded and underlined. All results are averaged over three independent runs.

Method Medical Domain Medical Avg. General Domain General Avg.
MedMCQA MedQA PubMedQA MMLU IFEval CLUEWSC
Qwen3-4B-Thinking 63.5 78.2 76.0 72.6 79.3 85.0 94.1 86.1
  + SFT 63.3 79.5 78.0 73.6 78.3 75.3 90.4 81.3 (-4.8)
  + EAFT 63.9 80.0 77.2 73.7 80.1 81.7 91.8 84.5 (-1.6)

Analysis and Findings

Entropy Distribution Analysis: Visualization across different domains showing distinct entropy patterns. Mathematical reasoning tasks show higher entropy (more uncertainty), while medical tasks show lower entropy (more confident predictions). This validates EAFT's adaptive gating mechanism.

Gradient Magnitude Comparison: EAFT successfully identifies and suppresses gradients from confident conflicts (low entropy, low probability tokens), while maintaining full supervision on uncertain tokens (high entropy). This results in more stable training and better preservation of general capabilities.

Model Size SFT Downstream SFT General EAFT Downstream EAFT General General Gain
Qwen2.5 32B 66.5 81.6 65.5 83.7 +2.6%
GLM-4 9B 40.1 70.6 39.4 72.7 +1.6%
Qwen3 4B 69.4 76.5 69.3 80.1 +4.7%

Performance Comparison: EAFT consistently preserves general capabilities while maintaining downstream performance across different model sizes. Findings: EAFT achieves significant improvements in general capability preservation without sacrificing downstream performance.

BibTeX Citation

@article{diao2026entropy,
  title={Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting},
  author={Diao, Muxi and Yang, Lele and Gong, Wuxuan and Zhang, Yutong and Yan, Zhonghao and Han, Yufei and Liang, Kongming and Xu, Weiran and Ma, Zhanyu},
  journal={arXiv preprint arXiv:2601.02151},
  year={2026}
}