Adversarial Robustness vs Continual Learning: A Gradient Conflict

21 Sep, 2025

Modern machine learning systems face two of the most important and challenging requirements:

Continual Learning (CL):
The ability to learn sequential tasks without catastrophic forgetting when learning a new task causes a model to forget previously learned ones.
Adversarial Robustness (AR):
The ability to resist adversarial attacks, i.e., small, carefully crafted perturbations to input data that fool the model into making incorrect predictions.

Both are essential for real-world AI agents.
Imagine a robot assistant:

It must continuously learn new user commands while retaining old ones.
It must stay robust to noisy, adversarial environments where malicious actors may perturb its sensors.

Unfortunately, these two goals clash at the optimization level.
This blog post explores the mathematical conflict between continual learning and adversarial robustness by deriving the combined gradient, showing why it is fundamentally difficult to satisfy both objectives simultaneously.

1. Setting Up the Problem

We have a neural network with parameters $θ$ . At each timestep, it is trained on:

A new task dataset $𝒟_{t} = {(x, y)}$ .
Potential adversarial perturbations $δ$ applied to inputs.

Key components:

Task loss (Continual Learning):
To prevent catastrophic forgetting, we use Elastic Weight Consolidation (EWC): $$ \mathcal{L}{task} = \mathcal{L}{current} + \lambda \sum_{i} \frac{\theta_i^2}{2\sigma_i2} $$ Where:
- $ℒ_{c u r r e n t}$ : Standard loss (e.g., cross-entropy) for the current task.
- $λ$ : Hyperparameter controlling importance of stability.
- $σ_{i}^{2}$ : Fisher Information, representing importance of parameter $i$ for old tasks.
Adversarial loss:
Adversarial training uses a worst-case perturbation within an $ϵ$ -ball: $$ \mathcal{L}{adv} = \max{\delta \in \mathcal{B}_\epsilon(x)} \mathcal{L}(f(x + \delta; \theta), y) $$ where:
- $ℬ_{ϵ} (x) = {δ ∣ ‖ δ ‖_{p} \leq ϵ}$ is the perturbation space.
- $f (x + δ; θ)$ : Model prediction for perturbed input.

2. Combined Loss Function

We combine both objectives into one total loss:

ℒ_{c o m b i n e d} = ℒ_{t a s k} + α ℒ_{a d v}

Substituting the expressions:

ℒ_{c o m b i n e d} = ℒ_{c u r r e n t} + λ \sum_{i} \frac{θ_{i}^{2}}{2 σ_{i}^{2}} + α {max}_{δ \in ℬ_{ϵ} (x)} ℒ (f (x + δ; θ), y)

Here, $α$ controls the relative weight given to adversarial robustness.

3. Gradient of the Combined Loss

To optimize the network, we compute the gradient with respect to the parameters $θ$ :

\nabla_{θ} ℒ_{c o m b i n e d} = \nabla_{θ} ℒ_{t a s k} + α \nabla_{θ} ℒ_{a d v}

3.1 Gradient of Task Loss

The task loss has two parts:

ℒ_{t a s k} = ℒ_{c u r r e n t} + λ \sum_{i} \frac{θ_{i}^{2}}{2 σ_{i}^{2}}

Taking the gradient:

Current task gradient: $$ \nabla_{\theta} \mathcal{L}{current} = \frac{\partial \mathcal{L}{current}}{\partial \theta} $$
EWC regularization gradient: $$ \nabla_{\theta} \Big(\lambda \sum_i \frac{\theta_i^2}{2\sigma_i2} \Big) = \lambda \sum_i \frac{\theta_i}{\sigma_i^2} $$

Combined task gradient:

\nabla_{θ} ℒ_{t a s k} = \nabla_{θ} ℒ_{c u r r e n t} + λ \sum_{i} \frac{θ_{i}}{σ_{i}^{2}}

3.2 Gradient of Adversarial Loss

The adversarial loss is a max over perturbations:

ℒ_{a d v} = {max}_{δ \in ℬ_{ϵ} (x)} ℒ (f (x + δ), y)

Let $δ^{⋆}$ be the optimal perturbation:

δ^{⋆} = \arg {max}_{δ \in ℬ_{ϵ} (x)} ℒ (f (x + δ; θ), y)

Using the envelope theorem (ignoring second-order effects of $δ^{⋆}$ on $θ$ ):

\nabla_{θ} ℒ_{a d v} = \nabla_{θ} ℒ (f (x + δ^{⋆}; θ), y)

3.3 Total Gradient Expression

Substituting into the combined gradient:

\nabla_{θ} ℒ_{c o m b i n e d} = \underset{task gradient}{\underset{⏟}{\nabla_{θ} ℒ_{c u r r e n t}}} + \underset{EWC regularizer}{\underset{⏟}{λ \sum_{i} \frac{θ_{i}}{σ_{i}^{2}}}} + \underset{adversarial gradient}{\underset{⏟}{α \nabla_{θ} ℒ (f (x + δ^{⋆}; θ), y)}}

Component-wise for parameter $j$ :

\frac{\partial ℒ_{c o m b i n e d}}{\partial θ_{j}} = \frac{\partial ℒ_{c u r r e n t}}{\partial θ_{j}} + λ \frac{θ_{j}}{σ_{j}^{2}} + α \frac{\partial ℒ (f (x + δ^{⋆}; θ), y)}{\partial θ_{j}}

4. The Gradient Conflict

The key insight:
Continual learning and adversarial robustness push parameters in opposing directions.

EWC Term ( $λ \frac{θ_{j}}{σ_{j}^{2}}$ ):
- Encourages stability, pulling parameters toward old values to retain past knowledge.
- Acts like a conservative "spring" preventing drastic updates.
Adversarial Gradient ( $α \frac{\partial ℒ (f (x + δ^{⋆}), y)}{\partial θ_{j}}$ ):
- Encourages robustness, often requiring large, aggressive updates to increase decision margins.

If these gradients point in opposite directions, they cancel out, slowing or even reversing progress on one objective.

Measuring Conflict: Cosine Similarity

We can quantify the conflict between gradients using their cosine similarity:

\cos (ϕ) = \frac{⟨ g_{E W C}, g_{A d v} ⟩}{‖ g_{E W C} ‖ ‖ g_{A d v} ‖}

Where:

$g_{E W C} = λ Σ^{- 1} θ$ (EWC gradient),
$g_{A d v} = \nabla_{θ} ℒ (f (x + δ^{⋆}), y)$ .
If $\cos (ϕ) > 0$ : The gradients are aligned (cooperative).
If $\cos (ϕ) < 0$ : The gradients are conflicting (antagonistic).

In practice, continual learning vs adversarial robustness often yields negative cosine similarity, highlighting their opposing nature.

5. Why This Conflict Matters

This conflict makes it fundamentally hard to jointly optimize continual learning and adversarial robustness:

Too much weight on EWC → Model remembers old tasks but becomes brittle to attacks.
Too much weight on adversarial training → Model is robust but forgets past knowledge quickly.

6. Potential Solutions

Several strategies can mitigate this gradient conflict:

Two-phase training:
Alternate between stability-focused updates and robustness-focused updates.
Gradient projection:
Modify $g_{A d v}$ to be orthogonal to $g_{E W C}$ , reducing destructive interference.
Meta-learning:
Learn dynamic scaling factors $α, λ$ to balance conflict adaptively.
Memory-augmented methods:
Store exemplars from old tasks to stabilize learning without overly rigid regularization.

7. Summary

Objective	Gradient Behavior	Effect on Model
Continual Learning	Conservative, stabilizing ( $g_{E W C}$ )	Prevents forgetting but resists change
Adversarial Robustness	Aggressive, disruptive ( $g_{A d v}$ )	Improves robustness but risks forgetting

The core tension arises because these objectives require fundamentally different gradient directions.
Understanding and measuring this conflict is the first step toward building systems that can both adapt continually and stay robust.

8. Final Equation Recap

The full gradient expression we derived:

\nabla_{θ} ℒ_{c o m b i n e d} = \nabla_{θ} ℒ_{c u r r e n t} + λ Σ^{- 1} θ + α \nabla_{θ} ℒ (f (x + δ^{⋆}), y)

Where:

$Σ = d i a g (σ_{1}^{2}, σ_{2}^{2}, \dots)$ .

This simple equation captures a deep conflict:

$λ Σ^{- 1} θ$ pushes for remembering,
$α \nabla_{θ} ℒ (f (x + δ^{⋆}), y)$ pushes for robustness,
And the balance between them determines whether the model thrives or collapses.

9. Final Words

Balancing continual learning and adversarial robustness is like walking a tightrope:

Lean too far one way, and you forget old knowledge.
Lean too far the other way, and you become vulnerable to attacks.

The gradient-level view we explored provides a mathematical foundation for this trade-off and opens up a new research direction:
How can we design optimization algorithms that align these conflicting gradients?

In future posts, we'll explore gradient projection techniques and meta-learning approaches to resolve this tension, moving closer to building truly adaptive and robust AI systems.