Vishal Pandey | ML Research Engineer; Neuroscience

Adversarial Robustness vs Continual Learning: A Gradient Conflict

Modern machine learning systems face two of the most important and challenging requirements:

  1. Continual Learning (CL):
    The ability to learn sequential tasks without catastrophic forgetting when learning a new task causes a model to forget previously learned ones.

  2. Adversarial Robustness (AR):
    The ability to resist adversarial attacks, i.e., small, carefully crafted perturbations to input data that fool the model into making incorrect predictions.

Both are essential for real-world AI agents.
Imagine a robot assistant:

Unfortunately, these two goals clash at the optimization level.
This blog post explores the mathematical conflict between continual learning and adversarial robustness by deriving the combined gradient, showing why it is fundamentally difficult to satisfy both objectives simultaneously.


1. Setting Up the Problem

We have a neural network with parameters θ. At each timestep, it is trained on:

Key components:

  1. Task loss (Continual Learning):
    To prevent catastrophic forgetting, we use Elastic Weight Consolidation (EWC): $$ \mathcal{L}{task} = \mathcal{L}{current} + \lambda \sum_{i} \frac{\theta_i2}{2\sigma_i2} $$ Where:

    • current: Standard loss (e.g., cross-entropy) for the current task.
    • λ: Hyperparameter controlling importance of stability.
    • σi2: Fisher Information, representing importance of parameter i for old tasks.
  2. Adversarial loss:
    Adversarial training uses a worst-case perturbation within an ϵ-ball: $$ \mathcal{L}{adv} = \max{\delta \in \mathcal{B}_\epsilon(x)} \mathcal{L}(f(x + \delta; \theta), y) $$ where:

    • ϵ(x)={δδpϵ} is the perturbation space.
    • f(x+δ;θ): Model prediction for perturbed input.

2. Combined Loss Function

We combine both objectives into one total loss:

combined=task+αadv

Substituting the expressions:

combined=current+λiθi22σi2+αmaxδϵ(x)(f(x+δ;θ),y)

Here, α controls the relative weight given to adversarial robustness.


3. Gradient of the Combined Loss

To optimize the network, we compute the gradient with respect to the parameters θ:

θcombined=θtask+αθadv

3.1 Gradient of Task Loss

The task loss has two parts:

task=current+λiθi22σi2

Taking the gradient:

  1. Current task gradient: $$ \nabla_{\theta} \mathcal{L}{current} = \frac{\partial \mathcal{L}{current}}{\partial \theta} $$

  2. EWC regularization gradient: $$ \nabla_{\theta} \Big(\lambda \sum_i \frac{\theta_i2}{2\sigma_i2} \Big) = \lambda \sum_i \frac{\theta_i}{\sigma_i^2} $$

Combined task gradient:

θtask=θcurrent+λiθiσi2

3.2 Gradient of Adversarial Loss

The adversarial loss is a max over perturbations:

adv=maxδϵ(x)(f(x+δ),y)

Let δ be the optimal perturbation:

δ=\argmaxδϵ(x)(f(x+δ;θ),y)

Using the envelope theorem (ignoring second-order effects of δ on θ):

θadv=θ(f(x+δ;θ),y)

3.3 Total Gradient Expression

Substituting into the combined gradient:

θcombined=θcurrenttask gradient+λiθiσi2EWC regularizer+αθ(f(x+δ;θ),y)adversarial gradient

Component-wise for parameter j:

combinedθj=currentθj+λθjσj2+α(f(x+δ;θ),y)θj

4. The Gradient Conflict

The key insight:
Continual learning and adversarial robustness push parameters in opposing directions.

If these gradients point in opposite directions, they cancel out, slowing or even reversing progress on one objective.


Measuring Conflict: Cosine Similarity

We can quantify the conflict between gradients using their cosine similarity:

cos(ϕ)=gEWC,gAdvgEWCgAdv

Where:

In practice, continual learning vs adversarial robustness often yields negative cosine similarity, highlighting their opposing nature.


5. Why This Conflict Matters

This conflict makes it fundamentally hard to jointly optimize continual learning and adversarial robustness:


6. Potential Solutions

Several strategies can mitigate this gradient conflict:

  1. Two-phase training:
    Alternate between stability-focused updates and robustness-focused updates.

  2. Gradient projection:
    Modify gAdv to be orthogonal to gEWC, reducing destructive interference.

  3. Meta-learning:
    Learn dynamic scaling factors α,λ to balance conflict adaptively.

  4. Memory-augmented methods:
    Store exemplars from old tasks to stabilize learning without overly rigid regularization.


7. Summary

Objective Gradient Behavior Effect on Model
Continual Learning Conservative, stabilizing (gEWC) Prevents forgetting but resists change
Adversarial Robustness Aggressive, disruptive (gAdv) Improves robustness but risks forgetting

The core tension arises because these objectives require fundamentally different gradient directions.
Understanding and measuring this conflict is the first step toward building systems that can both adapt continually and stay robust.


8. Final Equation Recap

The full gradient expression we derived:

θcombined=θcurrent+λΣ1θ+αθ(f(x+δ),y)

Where:

This simple equation captures a deep conflict:


9. Final Words

Balancing continual learning and adversarial robustness is like walking a tightrope:

The gradient-level view we explored provides a mathematical foundation for this trade-off and opens up a new research direction:
How can we design optimization algorithms that align these conflicting gradients?

In future posts, we'll explore gradient projection techniques and meta-learning approaches to resolve this tension, moving closer to building truly adaptive and robust AI systems.