Cross-Lingual Activation Steering for Multilingual Language Models

Summary

This paper introduces Cross-Lingual Activation Steering (CLAS), a training-free inference-time method to improve multilingual performance in large language models. CLAS addresses the performance gap between dominant (high-resource) and non-dominant (low-resource) languages by modulating neuron activations during inference. It selectively boosts shared neurons and suppresses language-specific ones, rebalancing representations to enhance cross-lingual transfer. The method uses parallel inputs to analyze neuron behavior, categorizes neurons into shared and language-specific types, and applies a lightweight steering rule in selected "bridge layers" near the model's output. Experiments on XNLI and XQuAD benchmarks show CLAS improves average accuracy by 1.93% and F1 score by 0.94% for Llama, and 0.45% and 1.10% for Qwen, respectively, while maintaining anchor language (English) performance. The results indicate that effective transfer arises from functional divergence rather than strict alignment to the anchor language. CLAS outperforms a baseline intervention method (Intμ) and demonstrates that targeted activation steering can unlock latent multilingual capacity without modifying model weights.

PDF viewer

Chunks(27)

Chunk 0 · 1,992 chars

Cross-Lingual Activation Steering for Multilingual Language Models
Rhitabrat Pokharel1, Ameeta Agrawal1, Tanay Nagar2
1Department of Computer Science, Portland State University, USA
{pokharel,ameeta}@pdx.edu
2Independent Researcher
tanaynagar7@gmail.com
Abstract
Large language models exhibit strong mul-
tilingual capabilities, yet significant perfor-
mance gaps persist between dominant and non-
dominant languages. Prior work attributes
this gap to imbalances between shared and
language-specific neurons in multilingual repre-
sentations. We propose Cross-Lingual Activa-
tion Steering (CLAS), a training-free inference-
time intervention that selectively modulates
neuron activations. We evaluate CLAS on clas-
sification and generation benchmarks, achiev-
ing average improvements of 2.3% (Acc.)
and 3.4% (F1) respectively, while maintaining
high-resource language performance. We dis-
cover that effective transfer operates through
functional divergence rather than strict align-
ment; performance gains correlate with in-
creased language cluster separation. Our re-
sults demonstrate that targeted activation steer-
ing can unlock latent multilingual capacity in
existing models without modification to model
weights.1
1 Introduction
Large Language Models (LLMs) perform well
on many multilingual tasks, but a substantial per-
formance gap remains between dominant (high-
resource) and non-dominant (low-resource) lan-
guages. This gap is largely attributed to the heavy
skew of pre-training corpora toward dominant lan-
guages, which enables models to develop richer
representations in those languages. While expand-
ing multilingual training data is an obvious solution,
it is often infeasible due to cost and limited data
availability.
Prior work has explored data- and training-based
solutions such as multilingual instruction tuning
(Chen et al., 2024b; Shaham et al., 2024), super-
vised fine-tuning (Chen et al., 2024a), and model
alignment using smaller multilingual datasets

Chunk 1 · 1,997 chars

olution,
it is often infeasible due to cost and limited data
availability.
Prior work has explored data- and training-based
solutions such as multilingual instruction tuning
(Chen et al., 2024b; Shaham et al., 2024), super-
vised fine-tuning (Chen et al., 2024a), and model
alignment using smaller multilingual datasets (She
1We will release the code to support future work.
et al., 2024; Gao et al., 2024; Pokharel et al., 2025).
Although effective, these methods still depend on
annotated datasets or additional training.
A complementary line of work studies multilin-
gual behavior at the neuron level as a lightweight
alternative. Researchers have identified shared, par-
tially shared, and language-specific neurons (Wang
et al., 2024; Tang et al., 2024), and shown that
shared neurons and overlapping subspaces in mid-
dle and upper layers play a central role in cross-
lingual transfer (Tezuka and Inoue, 2025; Xu et al.,
2025). At the same time, direct manipulation of
language-specific neurons has produced mixed re-
sults (Mondal et al., 2025). Other work suggests
that multilingual models often reason through an
English-like latent space before generating outputs
in the target language (Etxaniz et al., 2024; Zhao
et al., 2024a).
Building on these insights, we introduce Cross-
Lingual Activation Steering (CLAS), a test-time
neuron-level method that rebalances shared and
language-specific representations during infer-
ence. CLAS gently boosts neurons that encode
cross-lingual structure, suppresses those that over-
specialize to a single language, and blends the re-
sult with the model’s original activations. This
steers the model toward representations that better
support cross-lingual transfer, improving perfor-
mance on non-dominant languages without requir-
ing additional data or parameter updates.
While Mondal et al. (2025) also explore test-
time neuron interventions, their approach differs
substantially from ours. Their method overwrites
selected neuron activations

Chunk 2 · 1,998 chars

better
support cross-lingual transfer, improving perfor-
mance on non-dominant languages without requir-
ing additional data or parameter updates.
While Mondal et al. (2025) also explore test-
time neuron interventions, their approach differs
substantially from ours. Their method overwrites
selected neuron activations with corpus-level sta-
tistical constants (e.g., mean or percentile values),
producing the same activation regardless of the in-
put and effectively erasing and re-imprinting those
neurons. In contrast, our method preserves propor-
tionality to the model’s actual activations: the final
representation remains a blend with the original
signal rather than a hard replacement.
arXiv:2601.16390v1 [cs.CL] 23 Jan 2026

-- 1 of 11 --

Our main contributions are as follows:
• We propose CLAS, a training-free activation
steering mechanism that selectively modulates
neurons to enhance cross-lingual transfer dur-
ing inference.
• We conduct comprehensive analysis of cross-
lingual representations and find that effective
transfer is driven by functional divergence
rather than proximity to the anchor language.
• We demonstrate CLAS effectiveness across
diverse tasks and models, achieving improve-
ments on both classification and generation
benchmarks while maintaining anchor lan-
guage stability.
2 Cross-Lingual Activation Steering
(CLAS)
We introduce Cross-Lingual Activation Steering
(CLAS), a training-free, test-time intervention for
multilingual models. CLAS has three stages: (i)
construct parallel inputs across languages, (ii) sum-
marize neuron behavior using simple activation
statistics and group neurons into coarse categories,
and (iii) apply a lightweight steering rule that mod-
ulates activations at inference time. This section
describes each component.
2.1 Preliminaries
We define a configuration C =
{L, ℓanchor, B, Tact, β, γ, α}, where L is the
set of languages used for analysis, ℓanchor is an
anchor language (typically the model’s strongest
language), B

Chunk 3 · 1,995 chars

lightweight steering rule that mod-
ulates activations at inference time. This section
describes each component.
2.1 Preliminaries
We define a configuration C =
{L, ℓanchor, B, Tact, β, γ, α}, where L is the
set of languages used for analysis, ℓanchor is an
anchor language (typically the model’s strongest
language), B is the set of “bridge layers” se-
lected for intervention, and Tact is the activation
threshold for determining neuron activity. Finally,
(β, γ, α) control the strength of boosting selected
activations, suppressing selected activations, and
blending between streams during steering.
2.2 Parallel Input Construction
To analyze neuron behavior across languages, we
require aligned multilingual inputs that express the
same underlying content. For each sample index i,
we construct a set of parallel inputs:
x(i) = {x(i)
ℓ | ℓ ∈ L},
where each x(i)
ℓ is the same text expressed in lan-
guage ℓ. These parallel inputs are used only for
measuring neuron activations and do not update the
model in any way.
0 	1 	2 	3 	4 	5 	6 	7 	8 	9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31	
0
20
40
60
80
100
Dead 	Lang-Specific 	Partial-Shared 	All-Shared
Layer Index
% of Neuron Activation
Bridge Layers
(a) Llama
0 	1 	2 	3 	4 	5 	6 	7 	8 	9 	10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27	
0
20
40
60
80
100
Dead 	Lang-Specific 	Partial-Shared 	All-Shared
Layer Index
% of Neuron Activations
Bridge Layers
(b) Qwen
Figure 1: Distribution of types of neuron per layer
across models. Llama has a total of 32 layers and Qwen
has 28 layers. The bridge layers (purple shade) are
selected near the final layers where the partial-shared
language activations are the highest.
2.3 Neuron Statistics and Categorization
To understand the model’s multilingual structure
and identify where CLAS should intervene, we first
analyze neuron activations across languages and
layers. Following (Wang et al., 2024), we group
neurons into four mutually exclusive categories:
dead

Chunk 4 · 1,996 chars

activations are the highest.
2.3 Neuron Statistics and Categorization
To understand the model’s multilingual structure
and identify where CLAS should intervene, we first
analyze neuron activations across languages and
layers. Following (Wang et al., 2024), we group
neurons into four mutually exclusive categories:
dead (never active), language-specific (active for
one language), partial-shared (active for some lan-
guages), and all-shared (active for all languages).
However, we differ from prior work in how these
categories are computed. Wang et al. (2024) assign
categories at the instance level: a neuron is con-
sidered all-shared if it activates for all languages
on a single parallel example. As a result, neuron
categories can vary across inputs and tasks. In
contrast, we adopt a dataset-level view. We com-
pute the mean activation of each neuron using 100
parallel samples from the XQuAD dataset across
12 languages and two models (Llama 3.1 8B and
Qwen 2.5 7B) and use this aggregate statistic to as-
sign each neuron to a single category. This makes
each neuron’s category a stable property of the
model and language set rather than something that
changes across examples.

-- 2 of 11 --

Layer-wise Activation Patterns For each layer,
we compute the percentage of neurons in each cat-
egory. Figure 1 summarizes the resulting distribu-
tions. We observe several consistent patterns:
Early layers. Early layers contain a high propor-
tion of partial-shared neurons, suggesting that these
layers are involved in mapping language-specific
or language-related features into shared representa-
tions (Tang et al., 2024; Zhang et al., 2025).
Middle layers. In the middle layers, we observe
an increase in all-shared neurons alongside a rise
in dead neurons and a reduction in partial-shared
neurons. This is consistent with prior findings that
mid-level representations become more language-
agnostic and semantically focused (Zhang et al.,
2025; Xu et al., 2025).
Final layers. In the

Chunk 5 · 1,998 chars

ddle layers, we observe
an increase in all-shared neurons alongside a rise
in dead neurons and a reduction in partial-shared
neurons. This is consistent with prior findings that
mid-level representations become more language-
agnostic and semantically focused (Zhang et al.,
2025; Xu et al., 2025).
Final layers. In the upper layers, partial-shared
neurons become more prevalent again, and dead
neurons decrease. This aligns with evidence that
language-specific processing re-emerges near the
output as the model prepares to generate text in
the target language (Tang et al., 2024; Zhang et al.,
2025).
These observations are descriptive and do not
imply that these categories are sharply separated or
functionally pure; rather, they provide a coarse sum-
mary of how multilingual structure is distributed
across layers.
Selecting Bridge Layers Based on these statis-
tics, we select a small set of bridge layers for inter-
vention. In both LLaMA and Qwen, layers 24-29
and 24-25, respectively, exhibit relatively high pro-
portions of partial-shared neurons together with
relatively low proportions of dead and language-
specific neurons. We therefore hypothesize that in-
tervening in these layers provides a useful balance:
representations are shared enough to support cross-
lingual steering, while not yet so specialized that
small perturbations directly disrupt surface gener-
ation. We exclude the final two layers from inter-
vention, as these layers are known to be strongly
specialized for producing language-specific outputs
and are particularly sensitive to perturbation (Zhao
et al., 2024b). All layer selections are fixed prior
to evaluation and are not tuned on test data.
2.4 Activation Steering via CLAS
Given an input xℓ in language ℓ, the model com-
putes an intermediate MLP activation using the
standard SwiGLU transformation:
h = σ(Wgx) ⊙ (Wux),
where Wg and Wu are the gating and up-projection
matrices, and σ is the nonlinear activation function.
For non-anchor languages ℓ̸

Chunk 6 · 1,999 chars

st data.
2.4 Activation Steering via CLAS
Given an input xℓ in language ℓ, the model com-
putes an intermediate MLP activation using the
standard SwiGLU transformation:
h = σ(Wgx) ⊙ (Wux),
where Wg and Wu are the gating and up-projection
matrices, and σ is the nonlinear activation function.
For non-anchor languages ℓ̸ = ℓanchor, CLAS
applies a lightweight, deterministic modification
to this activation using masks derived from neuron
categories, with the goal of adjusting the relative
contribution of shared and language-specific neu-
rons.
Partial-shared neuron adjustment Let Mshared
denote the mask over partial-shared neurons. Em-
pirically, these neurons often account for a large
fraction of the active dimensions, which can reduce
the relative influence of language-specific features.
We therefore apply a controlled rescaling:
h1 = h ⊙ (1 + βMshared),
where β controls the magnitude of the adjustment
applied to partial-shared neurons. This operation
does not introduce new information but changes
the relative weighting of existing components.
Language-specific neuron adjustment Simi-
larly, let Mspec denote the mask over language-
specific neurons. These neurons are typically
under-represented. We apply a complementary ad-
justment:
h2 = h1 ⊙ (1 − γMspec).
where γ controls the strength of the adjustment
applied to language-specific neurons.
Blend Adjustment We then blend the modified
activation with the original:
hfinal = (1 − α)h + α h2,
where α controls the overall strength and direction
of the intervention.
Positive α increases the influence of the adjusted
representation, emphasizing the relative contribu-
tion of shared neurons.
Negative α reduces the influence of the adjusted
representation and increases the relative contribu-
tion of language-specific components. In practice,
we treat α as a steering coefficient whose effect is
model- and task-dependent; both positive and neg-
ative values can be beneficial in different regimes.
The adjusted activation is then

Chunk 7 · 1,988 chars

he influence of the adjusted
representation and increases the relative contribu-
tion of language-specific components. In practice,
we treat α as a steering coefficient whose effect is
model- and task-dependent; both positive and neg-
ative values can be beneficial in different regimes.
The adjusted activation is then passed through
the down-projection:
y = Wdhfinal.

-- 3 of 11 --

2.5 Anchor Language Handling
For the anchor language ℓanchor (i.e. English), no
modification is applied:
hfinal = h.
We keep the anchor activations untouched because
the anchor language serves as a stable reference
point for cross-lingual alignment. Modifying it
would risk introducing unnecessary distortion into
a representation that is already well supported by
the model. All adjustments are therefore applied
only to non-anchor languages, allowing them to
shift relative to a fixed reference.
3 Experimental Setting
We describe the models, datasets, and evaluation
setup used to assess CLAS in a controlled multilin-
gual setting, focusing on cross-lingual transfer to
non-English languages.
3.1 Models
We use LLaMA 3.1 8B Instruct (Grattafiori et al.,
2024) and Qwen 2.5 7B Instruct (Team, 2024). All
models are kept frozen; CLAS is applied only at
inference time and does not modify model parame-
ters. We intervene only in the selected bridge layers
defined by the configuration.
3.2 Datasets and Languages
We evaluate on two benchmarks:
XNLI (Conneau et al., 2018) is a natural language
inference dataset with parallel data in 15 languages.
For evaluation, we consider a constrained genera-
tion approach. We first prompt the model with the
premise and hypothesis and instruct it to classify
the relationship by predicting a single integer token:
“0” (Entailment), “1” (Neutral), or “2” (Contradic-
tion). Then we extract the logits for these specific
target tokens from the final position and select the
class with the highest probability. The results are
reported in terms of Accuracy and F1

Chunk 8 · 1,998 chars

ct it to classify
the relationship by predicting a single integer token:
“0” (Entailment), “1” (Neutral), or “2” (Contradic-
tion). Then we extract the logits for these specific
target tokens from the final position and select the
class with the highest probability. The results are
reported in terms of Accuracy and F1 scores.
XQuAD (Artetxe et al., 2019) is a multilingual
question answering dataset in 12 languages. For
evaluation, we consider a generative reading com-
prehension task where the model is provided with
the context paragraph and question, then prompted
to generate the answer span directly. We employ
greedy decoding with a strict maximum new to-
ken limit of 32. The results are reported using the
token-level F1 score.
3.3 Evaluation and Implementation
Neuron statistics are computed using the parallel
subset (100 samples) where the same example is
available in all analysis languages; downstream
evaluation uses the full dataset for each language
separately.
We evaluate a mix of high- and low-resource
languages and use English as the anchor language,
since prior work shows multilingual models often
rely on English-like internal representations. All
other languages are treated as non-anchor and used
to measure cross-lingual effects.
All experiments are run on a single NVIDIA
A40 GPU with a maximum sequence length of
512. Samples are processed individually (no batch-
ing) to ensure accurate activation capture. Steering
parameters (β, γ, α) are selected via grid search
(more details in §4.4).
4 Results and Discussion
In this section, we will discuss the findings of the
experiments with CLAS and how it improves mul-
tilingual performance.
4.1 Downstream Performance
Tables 1 and 2 summarize the main results, com-
paring CLAS against base model and Intμ baseline
method from Mondal et al. (2025) where mean
value is used during intervention.
XNLI classification On XNLI, CLAS improves
average accuracy over the base models, although
the magnitude and consistency of

Chunk 9 · 1,996 chars

nstream Performance
Tables 1 and 2 summarize the main results, com-
paring CLAS against base model and Intμ baseline
method from Mondal et al. (2025) where mean
value is used during intervention.
XNLI classification On XNLI, CLAS improves
average accuracy over the base models, although
the magnitude and consistency of the gains differ.
For Llama, CLAS yields an average improvement
of +1.93 accuracy points over the baseline, which is
statistically significant ( p < 0.05). These gains are
driven by large improvements in several languages,
including Urdu (+7.00), Chinese (+5.89), Hindi
(+5.80), and Greek (+5.39), although some lan-
guages show regressions (e.g., Bulgarian, Swahili,
and Vietnamese). This indicates that CLAS can
substantially help underperforming languages but
may also introduce instability in some cases. For
Qwen, the average improvement is smaller (+0.45)
but more consistent across languages, and also sta-
tistically significant (p < 0.001). Most gains fall
between +0.2 and +0.8, and none are strongly nega-
tive. This suggests that CLAS is more conservative
on Qwen: it yields smaller but more stable im-
provements, reflecting Qwen’s stronger baseline
multilingual representations. On both models, per-
formance on the anchor language (English) remains

-- 4 of 11 --

Llama Qwen
l base Int μ CLAS base Int μ CLAS
ar 38.88 -0.76 +4.71 60.45 +0.09 +0.45
bg 42.28 -1.60 -2.14 60.75 -0.01 +0.71
de 43.65 +2.12 +1.62 66.22 +0.11 +0.77
el 40.58 +0.72 +5.39 58.02 +0.04 +0.48
en 52.63 0.00 0.00 71.73 +0.05 +0.05
es 46.95 -0.36 -0.72 65.56 -0.05 +0.73
fr 44.53 -1.42 +2.34 64.47 -0.14 +0.66
hi 41.78 -0.68 +5.80 55.56 -0.05 +0.51
ru 43.75 +0.04 +1.14 61.87 -0.17 +0.25
sw 41.10 +0.32 -3.06 40.31 -0.35 +0.21
th 40.68 +3.71 +4.09 57.80 -0.02 +0.38
tr 44.17 -0.36 +0.66 59.18 -0.08 +0.72
ur 36.09 +0.28 +7.00 51.07 +0.09 +0.43
vi 44.81 -1.96 -3.73 62.11 +0.07 +0.31
zh 44.63 -3.25 +5.89 63.81 0.00 +0.08
Avg. 43.10 -0.21 +1.93∗ 59.93 -0.03 +0.45⋆
σ 2.82 2.77 3.19 6.78 6.84

Chunk 10 · 1,997 chars

1.87 -0.17 +0.25
sw 41.10 +0.32 -3.06 40.31 -0.35 +0.21
th 40.68 +3.71 +4.09 57.80 -0.02 +0.38
tr 44.17 -0.36 +0.66 59.18 -0.08 +0.72
ur 36.09 +0.28 +7.00 51.07 +0.09 +0.43
vi 44.81 -1.96 -3.73 62.11 +0.07 +0.31
zh 44.63 -3.25 +5.89 63.81 0.00 +0.08
Avg. 43.10 -0.21 +1.93∗ 59.93 -0.03 +0.45⋆
σ 2.82 2.77 3.19 6.78 6.84 6.87
Table 1: Accuracy and improvements on XNLI using
Llama and Qwen. en is removed during statistical anal-
ysis. Asterisks denote statistical significance of the im-
provement over the baseline (paired t-test): ∗p < 0.05,
⋆p < 0.001.
Llama Qwen
l base Int μ CLAS base Int μ CLAS
ar 23.20 -1.95 +1.01 28.73 +0.26 +5.06
de 38.54 -5.46 +6.25 34.77 -2.76 -0.96
el 25.18 +1.50 +0.23 37.06 -0.06 -1.51
en 23.47 0.00 0.00 48.88 +0.11 0.00
es 34.23 +1.08 +1.84 38.53 -2.53 +2.63
hi 29.47 -0.09 -0.10 33.12 -2.12 -5.28
ro 30.01 +6.97 -0.03 34.48 -0.48 +0.48
ru 27.32 -0.08 -0.06 27.58 -0.58 +8.94
th 19.23 -3.40 -0.46 28.51 +7.48 +0.98
tr 23.74 -2.11 +1.14 29.19 -2.19 -2.31
vi 34.91 +5.08 +1.67 31.67 -5.67 +1.36
zh 12.84 +0.05 -0.25 20.24 -1.24 +3.76
Avg. 26.85 +0.13 +0.94 32.73 -0.82 +1.10
σ 7.43 8.75 8.83 5.16 5.45 4.95
Table 2: F1 scores and improvements on XQuAD for
Llama and Qwen models. en is removed during sta-
tistical analysis. Statistical analysis (paired t-test) indi-
cates that the improvements for both Llama (p = 0.10)
and Qwen (p = 0.33) are not statistically significant
(p > 0.05).
unchanged, indicating that CLAS does not degrade
anchor-language behavior while modifying non-
anchor representations. Compared to Int μ, CLAS
consistently performs better: Int μ is slightly harm-
ful on Llama and neutral on Qwen, whereas CLAS
yields positive mean gains on both.
XQuAD generative question answering On
XQuAD, CLAS also improves average perfor-
mance, but the effects are more variable and not
ar 	bg 	de 	el 	es 	fr 	hi 	ru 	sw 	th 	tr 	ur 	vi 	zh	
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cosine Similarity to English
Before CLAS 	After CLAS
(a) XNLI
ar 	de 	el 	es 	hi 	ro

Chunk 11 · 1,994 chars

mean gains on both.
XQuAD generative question answering On
XQuAD, CLAS also improves average perfor-
mance, but the effects are more variable and not
ar 	bg 	de 	el 	es 	fr 	hi 	ru 	sw 	th 	tr 	ur 	vi 	zh	
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cosine Similarity to English
Before CLAS 	After CLAS
(a) XNLI
ar 	de 	el 	es 	hi 	ro 	ru 	th 	tr 	vi 	zh	
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cosine Similarity to English
Before CLAS 	After CLAS
(b) XQuAD
Figure 2: Cosine similarity with English across lan-
guages on each task using Llama model. Similar results
(Appendix A) were obtained with Qwen model.
statistically significant. On Llama, CLAS improves
average token-level F1 by +0.94, with notable gains
for German (+6.25) and Spanish (+1.84), but small
regressions for others. Int μ has a negligible effect.
On Qwen, CLAS improves average F1 by +1.10,
with large gains for Russian (+8.94) and Arabic
(+5.06), but also substantial regressions for Hindi
(–5.28) and Turkish (–2.31). This indicates that
CLAS can unlock large improvements for some
languages but can also strongly harm others in the
generative setting. Compared to XNLI, XQuAD
exhibits much higher variance, with both large pos-
itive and negative swings. This is expected given
that span extraction is sensitive to token boundaries,
local lexical cues, and translation artifacts, making
it inherently less stable than classification.
Statistical perspective Our statistical analysis
shows that CLAS significantly improves cross-
lingual performance on discriminative tasks with-
out increasing variance across languages. On
XNLI, both Llama and Qwen show significant
gains after excluding English (p < 0.05 and
p < 0.001), while cross-language variance remains
stable (p > 0.05), indicating a uniform improve-
ment rather than trade-offs across languages. On
XQuAD, average F1 also increases, but the gains
are not statistically significant, likely due to higher
variability in generative evaluation.

-- 5 of 11 --

−1.0 	−0.5 	0.0 	0.5 	1.0
CLAS

Chunk 12 · 1,997 chars

cross-language variance remains
stable (p > 0.05), indicating a uniform improve-
ment rather than trade-offs across languages. On
XQuAD, average F1 also increases, but the gains
are not statistically significant, likely due to higher
variability in generative evaluation.

-- 5 of 11 --

−1.0 	−0.5 	0.0 	0.5 	1.0
CLAS Improvement
−1.0
−0.5
0.0
0.5
1.0
Alignment Change
viswbgestrrudefr 	tharelhi
zh
ur
Trend
(a) XNLI Llama
−1.0 	−0.5 	0.0 	0.5 	1.0
CLAS Improvement
−1.0
−0.5
0.0
0.5
1.0
Alignment Change
es thsw
zhtrviar
elruhibgur
frde
Trend
 (b) XNLI Qwen
−1.0 	−0.5 	0.0 	0.5 	1.0
CLAS Improvement
−1.0
−0.5
0.0
0.5
1.0
Alignment Change
th
zh
hi
ru
roelartrvies 	de
Trend
 (c) XQuAD Llama
−1.0 	−0.5 	0.0 	0.5 	1.0
CLAS Improvement
−1.0
−0.5
0.0
0.5
1.0
Alignment Change
hi 	treldero
thvi eszhar 	ru
Trend
 (d) XQuAD Qwen
Figure 3: Relationship between alignment change to English (y-axis) and CLAS performance improvement (x-axis)
for each language across different tasks and models.
4.2 Cross-Lingual Alignment Analysis
In this subsection, we analyze the impact of our
activation steering mechanism on representation
alignment and examine how these geometric shifts
relate to cross-lingual performance gains on XNLI
and XQuAD.
Decoupling geometric alignment from func-
tional performance Figure 2 shows cosine sim-
ilarity between each target language and English
in the bridge layers before and after CLAS. Across
tasks and models, CLAS generally reduces simi-
larity to English rather than increasing it. This ef-
fect is strongest for Llama, especially on XQuAD,
where several languages move noticeably farther
from English, while Qwen shows smaller and more
uniform changes.
This pattern matches our neuron analysis (Sec-
tion 2.3). CLAS rebalances partial-shared and
language-specific neurons, reducing over-reliance
on English-centric features and allowing more
target-language structure to influence predictions.
As a result, CLAS improves performance not by
pulling languages closer

Chunk 13 · 1,997 chars

hanges.
This pattern matches our neuron analysis (Sec-
tion 2.3). CLAS rebalances partial-shared and
language-specific neurons, reducing over-reliance
on English-centric features and allowing more
target-language structure to influence predictions.
As a result, CLAS improves performance not by
pulling languages closer to English, but by relax-
ing excessive alignment. The effect is larger on
XQuAD than on XNLI, likely because span extrac-
tion is more sensitive to surface form. Reducing
English alignment can help match target-language
expressions, though it can also increase variance
and occasional regressions.
Performance gains are driven by functional di-
vergence. We fit a linear regression to quantify
the correlation between these metrics. Figure 3
shows the scatter plot with fitted trend line. It an-
alyzes the trade-off between task performance im-
provement and the shift in alignment relative to
English. Contrary to the design of the CLAS mech-
anism, which explicitly aims to amplify shared
neurons (h·(1+βMshared)) and attenuate language-
specific ones (h · (1 − γMspec)), we observe an in-
verse relationship on the XNLI task. This anomaly
is likely attributable to a negative value for α,
which effectively reverses the polarity of the mod-
ulation. For both Llama and Qwen, larger perfor-
mance gains are often accompanied by a reduction
in cosine similarity to the English anchor (nega-
tive slope), suggesting that functional optimization
for reasoning tasks may require diverging from the
strict geometric space of the anchor language. In
contrast, the XQuAD benchmarks display a mixed
relationship, indicating that for extraction-based
tasks, the correlation between shared-neuron acti-
vation and geometric alignment is less predictable.
4.3 Representation Space Visualization
We apply t-SNE dimensionality reduction to visual-
ize the aggregated hidden states of the English an-
chor and all target languages in a shared 2D space,
and provide useful intuition about

Chunk 14 · 1,997 chars

orrelation between shared-neuron acti-
vation and geometric alignment is less predictable.
4.3 Representation Space Visualization
We apply t-SNE dimensionality reduction to visual-
ize the aggregated hidden states of the English an-
chor and all target languages in a shared 2D space,
and provide useful intuition about representational
structure. Representations are extracted from the
model’s penultimate layer for all languages. Fig-
ures 4 shows separate visualizations for before-
CLAS and after-CLAS conditions. Language cen-
troids are computed as the mean position of each
language’s embeddings. Visual analysis reveals
three distinct geometric phenomena.
CLAS reshapes representations without collaps-
ing them into English. The “After” visualiza-
tions confirm that the mechanism drives functional
divergence rather than assimilation. Language clus-
ters remain distinct and in some cases become more
clearly separated as seen in Figure 4. This suggests
that CLAS improves performance by reorganiz-
ing language-specific representation spaces, rather
than by enforcing assimilation into a single shared
geometry.
Moderate reorganization is associated with
larger gains. On XNLI, this reorganization ap-
pears broadly beneficial, as many languages im-
prove when their representations become more
structured and distinct. On XQuAD, the rela-

-- 6 of 11 --

−60 	−30 	0 	30 	60
t-SNE Dimension 1
−60
−30
0
30
60
t-SNE Dimension 2
ar
bg
de
el
en
es
fr	hi
ru
sw
th
tr
ur
vi
zh
−60 	−30 	0 	30 	60
t-SNE Dimension 1
−60
−30
0
30
60
t-SNE Dimension 2
ar
bg
de
el
en
es
fr
hi	ru
sw
th
tr 	ur
vi
zh
(a) XNLI Llama (Before vs After)
−30 	0 	30 	60
t-SNE Dimension 1
−60
−30
0
30
60
t-SNE Dimension 2
ar
bg
de
el
enesfr 	hi
ru
sw
th
tr
ur
vi
zh
−60 	−30 	0 	30 	60
t-SNE Dimension 1
−60
−30
0
30
60
t-SNE Dimension 2
ar
bg	
de
el
en
es 	fr
hi
ru
sw
th
tr
ur
vi
zh
 (b) XNLI Qwen (Before vs After)
−30 	0 	30
t-SNE Dimension 1
−30
0
30
t-SNE Dimension 2
ar 	de
elen	
es
hi
ro
ru
th
tr
vi
zh
−30 	0

Chunk 15 · 1,999 chars

30
60
t-SNE Dimension 2
ar
bg
de
el
enesfr 	hi
ru
sw
th
tr
ur
vi
zh
−60 	−30 	0 	30 	60
t-SNE Dimension 1
−60
−30
0
30
60
t-SNE Dimension 2
ar
bg	
de
el
en
es 	fr
hi
ru
sw
th
tr
ur
vi
zh
 (b) XNLI Qwen (Before vs After)
−30 	0 	30
t-SNE Dimension 1
−30
0
30
t-SNE Dimension 2
ar 	de
elen	
es
hi
ro
ru
th
tr
vi
zh
−30 	0 	30
t-SNE Dimension 1
−30
0
30
t-SNE Dimension 2
ar	
de el
en
es
hi	ro
ru th
tr
vi	
zh
(c) XQuAD Llama (Before vs After)
−30 	0 	30
t-SNE Dimension 1
−30
0
30
t-SNE Dimension 2
ar
de
el
en
es
hi
ro
ru
th
tr
vi
zh
−30 	0 	30 	60
t-SNE Dimension 1
−30
0
30
t-SNE Dimension 2
 	ar
de 	el
en
es
hi
ro
ru
th
tr
vi 	zh
 (d) XQuAD Qwen (Before vs After)
Figure 4: t-SNE visualization of cross-lingual representations before (left) and after (right) CLAS intervention.
Each point represents a sentence embedding, with colors denoting languages. English, marked with stars, is the
anchor language.
tionship is more selective: languages that start in
a mixed or ambiguous region and become more
clearly separated after CLAS tend to show larger
gains (e.g., de, ar, es on Llama; ru, ar, es on Qwen),
while languages that are already highly separated
show smaller changes. This suggests that CLAS is
most helpful when it resolves representational over-
lap, rather than when representations are already
well-formed.
Distance to English does not explain improve-
ments. Neither initial nor final proximity to
the English cluster reliably predicts performance
changes. Some languages far from English (e.g., ur,
hi on XNLI–Llama) improve substantially, while
some closer languages improve little or regress.
After CLAS, high-performing languages are dis-
tributed across the space rather than concentrated
near English. This indicates that CLAS effective-
ness depends on how representations are reorga-
nized internally, not simply on how close they are
to the anchor language.
4.4 Optimal Steering Intensity and Direction
We tune three parameters: β (boosting shared
neurons), γ (suppressing

Chunk 16 · 1,995 chars

space rather than concentrated
near English. This indicates that CLAS effective-
ness depends on how representations are reorga-
nized internally, not simply on how close they are
to the anchor language.
4.4 Optimal Steering Intensity and Direction
We tune three parameters: β (boosting shared
neurons), γ (suppressing language-specific neu-
rons), and α (overall steering strength and direc-
tion). A grid search over β ∈ {0.2, 0.4, 0.6} and
γ ∈ {0.1, 0.2, 0.4} shows that moderate values
work best. We select β = 0.4 and γ = 0.2, which
emphasize shared structure without introducing ex-
cessive noise. We tune α separately. A grid search
−8 	−6 	−4 	−2 	0 	2
40
45
50
55
60
65
Llama 	Qwen
Alpha
Accuracy
(a) XNLI
−4 	−2 	0 	2 	4
0.1
0.15
0.2
0.25
0.3
0.35
Llama 	Qwen
Alpha
F1 Score
 (b) XQuAD
Figure 5: Optimal value of α depends on the model and
the task.
0.2 	0.4 	0.6
41
42
43
44
45
γ (β=0)
β (γ=0)
Value
Accuracy
Figure 6: Impact of β and γ on XNLI accuracy. The
blue line represents the performance when varying β
(while fixing γ=0), and the red line represents varying
γ (while fixing β=0). In all experiments, α=1.
(Figure 5) on 200 samples per language per task
shows that the optimal magnitude and direction
of steering vary: some settings benefit from rein-
forcing shared representations (positive α), while
others benefit from amplifying language-specific
signals (negative α). This confirms that the balance
between shared semantics and language-specific
precision is task- and model-dependent, consistent
with prior work (Tang et al., 2024; Wang et al.,

-- 7 of 11 --

Question Llama CLAS
Was benannte die Generalkonferenz für Maß
und Gewicht 1960 nach Tesla?
SI-Maßeinheit der magnetischen Flussdichte.
stop. Die SI-Maßeinheit der magnetischen
Flussdichte
SI-Maßeinheit der mag-
netischen Flussdichte.
(What did the General Conference on Weights
and Measures name after Tesla in 1960?)
(SI unit of magnetic flux density. stop. The SI
unit of magnetic flux density)
(SI unit of

Chunk 17 · 1,986 chars

?
SI-Maßeinheit der magnetischen Flussdichte.
stop. Die SI-Maßeinheit der magnetischen
Flussdichte
SI-Maßeinheit der mag-
netischen Flussdichte.
(What did the General Conference on Weights
and Measures name after Tesla in 1960?)
(SI unit of magnetic flux density. stop. The SI
unit of magnetic flux density)
(SI unit of magnetic flux
density.)
Die Lutherbibel beeinflusste wessen englische
Übersetzung der Bibel?
William Tyndale Die Lutherbibel beeinflusste
die englische Bibel von William Tyndale. Tyn-
dale war ein eng
William Tyndale.
(The Luther Bible influenced whose English
translation of the Bible?)
(William Tyndale’s English Bible was influ-
enced by the Luther Bible. Tyndale was a close)
(William Tyndale.)
Table 3: Comparison of outputs for German XQuAD samples. English translations provided via Google Translate.
2024). Figure 6 further shows that relying exclu-
sively on either shared or language-specific neu-
rons is suboptimal; best performance comes from
a calibrated balance between the two.
4.5 Qualitative Analysis
Table 3 shows the qualitative impact of CLAS on
German generation. In the examples, the baseline
model suffers from severe repetition loops and ver-
bosity leakage. In contrast, CLAS successfully sup-
presses these behaviors. This demonstrates CLAS
effectively generates concise response while main-
taining accuracy.
5 Related Work
Cross-lingual Transfer Cross-lingual transfer in
multilingual LLMs has been widely studied, with
prior work showing that transfer quality varies
across tasks, languages, and models. For exam-
ple, Hu et al. (2025) analyze factors that influence
cross-lingual performance on reasoning tasks.
Many approaches improve transfer through ad-
ditional training. These include adding multi-
lingual data during instruction tuning (Shaham
et al., 2024), combining supervised fine-tuning
with preference alignment (Lai et al., 2024), and
constructing new multilingual pre-training datasets
(He et al., 2025). Other work uses

Chunk 18 · 1,995 chars

approaches improve transfer through ad-
ditional training. These include adding multi-
lingual data during instruction tuning (Shaham
et al., 2024), combining supervised fine-tuning
with preference alignment (Lai et al., 2024), and
constructing new multilingual pre-training datasets
(He et al., 2025). Other work uses translation-
based fine-tuning (Lee et al., 2025), layer-wise
fine-tuning (Bandarkar et al., 2025), or language-
specific adapters (Zhao et al., 2025). Continued
pre-training has also been shown to improve trans-
fer for some language pairs (Wu et al., 2025).
An alternative line of work uses prompts rather
than parameter updates. For example, Tanwar et al.
(2023) use multilingual in-context examples, and
Yoo et al. (2025) study in-context learning in code-
switching settings.
Neuron Behavior and Cross-lingual Transfer
Several studies examine cross-lingual transfer at
the neuron level. Huang et al. (2025) show that
activation similarity across languages is associ-
ated with better transfer. Other work identifies
language-specific and shared neurons (Tang et al.,
2024; Zhang et al., 2025; Tezuka and Inoue, 2025),
and shows that shared neurons often concentrate in
middle and upper layers (Xu et al., 2025).
Results on directly intervening on neurons are
mixed. Mondal et al. (2025) find that manip-
ulating language-specific neurons yields limited
gains, while Wang et al. (2024) show that neuron
roles vary by task and model. Our work differs in
that we apply test-time steering that blends rather
than overwrites activations, preserving representa-
tional structure. We show that the direction and
magnitude of steering matter, and that carefully
controlled neuron-level interventions can improve
cross-lingual transfer.
6 Conclusion
We presented CLAS, a training-free activation
steering method that improves cross-lingual trans-
fer by rebalancing shared and language-specific
neurons at inference time. CLAS improves per-
formance across both classification and

Chunk 19 · 1,998 chars

y
controlled neuron-level interventions can improve
cross-lingual transfer.
6 Conclusion
We presented CLAS, a training-free activation
steering method that improves cross-lingual trans-
fer by rebalancing shared and language-specific
neurons at inference time. CLAS improves per-
formance across both classification and generation
tasks, and our analysis shows that gains come from
functional divergence rather than forcing represen-
tations to align closely with the anchor language.
Languages benefit most by shifting from ambigu-
ous regions into distinct clusters, whereas initial
anchor proximity does not reliably predict success.
These patterns are task-dependent, highlighting the
need for task-aware cross-lingual methods. Fu-
ture work could explore adaptive steering strategies
that adjust intervention strength based on represen-
tational structure, better understand saturation ef-
fects in already well-separated clusters, and extend
CLAS to other modalities and training settings.

-- 8 of 11 --

Limitations
We evaluate CLAS on two multilingual instruction-
tuned LLMs (Qwen and Llama) and across multi-
ple multilingual benchmarks spanning NLI (XNLI)
and QA (XQuAD). However, our analysis and sug-
gested intervention are anchored to English i.e. we
quantify alignment shifts via cosine similarity to
an English anchor. This may not reflect behavior
under alternative references/anchors or truly lan-
guage pair-specific settings. Additionally, it is im-
portant to note that CLAS is not uniformly benefi-
cial: while average gains for languages are positive,
we also observe language-specific regressions (e.g,
Hindi, Greek). This indicates that test-time steering
can be brittle for certain model/language combina-
tions. Finally, our mechanistic analysis does not
isolate the role of attention heads or other circuit
components, limiting the granularity of causal at-
tribution.
Ethical Considerations
CLAS is a test-time activation intervention and can
change model behavior in

Chunk 20 · 1,994 chars

ing
can be brittle for certain model/language combina-
tions. Finally, our mechanistic analysis does not
isolate the role of attention heads or other circuit
components, limiting the granularity of causal at-
tribution.
Ethical Considerations
CLAS is a test-time activation intervention and can
change model behavior in ways that are not always
predictable across languages, including occasional
performance degradations. As with other steering
approaches, such interventions can be re-purposed
in undesirable ways (e.g, modulating output with-
out transparency). Thus, we recommend caution
and task-specific testing and validation before real-
world deployment. Our experiments use publicly-
available benchmarks and do not involve human
subjects or personal data.
References
Mikel Artetxe, Sebastian Ruder, and Dani Yogatama.
2019. On the cross-lingual transferability of mono-
lingual representations. CoRR, abs/1910.11856.
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui
Hou, Nayan Singhal, Hongjiang Lv, and Bing Liu.
2025. Layer swapping for zero-shot cross-lingual
transfer in large language models. In The Thirteenth
International Conference on Learning Representa-
tions.
Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Dong-
mei Zhang, and Jia Li. 2024a. Breaking language
barriers in multilingual mathematical reasoning: In-
sights and observations. In Findings of the Associa-
tion for Computational Linguistics: EMNLP 2024,
pages 7001–7016, Miami, Florida, USA. Association
for Computational Linguistics.
Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, An-
drey Kutuzov, Barry Haddow, and Kenneth Heafield.
2024b. Monolingual or multilingual instruction tun-
ing: Which makes a better alpaca. In Findings of the
Association for Computational Linguistics: EACL
2024, pages 1347–1356, St. Julian’s, Malta. Associa-
tion for Computational Linguistics.
Alexis Conneau, Ruty Rinott, Guillaume Lample, Ad-
ina Williams, Samuel R. Bowman, Holger Schwenk,
and Veselin Stoyanov. 2018. Xnli:

Chunk 21 · 1,997 chars

: Which makes a better alpaca. In Findings of the
Association for Computational Linguistics: EACL
2024, pages 1347–1356, St. Julian’s, Malta. Associa-
tion for Computational Linguistics.
Alexis Conneau, Ruty Rinott, Guillaume Lample, Ad-
ina Williams, Samuel R. Bowman, Holger Schwenk,
and Veselin Stoyanov. 2018. Xnli: Evaluating cross-
lingual sentence representations. In Proceedings of
the 2018 Conference on Empirical Methods in Natu-
ral Language Processing. Association for Computa-
tional Linguistics.
Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier
Lopez de Lacalle, and Mikel Artetxe. 2024. Do mul-
tilingual language models think better in English?
In Proceedings of the 2024 Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies
(Volume 2: Short Papers), pages 550–564, Mexico
City, Mexico. Association for Computational Lin-
guistics.
Changjiang Gao, Hongda Hu, Peng Hu, Jiajun Chen,
Jixing Li, and Shujian Huang. 2024. Multilingual pre-
training and instruction tuning improve cross-lingual
knowledge alignment, but only shallowly. In Pro-
ceedings of the 2024 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies (Volume
1: Long Papers), pages 6101–6117, Mexico City,
Mexico. Association for Computational Linguistics.
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri,
Abhinav Pandey, Abhishek Kadian, Ahmad Al-
Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten,
Alex Vaughan, and 1 others. 2024. The llama 3 herd
of models. arXiv preprint arXiv:2407.21783.
Kaiyu He, Tong Zhou, Yubo Chen, Delai Qiu, Sheng-
ping Liu, Kang Liu, and Jun Zhao. 2025. Semantic
pivots enable cross-lingual transfer in large language
models. arXiv preprint arXiv:2505.16385.
Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang,
Xue Han, Junlan Feng, Chao Deng, and Shujian
Huang. 2025. Large language models are cross-
lingual knowledge-free reasoners. In Proceedings of
the

Chunk 22 · 1,998 chars

and Jun Zhao. 2025. Semantic
pivots enable cross-lingual transfer in large language
models. arXiv preprint arXiv:2505.16385.
Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang,
Xue Han, Junlan Feng, Chao Deng, and Shujian
Huang. 2025. Large language models are cross-
lingual knowledge-free reasoners. In Proceedings of
the 2025 Conference of the Nations of the Americas
Chapter of the Association for Computational Lin-
guistics: Human Language Technologies (Volume 1:
Long Papers), pages 1525–1542, Albuquerque, New
Mexico. Association for Computational Linguistics.
Chongxuan Huang, Yongshi Ye, Biao Fu, Qifeng Su,
and Xiaodong Shi. 2025. From neurons to semantics:
Evaluating cross-linguistic alignment capabilities of
large language models via neurons alignment. In
Proceedings of the 63rd Annual Meeting of the As-
sociation for Computational Linguistics (Volume 1:
Long Papers), pages 28956–28974, Vienna, Austria.
Association for Computational Linguistics.

-- 9 of 11 --

Wen Lai, Mohsen Mesgar, and Alexander Fraser. 2024.
LLMs beyond English: Scaling the multilingual ca-
pability of LLMs with cross-lingual feedback. In
Findings of the Association for Computational Lin-
guistics: ACL 2024, pages 8186–8213, Bangkok,
Thailand. Association for Computational Linguistics.
Jungseob Lee, Seongtae Hong, Hyeonseok Moon, and
Heuiseok Lim. 2025. Cross-lingual optimization for
language transfer in large language models. In Pro-
ceedings of the 63rd Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 1: Long
Papers), pages 15100–15119, Vienna, Austria. Asso-
ciation for Computational Linguistics.
Soumen Kumar Mondal, Sayambhu Sen, Abhishek
Singhania, and Preethi Jyothi. 2025. Language-
specific neurons do not facilitate cross-lingual trans-
fer. In The Sixth Workshop on Insights from Negative
Results in NLP, pages 46–62, Albuquerque, New
Mexico. Association for Computational Linguistics.
Rhitabrat Pokharel, Yufei Tao, and Ameeta Agrawal.
2025. Capo: Confidence aware

Chunk 23 · 1,998 chars

d Preethi Jyothi. 2025. Language-
specific neurons do not facilitate cross-lingual trans-
fer. In The Sixth Workshop on Insights from Negative
Results in NLP, pages 46–62, Albuquerque, New
Mexico. Association for Computational Linguistics.
Rhitabrat Pokharel, Yufei Tao, and Ameeta Agrawal.
2025. Capo: Confidence aware preference optimiza-
tion learning for multilingual preferences. arXiv
preprint arXiv:2511.07691.
Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan
Szpektor, Reut Tsarfaty, and Matan Eyal. 2024. Mul-
tilingual instruction tuning with just a pinch of mul-
tilinguality. In Findings of the Association for Com-
putational Linguistics: ACL 2024, pages 2304–2317,
Bangkok, Thailand. Association for Computational
Linguistics.
Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu,
Xiang Liu, Xiang Geng, and Jiajun Chen. 2024.
MAPO: Advancing multilingual reasoning through
multilingual-alignment-as-preference optimization.
In Proceedings of the 62nd Annual Meeting of the
Association for Computational Linguistics (Volume 1:
Long Papers), pages 10015–10027, Bangkok, Thai-
land. Association for Computational Linguistics.
Tianyi Tang, Wenyang Luo, Haoyang Huang, Dong-
dong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei,
and Ji-Rong Wen. 2024. Language-specific neurons:
The key to multilingual capabilities in large language
models. In Proceedings of the 62nd Annual Meeting
of the Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 5701–5715, Bangkok,
Thailand. Association for Computational Linguistics.
Eshaan Tanwar, Subhabrata Dutta, Manish Borthakur,
and Tanmoy Chakraborty. 2023. Multilingual LLMs
are better cross-lingual in-context learners with align-
ment. In Proceedings of the 61st Annual Meeting of
the Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 6292–6307, Toronto,
Canada. Association for Computational Linguistics.
Qwen Team. 2024. Qwen2.5: A party of foundation
models.
Hinata Tezuka and Naoya Inoue. 2025. The transfer

Chunk 24 · 1,986 chars

with align-
ment. In Proceedings of the 61st Annual Meeting of
the Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 6292–6307, Toronto,
Canada. Association for Computational Linguistics.
Qwen Team. 2024. Qwen2.5: A party of foundation
models.
Hinata Tezuka and Naoya Inoue. 2025. The transfer neu-
rons hypothesis: An underlying mechanism for lan-
guage latent space transitions in multilingual LLMs.
In Proceedings of the 2025 Conference on Empiri-
cal Methods in Natural Language Processing, pages
31730–31780, Suzhou, China. Association for Com-
putational Linguistics.
Weixuan Wang, Barry Haddow, Minghao Wu, Wei
Peng, and Alexandra Birch. 2024. Sharing matters:
Analysing neurons across languages and tasks in llms.
arXiv preprint arXiv:2406.09265.
Linjuan Wu, Hao-Ran Wei, Huan Lin, Tianhao Li,
Baosong Yang, Fei Huang, and Weiming Lu. 2025.
Enhancing LLM language adaption through cross-
lingual in-context pre-training. In Proceedings of the
2025 Conference on Empirical Methods in Natural
Language Processing, pages 27140–27154, Suzhou,
China. Association for Computational Linguistics.
Yuemei Xu, Kexin Xu, Jian Zhou, Ling Hu, and Lin
Gui. 2025. Linguistic neuron overlap patterns to
facilitate cross-lingual transfer on low-resource lan-
guages. In Proceedings of the 2025 Conference on
Empirical Methods in Natural Language Processing,
pages 27646–27661, Suzhou, China. Association for
Computational Linguistics.
Haneul Yoo, Jiho Jin, Kyunghyun Cho, and Alice Oh.
2025. Code-switching in-context learning for cross-
lingual transfer of large language models. arXiv
preprint arXiv:2510.05678.
Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She,
Xiao Liu, Yeyun Gong, Shujian Huang, and Jiajun
Chen. 2025. How does alignment enhance llms’ mul-
tilingual capabilities? a language neurons perspective.
arXiv preprint arXiv:2505.21505.
Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji
Kawaguchi, and Lidong Bing. 2024a. How do large
language models handle

Chunk 25 · 1,995 chars

Shuaijie She,
Xiao Liu, Yeyun Gong, Shujian Huang, and Jiajun
Chen. 2025. How does alignment enhance llms’ mul-
tilingual capabilities? a language neurons perspective.
arXiv preprint arXiv:2505.21505.
Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji
Kawaguchi, and Lidong Bing. 2024a. How do large
language models handle multilingualism? In The
Thirty-eighth Annual Conference on Neural Informa-
tion Processing Systems.
Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji
Kawaguchi, and Lidong Bing. 2024b. How do large
language models handle multilingualism? Advances
in Neural Information Processing Systems, 37:15296–
15319.
Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji
Kawaguchi, and Lidong Bing. 2025. AdaMergeX:
Cross-lingual transfer with large language models via
adaptive adapter merging. In Proceedings of the 2025
Conference of the Nations of the Americas Chapter of
the Association for Computational Linguistics: Hu-
man Language Technologies (Volume 1: Long Pa-
pers), pages 9785–9800, Albuquerque, New Mexico.
Association for Computational Linguistics.
A Alignment vs. CLAS Performance
To further understand the relationship between
alignment and CLAS performance, we generate

-- 10 of 11 --

−0.25 	−0.20 	−0.15 	−0.10 	−0.05 	0.00 	0.05
Alignment Change
ur
zh
hi
el
ar
th
fr
de
ru
tr
es
bg
sw
vi
−2
0
2
4
6
CLAS Improvement
(a) XNLI - Llama
−0.25 	−0.20 	−0.15 	−0.10 	−0.05 	0.00 	0.05
Alignment Change
de
es
tr
bg
fr
hi
el
ar
ur
th
vi
ru
sw
zh 	0.1
0.2
0.3
0.4
0.5
0.6
0.7
CLAS Improvement
(b) XNLI - Qwen
−0.25 	−0.20 	−0.15 	−0.10 	−0.05 	0.00 	0.05
Alignment Change
de
es
vi
tr
ar
el
ro
ru
hi
zh
th 	0
1
2
3
4
5
6
CLAS Improvement
(c) XQuAD - Llama
−0.25 	−0.20 	−0.15 	−0.10 	−0.05 	0.00 	0.05
Alignment Change
ru
ar
zh
es
vi
th
ro
de
el
tr
hi 	−4
−2
0
2
4
6
8
CLAS Improvement
(d) XQuAD - Qwen
Figure 7: Heatmap showing alignment change vs.
CLAS improvement
heatmaps in Figure 7. Across all four settings, most
languages exhibit a negative alignment change, in-
dicating that

Chunk 26 · 1,025 chars

.20 	−0.15 	−0.10 	−0.05 	0.00 	0.05
Alignment Change
ru
ar
zh
es
vi
th
ro
de
el
tr
hi 	−4
−2
0
2
4
6
8
CLAS Improvement
(d) XQuAD - Qwen
Figure 7: Heatmap showing alignment change vs.
CLAS improvement
heatmaps in Figure 7. Across all four settings, most
languages exhibit a negative alignment change, in-
dicating that CLAS generally reduces the similarity
of non-English representations to English. At the
same time, many of these same languages show
positive performance improvements, as indicated
by darker shading.
Figure 8 presents the plots for cosine similarity
with English across langauges on each task using
the Qwen model.
ar 	bg 	de 	el 	es 	fr 	hi 	ru 	sw 	th 	tr 	ur 	vi 	zh	
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cosine Similarity to English
Before CLAS 	After CLAS
(a) XNLI-Qwen
ar 	de 	el 	es 	hi 	ro 	ru 	th 	tr 	vi 	zh	
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Cosine Similarity to English
Before CLAS 	After CLAS
(b) XQuAD-Qwen
Figure 8: Cosine similarity with English across lan-
guages on each model and task.

-- 11 of 11 --