The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It

Summary

This paper analyzes the linguistic diversity of Large Language Model (LLM) safety research, revealing a significant and widening English-centric bias. Through a systematic review of nearly 300 publications from 2020 to 2024 across major *ACL venues, the authors find that English-only research dominates, with even high-resource non-English languages like Mandarin receiving minimal attention. Non-English languages are rarely studied in isolation, often appearing only in broad multilingual evaluations that lack cultural depth. Furthermore, half of English safety studies fail to explicitly document their language coverage, implying a false universality. The authors argue that safety mechanisms do not generalize across languages due to cultural nuances, such as varying definitions of toxicity or taboo. Current evaluation metrics relying on averages obscure critical safety failures in specific languages, potentially deploying unsafe models globally. To address these gaps, the paper proposes three future directions: developing culturally grounded evaluation benchmarks that account for code-switching and local linguistic patterns; generating diverse, culturally contextualized synthetic training data using frameworks like Constitutional AI; and investigating crosslingual safety generalization through mechanistic interpretability and influence analysis. The study concludes that bridging this linguistic divide is essential for creating robust, inclusive AI safety practices for diverse global populations.

PDF viewer

Chunks(50)

Chunk 0 · 1,998 chars

arXiv:2505.24119v1 [cs.CL] 30 May 2025
The State of Multilingual LLM Safety Research:
From Measuring the Language Gap to Mitigating It
Zheng-Xin Yong1, Beyza Ermis2, Marzieh Fadaee2, Stephen H. Bach1,
and Julia Kreutzer2
1Brown University, 2Cohere Labs
Corresponding authors: Zheng-Xin Yong (contact.yong@brown.edu), Julia Kreutzer (juliakreutzer@cohere.com)
Abstract
This paper presents a comprehensive analysis of the linguistic diversity of LLM safety research,
highlighting the English-centric nature of the field. Through a systematic review of nearly 300
publications from 2020–2024 across major NLP conferences and workshops at ∗ACL, we identify a
significant and growing language gap in LLM safety research, with even high-resource non-English
languages receiving minimal attention. We further observe that non-English languages are rarely
studied as a standalone language and that English safety research exhibits poor language documen-
tation practice. To motivate future research into multilingual safety, we make several recommenda-
tions based on our survey, and we then pose three concrete future directions on safety evaluation,
training data generation, and crosslingual safety generalization. Based on our survey and proposed
directions, the field can develop more robust, inclusive AI safety practices for diverse global popu-
lations.
Content Warning: This paper contains examples of harmful language.
1 Introduction
The rapid advancement of large language models (LLMs) has transformed the artificial intelli-
gence landscape, enabling increasingly sophisticated capabilities across domains including health-
care [Singhal et al., 2023; Nazi & Peng, 2024; Singhal et al., 2025], education [Neumann et al., 2024;
Zhang et al., 2024b; Wen et al., 2024], and media content generation [Wang et al., 2023; Zhang
et al., 2024a; Barman et al., 2024]. As these powerful systems are deployed globally and used across
different linguistic communities [Tamkin et al., 2024], ensuring their

Chunk 1 · 1,997 chars

ghal et al., 2025], education [Neumann et al., 2024;
Zhang et al., 2024b; Wen et al., 2024], and media content generation [Wang et al., 2023; Zhang
et al., 2024a; Barman et al., 2024]. As these powerful systems are deployed globally and used across
different linguistic communities [Tamkin et al., 2024], ensuring their safe and secure operation across
diverse linguistic and cultural contexts has emerged as a critical research imperative. While sig-
nificant progress has been made in developing safety mechanisms for high-resource languages [Shi
et al., 2024a; Dong et al., 2024], particularly English, the multilingual dimensions of LLM safety
remain considerably underexplored. For example, all the public safety evaluation datasets reviewed
by Dong et al. [2024] include English content, with only two datasets being bilingual (English and
Chinese). This gap creates potentially dangerous blind spots in our safety frameworks and raises
fundamental questions about the equitable distribution of AI benefits and risks [Yong et al., 2023a;
Ermis et al., 2024; Aakanksha et al., 2024; Kanepajs et al., 2024; Bengio et al., 2025; Peppin et al.,
2025].
Multilingual LLM safety encompasses challenges that extend well beyond the simple translation of
existing safety techniques. Languages differ not only in their vocabulary and grammatical struc-
tures but also in their cultural connotations [Hoijer, 1954; Jiang, 2000; Everett, 2012; Kramsch, 2014;
Released as a preprint on June 10, 2025 1

-- 1 of 26 --

Categories 	Definitions 	Examples
Jailbreaking attacks Work on designing adversarial prompts to bypass
refusal safety guardrails or detecting jailbreaking
attacks
Zeng et al. [2024],
Wang et al. [2024c]
Toxicity and bias Work on toxic content and stereotypical bias in
training data and output generations
Zhu et al. [2024],
Kim et al. [2024]
Factuality and hallucination Work on nonsensical, unfaithful, and factually in-
correct content generated by LLMs
Pal & Sankara-
subbu [2024]
AI

Chunk 2 · 1,993 chars

et al. [2024],
Wang et al. [2024c]
Toxicity and bias Work on toxic content and stereotypical bias in
training data and output generations
Zhu et al. [2024],
Kim et al. [2024]
Factuality and hallucination Work on nonsensical, unfaithful, and factually in-
correct content generated by LLMs
Pal & Sankara-
subbu [2024]
AI privacy 	Work on memorization, private data leakage, and
unlearning
Dou et al. [2024],
Shi et al. [2024b]
Policy 	Work on governance frameworks, regulatory ap-
proaches, and ethical guidelines for responsible AI
deployment
Goanta et al. [2023]
LLM alignment Work that spans multiple subtopics above or is re-
lated to other LLM safety subtopics such as RLHF
alignment algorithms
Wang et al. [2024d],
Yang et al. [2024b]
Not related to safety Work that does not belong to any of the topics
above
Manino et al. [2022]
Table 1: Taxonomy for our LLM safety survey study.
Mazari & Derraz, 2015], metaphorical expressions [Saygin, 2001; Khoshtab et al., 2025], taboos [De-
waele, 2004], and social norms [Sridhar, 1996; Baquedano-López & Kattan, 2007; Fasya & Sari,
2021]. Therefore, content that is harmless in one cultural context may be deeply offensive or harm-
ful in another [Keipi et al., 2016; Ermis et al., 2024; Aakanksha et al., 2024; Korre et al., 2025],
or vice versa. For instance, in South-East Asia, the term “banana”—which connotes “yellow on the
outside, white on the inside”—is used to disparage people of Asian descent who are perceived as
forgoing their cultural identity and having adopted Western cultural values and behaviors [Khoo,
2003; Trieu, 2019]. On the other hand, the Chinese word 屌, which literally translates as “dick”, can
be used in both offensive (i.e., swear words) and non-offensive (i.e., an adjective to praise someone
who possesses a remarkable talent) settings [Carson & Jiang, 2021].
The wide disparity in language resources [Joshi et al., 2020; Nigatu et al., 2024]––from high-resource
languages like English, Mandarin, and Spanish to

Chunk 3 · 1,991 chars

used in both offensive (i.e., swear words) and non-offensive (i.e., an adjective to praise someone
who possesses a remarkable talent) settings [Carson & Jiang, 2021].
The wide disparity in language resources [Joshi et al., 2020; Nigatu et al., 2024]––from high-resource
languages like English, Mandarin, and Spanish to thousands of low-resource languages––creates
uneven safety landscapes with potentially severe consequences for marginalized linguistic commu-
nities. Several commercial LLMs have demonstrated significantly weaker safety performance when
prompted in non-English languages, producing harmful content and undesirable outputs that would
be filtered in English contexts [Yong et al., 2023a; Deng et al., 2024; Wang et al., 2024a; Al Ghanim
et al., 2024; Yoo et al., 2024; He et al., 2024; Shen et al., 2024; Nigatu & Raji, 2024; Poppi et al.,
2025; Aakanksha et al., 2024; Jain et al., 2024; Chan et al., 2025]. These disparities in safety
protections, combined with increasingly capable LLMs, risk magnifying societal harms within mul-
tilingual communities. While companies behind frontier LLMs have taken concerted efforts to
perform multilingual safety alignment training and red-teaming [Grattafiori et al., 2024; Cohere
et al., 2025; OpenAI, 2025], these initiatives remain limited in scope. For instance, among the top-
ranking LLMs on Chatbot Arena––a widely used leaderboard platform for evaluating LLMs through
user-submitted preference––20 of 24 of those that provide a system report have wide multilingual
support, but only 5 reported multilingual safety alignment training and red-teaming efforts. This
gap between multilingual deployment capabilities and safety alignment calls for further participation
2

-- 2 of 26 --

2020 	2021 	2022 	2023 	2024
Year
0
20
40
60
80
100
120
Number of Publications
6 	8 	11
26
118
1 	3 	2 8
35
English Only
Monolingual Non-English +
Multilingual
Figure 1: Trends of English-only and multilingual LLM safety publications in ∗ACL

Chunk 4 · 1,995 chars

lities and safety alignment calls for further participation
2

-- 2 of 26 --

2020 	2021 	2022 	2023 	2024
Year
0
20
40
60
80
100
120
Number of Publications
6 	8 	11
26
118
1 	3 	2 8
35
English Only
Monolingual Non-English +
Multilingual
Figure 1: Trends of English-only and multilingual LLM safety publications in ∗ACL conferences
and workshops over the past five years: the language gap in LLM safety research widens.
from both private enterprises and academia on multilingual safety alignment.
We perform a systematic review of nearly 300 LLM safety publications over the past five years
in ACL proceedings (Section 2), and we uncover a concerning trend: the vast majority of safety
research is centered on English-language models, while comparatively little work addresses safety
in non-English or multilingual contexts. This imbalance has become more pronounced over time.
Even Mandarin Chinese––the second most studied language––still has about ten times less research
than English. This disparity persists across multiple subdomains of safety research. Furthermore,
non-English languages are rarely studied as a standalone language but rather as part of broader
multilingual evaluations, which often lack the nuance and depth necessary to address language-
specific safety challenges and cultural contexts. Lastly, we discover that only half of English safety
research publications document the limitations of their language coverage.
These findings highlight critical gaps in the current landscape of LLM safety research and motivate
the need for more targeted efforts to address multilingual safety concerns. To help close this gap,
we outline three tractable directions for future multilingual safety work: (1) developing culturally
grounded evaluation benchmarks, (2) curating diverse multilingual safety training data, and (3)
deepening our understanding of alignment challenges across languages.
2 The Language Gap in LLM Safety Research
To understand the language gap in LLM safety

Chunk 5 · 1,995 chars

ons for future multilingual safety work: (1) developing culturally
grounded evaluation benchmarks, (2) curating diverse multilingual safety training data, and (3)
deepening our understanding of alignment challenges across languages.
2 The Language Gap in LLM Safety Research
To understand the language gap in LLM safety research, we systematically survey relevant papers
and analyze how safety research is distributed across languages and subtopics, as well as how non-
English language research is conducted and reported.
2.1 Methodology
We collect work related to LLM safety and manually annotate the languages studied in each paper,
along with their safety subtopic. To reduce human annotation efforts while ensuring that our
3

-- 3 of 26 --

findings reflect the overall trends in the field, we perform the following strategies:
1. Venue selection: We focus on all ∗ACL venues such as ACL and EMNLP, including both
conferences and workshops, as we believe they are the venues with the most linguistically
diverse NLP works compared to other venues such as ICLR, NeurIPS, and ICML.
2. Keyword filter: We filter the safety-related publications through keyword matching with
words “safe” and “safety” in paper abstracts. Using these two terms we get a good proxy for
the distribution of diverse LLM safety literature.
3. Manual categorization: We adopt a simplified taxonomy following Cui et al. [2024], which
is representative of the type of safety work published at ∗ACL, and we manually categorize
publications into seven different subtopics as shown in Table 1.
4. Language Documentation: We annotate the languages that each work addresses,1 and we
indicate if the language(s) studied are mentioned in the work. We group them into three
categories: monolingual English, monolingual non-English, and multilingual (covering two or
more languages).
Annotation Task Type Avg Std
Safety topic Categorical 0.83 0.19
Has non-English? Binary 0.81 0.15
Specifies languages? Binary 0.80 0.04
Covered

Chunk 6 · 1,939 chars

age(s) studied are mentioned in the work. We group them into three
categories: monolingual English, monolingual non-English, and multilingual (covering two or
more languages).
Annotation Task Type Avg Std
Safety topic Categorical 0.83 0.19
Has non-English? Binary 0.81 0.15
Specifies languages? Binary 0.80 0.04
Covered languages List 0.96 0.05
Table 2: Average and standard deviation of agree-
ment between four pairs of annotators. Agree-
ment on ‘language coverage’ is measured with Jac-
card similarity, and all other categories are mea-
sured with Cohen’s κ.
Annotations were manually performed by the
authors. In total, we annotated nearly 300
publications from year 2020 till year 2024. Of
these, 28% were false positives from our key-
word matching process (i.e., unrelated to LLM
safety), and were filtered out before we perform
further analysis.2
Table 2 reports the mean and standard de-
viation of pairwise inter-annotator agreement
scores on subsets of 20 repeated annotations.
We perform a 4 × 20 pairwise agreement study
across distinct subsets to maximize the repre-
sentativeness of our survey corpus and ensure
robust assessment of annotation consistency.
We find that inter-annotator agreement is consistently high, between 0.80 and 0.96 on average
per category, but we note that the annotations may still contain imperfections.
2.2 Findings
English-centricity of LLM safety research. Figure 1 highlights a stark language imbalance in
LLM safety research published at ∗ACL conferences and workshops over the past five years. The
data reveals a clear English-centric pattern that has persisted throughout this period. English-only
research dominates across all years, with a particularly dramatic increase in recent publications.
1If the languages studied were not explicitly mentioned, we followed up on their training and evaluation datasets
to identify the language coverage of the work.
2We release our annotations on

Chunk 7 · 1,995 chars

ut this period. English-only
research dominates across all years, with a particularly dramatic increase in recent publications.
1If the languages studied were not explicitly mentioned, we followed up on their training and evaluation datasets
to identify the language coverage of the work.
2We release our annotations on https://huggingface.co/CohereLabsCommunity/multilingual_safety_surve
y2025.
4

-- 4 of 26 --

0 5 10 15 20 25 30
Paper's Average Multilinguality
0
5
10
15
20
185
190
Frequency
eng
zho
tha
ara
guj,kan
vie swe
ita
fra
por
spa
rus
hin
kor
tur
ukr,zul,gla heb
ben
deu
jpn,ind
mar
nld swa
tel
afr,ast,cym,ell,
fas,hrv,lit,oci,tgl
fin
pol
cat bul,nep egy
Figure 2: Measure of how often a language is studied (“Frequency”) and the average number of
languages covered by all papers in which the language appear in (“Paper’s Average Multilinguality”).
The trend shows consistent underrepresentation of multilingual non-English research, with the gap
widening significantly over time. While both categories have grown as LLM safety has gained
prominence, the proportional imbalance remains. English-only publications have consistently out-
numbered multilingual and non-English work, and this absolute gap has widened over time, from
5 in 2020 to 83 in 2024. While both categories have grown, the increase is disproportionately
concentrated in English-only research.
Non-English languages are studied in herds. Another aspect of the marginalization of non-
English languages is that they are often addressed as part of large multilingual evaluations, rather
than studied in depth on their own. In many cases, breadth is prioritized over depth, and multi-
lingual studies are preferred over focused analyses of monolingual ones.3 This is shown in Figure 2
which provides a detailed breakdown of how frequently a language is studied (y-axis) and how often
it is studied alongside other languages (x-axis). English (eng) exhibits overwhelming dominance
with a frequency nearly ten

Chunk 8 · 1,998 chars

gual studies are preferred over focused analyses of monolingual ones.3 This is shown in Figure 2
which provides a detailed breakdown of how frequently a language is studied (y-axis) and how often
it is studied alongside other languages (x-axis). English (eng) exhibits overwhelming dominance
with a frequency nearly ten times higher than Chinese (zho)––the second most studied language.
However, English is primarily studied in isolation, resulting in a low average multilinguality score.
In contrast, languages with moderate representation like Chinese (zho), Arabic (ara) and Span-
ish (spa) appear primarily in multilingual studies, suggesting that deeper, language-specific safety
analyses remain limited even for widely spoken languages. This trend is even more noticeable for
under-resourced languages such as Swahili (swa) and Telugu (tel), and especially for languages
at the extreme end of the multilingualism spectrum such as Afrikaans (afr), which appears only
in a single paper that covers approximately 30 languages [Guerreiro et al., 2023]. Such inclusion
severely limits the possibility for language-specific safety analysis and gaining meaningful insights.
We commend focused analysis on individual lower-resource languages such as Nakov et al. [2021]
and Niraula et al. [2021], who specifically study disinformation and offensive language detection in
3Since our study only captures published papers, we might be missing out on rejected works. There may be a
reviewer preference for multilingual over monolingual non-English papers.
5

-- 5 of 26 --

LLM alignment
Jailbreaking	
attacks
Toxicity
and bias
Hallucination
and factuality
Privacy	
Policy
0
20
40
60
Count
Monolingual
(English)
Monolingual
(Non-English)
Multilingual
(a) Topic distribution
0.0 	20.0 	40.0 	60.0 	80.0 	100.0
Proportion (%)
Multilingual
Monolingual
(Non-English)
Monolingual
(English)
80.8%
75.0%
83.0%
19.2%
25.0%
17.0%
Conferences 	Workshops
 (b) Venue distribution
Figure 3: Distribution of LLM safety

Chunk 9 · 1,997 chars

ount
Monolingual
(English)
Monolingual
(Non-English)
Multilingual
(a) Topic distribution
0.0 	20.0 	40.0 	60.0 	80.0 	100.0
Proportion (%)
Multilingual
Monolingual
(Non-English)
Monolingual
(English)
80.8%
75.0%
83.0%
19.2%
25.0%
17.0%
Conferences 	Workshops
 (b) Venue distribution
Figure 3: Distribution of LLM safety publications by (a) safety subtopics and (b) publication venues.
Bulgarian and Nepali social media, respectively.
Disparities in subtopics of safety. Breaking down LLM safety publications by specific safety
subtopics in Figure 3(a), we find that English-centricity persists across all domains, with English-
only publications substantially outnumbering multilingual work in every category. LLM alignment
and jailbreaking attacks demonstrate the most pronounced disparities, suggesting that these critical
safety areas receive particularly limited cross-linguistic attention. In particular, LLM alignment
work involving evaluation [Yuan et al., 2024; Hua et al., 2024; Hammoud et al., 2024; Gabriel
et al., 2024] and algorithmic improvement [Zhou et al., 2024; Hassan et al., 2024] would benefit
from further research with expanded language coverage. Toxicity and bias research shows a similar
pattern despite being a domain where cultural and linguistic variations are especially relevant [Costa-
jussà et al., 2023b; Tao et al., 2024; Devinney et al., 2024; Bhutani et al., 2024]. The near absence of
multilingual work in privacy and policy domains indicates these emerging safety concerns are being
conceptualized almost exclusively through an English-language framework, potentially overlooking
important cultural and legal variations that exist across different linguistic contexts [Larsen &
Dignum, 2024].
Valuable role of workshops. Figure 3 (b) reveals an interesting pattern in the distribution of LLM
safety publications across venue types. While conferences dominate across all language categories,
monolingual non-English safety papers are 46% relatively more likely

Chunk 10 · 1,994 chars

fferent linguistic contexts [Larsen &
Dignum, 2024].
Valuable role of workshops. Figure 3 (b) reveals an interesting pattern in the distribution of LLM
safety publications across venue types. While conferences dominate across all language categories,
monolingual non-English safety papers are 46% relatively more likely to appear in workshops than
English-only papers, highlighting the valuable accessibility that workshops offer for this line of
work. This suggests that non-English safety research faces a higher barrier to entry at prestigious
conferences, whereas workshops, such as Workshop on Gender Bias in Natural Language Processing
(GeBNLP) and Workshop on Safety for Conversational AI (Safety4ConvAI), serve as more accessible
venues for disseminating non-English safety research. The pattern indicates that, beyond the overall
English-centricity of safety research documented in previous figures, additional structural factors
may be affecting how non-English safety work is evaluated and disseminated within the community.
Language documentation practice differs for English-only research. We argue that it is
important for LLM safety research to explicitly document the languages studied (also known as
Bender’s rule [Bender, 2011; 2019]) for two key reasons. (1) Safety alignment does not necessarily
generalize across languages [Yong et al., 2023a; Wang et al., 2024b; Yoo et al., 2024; Al Ghanim
et al., 2024]. Clearly stating which languages were included enables future researchers to under-
stand the specific linguistic contexts in which safety findings have been validated. (2) By explicitly
acknowledging language limitations, the field can more accurately measure progress in expanding
6

-- 6 of 26 --

Chunk 11 · 1,996 chars

tely measure progress in expanding
6

-- 6 of 26 --

safety coverage across languages, thus encouraging a more equitable distribution of safety research
to serve a broader range of global populations.
Category Does the paper mention languages studied?
No (↓) Yes (↑)
Mono. English 50.6% 49.4%
Mono. Non-English 0.0% 100.0%
Multilingual 0.0% 100.0%
Table 3: Proportion of language documentation
practice among LLM safety publications.
Based on the data presented in Table 3, we
observe substantially different patterns in lan-
guage documentation practices across LLM
safety publications. English-only publications
show a concerning trend with 50.6% failing to
explicitly name the language studied – in other
words, “English” is not mentioned throughout
the paper. In contrast, both non-English mono-
lingual and multilingual publications demon-
strate full compliance, with 100% explicitly documenting the languages studied. This disparity
highlights a systematic bias in reporting practices, where English-centered research often proceeds
under an implicit assumption of universality, whereas non-English research demonstrates greater
methodological transparency.
2.3 Moving Forward for ∗ACL Venues
Our survey reveals that English safety research remains overwhelmingly dominant in nearly every
dimension—publication volume, topical coverage, methodological reporting, and conference visibil-
ity. Nonetheless, Figure 1 shows an encouraging trend of growing multilingual safety research over
time. One concrete and low-effort step toward improving documentation is integrating language
coverage reporting into ∗ACL proceedings. OpenReview submissions already include a metadata
field where authors can indicate the languages studied, but this information is currently private.
Making this metadata public would allow for more transparent tracking of linguistic representation
and support future meta-analyses of multilingual research, particularly in the context of LLM safety.
Addressing

Chunk 12 · 1,998 chars

metadata
field where authors can indicate the languages studied, but this information is currently private.
Making this metadata public would allow for more transparent tracking of linguistic representation
and support future meta-analyses of multilingual research, particularly in the context of LLM safety.
Addressing the deeper structural imbalance in language and topic representation will require long-
term efforts. We believe that conference and workshop organizers can provide incentive structures to
address this systemic imbalance, such as special conference theme tracks dedicated to multilingual
safety subtopics and/or creating shared workshop tasks on multilingual safety benchmarks. These
initiatives could meaningfully expand the scope and visibility of research beyond English, helping
the community better serve diverse user populations.
3 Future Research Directions for Multilingual LLM Safety
In addition to providing recommendations to ∗ACL organizers, we propose several key research
priorities for researchers and model developers to advance multilingual LLM safety alignment.
3.1 Safety Evaluation for Multilingual Models
Moving beyond average safety criterion. Traditional evaluation metrics focus on average
performance across languages, for which the model that maximizes the uniformly weighted average
across tasks and languages is considered best. However, this criterion is susceptible to outliers (e.g.,
due to unsupported languages) and not suitable for comparing models with different language and
7

-- 7 of 26 --

Models 	en zh fr ru de ar hi es ja bn Average ↑ Worst Case∗ ↑
ChatGPT [OpenAI, 2022] 99.0 91.9 86.3 87.5 85.3 90.8 81.7 91.5 79.0 62.6 85.56 	62.6
PaLM-2 [Anil et al., 2023] 89.7 78.4 84.6 85.9 83.6 82.6 83.0 85.7 70.1 78.1 82.17 	70.1
Llama-2 [Touvron et al., 2023] 85.4 73.5 83.2 82.3 82.0 - 63.5 79.3 71.0 - 77.53 	63.5
Vicuna [Chiang et al., 2023] 94.0 89.4 90.6 83.3 88.3 43.4 36.8 88.8 60.2 18.4 69.32 	18.4 (!)
Table 4: Harmlessness scores of

Chunk 13 · 1,996 chars

62.6 85.56 	62.6
PaLM-2 [Anil et al., 2023] 89.7 78.4 84.6 85.9 83.6 82.6 83.0 85.7 70.1 78.1 82.17 	70.1
Llama-2 [Touvron et al., 2023] 85.4 73.5 83.2 82.3 82.0 - 63.5 79.3 71.0 - 77.53 	63.5
Vicuna [Chiang et al., 2023] 94.0 89.4 90.6 83.3 88.3 43.4 36.8 88.8 60.2 18.4 69.32 	18.4 (!)
Table 4: Harmlessness scores of different models across 10 languages, based on the results from
[Wang et al., 2024a]. We augment the original table with a new "Worst Case" column for the lowest
harmlessness score. We use bold text to indicate the cases where average score is not necessarily
aligned with worst-case score, and we use red text and exclamation mark to indicate how not
reporting Worst-Case score can create a false sense of safety.
task support [Kreutzer et al., 2025]. In the context of multilingual safety, where reporting average
scores is the norm [Guan et al., 2024], this matters even more since averaging might obscure critical
safety failures.
To illustrate this blind spot, we add the additional worst-case harmlessness score metric to an ACL
2024 paper Wang et al. [2024a] and report the results in Table 4. The table reveals two findings.
First, if the winner were chosen based solely on the highest average harmlessness score, it would
be ChatGPT [OpenAI, 2022], with a score of 85.56. However, its worst-case score (i.e., the lowest
harmlessness score across languages) is only 62.6. This is notably lower than the worst-case score
of PaLM-2 (70.1), despite PaLM-2 [Anil et al., 2023] having a lower average score (82.17). This
discrepancy highlights that strong average performance does not necessarily reflect robustness in
the worst-case scenarios. Second, and more importantly, despite a high average harmlessness score,
Vicuna’s [Chiang et al., 2023] worst-case harmlessness score is just 18.4 due to unsafe behaviour
in Bengali (bn). This suggests that relying on average metrics alone may create a false sense of
safety, potentially leading to the deployment of models like

Chunk 14 · 1,991 chars

nd more importantly, despite a high average harmlessness score,
Vicuna’s [Chiang et al., 2023] worst-case harmlessness score is just 18.4 due to unsafe behaviour
in Bengali (bn). This suggests that relying on average metrics alone may create a false sense of
safety, potentially leading to the deployment of models like Vicuna in languages where they produce
harmful content. In future work, we believe that, in addition to reporting worst-case performance
to ensure that models meet fundamental safety thresholds across all languages, researchers should
explore designing adaptive thresholding mechanisms that establish language-specific safety baselines
according to their unique cultural contexts and user groups.
Wider language coverage in evaluation. We observe that current multilingual red-teaming
practice mostly focuses on languages that models are finetuned on during post-pretraining processes,
such as instruction-following and alignment finetuning [Üstün et al., 2024; Grattafiori et al., 2024].
Given that language contamination in pretraining can facilitate crosslingual transfer [Blevins &
Zettlemoyer, 2022], it raises valid concerns about whether exempting certain languages from the
safety evaluation of multilingual LLMs is justified. Language exemptions risk creating blind spots
in safety assessments precisely where they might be most needed, as models can bypass safety
guardrails when prompted in languages underrepresented in pretraining [Shen et al., 2024]. For
instance, the Llama-3 model report presents red-teaming results for only eight languages (six of
which are high-resource) [Grattafiori et al., 2024]. Yet the strong multilingual model has been
adapted for languages not covered in its safety evaluation, such as Indonesian [Huang et al., 2024b].
We urge researchers to develop more sophisticated evaluation protocols that can detect and account
for potential contamination and to issue disclaimers when safety alignment has not been conducted
in certain

Chunk 15 · 1,993 chars

has been
adapted for languages not covered in its safety evaluation, such as Indonesian [Huang et al., 2024b].
We urge researchers to develop more sophisticated evaluation protocols that can detect and account
for potential contamination and to issue disclaimers when safety alignment has not been conducted
in certain languages. This would help ensure that speakers of those languages are aware of poten-
tial risks. Such transparency would allow communities to make informed decisions about model
8

-- 8 of 26 --

deployment while encouraging greater accountability from developers to expand alignment efforts
to underserved languages.
Incorporate diverse and natural linguistic patterns. We believe evaluating multilingual safety
requires a fundamental shift away from treating evaluation as merely adding more languages to ex-
isting benchmarks, as they should incorporate linguistic patterns used by real-life speakers. One
case study is code-switching––the communication pattern of alternating between languages within
a single utterance [Nilep, 2006; Gardner-Chloros, 2009; Winata et al., 2023]––which is shown to be
able to jailbreak multilingual safety guardrails [Yoo et al., 2024; Yang et al., 2024a; Song et al., 2025].
Another example is Al Ghanim et al.’s [2024] discovery that while LLMs remain safe in standardized
Arabic scripts, they are jailbroken when Arabic inputs are written in Arabizi form––a system of
writing Arabic using English characters and commonly used among native speakers communicat-
ing digitally [Yaghan, 2008]. These examples show that current safety evaluation frameworks that
predominantly evaluate languages in a monolingual setting fail to capture the complex reality of
multilingual communication. Future work on multilingual red-teaming should develop a method-
ology that systematically accounts for diverse multilingual multi-turn interactions among users [Li
et al., 2025] to ensure that models remain safe across the full spectrum of real-world

Chunk 16 · 1,994 chars

fail to capture the complex reality of
multilingual communication. Future work on multilingual red-teaming should develop a method-
ology that systematically accounts for diverse multilingual multi-turn interactions among users [Li
et al., 2025] to ensure that models remain safe across the full spectrum of real-world usage patterns
rather than just in artificial monolingual test scenarios.
3.2 Culturally-Contextualized Synthetic Training Data
Collecting labeled training data for LLM safety alignment can be resource-intensive, and many
English-centric research has turned to using synthetic data generation [Bai et al., 2022; Kruschwitz
& Schmidhuber, 2024; Samvelyan et al., 2024]. However, exploration of multilingual synthetic
safety data has been relatively underexplored. Here, we propose two viable future research di-
rections based on constitutional AI framework [Bai et al., 2022; Kundu et al., 2023] for cultural
contextualization [Qiu et al., 2025; Guo et al., 2025; Qiu et al., 2024; Yin et al., 2024]
LLM Generation. Under constitutional AI framework, LLMs are first prompted to generate
harmful (or harmless texts). They are then presented with a set of human-written principles that
capture culture-specific harms so that they can engage in a multi-turn process of critiquing and
revising originally harmful generations to harmless generations (or vice versa), to create culture-
specific preference pairs for alignment training. Enabling constitutional AI for multilingual and
multicultural alignment data generation requires close collaboration among linguists, cultural an-
thropologists and AI researchers to co-create three key components: (1) culturally-informed consti-
tutional principles that reflect diverse value systems and ethical frameworks across different societies
[Kirk et al., 2024; Pistilli et al., 2025]; (2) sufficiently capable multilingual LLMs that can both un-
derstand these principles and generate high-quality content in target languages [Qin et

Chunk 17 · 1,997 chars

ally-informed consti-
tutional principles that reflect diverse value systems and ethical frameworks across different societies
[Kirk et al., 2024; Pistilli et al., 2025]; (2) sufficiently capable multilingual LLMs that can both un-
derstand these principles and generate high-quality content in target languages [Qin et al., 2024;
Huang et al., 2024a]; and (3) evaluation protocols involving native speakers and cultural experts
to validate both the constitutional principles and the resulting synthetic data [Kyrychenko et al.,
2025]. This direction offers a pathway toward scalable, culturally grounded alignment practices that
make LLM safety more inclusive and globally relevant.
Machine Translation. Machine translation (MT) often fails to capture or preserve culture-specific
harms and may introduce undesirable societal biases such as gender stereotyping [Savoldi et al.,
2021; Ahn et al., 2022; Wang et al., 2022; Costa-jussà et al., 2023a;c]. The iterative refinement
process from the constitutional AI framework can detect and mitigate translation artifacts that
9

-- 9 of 26 --

might inadvertently encode harmful content or lose important cultural nuances. Unlike direct LLM
generation, this approach can take advantage of the decades-long research in MT, especially on
cross-cultural adaptation studies [Maxwell et al., 1996; de Lima Barroso et al., 2018; Gorecki et al.,
2014; Mbada et al., 2015; Pilz et al., 2014]. Future work should focus on developing automated
methods to identify culture-specific safety issues that might be lost in translation, especially for
languages with limited digital presence and linguistic resources.
3.3 Towards Understanding Crosslingual Safety Generalization
Most existing safety alignment data are centered on English or Chinese [Röttger et al., 2025; Costa-
jussà et al., 2024; Plaza-del Arco et al., 2024]. It is important to understand how safety alignment
generalizes across languages, so the model developers can anticipate potential failure

Chunk 18 · 1,992 chars

slingual Safety Generalization
Most existing safety alignment data are centered on English or Chinese [Röttger et al., 2025; Costa-
jussà et al., 2024; Plaza-del Arco et al., 2024]. It is important to understand how safety alignment
generalizes across languages, so the model developers can anticipate potential failure modes when
alignment training data lack language coverage.
Mechanistic interpretability. This scientific approach of reverse-engineering neural networks to
understand precisely how they process information at the circuit and component levels–––allows
researchers to characterize mechanisms that enable or prevent safety alignment knowledge transfer.
We believe this research direction is particularly helpful in explaining several phenomena, such as
why detoxification and debiasing can transfer effectively across languages [Li et al., 2024; Reusens
et al., 2023] but not refusal training Shen et al. [2024]; Aakanksha et al. [2024]; Wang et al. [2025],
or to what extent safety alignment is preserved after language adaptation to underrepresented
languages [Yong et al., 2023b; Lin et al., 2024; Ji et al., 2024]. Insights from this research direction
can inspire novel training techniques that facilitate zero-shot crosslingual generalization of alignment
training and maintain safety consistency as language coverage expands.
Training Data Influence Analysis. We also recommend exploring the use of influence functions
[Grosse et al., 2023; Ruis et al., 2025] to study crosslingual alignment. This technique enables re-
searchers to trace how specific training examples causally affect model behavior during generation.
Training data influence analysis offers a valuable complement to mechanistic approaches for inves-
tigating two key open questions. For crosslingual generalization, it can help quantify how safety-
relevant examples—especially those from high-resource versus low-resource languages—contribute
to harmful or aligned outputs. For language adaptation,

Chunk 19 · 1,995 chars

sis offers a valuable complement to mechanistic approaches for inves-
tigating two key open questions. For crosslingual generalization, it can help quantify how safety-
relevant examples—especially those from high-resource versus low-resource languages—contribute
to harmful or aligned outputs. For language adaptation, influence functions can identify prob-
lematic documents within the continued pretraining corpus, enabling more targeted curation of
safer language-specific data. To our knowledge, there is currently very limited work on analyzing
training-example-to-output relationships for multilingual safety-relevant behaviors. This presents a
promising and underexplored direction for improving alignment practices across languages.
4 Related Work and Discussion
Our work contrasts prior survey literature on multilingual NLP [Joshi et al., 2020; Pamungkas
et al., 2023; Yadav & Sitaram, 2022; Winata et al., 2023; Huang et al., 2024a; Qin et al., 2024;
Wu et al., 2025] by focusing on LLM safety. The limitations we identify align with concerns by
Blasi et al. [2022] regarding systematic inequalities in language technology, which privileges certain
sociolinguistic groups through choices in data collection, annotation protocols, and evaluation. Our
findings suggest these inequalities may be even more pronounced in safety research, where cultural
and linguistic nuances significantly impact harm and mitigation strategies.
10

-- 10 of 26 --

Recent efforts to catalog LLM safety research challenges [Barez et al., 2025; Debar et al., 2024;
Anwar et al., 2024] have primarily centered on threats identified through English-language models,
often overlooking multilingual aspects. This gap, along with our survey findings, echoes the “square-
one bias” phenomenon [Ruder et al., 2022]: When NLP researchers moves beyond optimizing for
usefulness (e.g., accuracy), their study is often only conducted in a single direction of either safety,
interpretability, or multilinguality. This

Chunk 20 · 1,998 chars

ilingual aspects. This gap, along with our survey findings, echoes the “square-
one bias” phenomenon [Ruder et al., 2022]: When NLP researchers moves beyond optimizing for
usefulness (e.g., accuracy), their study is often only conducted in a single direction of either safety,
interpretability, or multilinguality. This siloed approach means that progress in one dimension
rarely informs the others, resulting in a fragmented research landscape where multilingual LLM
safety research remains underdeveloped.
5 Conclusion
Our analysis of nearly 300 publications (2020-2024) reveals a significant language gap in LLM safety
research, with even high-resource non-English languages receiving minimal attention and typically
appearing only in multi-language studies that lack the depth of English-focused work. This linguistic
imbalance potentially leaves language-specific risks undetected as LLMs deploy globally. To address
these disparities, we make recommendations to future conferences and highlight several critical
future research directions.
Limitations
Coverage of venues Due to the focus on ∗ACL venues, we might have missed out on relevant
multilingual safety works that are either not peer reviewed (yet) or published in other venues, such
as ML conferences and workshops. Since it is a very fast moving field, the state of the field described
in this paper represents a snapshot in time. We hope that if we ran an analysis like this in a year’s
time, the data would hopefully paint a more optimistic picture.
Annotation accuracy Inaccuracies in our annotations might have introduced imprecision in our
measurements of the language gap. From our analysis of the inter-annotator agreement, we suspect
that this would foremost affect the categorization of safety research topics, as the labels for these
categories carry the most ambiguity. When papers do not state language coverage very prominently,
such in the abstract or introduction or the experimental setup, it might lead to oversight

Chunk 21 · 1,996 chars

otator agreement, we suspect
that this would foremost affect the categorization of safety research topics, as the labels for these
categories carry the most ambiguity. When papers do not state language coverage very prominently,
such in the abstract or introduction or the experimental setup, it might lead to oversight in the
annotation (reducing recall in annotations), depending how deeply an annotator reads the paper.
However, we observe that especially those works that are investigating multilinguality in LLM safety
as a primary angle, do state it explicitly, so we are confident we did not miss these.
Research directions We highlight three prominent future directions for multilingual safety re-
search in our work, but we believe there are many other directions that are equally important for
advancing safety and security of LLMs in global deployment. These include work on AI governance
[Reuel et al., 2024], AI auditing [Birhane et al., 2024; Ojewale et al., 2025], hate speech detection
[Nozza, 2021], multimodal AI safety [Dash et al., 2025; Ji et al., 2025], algorithmic designs [Zhao
et al., 2025], reasoning models [Yong et al., 2025; Guan et al., 2024], etc. Fundamentally, our work
illuminates the substantial language disparity within current LLM safety research. Therefore, as re-
searchers pursue diverse research directions on LLM safety, efforts on bridging this linguistic divide
must remain central to ensuring equitable safeguards across the world’s languages.
11

-- 11 of 26 --

Acknowledgements and Disclosure
We are grateful for feedback from Hellina Hailu Nigatu, M Saiful Bari, Cristina Menghini, Alham
Fikri Aji, Pedro Ortiz Suarez, Victor Ojewale, Simran Khanuja, Arianna Muti, Debora Nozza,
Srishti Yadav, Jonas Kgomo, and Catherine Arnett on the early draft of our work. Disclosure:
Stephen Bach is an advisor to Snorkel AI, a company that provides software and services for data-
centric artificial intelligence.
References
Aakanksha, Arash Ahmadian, Beyza

Chunk 22 · 1,993 chars

jewale, Simran Khanuja, Arianna Muti, Debora Nozza,
Srishti Yadav, Jonas Kgomo, and Catherine Arnett on the early draft of our work. Disclosure:
Stephen Bach is an advisor to Snorkel AI, a company that provides software and services for data-
centric artificial intelligence.
References
Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh
Fadaee, and Sara Hooker. The multilingual alignment prism: Aligning global and local preferences
to reduce harm. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Proceedings of
the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 12027–12049,
Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi: 10.18653
/v1/2024.emnlp-main.671. URL https://aclanthology.org/2024.emnlp-main.671/.
Jaimeen Ahn, Hwaran Lee, Jinhwa Kim, and Alice Oh. Why knowledge distillation amplifies gender
bias and how to mitigate from the perspective of DistilBERT. In Christian Hardmeier, Christine
Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen (eds.), Proceedings of the 4th
Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp. 266–272, Seattle,
Washington, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.gebnl
p-1.27. URL https://aclanthology.org/2022.gebnlp-1.27/.
Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan Solihin, and Qian Lou. Jail-
breaking LLMs with Arabic transliteration and Arabizi. In Yaser Al-Onaizan, Mohit Bansal,
and Yun-Nung Chen (eds.), Proceedings of the 2024 Conference on Empirical Methods in Nat-
ural Language Processing, pp. 18584–18600, Miami, Florida, USA, November 2024. Associ-
ation for Computational Linguistics. doi: 10.18653/v1/2024.emnlp- main.1034. URL
https://aclanthology.org/2024.emnlp-main.1034/.
Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos,
Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. Palm 2

Chunk 23 · 1,994 chars

SA, November 2024. Associ-
ation for Computational Linguistics. doi: 10.18653/v1/2024.emnlp- main.1034. URL
https://aclanthology.org/2024.emnlp-main.1034/.
Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos,
Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. Palm 2 technical report.
arXiv preprint arXiv:2305.10403, 2023.
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase,
Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, et al. Foundational chal-
lenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932,
2024.
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones,
Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai:
Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
Patricia Baquedano-López and Shlomy Kattan. Growing up in a multilingual community: Insights
from language socialization. Handbook of multilingualism and multilingual communication, 5:
69–99, 2007.
Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan
12

-- 12 of 26 --

O’Gara, Robert Kirk, Ben Bucknall, Tim Fist, et al. Open problems in machine unlearning for
ai safety. arXiv preprint arXiv:2501.04952, 2025.
Dipto Barman, Ziyi Guo, and Owen Conlan. The dark side of language models: Exploring the
potential of llms in multimedia disinformation generation and dissemination. Machine Learning
with Applications, pp. 100545, 2024.
Emily Bender. The #benderrule: On naming the languages we study and why it matters. The
Gradient, 2019.
Emily M Bender. On achieving and evaluating language-independence in nlp. Linguistic Issues in
Language Technology, 6, 2011.
Yoshua Bengio, Sören Mindermann, and Daniel Privitera. International ai safety report 2025. 2025.
Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, and Sunipa Dev. SeeG-
ULL

Chunk 24 · 1,997 chars

9.
Emily M Bender. On achieving and evaluating language-independence in nlp. Linguistic Issues in
Language Technology, 6, 2011.
Yoshua Bengio, Sören Mindermann, and Daniel Privitera. International ai safety report 2025. 2025.
Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, and Sunipa Dev. SeeG-
ULL multilingual: a dataset of geo-culturally situated stereotypes. In Lun-Wei Ku, Andre Mar-
tins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers), pp. 842–854, Bangkok, Thailand, August
2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-short.75. URL
https://aclanthology.org/2024.acl-short.75/.
Abeba Birhane, Ryan Steed, Victor Ojewale, Briana Vecchione, and Inioluwa Deborah Raji. Ai
auditing: The broken bus on the road to ai accountability. In 2024 IEEE Conference on Secure
and Trustworthy Machine Learning (SaTML), pp. 612–643. IEEE, 2024.
Damian Blasi, Antonios Anastasopoulos, and Graham Neubig. Systematic inequalities in language
technology performance across the world‘s languages. In Smaranda Muresan, Preslav Nakov,
and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pp. 5486–5505, Dublin, Ireland, May 2022.
Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.376. URL https:
//aclanthology.org/2022.acl-long.376/.
Terra Blevins and Luke Zettlemoyer. Language contamination helps explains the cross-lingual
capabilities of English pretrained models. In Yoav Goldberg, Zornitsa Kozareva, and Yue
Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language
Processing, pp. 3563–3574, Abu Dhabi, United Arab Emirates, December 2022. Association
for Computational Linguistics. doi: 10.18653/v1/2022.emnlp- main.233. URL https:
//aclanthology.org/2022.emnlp-main.233/.
Lorna Carson and Ning Jiang. Offensive

Chunk 25 · 1,997 chars

eedings of the 2022 Conference on Empirical Methods in Natural Language
Processing, pp. 3563–3574, Abu Dhabi, United Arab Emirates, December 2022. Association
for Computational Linguistics. doi: 10.18653/v1/2022.emnlp- main.233. URL https:
//aclanthology.org/2022.emnlp-main.233/.
Lorna Carson and Ning Jiang. Offensive words in chinese dialects. In An Anatomy of Chinese
Offensive Words: A Lexical and Semantic Analysis, pp. 99–143. Springer, 2021.
Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, and Marzyeh Ghassemi. Speak easy: Eliciting harmful
jailbreaks from llms with simple interactions. arXiv preprint arXiv:2502.04322, 2025.
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng,
Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna:
An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https:
//lmsys.org/blog/2023-03-30-vicuna/.
13

-- 13 of 26 --

Team Cohere, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Yazeed Alnumay, Sophia Al-
thammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, et al.
Command a: An enterprise-ready large language model. arXiv preprint arXiv:2504.00698, 2025.
Marta Costa-jussà, Pierre Andrews, Eric Smith, Prangthip Hansanti, Christophe Ropers, Elahe
Kalbassi, Cynthia Gao, Daniel Licht, and Carleigh Wood. Multilingual holistic bias: Extending
descriptors and patterns to unveil demographic biases in languages at scale. In Houda Bouamor,
Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods
in Natural Language Processing, pp. 14141–14156, Singapore, December 2023a. Association for
Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.874. URL https://aclantholo
gy.org/2023.emnlp-main.874/.
Marta Costa-jussà, Pierre Andrews, Eric Smith, Prangthip Hansanti, Christophe Ropers, Elahe
Kalbassi, Cynthia Gao, Daniel Licht, and Carleigh Wood. Multilingual holistic bias: Extending
descriptors and

Chunk 26 · 1,994 chars

Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.874. URL https://aclantholo
gy.org/2023.emnlp-main.874/.
Marta Costa-jussà, Pierre Andrews, Eric Smith, Prangthip Hansanti, Christophe Ropers, Elahe
Kalbassi, Cynthia Gao, Daniel Licht, and Carleigh Wood. Multilingual holistic bias: Extending
descriptors and patterns to unveil demographic biases in languages at scale. In Houda Bouamor,
Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods
in Natural Language Processing, pp. 14141–14156, Singapore, December 2023b. Association for
Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.874. URL https://aclantholo
gy.org/2023.emnlp-main.874/.
Marta Costa-jussà, Eric Smith, Christophe Ropers, Daniel Licht, Jean Maillard, Javier Ferrando,
and Carlos Escolano. Toxicity in multilingual machine translation at scale. In Houda Bouamor,
Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics:
EMNLP 2023, pp. 9570–9586, Singapore, December 2023c. Association for Computational Lin-
guistics. doi: 10.18653/v1/2023.findings-emnlp.642. URL https://aclanthology.org/2023.fi
ndings-emnlp.642/.
Marta Costa-jussà, Pierre Andrews, Christine Basta, Juan Ciro, Agnieszka Falenska, Seraphina
Goldfarb-Tarrant, Rafael Mosquera, Debora Nozza, and Eduardo Sánchez. Overview of the
shared task on machine translation gender bias evaluation with multilingual holistic bias. In
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, and Deb-
ora Nozza (eds.), Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing
(GeBNLP), pp. 399–404, Bangkok, Thailand, August 2024. Association for Computational Lin-
guistics. doi: 10.18653/v1/2024.gebnlp-1.26. URL https://aclanthology.org/2024.gebnlp-1
.26.
Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin
Zhang, Ziyi Qiu, Peiyang Li, et al. Risk taxonomy, mitigation, and assessment

Chunk 27 · 1,996 chars

ailand, August 2024. Association for Computational Lin-
guistics. doi: 10.18653/v1/2024.gebnlp-1.26. URL https://aclanthology.org/2024.gebnlp-1
.26.
Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin
Zhang, Ziyi Qiu, Peiyang Li, et al. Risk taxonomy, mitigation, and assessment benchmarks of
large language model systems. arXiv preprint arXiv:2401.05778, 2024.
Saurabh Dash, Yiyang Nan, John Dang, Arash Ahmadian, Shivalika Singh, Madeline Smith, Bharat
Venkitesh, Vlad Shmyhlo, Viraat Aryabumi, Walter Beller-Morales, et al. Aya vision: Advancing
the frontier of multilingual multimodality. arXiv preprint arXiv:2505.08751, 2025.
Bárbara Iansã de Lima Barroso, Cláudia Regina Cabral Galvão, Luiz Bueno da Silva, and Selma
Lancman. A systematic review of translation and cross-cultural adaptation of instruments for the
selection of assistive technologies. Occupational Therapy International, 2018(1):4984170, 2018.
Herve Debar, Sven Dietrich, Pavel Laskov, Emil C Lupu, and Eirini Ntoutsi. Emerging security
challenges of large language models. arXiv preprint arXiv:2412.17614, 2024.
14

-- 14 of 26 --

Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. Multilingual jailbreak challenges
in large language models. In The Twelfth International Conference on Learning Representations,
2024. URL https://openreview.net/forum?id=vESNKdEMGp.
Hannah Devinney, Jenny Björklund, and Henrik Björklund. We don‘t talk about that: Case studies
on intersectional analysis of social bias in large language models. In Agnieszka Faleńska, Christine
Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, and Debora Nozza (eds.), Proceedings
of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp. 33–44,
Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/
2024.gebnlp-1.3. URL https://aclanthology.org/2024.gebnlp-1.3/.
Jean-Marc Dewaele. The emotional force of swearwords and taboo words in the

Chunk 28 · 1,996 chars

of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp. 33–44,
Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/
2024.gebnlp-1.3. URL https://aclanthology.org/2024.gebnlp-1.3/.
Jean-Marc Dewaele. The emotional force of swearwords and taboo words in the speech of multilin-
guals. Journal of multilingual and multicultural development, 25(2-3):204–222, 2004.
Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, and Yu Qiao. Attacks, defenses and evalua-
tions for LLM conversation safety: A survey. In Kevin Duh, Helena Gomez, and Steven Bethard
(eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.
6734–6747, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi:
10.18653/v1/2024.naacl-long.375. URL https://aclanthology.org/2024.naacl-long.375/.
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, and Wei Xu.
Reducing privacy risks in online self-disclosures with language models. In Lun-Wei Ku, Andre
Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), pp. 13732–13754, Bangkok, Thailand,
August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.741.
URL https://aclanthology.org/2024.acl-long.741/.
Beyza Ermis, Luiza Pozzobon, Sara Hooker, and Patrick Lewis. From one to many: Expand-
ing the scope of toxicity mitigation in language models. In Lun-Wei Ku, Andre Martins, and
Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics: ACL 2024,
pp. 15041–15058, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
doi: 10.18653/v1/2024.findings-acl.893. URL https://aclanthology.org/2024.findings-acl
.893/.
Daniel L Everett. Language: The cultural tool. Vintage, 2012.
Mahmud

Chunk 29 · 1,994 chars

ings of the Association for Computational Linguistics: ACL 2024,
pp. 15041–15058, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
doi: 10.18653/v1/2024.findings-acl.893. URL https://aclanthology.org/2024.findings-acl
.893/.
Daniel L Everett. Language: The cultural tool. Vintage, 2012.
Mahmud Fasya and Dini Gilang Sari. Sociocultural factors that determine language choice in a
multilingual society. In Fifth International Conference on Language, Literature, Culture, and
Education (ICOLLITE 2021), pp. 412–418. Atlantis Press, 2021.
Saadia Gabriel, Isha Puri, Xuhai Xu, Matteo Malgaroli, and Marzyeh Ghassemi. Can AI re-
late: Testing large language model response for mental health support. In Yaser Al-Onaizan,
Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational
Linguistics: EMNLP 2024, pp. 2206–2221, Miami, Florida, USA, November 2024. Associa-
tion for Computational Linguistics. doi: 10.18653/v1/2024.f indings- emnlp.120. URL
https://aclanthology.org/2024.findings-emnlp.120/.
Penelope Gardner-Chloros. Code-switching. Cambridge university press, 2009.
15

-- 15 of 26 --

Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordás, and Gerasimos Spanakis.
Regulation and NLP (RegNLP): Taming large language models. In Houda Bouamor, Juan Pino,
and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing, pp. 8712–8724, Singapore, December 2023. Association for Computational
Linguistics. doi: 10.18653/v1/2023.emnlp-main.539. URL https://aclanthology.org/2023.
emnlp-main.539/.
Claudia Gorecki, Julia M Brown, Michelle Briggs, Suzanne Coleman, Carol Dealey, Elizabeth
McGinnis, E Andrea Nelson, Nikki Stubbs, Lyn Wilson, and Jane Nixon. Language translation &
cross-cultural adaptation guideline. Recommendations for language translation and cross-cultural
adaption of the PU-QOL questionnaire, 2014.
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav

Chunk 30 · 1,993 chars

n, Carol Dealey, Elizabeth
McGinnis, E Andrea Nelson, Nikki Stubbs, Lyn Wilson, and Jane Nixon. Language translation &
cross-cultural adaptation guideline. Recommendations for language translation and cross-cultural
adaption of the PU-QOL questionnaire, 2014.
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad
Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd
of models. arXiv preprint arXiv:2407.21783, 2024.
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit
Steiner, Dustin Li, Esin Durmus, Ethan Perez, et al. Studying large language model generalization
with influence functions. arXiv preprint arXiv:2308.03296, 2023.
Melody Y Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias,
Andrea Vallone, Hongyu Ren, Jason Wei, et al. Deliberative alignment: Reasoning enables safer
language models. arXiv preprint arXiv:2412.16339, 2024.
Nuno M. Guerreiro, Duarte M. Alves, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pierre
Colombo, and André F. T. Martins. Hallucinations in large multilingual translation models.
Transactions of the Association for Computational Linguistics, 11:1500–1517, 2023. doi: 10.116
2/tacl_a_00615. URL https://aclanthology.org/2023.tacl-1.85/.
Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, and
Wei Xu. Care: Aligning language models for regional cultural awareness. arXiv preprint
arXiv:2504.05154, 2025.
Hasan Abed Al Kader Hammoud, Umberto Michieli, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard
Ghanem, and Mete Ozay. Model merging and safety alignment: One bad model spoils the bunch.
In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for
Computational Linguistics: EMNLP 2024, pp. 13033–13046, Miami, Florida, USA, November
2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.762.
URL

Chunk 31 · 1,996 chars

ty alignment: One bad model spoils the bunch.
In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for
Computational Linguistics: EMNLP 2024, pp. 13033–13046, Miami, Florida, USA, November
2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.762.
URL https://aclanthology.org/2024.findings-emnlp.762/.
Sabit Hassan, Anthony Sicilia, and Malihe Alikhani. Active learning for robust and represen-
tative LLM generation in safety-critical scenarios. In Sachin Kumar, Vidhisha Balachandran,
Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Han-
naneh Hajishirzi, Dongyeop Kang, and David Jurgens (eds.), Proceedings of the 1st Workshop
on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Applica-
tion, Group, or Individual (CustomNLP4U), pp. 113–123, Miami, Florida, USA, November 2024.
Association for Computational Linguistics. doi: 10.18653/v1/2024.customnlp4u-1.10. URL
https://aclanthology.org/2024.customnlp4u-1.10/.
Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin IP Rubinstein,
and Trevor Cohn. Tuba: Cross-lingual transferability of backdoor attacks in llms with instruction
tuning. arXiv preprint arXiv:2404.19597, 2024.
16

-- 16 of 26 --

Harry Ed Hoijer. Language in culture; conference on the interrelations of language and other aspects
of culture. 1954.
Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng
Zhang. TrustAgent: Towards safe and trustworthy LLM-based agents. In Yaser Al-Onaizan,
Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational
Linguistics: EMNLP 2024, pp. 10000–10016, Miami, Florida, USA, November 2024. Associ-
ation for Computational Linguistics. doi: 10.18653/v1/2024.f indings- emnlp.585. URL
https://aclanthology.org/2024.findings-emnlp.585/.
Kaiyu Huang, Fengran Mo, Xinyu Zhang, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi,

Chunk 32 · 1,996 chars

putational
Linguistics: EMNLP 2024, pp. 10000–10016, Miami, Florida, USA, November 2024. Associ-
ation for Computational Linguistics. doi: 10.18653/v1/2024.f indings- emnlp.585. URL
https://aclanthology.org/2024.findings-emnlp.585/.
Kaiyu Huang, Fengran Mo, Xinyu Zhang, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong
Mao, Jinchen Liu, Yuzhuang Xu, et al. A survey on large language models with multilingualism:
Recent advances and new frontiers. arXiv preprint arXiv:2405.10936, 2024a.
Xin Huang, Tarun Kumar Vangani, Minh Duc Pham, Xunlong Zou, Bin Wang, Zhengyuan Liu, and
Ai Ti Aw. Meralion-textllm: Cross-lingual understanding of large language models in chinese,
indonesian, malay, and singlish. arXiv preprint arXiv:2501.08335, 2024b.
Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, and Maarten
Sap. Polyglotoxicityprompts: Multilingual evaluation of neural toxic degeneration in large lan-
guage models. In First Conference on Language Modeling, 2024. URL https://openreview.n
et/forum?id=ootI3ZO6TJ.
Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan
Chen, Jiayi Zhou, Kaile Wang, et al. Safe rlhf-v: Safe reinforcement learning from human feedback
in multimodal large language models. arXiv preprint arXiv:2503.17682, 2025.
Shaoxiong Ji, Zihao Li, Indraneil Paul, Jaakko Paavola, Peiqin Lin, Pinzhen Chen, Dayyán O’Brien,
Hengyu Luo, Hinrich Schütze, Jörg Tiedemann, et al. Emma-500: Enhancing massively multilin-
gual adaptation of large language models. arXiv preprint arXiv:2409.17892, 2024.
Wenying Jiang. The relationship between culture and language. ELT journal, 54(4):328–334, 2000.
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and
fate of linguistic diversity and inclusion in the NLP world. In Dan Jurafsky, Joyce Chai, Natalie
Schluter, and Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association
for Computational Linguistics,

Chunk 33 · 1,999 chars

Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and
fate of linguistic diversity and inclusion in the NLP world. In Dan Jurafsky, Joyce Chai, Natalie
Schluter, and Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association
for Computational Linguistics, pp. 6282–6293, Online, July 2020. Association for Computational
Linguistics. doi: 10.18653/v1/2020.acl-main.560. URL https://aclanthology.org/2020.ac
l-main.560/.
Arturs Kanepajs, Vladimir Ivanov, and Richard Moulange. Towards safe multilingual frontier AI.
In Workshop on Socially Responsible Language Modelling Research, 2024. URL https://openre
view.net/forum?id=iFHsnIkj4q.
Teo Keipi, Matti Näsi, Atte Oksanen, and Pekka Räsänen. Online hate and harmful content: Cross-
national perspectives. Taylor & Francis, 2016.
Tseen-Ling Khoo. Banana Bending: Asian-Australian and Asian-Canadian Literatures. Hong Kong
University Press, 2003.
Paria Khoshtab, Danial Namazifard, Mostafa Masoudi, Ali Akhgary, Samin Mahdizadeh Sani, and
Yadollah Yaghoobzadeh. Comparative study of multilingual idioms and similes in large language
17

-- 17 of 26 --

models. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di
Eugenio, and Steven Schockaert (eds.), Proceedings of the 31st International Conference on Com-
putational Linguistics, pp. 8680–8698, Abu Dhabi, UAE, January 2025. Association for Compu-
tational Linguistics. URL https://aclanthology.org/2025.coling-main.580/.
Minbeom Kim, Jahyun Koo, Hwanhee Lee, Joonsuk Park, Hwaran Lee, and Kyomin Jung. LifeTox:
Unveiling implicit toxicity in life advice. In Kevin Duh, Helena Gomez, and Steven Bethard
(eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp. 688–
698, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653
/v1/2024.naacl-short.60. URL

Chunk 34 · 1,953 chars

en Bethard
(eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp. 688–
698, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653
/v1/2024.naacl-short.60. URL https://aclanthology.org/2024.naacl-short.60/.
Hannah Rose Kirk, Alexander Whitefield, Paul Rottger, Andrew M Bean, Katerina Margatina,
Rafael Mosquera-Gomez, Juan Ciro, Max Bartolo, Adina Williams, He He, et al. The prism
alignment dataset: What participatory, representative and individualised human feedback reveals
about the subjective and multicultural alignment of large language models. Advances in Neural
Information Processing Systems, 37:105236–105344, 2024.
Katerina Korre, Arianna Muti, Federico Ruggeri, and Alberto Barrón-Cedeño. Untangling hate
speech definitions: A semantic componential analysis across cultures and domains. In Luis
Chiruzzo, Alan Ritter, and Lu Wang (eds.), Findings of the Association for Computational Lin-
guistics: NAACL 2025, pp. 3184–3198, Albuquerque, New Mexico, April 2025. Association for
Computational Linguistics. ISBN 979-8-89176-195-7. URL https://aclanthology.org/2025.
findings-naacl.175/.
Claire Kramsch. Language and culture. AILA review, 27(1):30–55, 2014.
Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal, Marzieh Fadaee, and Kocmi Tom. D\’ej\a vu:
Multilingual llm evaluation through the lens of machine translation evaluation. arXiv preprint
arXiv:2504.11829, 2025.
Udo Kruschwitz and Maximilian Schmidhuber. LLM-based synthetic datasets: Applications
and limitations in toxicity detection. In Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi,
Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, and Shyam Ratan (eds.), Pro-
ceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024,
pp. 37–51, Torino, Italia, May 2024. ELRA and ICCL. URL

Chunk 35 · 1,995 chars

limitations in toxicity detection. In Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi,
Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, and Shyam Ratan (eds.), Pro-
ceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024,
pp. 37–51, Torino, Italia, May 2024. ELRA and ICCL. URL https://aclanthology.org/2024.
trac-1.6/.
Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen,
Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, et al. Specific versus general
principles for constitutional ai. arXiv preprint arXiv:2310.13798, 2023.
Yara Kyrychenko, Ke Zhou, Edyta Bogucka, and Daniele Quercia. C3ai: Crafting and evaluating
constitutions for constitutional ai. In Proceedings of the ACM on Web Conference 2025, WWW
’25, pp. 3204–3218, New York, NY, USA, 2025. Association for Computing Machinery. ISBN
9798400712746. doi: 10.1145/3696410.3714705. URL https://doi.org/10.1145/3696410.3714
705.
Benjamin Larsen and Virginia Dignum. Ai value alignment: How we can align artificial intelligence
with human values. World Economic Forum, 10 2024. URL https://weforum.org/stories/20
24/10/ai-value-alignment-how-we-can-align-artificial-intelligence-with-human-val
ues/.
18

-- 18 of 26 --

Xiaochen Li, Zheng Xin Yong, and Stephen Bach. Preference tuning for toxicity mitigation general-
izes across languages. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of
the Association for Computational Linguistics: EMNLP 2024, pp. 13422–13440, Miami, Florida,
USA, November 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findi
ngs-emnlp.784. URL https://aclanthology.org/2024.findings-emnlp.784/.
Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, and Rema
Padman. Beyond single-turn: A survey on multi-turn interactions with large language models.
arXiv preprint arXiv:2504.04717, 2025.
Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André FT Martins, and

Chunk 36 · 1,995 chars

nthology.org/2024.findings-emnlp.784/.
Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, and Rema
Padman. Beyond single-turn: A survey on multi-turn interactions with large language models.
arXiv preprint arXiv:2504.04717, 2025.
Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André FT Martins, and Hinrich Schütze. Mala-500:
Massive language adaptation of large language models. arXiv preprint arXiv:2401.13303, 2024.
Edoardo Manino, Julia Rozanova, Danilo Carvalho, Andre Freitas, and Lucas Cordeiro. Systematic-
ity, compositionality and transitivity of deep NLP models: a metamorphic testing perspective.
In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Findings of the Associ-
ation for Computational Linguistics: ACL 2022, pp. 2355–2366, Dublin, Ireland, May 2022.
Association for Computational Linguistics. doi: 10.18653/v1/2022.f indings-acl.185. URL
https://aclanthology.org/2022.findings-acl.185/.
Beverley Maxwell, MO Martin, and DL Kelly. Translation and cultural adaptation of the survey
instruments. Third international mathematics and science study (TIMSS) technical report, 1:
159–169, 1996.
Abdelfattah Mazari and Naoual Derraz. Language and culture. International Journal of Humanities
and Cultural Studies, 2(2):350–359, 2015.
Chidozie Emmanuel Mbada, Gafar Atanda Adeogun, Michael Opeoluwa Ogunlana, Rufus Adesoji
Adedoyin, Adesanmi Akinsulore, Taofeek Oluwole Awotidebe, Opeyemi Ayodiipo Idowu, and
Olumide Ayoola Olaoye. Translation, cross-cultural adaptation and psychometric evaluation of
yoruba version of the short-form 36 health survey. Health and quality of life outcomes, 13:1–12,
2015.
Preslav Nakov, Firoj Alam, Shaden Shaar, Giovanni Da San Martino, and Yifan Zhang. COVID-19
in Bulgarian social media: Factuality, harmfulness, propaganda, and framing. In Ruslan Mitkov
and Galia Angelova (eds.), Proceedings of the International Conference on Recent Advances in
Natural Language Processing (RANLP 2021), pp. 997–1009, Held

Chunk 37 · 1,987 chars

Alam, Shaden Shaar, Giovanni Da San Martino, and Yifan Zhang. COVID-19
in Bulgarian social media: Factuality, harmfulness, propaganda, and framing. In Ruslan Mitkov
and Galia Angelova (eds.), Proceedings of the International Conference on Recent Advances in
Natural Language Processing (RANLP 2021), pp. 997–1009, Held Online, September 2021. IN-
COMA Ltd. URL https://aclanthology.org/2021.ranlp-1.113/.
Zabir Al Nazi and Wei Peng. Large language models in healthcare and medical domain: A review.
In Informatics, volume 11, pp. 57. MDPI, 2024.
Alexander Tobias Neumann, Yue Yin, Sulayman Sowe, Stefan Decker, and Matthias Jarke. An llm-
driven chatbot in higher education for databases and information systems. IEEE Transactions
on Education, 2024.
Hellina Hailu Nigatu and Inioluwa Deborah Raji. “i searched for a religious song in amharic and
got sexual content instead”: Investigating online harm in low-resourced languages on youtube.
In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pp.
141–160, 2024.
19

-- 19 of 26 --

Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Benjamin Rosman, Thamar Solorio, and Monojit
Choudhury. The zeno‘s paradox of ‘low-resource’ languages. In Yaser Al-Onaizan, Mohit Bansal,
and Yun-Nung Chen (eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural
Language Processing, pp. 17753–17774, Miami, Florida, USA, November 2024. Association for
Computational Linguistics. doi: 10.18653/v1/2024.emnlp-main.983. URL https://aclantholo
gy.org/2024.emnlp-main.983/.
Chad Nilep. “code switching” in sociocultural linguistics. Colorado research in linguistics, 2006.
Nobal B. Niraula, Saurab Dulal, and Diwa Koirala. Offensive language detection in Nepali social
media. In Aida Mostafazadeh Davani, Douwe Kiela, Mathias Lambert, Bertie Vidgen, Vinodku-
mar Prabhakaran, and Zeerak Waseem (eds.), Proceedings of the 5th Workshop on Online Abuse
and Harms (WOAH 2021), Online, August 2021. Association for

Chunk 38 · 1,996 chars

Saurab Dulal, and Diwa Koirala. Offensive language detection in Nepali social
media. In Aida Mostafazadeh Davani, Douwe Kiela, Mathias Lambert, Bertie Vidgen, Vinodku-
mar Prabhakaran, and Zeerak Waseem (eds.), Proceedings of the 5th Workshop on Online Abuse
and Harms (WOAH 2021), Online, August 2021. Association for Computational Linguistics. doi:
10.18653/v1/2021.woah-1.7. URL https://aclanthology.org/2021.woah-1.7/.
Debora Nozza. Exposing the limits of zero-shot cross-lingual hate speech detection. In Chengqing
Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting
of the Association for Computational Linguistics and the 11th International Joint Conference
on Natural Language Processing (Volume 2: Short Papers), pp. 907–914, Online, August 2021.
Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-short.114. URL https:
//aclanthology.org/2021.acl-short.114/.
Victor Ojewale, Ryan Steed, Briana Vecchione, Abeba Birhane, and Inioluwa Deborah Raji. Towards
ai accountability infrastructure: Gaps and opportunities in ai audit tooling. In Proceedings of the
2025 CHI Conference on Human Factors in Computing Systems, pp. 1–29, 2025.
OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, November 2022. Accessed
[Insert Date You Accessed This Page].
OpenAI. Openai gpt-4.5 system card. Technical report, OpenAI, 2 2025.
Ankit Pal and Malaikannan Sankarasubbu. Gemini goes to Med school: Exploring the capabilities
of multimodal large language models on medical challenge problems & hallucinations. In Tristan
Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, and Danielle Bitterman (eds.),
Proceedings of the 6th Clinical Natural Language Processing Workshop, pp. 21–46, Mexico City,
Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.clinicaln
lp-1.3. URL https://aclanthology.org/2024.clinicalnlp-1.3/.
Endang Wahyu Pamungkas, Valerio Basile, and Viviana Patti. Towards multidomain

Chunk 39 · 1,994 chars

ings of the 6th Clinical Natural Language Processing Workshop, pp. 21–46, Mexico City,
Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.clinicaln
lp-1.3. URL https://aclanthology.org/2024.clinicalnlp-1.3/.
Endang Wahyu Pamungkas, Valerio Basile, and Viviana Patti. Towards multidomain and multi-
lingual abusive language detection: a survey. Personal and Ubiquitous Computing, 27(1):17–43,
2023.
Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang,
Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha,
Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, and Sara Hooker. The multilingual
divide and its impact on global ai safety. arXiv preprint arXiv:2505.21344, 2025.
Bruna Pilz, Rodrigo A Vasconcelos, Freddy B Marcondes, Samuel S Lodovichi, Wilson Mello, and
Débora B Grossi. The brazilian version of start back screening tool-translation, cross-cultural
adaptation and reliability. Brazilian journal of physical therapy, 18:453–461, 2014.
20

-- 20 of 26 --

Giada Pistilli, Alina Leidinger, Yacine Jernite, Atoosa Kasirzadeh, Alexandra Sasha Luccioni, and
Margaret Mitchell. Civics: Building a dataset for examining culturally-informed values in large
language models. In Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society,
AIES ’24, pp. 1132–1144. AAAI Press, 2025.
Flor Miriam Plaza-del Arco, Debora Nozza, Marco Guerini, Jeffrey Sorensen, and Marcos Zampieri.
Countering hateful and offensive speech online - open challenges. In Jessy Li and Fei Liu
(eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro-
cessing: Tutorial Abstracts, pp. 11–16, Miami, Florida, USA, November 2024. Association
for Computational Linguistics. doi: 10.18653/v1/2024.emnlp- tutorials.2. URL https:
//aclanthology.org/2024.emnlp-tutorials.2/.
Samuele Poppi, Zheng-Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, and Jianfeng

Chunk 40 · 1,994 chars

Language Pro-
cessing: Tutorial Abstracts, pp. 11–16, Miami, Florida, USA, November 2024. Association
for Computational Linguistics. doi: 10.18653/v1/2024.emnlp- tutorials.2. URL https:
//aclanthology.org/2024.emnlp-tutorials.2/.
Samuele Poppi, Zheng-Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, and Jianfeng Chi.
Towards understanding the fragility of multilingual llms against fine-tuning attacks. Proceedings
of the 2025 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, 2025.
Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, and
Philip S Yu. Multilingual large language model: A survey of resources, taxonomy and frontiers.
arXiv preprint arXiv:2404.04925, 2024.
Haoyi Qiu, Alexander R Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng,
and Chien-Sheng Wu. Evaluating cultural and social awareness of llm web agents. arXiv preprint
arXiv:2410.23252, 2024.
Haoyi Qiu, Kung-Hsiang Huang, Ruichen Zheng, Jiao Sun, and Nanyun Peng. Multimodal cultural
safety: Evaluation frameworks and alignment strategies. arXiv preprint arXiv:2505.14972, 2025.
Anka Reuel, Ben Bucknall, Stephen Casper, Tim Fist, Lisa Soder, Onni Aarne, Lewis Hammond,
Lujain Ibrahim, Alan Chan, Peter Wills, et al. Open problems in technical ai governance. arXiv
preprint arXiv:2407.14981, 2024.
Manon Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt, and Bart Baesens. In-
vestigating bias in multilingual language models: Cross-lingual transfer of debiasing techniques.
In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on
Empirical Methods in Natural Language Processing, pp. 2887–2896, Singapore, December 2023.
Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.175. URL
https://aclanthology.org/2023.emnlp-main.175/.
Paul Röttger, Fabio Pernisi, Bertie Vidgen, and Dirk Hovy. Safetyprompts: a systematic

Chunk 41 · 1,998 chars

erence on
Empirical Methods in Natural Language Processing, pp. 2887–2896, Singapore, December 2023.
Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.175. URL
https://aclanthology.org/2023.emnlp-main.175/.
Paul Röttger, Fabio Pernisi, Bertie Vidgen, and Dirk Hovy. Safetyprompts: a systematic review of
open datasets for evaluating and improving large language model safety. In Proceedings of the
AAAI Conference on Artificial Intelligence, volume 39, pp. 27617–27627, 2025.
Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Square one bias in NLP: Towards a multi-
dimensional exploration of the research manifold. In Smaranda Muresan, Preslav Nakov, and
Aline Villavicencio (eds.), Findings of the Association for Computational Linguistics: ACL 2022,
pp. 2340–2354, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.1
8653/v1/2022.findings-acl.184. URL https://aclanthology.org/2022.findings-acl.184.
Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwaraknath Gnaneshwar,
Acyr Locatelli, Robert Kirk, Tim Rocktäschel, Edward Grefenstette, and Max Bartolo. Proce-
dural knowledge in pretraining drives reasoning in large language models. In The Thirteenth
21

-- 21 of 26 --

International Conference on Learning Representations, 2025. URL https://openreview.net/f
orum?id=1hQKHHUsMx.
Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram Markosyan,
Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, et al. Rainbow
teaming: Open-ended generation of diverse adversarial prompts. Advances in Neural Information
Processing Systems, 37:69747–69786, 2024.
Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, and Marco Turchi. Gender bias in
machine translation. Transactions of the Association for Computational Linguistics, 9:845–874,
2021.
Ayse Pinar Saygin. Processing figurative language in a multi-lingual task: Translation, transfer
and metaphor. In Proceedings of

Chunk 42 · 1,994 chars

ce Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, and Marco Turchi. Gender bias in
machine translation. Transactions of the Association for Computational Linguistics, 9:845–874,
2021.
Ayse Pinar Saygin. Processing figurative language in a multi-lingual task: Translation, transfer
and metaphor. In Proceedings of the Workshop on Corpus-based and Processing Approaches to
Figurative Language. Citeseer, 2001.
Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng,
Philipp Koehn, and Daniel Khashabi. The language barrier: Dissecting safety challenges of LLMs
in multilingual contexts. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings
of the Association for Computational Linguistics: ACL 2024, pp. 2668–2680, Bangkok, Thailand,
August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-acl.156.
URL https://aclanthology.org/2024.findings-acl.156/.
Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei
Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, and Deyi Xiong. Large language model
safety: A holistic survey, 2024a. URL https://arxiv.org/abs/2412.17686.
Shaojie Shi, Xiaoyu Tan, Xihe Qiu, Chao Qu, Kexin Nie, Yuan Cheng, Wei Chu, Xu Yinghui,
and Yuan Qi. ULMR: Unlearning large language models via negative response and model
parameter average. In Franck Dernoncourt, Daniel Preoţiuc-Pietro, and Anastasia Shimorina
(eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro-
cessing: Industry Track, pp. 755–762, Miami, Florida, US, November 2024b. Association for
Computational Linguistics. doi: 10.18653/v1/2024.emnlp- industry.57. URL https:
//aclanthology.org/2024.emnlp-industry.57/.
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan
Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode
clinical knowledge. Nature, 620(7972):172–180, 2023.
Karan

Chunk 43 · 1,994 chars

4.emnlp- industry.57. URL https:
//aclanthology.org/2024.emnlp-industry.57/.
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan
Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode
clinical knowledge. Nature, 620(7972):172–180, 2023.
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou,
Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question
answering with large language models. Nature Medicine, pp. 1–8, 2025.
Jiayang Song, Yuheng Huang, Zhehua Zhou, and Lei Ma. Multilingual blending: Large language
model safety alignment evaluation with language mixture. In Luis Chiruzzo, Alan Ritter, and
Lu Wang (eds.), Findings of the Association for Computational Linguistics: NAACL 2025, pp.
3433–3449, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics.
ISBN 979-8-89176-195-7. URL https://aclanthology.org/2025.findings-naacl.191/.
Kamal K Sridhar. Societal multilingualism. Sociolinguistics and language teaching, 47:70, 1996.
22

-- 22 of 26 --

Alex Tamkin, Miles McCain, Kunal Handa, Esin Durmus, Liane Lovitt, Ankur Rathi, Saffron
Huang, Alfred Mountfield, Jerry Hong, Stuart Ritchie, et al. Clio: Privacy-preserving insights
into real-world ai use. arXiv preprint arXiv:2412.13678, 2024.
Yan Tao, Olga Viberg, Ryan S Baker, and René F Kizilcec. Cultural bias and cultural alignment
of large language models. PNAS nexus, 3(9):pgae346, 2024.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Niko-
lay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open founda-
tion and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
Monica M Trieu. Understanding the use of “twinkie,”“banana,” and “fob”: Identifying the origin,
role, and consequences of internalized racism within asian america. Sociology Compass, 13(5):
e12679, 2019.
Ahmet

Chunk 44 · 1,999 chars

Bhosale, et al. Llama 2: Open founda-
tion and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
Monica M Trieu. Understanding the use of “twinkie,”“banana,” and “fob”: Identifying the origin,
role, and consequences of internalized racism within asian america. Sociology Compass, 13(5):
e12679, 2019.
Ahmet Üstün, Viraat Aryabumi, Zheng Yong, Wei-Yin Ko, Daniel D’souza, Gbemileke Onilude,
Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne
Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, and Sara Hooker. Aya model: An
instruction finetuned open-access multilingual language model. In Lun-Wei Ku, Andre Martins,
and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Com-
putational Linguistics (Volume 1: Long Papers), pp. 15894–15939, Bangkok, Thailand, August
2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.845. URL
https://aclanthology.org/2024.acl-long.845/.
Jun Wang, Benjamin Rubinstein, and Trevor Cohn. Measuring and mitigating name biases in
neural machine translation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.),
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers), pp. 2576–2590, Dublin, Ireland, May 2022. Association for Computational
Linguistics. doi: 10.18653/v1/2022.acl-long.184. URL https://aclanthology.org/2022.acl-l
ong.184/.
Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, and
Michael Lyu. All languages matter: On the multilingual safety of LLMs. In Lun-Wei Ku, Andre
Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics:
ACL 2024, pp. 5865–5877, Bangkok, Thailand, August 2024a. Association for Computational
Linguistics. doi: 10.18653/v1/2024.findings-acl.349. URL https://aclanthology.org/2024.fi
ndings-acl.349/.
Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse

Chunk 45 · 1,988 chars

.), Findings of the Association for Computational Linguistics:
ACL 2024, pp. 5865–5877, Bangkok, Thailand, August 2024a. Association for Computational
Linguistics. doi: 10.18653/v1/2024.findings-acl.349. URL https://aclanthology.org/2024.fi
ndings-acl.349/.
Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, and
Michael Lyu. All languages matter: On the multilingual safety of LLMs. In Lun-Wei Ku, Andre
Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics:
ACL 2024, pp. 5865–5877, Bangkok, Thailand, August 2024b. Association for Computational
Linguistics. doi: 10.18653/v1/2024.findings-acl.349. URL https://aclanthology.org/2024.fi
ndings-acl.349/.
Xiangwen Wang, Jie Peng, Kaidi Xu, Huaxiu Yao, and Tianlong Chen. Reinforcement learning-
driven LLM agent for automated attacks on LLMs. In Ivan Habernal, Sepideh Ghanavati, Abhi-
lasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah,
and Oluwaseyi Feyisetan (eds.), Proceedings of the Fifth Workshop on Privacy in Natural Lan-
guage Processing, pp. 170–177, Bangkok, Thailand, August 2024c. Association for Computational
Linguistics. URL https://aclanthology.org/2024.privatenlp-1.17/.
23

-- 23 of 26 --

Xinpeng Wang, Mingyang Wang, Yihong Liu, Hinrich Schütze, and Barbara Plank. Refusal direction
is universal across safety-aligned languages. arXiv preprint arXiv:2505.17306, 2025.
Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun
Ma, Yu-Gang Jiang, Yu Qiao, and Yingchun Wang. Fake alignment: Are LLMs really aligned
well? In Kevin Duh, Helena Gomez, and Steven Bethard (eds.), Proceedings of the 2024 Confer-
ence of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (Volume 1: Long Papers), pp. 4696–4712, Mexico City, Mexico, June
2024d. Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.263.

Chunk 46 · 1,995 chars

even Bethard (eds.), Proceedings of the 2024 Confer-
ence of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (Volume 1: Long Papers), pp. 4696–4712, Mexico City, Mexico, June
2024d. Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.263. URL
https://aclanthology.org/2024.naacl-long.263/.
Zhonghao Wang, Zijia Lu, Bo Jin, and Haiying Deng. Mediagpt: A large language model for chinese
media. arXiv preprint arXiv:2307.10930, 2023.
Qingsong Wen, Jing Liang, Carles Sierra, Rose Luckin, Richard Tong, Zitao Liu, Peng Cui, and
Jiliang Tang. Ai for education (ai4edu): Advancing personalized education with llm and adaptive
learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining, pp. 6743–6744, 2024.
Genta Winata, Alham Fikri Aji, Zheng Xin Yong, and Thamar Solorio. The decades progress
on code-switching research in NLP: A systematic survey on trends and challenges. In Anna
Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Com-
putational Linguistics: ACL 2023, pp. 2936–2978, Toronto, Canada, July 2023. Association
for Computational Linguistics. doi: 10.18653/v1/2023.f indings- acl.185. URL https:
//aclanthology.org/2023.findings-acl.185/.
Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu,
Longyue Wang, Weihua Luo, and Kaifu Zhang. The bitter lesson learned from 2,000+ multilingual
benchmarks. arXiv preprint arXiv:2504.15521, 2025.
Hemant Yadav and Sunayana Sitaram. A survey of multilingual models for automatic speech
recognition. arXiv preprint arXiv:2202.12576, 2022.
Mohammad Ali Yaghan. " arabizi": A contemporary style of arabic slang. Design issues, 24(2):
39–52, 2008.
Yahan Yang, Soham Dan, Dan Roth, and Insup Lee. Benchmarking llm guardrails in handling
multilingual toxicity. arXiv preprint arXiv:2410.22153, 2024a.
Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen,

Chunk 47 · 1,996 chars

2.
Mohammad Ali Yaghan. " arabizi": A contemporary style of arabic slang. Design issues, 24(2):
39–52, 2008.
Yahan Yang, Soham Dan, Dan Roth, and Insup Lee. Benchmarking llm guardrails in handling
multilingual toxicity. arXiv preprint arXiv:2410.22153, 2024a.
Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, and Qian Liu.
Self-distillation bridges distribution gap in language model fine-tuning. In Lun-Wei Ku, Andre
Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), pp. 1028–1043, Bangkok, Thailand,
August 2024b. Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.58.
URL https://aclanthology.org/2024.acl-long.58/.
Da Yin, Haoyi Qiu, Kung-Hsiang Huang, Kai-Wei Chang, and Nanyun Peng. Safeworld: Geo-
diverse safety alignment. Advances in Neural Information Processing Systems, 37:128734–128768,
2024.
Zheng Xin Yong, Cristina Menghini, and Stephen Bach. Low-resource languages jailbreak GPT-4.
In Socially Responsible Language Modelling Research, 2023a. URL https://openreview.net/f
orum?id=pn83r8V2sv.
24

-- 24 of 26 --

Zheng Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Ade-
lani, Khalid Almubarak, M Saiful Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, Genta
Winata, Stella Biderman, Edward Raff, Dragomir Radev, and Vassilina Nikoulina. BLOOM+1:
Adding language support to BLOOM for zero-shot prompting. In Anna Rogers, Jordan Boyd-
Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pp. 11682–11703, Toronto, Canada, July
2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.653. URL
https://aclanthology.org/2023.acl-long.653/.
Zheng-Xin Yong, M Farid Adilazuarda, Jonibek Mansurov, Ruochen Zhang, Niklas Muennighoff,
Carsten Eickhoff, Genta Indra Winata, Julia Kreutzer,

Chunk 48 · 1,991 chars

, pp. 11682–11703, Toronto, Canada, July
2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.653. URL
https://aclanthology.org/2023.acl-long.653/.
Zheng-Xin Yong, M Farid Adilazuarda, Jonibek Mansurov, Ruochen Zhang, Niklas Muennighoff,
Carsten Eickhoff, Genta Indra Winata, Julia Kreutzer, Stephen H Bach, and Alham Fikri Aji.
Crosslingual reasoning through test-time scaling. arXiv preprint arXiv:2505.05408, 2025.
Haneul Yoo, Yongjin Yang, and Hwaran Lee. Code-switching red-teaming: Llm evaluation for safety
and multilingual understanding. arXiv preprint arXiv:2406.15481, 2024.
Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu,
Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-judge: Benchmark-
ing safety risk awareness for LLM agents. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung
Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 1467–
1490, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi:
10.18653/v1/2024.findings-emnlp.79. URL https://aclanthology.org/2024.findings-emnlp
.79/.
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi. How johnny can
persuade LLMs to jailbreak them: Rethinking persuasion to challenge AI safety by humanizing
LLMs. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.
14322–14350, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi:
10.18653/v1/2024.acl-long.773. URL https://aclanthology.org/2024.acl-long.773/.
Yizhou Zhang, Karishma Sharma, Lun Du, and Yan Liu. Toward mitigating misinformation
and social media manipulation in llm era. In Companion Proceedings of the ACM Web Con-
ference 2024, WWW ’24, pp. 1302–1305, New York, NY, USA, 2024a. Association for Com-
puting Machinery. ISBN 9798400701726. doi:

Chunk 49 · 1,952 chars

2024.acl-long.773/.
Yizhou Zhang, Karishma Sharma, Lun Du, and Yan Liu. Toward mitigating misinformation
and social media manipulation in llm era. In Companion Proceedings of the ACM Web Con-
ference 2024, WWW ’24, pp. 1302–1305, New York, NY, USA, 2024a. Association for Com-
puting Machinery. ISBN 9798400701726. doi: 10.1145/3589335.3641256. URL https:
//doi.org/10.1145/3589335.3641256.
Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianx-
iao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, et al. Simulating classroom education with llm-
empowered agents. arXiv preprint arXiv:2406.19226, 2024b.
Weixiang Zhao, Yulin Hu, Yang Deng, Tongtong Wu, Wenxuan Zhang, Jiahe Guo, An Zhang,
Yanyan Zhao, Bing Qin, Tat-Seng Chua, et al. Mpo: Multilingual safety alignment via reward
gap optimization. arXiv preprint arXiv:2505.16869, 2025.
Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, and Yu Qiao. Be-
yond one-preference-fits-all alignment: Multi-objective direct preference optimization. In Lun-
Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Com-
putational Linguistics: ACL 2024, pp. 10586–10613, Bangkok, Thailand, August 2024. As-
sociation for Computational Linguistics. doi: 10.18653/v1/2024.f indings- acl.630. URL
https://aclanthology.org/2024.findings-acl.630/.
25

-- 25 of 26 --

Shucheng Zhu, Bingjie Du, Jishun Zhao, Ying Liu, and Pengyuan Liu. Do PLMs and annotators
share the same gender bias? definition, dataset, and framework of contextualized gender bias. In
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, and Deb-
ora Nozza (eds.), Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing
(GeBNLP), pp. 20–32, Bangkok, Thailand, August 2024. Association for Computational Linguis-
tics. doi: 10.18653/v1/2024.gebnlp-1.2. URL https://aclanthology.org/2024.gebnlp-1.2/.
26

-- 26 of 26 --