๐ฆ RESEARCHER
Alpha
Home
Documents
โ๏ธ
Loading...
โ Home
๐ Documents
58 documents
Sort by
Name
ยท
arXiv ID
ยท
Uploaded โ
Filter documents by title
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
arXiv:2406.10118 ยท 15-Apr-2026
Which Humans?
arXiv:2506.14680 ยท 15-Apr-2026
SEA-BED: How Do Embedding Models Represent Southeast Asian Languages?
arXiv:2508.12243 ยท 15-Apr-2026
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
arXiv:2412.03304 ยท 15-Apr-2026
Is Small Language Model the Silver Bullet to Low-Resource Languages Machine Translation?
arXiv:2503.24102 ยท 15-Apr-2026
SEA-LION: Southeast Asian Languages in One Network
arXiv:2504.05747 ยท 15-Apr-2026
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Languages
arXiv:2411.05049 ยท 15-Apr-2026
SEA-HELM: Southeast Asian Holistic Evaluation of Language Models
arXiv:2502.14301 ยท 15-Apr-2026
Bhaasha, Bhasa, Zaban: A Survey for Low-Resourced Languages in South Asia -- Current Stage and Challenges
arXiv:2509.11570 ยท 16-Apr-2026
FILBENCH: Can LLMs Understand and Generate Filipino?
arXiv:2508.03523 ยท 16-Apr-2026
English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
arXiv:2604.13286 ยท 16-Apr-2026
Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion
arXiv:2603.06264 ยท 16-Apr-2026
Toward Robust Multilingual Adaptation of LLMs for Low-Resource Languages
arXiv:2510.14466 ยท 16-Apr-2026
BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models
arXiv:2602.18788 ยท 16-Apr-2026
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages
arXiv:2412.01186 ยท 16-Apr-2026
MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation
arXiv:2604.04839 ยท 16-Apr-2026
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
arXiv:2511.01670 ยท 16-Apr-2026
Multilingual Text Representation
arXiv:2309.00949 ยท 17-Apr-2026
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
arXiv:2509.14804 ยท 17-Apr-2026
Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance
arXiv:2604.10590 ยท 17-Apr-2026
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
arXiv:2309.06085 ยท 17-Apr-2026
LaoBench: A Large-Scale Multidimensional Lao Benchmark
arXiv:2511.11334 ยท 17-Apr-2026
Learning Speech Representations with Variational Predictive Coding
arXiv:2601.00100 ยท 17-Apr-2026
The Geometry of Multilingual Language Models: An Equality Lens
arXiv:2305.07839 ยท 17-Apr-2026
Enhancing Multilingual RAG Systems with Debiased Language
arXiv:2601.02956 ยท 17-Apr-2026
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context
arXiv:2504.02708 ยท 17-Apr-2026
OpenSeal: Good, Fast, and Cheap Construction of an Open-Source Southeast Asian LLM via Parallel Data
arXiv:2602.02266 ยท 17-Apr-2026
Tokenization Disparities as Infrastructure Bias: How Subword Systems Create Inequities in LLM Access and Efficiency
arXiv:2510.12389 ยท 17-Apr-2026
Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe
arXiv:2604.12477 ยท 17-Apr-2026
Large Multimodal Models for Low-Resource Languages: A Survey
arXiv:2502.05568 ยท 17-Apr-2026
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
arXiv:2412.04497 ยท 17-Apr-2026
Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages
arXiv:2511.06497 ยท 17-Apr-2026
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
arXiv:2506.08400 ยท 17-Apr-2026
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures
arXiv:2512.05501 ยท 17-Apr-2026
Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering
arXiv:2511.23231 ยท 17-Apr-2026
Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration
arXiv:2509.21946 ยท 17-Apr-2026
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework
arXiv:2410.19453 ยท 17-Apr-2026
BBPE16: UTF-16-based byte-level byte-pair encoding for improved multilingual speech recognition
arXiv:2602.01717 ยท 17-Apr-2026
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
arXiv:2410.12462 ยท 17-Apr-2026
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
arXiv:2604.12710 ยท 17-Apr-2026
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
arXiv:2502.06298 ยท 17-Apr-2026
When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities
arXiv:2604.10787 ยท 17-Apr-2026
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language
arXiv:2407.19672 ยท 17-Apr-2026
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
arXiv:2507.08898 ยท 17-Apr-2026
Cross-Lingual Activation Steering for Multilingual Language Models
arXiv:2601.16390 ยท 17-Apr-2026
Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks
arXiv:2509.20045 ยท 17-Apr-2026
Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality
arXiv:2603.17512 ยท 17-Apr-2026
The Serendipity of Claude AI: Case of the 13 Low-Resource National Languages of Mali
arXiv:2503.03380 ยท 17-Apr-2026
Mangosteen: An Open Thai Corpus for Language Model Pretraining
arXiv:2507.14664 ยท 17-Apr-2026
Typologically-Informed Candidate Reranking for LLM-based Translation into Low-Resource Languages
arXiv:2602.01162 ยท 17-Apr-2026
Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries
arXiv:2512.01419 ยท 17-Apr-2026
Benchmarking Concept-Spilling Across Languages in LLMs
arXiv:2601.12549 ยท 17-Apr-2026
VietJobs: A Vietnamese Job Advertisement Dataset
arXiv:2603.05262 ยท 17-Apr-2026
The Token Tax: Systematic Bias in Multilingual Tokenization
arXiv:2509.05486 ยท 19-Apr-2026
Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali
arXiv:2604.14171 ยท 19-Apr-2026
XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics
arXiv:2604.14934 ยท 19-Apr-2026
Multilingual Large Language Models do not comprehend all natural languages to equal degrees
arXiv:2602.20065 ยท 20-Apr-2026
SeaLLMs - Large Language Models for Southeast Asia
arXiv:2312.00738 ยท 21-Apr-2026