Why do LLaVA Vision-Language Models Reply to Images in English?

Published in In the proceedings of Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Access paper here

Recommended citation: Musashi Hinck, Carolin Holtermann, Matthew Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shao-Yen Tseng, Vasudev Lal, "Why do LLaVA Vision-Language Models Reply to Images in English?." In the proceedings of Findings of the Association for Computational Linguistics: EMNLP 2024, 2024.
Download Paper