Why do LLaVA Vision-Language Models Reply to Images in English?
Published in In the proceedings of Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Recommended citation: Musashi Hinck, Carolin Holtermann, Matthew Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shao-Yen Tseng, Vasudev Lal, "Why do LLaVA Vision-Language Models Reply to Images in English?." In the proceedings of Findings of the Association for Computational Linguistics: EMNLP 2024, 2024.
Download Paper