Comparative Evaluation of GPT-4o, Gemini, Llama, And Grok On Remote Sensing Imagery

Author(s)	Mr. Abbdulmumini Imam Ibrahim, Mr. Abdullahi Muhammad Auwal, Mr. Jidda Harun Abba
Country	India
Abstract	This study presents an in-depth comparative evaluation of four Multimodal Large Language Models (MLLMs) GPT-4o, Gemini 2.5 Pro, Llama 4, and Grok 3 on satellite image captioning and classification using the Remote Sensing Image Captioning Dataset (RSICD). Using structured prompts and expert human judgment, we assessed each model across the following qualitative metrics: accuracy, relevance, understanding depth, and classification precision. Our findings show that MLLMs, while not replacements for specialized remote sensing tools, offer substantial support as analytical partners and produce context-aware interpretations and reliable classifications. Distinct performance profiles emerged, and we outlined critical directions for future research in quantitative benchmarking, advanced prompt engineering, and hybrid model architectures.
Keywords	: Multimodal Large Language Models, Satellite Imagery, Remote Sensing, Image Captioning, Image Classification
Field	Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In	Volume 7, Issue 6, November-December 2025
Published On	2025-12-31
DOI	https://doi.org/10.36948/ijfmr.2025.v07i06.65292

About IJFMR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijfmr.com

International Journal For Multidisciplinary Research