Arxiv Weekly Insights
Posts
Weekly Research Digest: Top arXiv Papers (#24)

Weekly Research Digest: Top arXiv Papers (#24)

Exclusive Report on AI and LLMs

John Accel
January 06, 2025

Welcome to the 24th edition of "Arxiv Weekly Insights," where we delve into the latest groundbreaking research and developments from the Arxiv repository.

We’re excited to share something we think you’ll love – our latest report:

✨ Top 100 Most Influential AI & LLM Papers of 2024, featuring the most exciting and impactful research from arXiv.org this year.

It’s completely free, and it’s packed with insights into the breakthroughs shaping AI right now. Whether you're deep in the AI world or just curious about what’s next, we’re sure you’ll find it valuable.

Artificial Intelligence
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu

OS-Genesis is a novel pipeline for automating GUI agent trajectory construction by reversing the conventional task synthesis process. It enables agents to perceive environments, perform interactions, and retrospectively derive high-quality tasks, improving data quality and diversity for training GUI agents.

Computer Vision and Pattern Recognition
MVTamperBench: Evaluating Robustness of Vision-Language Models
Amit Agarwal, Srikant Panda, Angeline Charles, Bhargava Kumar, Hitesh Patel, Priyanranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae

This paper introduces MVTamperBench, a benchmark for evaluating the robustness of Vision-Language Models (VLMs) to video tampering effects. The benchmark assesses models like InternVL2-8B and Llama-VILA1.5-8B, revealing significant variability in their resilience. MVTamperBench is integrated into VLMEvalKit to facilitate reproducibility and advancements in model robustness.

Thank you for joining us this week. Stay tuned for more insights in our next edition. Until then, happy researching! See you next week!