- Arxiv Weekly Insights
- Posts
- #10 Arxiv Weekly Insights
#10 Arxiv Weekly Insights
Papers on Artificial Intelligence, Computation and Language, Pattern Recognition and Robotics
Welcome to the 10th edition of "Arxiv Weekly Insights," where we delve into the latest groundbreaking research and developments from the arXiv repository.
This newsletter is brought to you by SmartXiv, the AI-powered personalized arXiv digest designed to enhance your research experience. With over 1000 research papers uploaded daily on arXiv, it's easy to miss important updates. Let SmartXiv deliver personalized recommendations so you never miss what truly matters to you.
The best part? SmartXiv is now completely free for 14 days, and you can cancel anytime.
Computation and Language
Gender Representation and Bias in Indian Civil Service Mock Interviews
Somonnoy Banerjee, Sujan Dutta, Soumyajit Datta, Ashiqur R. KhudaBukhsh
This paper makes three key contributions. First, via a substantial corpus of 51,278 interview questions sourced from 888 YouTube videos of mock interviews of Indian civil service candidates, we demonstrate stark gender bias in the broad nature of questions asked to male and female candidates. Second, our experiments with large language models show a strong presence of gender bias in explanations provided by the LLMs on the gender inference task. Finally, we present a novel dataset of 51,278 interview questions that can inform future social science studies.
Computer Vision and Pattern Recognition
Do Pre-trained Vision-Language Models Encode Object States?
Kaleb Newman, Shijie Wang, Yuan Zang, David Heffren, Chen Sun
This paper investigates if vision-language models pre-trained on web-scale data learn to encode object states. The authors curate an object state recognition dataset and evaluate nine open-source vision-language models. They observe that while these models can reliably perform object recognition, they consistently fail to accurately distinguish the objects' physical states. The paper identifies three areas for improvements for vision-language models to better encode object states: the quality of object localization, the architecture to bind concepts to objects, and the objective to learn discriminative visual and language encoders on object states.
Computer Vision and Pattern Recognition
Deep-Wide Learning Assistance for Insect Pest Classification
Toan Nguyen, Huy Nguyen, Huy Ung, Hieu Ung, Binh Nguyen
This paper presents DeWi, a novel learning assistance for insect pest classification. With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination and generalization. DeWi learns discriminative and in-depth features of insect pests while still generalizing well to a large number of insect categories. Experimental results show that DeWi achieves the highest performances on two insect pest classification benchmarks.
Artificial Intelligence
Instigating Cooperation among LLM Agents Using Adaptive Information Modulation
Qiliang Chen, Alireza, Ilami, Nunzio Lore, Babak Heydari
This paper introduces a novel framework combining LLM agents as proxies for human strategic behavior with reinforcement learning (RL) to engage these agents in evolving strategic interactions within team environments. The framework extends traditional agent-based simulations by using strategic LLM agents and introducing dynamic and adaptive governance through a pro-social promoting RL agent that modulates information access across agents in a network, optimizing social welfare and promoting pro-social behavior.
Computation and Language
E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on High-impact Legal Cases
Candida M. Greco, Lorenzo Zangari, Davide Picca, Andrea Tagarelli
The way media reports on legal cases can significantly shape public opinion, often embedding subtle biases that influence societal views on justice and morality. Analyzing these biases requires a holistic approach that captures the emotional tone, moral framing, and specific events within the narratives. In this work we introduce E2MoCase, a novel dataset designed to facilitate the integrated analysis of emotions, moral values, and events within legal narratives and media coverage. By leveraging advanced models for emotion detection, moral value identification, and event extraction, E2MoCase offers a multi-dimensional perspective on how legal cases are portrayed in news articles.
Robotics
Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation
Yifan Xu, Ziming Luo, Qianwei Wang, Vineet Kamat, Carol Menassa
This paper proposes Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. The framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, the authors leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a 'Snap-Lookup' framework for open-vocabulary room classification.
Thank you for joining us this week. Stay tuned for more insights in our next edition. Until then, happy researching! See you next week!