#6 Arxiv Weekly Insights

Welcome to the 6th edition of "Arxiv Weekly Insights," where we delve into the latest groundbreaking research and developments from the Arxiv repository.

This newsletter is brought to you by SmartXiv, the AI-powered personalized arXiv digest designed to enhance your research experience. With over 1000 research papers uploaded daily on arXiv, it's easy to miss important updates. Let SmartXiv deliver personalized recommendations so you never miss what truly matters to you.
Get started today and save 30% with your annual subscription.

Machine Learning
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf

This paper focuses on assessing the capabilities of large language models (LLMs) by turning to a new task: focusing on symbolic graphics programs. The authors characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. The task is challenging as the questions are difficult to answer from the symbolic programs alone. The authors introduce Symbolic Instruction Tuning (SIT) to improve this ability.

Artificial Intelligence
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors
Usman Syed, Ethan Light, Xingang Guo, Huan Zhang, Lianhui Qin, Yanfeng Ouyang, Bin Hu

This paper explores the capabilities of state-of-the-art large language models (LLMs) in solving some selected undergraduate-level transportation engineering problems. The authors introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems.

Software Engineering
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong

This paper proposes DEI (Diversity Empowered Intelligence), a framework that leverages the unique expertise of various large language model (LLM) agents to solve real-world software engineering problems. DEI functions as a meta-module atop existing agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents can surpass the best individual agent's performance by a large margin.

Computers and Society
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Łukasz Borchmann, Michał Pietruszka, Wojciech Jaśkowski, Dawid Jurkiewicz, Piotr Halama, Paweł Józiak, Łukasz Garncarek, Paweł Liskowski, Karolina Szyndler, Andrzej Gretkowski, Julita Ołtusek, Gabriela Nowakowska, Artur Zawłocki, Łukasz Duhr, Paweł Dyda, Michał Turski

The Arctic-TILT is introduced, a model that achieves accuracy on par with models 1000x its size on workloads involving answering questions grounded on PDF or scan content. It can be fine-tuned and deployed on a single 24GB GPU, and establishes state-of-the-art results on seven diverse Document Understanding benchmarks.


Computation and Language
Covert Bias: The Severity of Social Views' Unalignment Towards Implicit and Explicit Opinion
Abeer Aldayel, Areej Alokaili, Rehab Alahmadi

While various approaches have recently been studied for bias identification, little is known about how implicit language that does not explicitly convey a viewpoint affects bias amplification in large language models.To examine the severity of bias toward a view, we evaluated the performance of two downstream tasks where the implicit and explicit knowledge of social groups were used. First, we present a stress test evaluation by using a biased model in edge cases of excessive bias scenarios. Then, we evaluate how LLMs calibrate linguistically in response to both implicit and explicit opinions when they are aligned with conflicting viewpoints. Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances. Moreover, the bias-aligned models generate more cautious responses using uncertainty phrases compared to the unaligned (zero-shot) base models. The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness by incorporating uncertainty markers to enhance their reliability, especially on socially nuanced topics with high subjectivity.

Machine Learning
Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices
Tess Watt, Christos Chrysoulas, Peter J Barclay

Image classification usually requires connectivity and access to the cloud which is often limited in many parts of the world, including hard to reach rural areas. TinyML aims to solve this problem by hosting AI assistants on constrained devices, eliminating connectivity issues by processing data within the device itself, without internet or cloud access. This pilot study explores the use of tinyML to provide healthcare support with low spec devices in low connectivity environments, focusing on diagnosis of skin diseases and the ethical use of AI assistants in a healthcare setting. To investigate this, 10,000 images of skin lesions were used to train a model for classifying visually detectable diseases (VDDs). The model weights were then offloaded to a Raspberry Pi with a webcam attached, to be used for the classification of skin lesions without internet access. It was found that the developed prototype achieved a test accuracy of 78% and a test loss of 1.08.


Thank you for joining us this week. Stay tuned for more insights in our next edition. Until then, happy researching! See you next week!