Arxiv Weekly Insights
Posts
#11 Arxiv Weekly Insights

#11 Arxiv Weekly Insights

Papers on Robotics, Computer Vision and Pattern Recognition, Data Structures and Algorithms, Computers and Society

John Accel
September 29, 2024

Welcome to the 11th edition of "Arxiv Weekly Insights," where we delve into the latest groundbreaking research and developments from the Arxiv repository.

This newsletter is brought to you by SmartXiv, the AI-powered personalized arXiv digest designed to enhance your research experience. With over 1000 research papers uploaded daily on arXiv, it's easy to miss important updates. Let SmartXiv deliver personalized recommendations so you never miss what truly matters to you.

The best part? SmartXiv is now completely free for 14 days, and you can cancel anytime.

Robotics
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto

This paper presents DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, DynaMo jointly learns a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space without augmentations, contrastive sampling, or access to ground truth actions.

Computer Vision and Pattern Recognition
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin

This paper presents the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process images of varying resolutions into different numbers of visual tokens.

Computer Vision and Pattern Recognition
Precise Forecasting of Sky Images Using Spatial Warping
Leron Julian, Aswin C. Sankaranarayanan

This paper presents a deep learning method to predict a future sky image frame with higher resolution than previous methods. The main contribution is to derive an optimal warping method to counter the adverse affects of clouds at the horizon and learn a framework for future sky image prediction which better determines cloud evolution for longer time horizons.

Data Structures and Algorithms
Generalized compression and compressive search of large datasets
Morgan E. Prior, Thomas Howard III, Emily Light, Najib Ishaq, Noah M. Daniels

This paper presents panCAKES, a novel approach to compressive search, i.e., a way to perform k-NN and ρ-NN search on compressed data while only decompressing a small, relevant portion of the data. panCAKES assumes the manifold hypothesis and leverages the low-dimensional structure of the data to compress and search it efficiently.

Machine Learning
Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility
Nathan Wolfrath, Joel Wolfrath, Hengrui Hu, Anjishnu Banerjee, Anai N. Kothari

This paper shows empirically that including stronger baseline models in healthcare ML evaluations has important downstream effects that aid practitioners in addressing challenges such as lack of model transparency, large training data requirements, and complicated metrics for measuring model utility.

Computers and Society
Reporting Non-Consensual Intimate Media: An Audit Study of Deepfakes
Rohan Sinha, Amine Elhafsi, Christopher Agia, Matthew Foutter, Edward Schmerling, Marco Pavone

This paper presents a study of the decision processes humans use when searching briefly presented displays having well-separated potential target-object locations. The study compares human performance with the Bayesian-optimal decision process under the assumption that the information from the different potential target locations is statistically independent.

Thank you for joining us this week. Stay tuned for more insights in our next edition. Until then, happy researching! See you next week!