Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

0upvotes

By: Arjun Krishna, Aaditya Rastogi, Erick Galinkin

The introduction of advanced reasoning capabilities have improved the problem-solving performance of large language models, particularly on math and coding benchmarks. However, it remains unclear whether these reasoning models are more or less vulnerable to adversarial prompt attacks than their non-reasoning counterparts. In this work, we present a systematic evaluation of weaknesses in advanced reasoning models compared to similar non-reason... more

Artificial IntelligenceJune 17, 2025 2:37am

4 SciCasts by .

Comments (0)
Views (44)

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

0upvotes

By: Fei Lin, Ziyang Gong, Cong Wang, Yonglin Tian, Tengchao Zhang, Xue Yang, Gen Luo, Fei-Yue Wang

Toxicity remains a leading cause of early-stage drug development failure. Despite advances in molecular design and property prediction, the task of molecular toxicity repair - generating structurally valid molecular alternatives with reduced toxicity - has not yet been systematically defined or benchmarked. To fill this gap, we introduce ToxiMol, the first benchmark task for general-purpose Multimodal Large Language Models (MLLMs) focused on ... more

Artificial IntelligenceJune 13, 2025 2:49am

Comments (0)
Views (42)

Spurious Rewards: Rethinking Training Signals in RLVR

0upvotes

By: Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer

We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain models even with spurious rewards that have little, no, or even negative correlation with the correct answer. For example, RLVR improves MATH-500 performance for Qwen2.5-Math-7B in absolute points by 21.4% (random reward), 13.8% (format reward), 24.1% (incorrect label), 26.0% (1-shot RL), and 27.1% (majority voting) -- nearly... more

Artificial IntelligenceJune 13, 2025 2:49am

Comments (0)
Views (51)

How Do People Revise Inconsistent Beliefs? Examining Belief Revision in Humans with User Studies

0upvotes

By: Stylianos Loukas Vasileiou, Antonio Rago, Maria Vanina Martinez, William Yeoh

Understanding how humans revise their beliefs in light of new information is crucial for developing AI systems which can effectively model, and thus align with, human reasoning. While theoretical belief revision frameworks rely on a set of principles that establish how these operations are performed, empirical evidence from cognitive psychology suggests that people may follow different patterns when presented with conflicting information. In ... more

Artificial IntelligenceJune 12, 2025 2:32am

Comments (0)
Views (51)

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

0upvotes

By: Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma, Sarath Chandar, Franziska Meier, Yann LeCun, Michael Rabbat, Nicolas Ballas

A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video ... more

Artificial IntelligenceJune 12, 2025 2:31am

Comments (0)
Views (61)

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

0upvotes

By: Li Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin

Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse em... more

Artificial IntelligenceJune 11, 2025 8:26am

Comments (0)
Views (62)

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

0upvotes

By: Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi

Data science aims to extract insights from data to support decision-making processes. Recently, Large Language Models (LLMs) are increasingly used as assistants for data science, by suggesting ideas, techniques and small code snippets, or for the interpretation of results and reporting. Proper automation of some data-science activities is now promised by the rise of LLM agents, i.e., AI systems powered by an LLM equipped with additional affor... more

Artificial IntelligenceJune 11, 2025 3:27am