LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

0upvotes

By: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong

Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper,... more

Machine LearningJune 20, 2025 1:42pm

Comments (0)
Views (5)

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

0upvotes

By: Xinnuo Xu, Rachel Lawrence, Kshitij Dubey, Atharva Pandey, Risa Ueno, Fabian Falck, Aditya V. Nori, Rahul Sharma, Amit Sharma, Javier Gonzalez

Recent Large Language Models (LLMs) have reported high accuracy on reasoning benchmarks. However, it is still unclear whether the observed results arise from true reasoning or from statistical recall of the training set. Inspired by the ladder of causation (Pearl, 2009) and its three levels (associations, interventions and counterfactuals), this paper introduces RE-IMAGINE, a framework to characterize a hierarchy of reasoning ability in LLMs,... more

Computation and LanguageJune 19, 2025 8:26am

Comments (0)
Views (5)

CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization

0upvotes

By: Ranting Hu

Offline reinforcement learning (offline RL) algorithms often require additional constraints or penalty terms to address distribution shift issues, such as adding implicit or explicit policy constraints during policy optimization to reduce the estimation bias of functions. This paper focuses on a limitation of the Advantage-Weighted Regression family (AWRs), i.e., the potential for learning over-conservative policies due to data corruption, sp... more

Machine LearningJune 19, 2025 3:16am

Comments (0)
Views (6)

Over-squashing in Spatiotemporal Graph Neural Networks

0upvotes

By: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein

Graph Neural Networks (GNNs) have achieved remarkable success across various domains. However, recent theoretical advances have identified fundamental limitations in their information propagation capabilities, such as over-squashing, where distant nodes fail to effectively exchange information. While extensively studied in static contexts, this issue remains unexplored in Spatiotemporal GNNs (STGNNs), which process sequences associated with g... more

Machine LearningJune 19, 2025 3:01am

Comments (0)
Views (8)

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

0upvotes

By: Tevin Wang, Chenyan Xiong

Rule-based rewards offer a promising strategy for improving reinforcement learning from human feedback (RLHF), but current approaches often rely on manual rule engineering. We present AutoRule, a fully automated method for extracting rules from preference feedback and formulating them into rule-based rewards. AutoRule extraction operates in three stages: it leverages a reasoning model to interpret user preferences, identifies candidate rules ... more

Machine LearningJune 19, 2025 2:43am

Comments (0)
Views (5)

RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation

0upvotes

By: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia

Retrieval-augmented generation (RAG) has become a common strategy for updating large language model (LLM) responses with current, external information. However, models may still rely on memorized training data, bypass the retrieved evidence, and produce contaminated outputs. We introduce Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects such behavior without requiring model access or retraining. RePCS compares two... more

Machine LearningJune 19, 2025 2:37am

4 SciCasts by .

Comments (0)
Views (11)

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

0upvotes

By: Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems... more

Artificial IntelligenceJune 19, 2025 2:35am

Comments (0)
Views (6)

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

0upvotes

By: Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang

AI agents today are mostly siloed - they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action - but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-... more

Artificial IntelligenceJune 19, 2025 2:34am

Comments (0)
Views (6)

Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

0upvotes

By: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son

Since the advent of large language models, prompt engineering now enables the rapid, low-effort creation of diverse autonomous agents that are already in widespread use. Yet this convenience raises urgent concerns about the safety, robustness, and behavioral consistency of the underlying prompts, along with the pressing challenge of preventing those prompts from being exposed to user's attempts. In this paper, we propose the ''Doppelg\"anger ... more

Artificial IntelligenceJune 19, 2025 1:41am

Comments (0)
Views (4)

Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning

0upvotes

By: Prabhav Sanga, Jaskaran Singh, Arun K. Dubey

As machine learning systems increasingly rely on data subject to privacy regulation, selectively unlearning specific information from trained models has become essential. In image classification, this involves removing the influence of particular training samples, semantic classes, or visual styles without full retraining. We introduce \textbf{Forget-Aligned Model Reconstruction (FAMR)}, a theoretically grounded and computationally efficient ... more

Machine LearningJune 18, 2025 8:56am

Comments (0)
Views (7)

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

0upvotes

By: Jingqi Yang, Zhilong Song, Jiawei Chen, Mingli Song, Sheng Zhou, linjun sun, Xiaogang Ouyang, Chun Chen, Can Wang

The development of high-quality datasets is crucial for benchmarking and advancing research in Graphical User Interface (GUI) agents. Despite their importance, existing datasets are often constructed under idealized conditions, overlooking the diverse anomalies frequently encountered in real-world deployments. To address this limitation, we introduce GUI-Robust, a novel dataset designed for comprehensive GUI agent evaluation, explicitly incor... more

Artificial IntelligenceJune 18, 2025 4:57am

Comments (0)
Views (4)

Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization

0upvotes

By: Ziyu Gong, Jim Lim, David I. Inouye

Distribution matching (DM) is a versatile domain-invariant representation learning technique that has been applied to tasks such as fair classification, domain adaptation, and domain translation. Non-parametric DM methods struggle with scalability and adversarial DM approaches suffer from instability and mode collapse. While likelihood-based methods are a promising alternative, they often impose unnecessary biases through fixed priors or requ... more

Machine LearningJune 18, 2025 2:57am

Comments (0)
Views (6)

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

0upvotes

By: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

Recent advancements in reinforcement learning from human feedback have shown that utilizing fine-grained token-level reward models can substantially enhance the performance of Proximal Policy Optimization (PPO) in aligning large language models. However, it is challenging to leverage such token-level reward as guidance for Direct Preference Optimization (DPO), since DPO is formulated as a sequence-level bandit problem. To address this challen... more

Machine LearningJune 18, 2025 2:53am

Comments (0)
Views (4)

Towards Desiderata-Driven Design of Visual Counterfactual Explainers

0upvotes

By: Sidney Bender, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon

Visual counterfactual explainers (VCEs) are a straightforward and promising approach to enhancing the transparency of image classifiers. VCEs complement other types of explanations, such as feature attribution, by revealing the specific data transformations to which a machine learning model responds most strongly. In this paper, we argue that existing VCEs focus too narrowly on optimizing sample quality or change minimality; they fail to cons... more

Machine LearningJune 18, 2025 2:53am

Comments (0)
Views (4)

On the Hardness of Bandit Learning

0upvotes

By: Nataly Brukhim, Aldo Pacchiano, Miroslav Dudik, Robert Schapire

We study the task of bandit learning, also known as best-arm identification, under the assumption that the true reward function f belongs to a known, but arbitrary, function class F. We seek a general theory of bandit learnability, akin to the PAC framework for classification. Our investigation is guided by the following two questions: (1) which classes F are learnable, and (2) how they are learnable. For example, in the case of binary PAC cl... more

Machine LearningJune 18, 2025 2:52am

Comments (0)
Views (8)

From Points to Places: Towards Human Mobility-Driven Spatiotemporal Foundation Models via Understanding Places

0upvotes

By: Mohammad Hashemi, Andreas Zufle

Capturing human mobility is essential for modeling how people interact with and move through physical spaces, reflecting social behavior, access to resources, and dynamic spatial patterns. To support scalable and transferable analysis across diverse geographies and contexts, there is a need for a generalizable foundation model for spatiotemporal data. While foundation models have transformed language and vision, they remain limited in handlin... more

Artificial IntelligenceJune 18, 2025 2:52am

Comments (0)
Views (6)

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

0upvotes

By: Jiahao Qiu, Xinzhe Juan, Yimin Wang, Ling Yang, Xuan Qi, Tongcheng Zhang, Jiacheng Guo, Yifu Lu, Zixin Yao, Hongru Wang, Shilong Liu, Xun Jiang, Liu Leqi, Mengdi Wang

While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teacher tool usage, but they often struggle to train... more

Artificial IntelligenceJune 18, 2025 2:51am