LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

0upvotes

By: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong

Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper,... more

Machine LearningJune 20, 2025 1:42pm

Comments (0)
Views (5)

CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization

0upvotes

By: Ranting Hu

Offline reinforcement learning (offline RL) algorithms often require additional constraints or penalty terms to address distribution shift issues, such as adding implicit or explicit policy constraints during policy optimization to reduce the estimation bias of functions. This paper focuses on a limitation of the Advantage-Weighted Regression family (AWRs), i.e., the potential for learning over-conservative policies due to data corruption, sp... more

Machine LearningJune 19, 2025 3:16am

Comments (0)
Views (7)

Over-squashing in Spatiotemporal Graph Neural Networks

0upvotes

By: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein

Graph Neural Networks (GNNs) have achieved remarkable success across various domains. However, recent theoretical advances have identified fundamental limitations in their information propagation capabilities, such as over-squashing, where distant nodes fail to effectively exchange information. While extensively studied in static contexts, this issue remains unexplored in Spatiotemporal GNNs (STGNNs), which process sequences associated with g... more

Machine LearningJune 19, 2025 3:01am

Comments (0)
Views (8)

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

0upvotes

By: Tevin Wang, Chenyan Xiong

Rule-based rewards offer a promising strategy for improving reinforcement learning from human feedback (RLHF), but current approaches often rely on manual rule engineering. We present AutoRule, a fully automated method for extracting rules from preference feedback and formulating them into rule-based rewards. AutoRule extraction operates in three stages: it leverages a reasoning model to interpret user preferences, identifies candidate rules ... more

Machine LearningJune 19, 2025 2:43am

Comments (0)
Views (5)

RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation

0upvotes

By: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia

Retrieval-augmented generation (RAG) has become a common strategy for updating large language model (LLM) responses with current, external information. However, models may still rely on memorized training data, bypass the retrieved evidence, and produce contaminated outputs. We introduce Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects such behavior without requiring model access or retraining. RePCS compares two... more

Machine LearningJune 19, 2025 2:37am

5 SciCasts by .

Comments (0)
Views (6)

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

0upvotes

By: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

Recent advancements in reinforcement learning from human feedback have shown that utilizing fine-grained token-level reward models can substantially enhance the performance of Proximal Policy Optimization (PPO) in aligning large language models. However, it is challenging to leverage such token-level reward as guidance for Direct Preference Optimization (DPO), since DPO is formulated as a sequence-level bandit problem. To address this challen... more

Machine LearningJune 18, 2025 2:53am

Comments (0)
Views (4)

Towards Desiderata-Driven Design of Visual Counterfactual Explainers

0upvotes

By: Sidney Bender, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon

Visual counterfactual explainers (VCEs) are a straightforward and promising approach to enhancing the transparency of image classifiers. VCEs complement other types of explanations, such as feature attribution, by revealing the specific data transformations to which a machine learning model responds most strongly. In this paper, we argue that existing VCEs focus too narrowly on optimizing sample quality or change minimality; they fail to cons... more

Machine LearningJune 18, 2025 2:53am

5 SciCasts by .

Comments (0)
Views (16)

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning

0upvotes

By: Junru Zhang, Lang Feng, Xu Guo, Yuhan Wu, Yabo Dong, Duanqing Xu

Time-series reasoning remains a significant challenge in multimodal large language models (MLLMs) due to the dynamic temporal patterns, ambiguous semantics, and lack of temporal priors. In this work, we introduce TimeMaster, a reinforcement learning (RL)-based method that enables time-series MLLMs to perform structured, interpretable reasoning directly over visualized time-series inputs and task prompts. TimeMaster adopts a three-part structu... more

Machine LearningJune 17, 2025 2:39am

Comments (0)
Views (6)

Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

0upvotes

By: Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Reduan Achtibat, Patrick Kahardipraja, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

Large Language Models (LLMs) are central to many contemporary AI applications, yet their extensive parameter counts pose significant challenges for deployment in memory- and compute-constrained environments. Recent works in eXplainable AI (XAI), particularly on attribution methods, suggest that interpretability can also enable model compression by identifying and removing components irrelevant to inference. In this paper, we leverage Layer-wi... more

Machine LearningJune 17, 2025 2:38am

4 SciCasts by .

Comments (0)
Views (48)

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

0upvotes

By: Zhaiming Shen, Alexander Hsu, Rongjie Lai, Wenjing Liao

While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding--particularly in the context of structured geometric data--remains unexplored. In this work, we initiate a theoretical study of ICL for regression of H\"older functions on manifolds. By establishing a novel connection between the attention mechanism and classical kernel methods, we derive generalization error b... more

Machine LearningJune 13, 2025 4:11am

Comments (0)
Views (46)

Self-Adapting Language Models

0upvotes

By: Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, Pulkit Agrawal

Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit-a generation that may restructure the information in different ways, specify optimization hyperparame... more

Machine LearningJune 13, 2025 2:52am

Comments (0)
Views (40)

Farseer: A Refined Scaling Law in Large Language Models

0upvotes

By: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achiev... more

Machine LearningJune 13, 2025 2:51am

Comments (0)
Views (41)

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning

0upvotes

By: Julius Berner, Miguel Liu-Schiaffini, Jean Kossaifi, Valentin Duruisseaux, Boris Bonev, Kamyar Azizzadenesheli, Anima Anandkumar

A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in... more

Machine LearningJune 13, 2025 2:51am

Comments (0)
Views (50)

Canonical Latent Representations in Conditional Diffusion Models

0upvotes

By: Yitao Xu, Tong Zhang, Ehsan Pajouheshgar, Sabine Süsstrunk

Conditional diffusion models (CDMs) have shown impressive performance across a range of generative tasks. Their ability to model the full data distribution has opened new avenues for analysis-by-synthesis in downstream discriminative learning. However, this same modeling capacity causes CDMs to entangle the class-defining features with irrelevant context, posing challenges to extracting robust and interpretable representations. To this end, w... more

Machine LearningJune 12, 2025 2:34am

Comments (0)
Views (53)

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

0upvotes

By: Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen

Autoregressive Large Language Models (AR-LLMs) frequently exhibit implicit parallelism in sequential generation. Inspired by this, we introduce Multiverse, a new generative model that enables natively parallel generation. Multiverse internalizes a MapReduce paradigm, generating automatically through three stages: (i) a Map stage for adaptive task decomposition, (ii) a Process stage for parallel subtask execution, and (iii) a Reduce stage for ... more

Machine LearningJune 12, 2025 2:33am

Comments (0)
Views (51)

Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

0upvotes

By: Tim Z. Xiao, Johannes Zenn, Zhen Liu, Weiyang Liu, Robert Bamler, Bernhard Schölkopf

Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Ve... more

Machine LearningJune 12, 2025 2:32am

4 SciCasts by .