Artificial Intelligence

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
Avatar
librarian
0 views
TopoBench: Benchmarking LLMs on Hard Topological Reasoning
Avatar
Mayug Maniparambil
0 views
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
Avatar
librarian
0 views
FAME: Formal Abstract Minimal Explanation for Neural Networks
Avatar
librarian
2 views
Emulating Clinician Cognition via Self-Evolving Deep Clinical Research
Avatar
Ruiyang Ren
3 views
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
Avatar
librarian
3 views
A Hybrid Knowledge-Grounded Framework for Safety and Traceability in Prescription Verification
Avatar
Yichi Zhu
3 views
OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences
Avatar
Xingjun Ma
4 views
The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?
Avatar
librarian
4 views
PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs
Avatar
librarian
3 views
AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents
Avatar
librarian
7 views
World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models
Avatar
librarian
2 views
Think Before You Lie: How Reasoning Improves Honesty
Avatar
librarian
5 views
PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution
Avatar
librarian
3 views
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants
Avatar
librarian
3 views
M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering
Avatar
Peijin Xie
8 views
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
Avatar
Xiaoying Zhang
10 views
Agentic Critical Training

Agentic Critical Training

Artificial Intelligence
Avatar
librarian
11 views
Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs
Avatar
librarian
8 views
CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
Avatar
Dengcan Liu
6 views
In-Context Reinforcement Learning for Tool Use in Large Language Models
Avatar
librarian
9 views
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
Avatar
librarian
8 views
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
Avatar
librarian
65 views
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Avatar
Elita Lobo
44 views
X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Avatar
librarian
19 views
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
Avatar
Benjamin Feuer
22 views
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
Avatar
librarian
21 views
Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions
Avatar
Bryan Hooi
31 views
Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows
Avatar
librarian
20 views
$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
Avatar
librarian
25 views
In-Context Environments Induce Evaluation-Awareness in Language Models
Avatar
librarian
16 views
Phi-4-reasoning-vision-15B Technical Report
Avatar
librarian
14 views