RESEARCH NOTES & CODES

SEARCH DATABASE

FILTER BY CATEGORY

RESEARCH ENTRIES

LANGUAGE MODELS THAT THINK, CHAT BETTER

10.20.2025

Reinforcement Learning with Model-rewarded Thinking (RLMT), a new technique that trains language models to explicitly generate detailed Chain-of-Thought reasoning before answering, making performance on open-ended chat and creative tasks better.

Reinforcement Learning LLM Alignment Chain-of-Thought RLHF Generalization GRPO

ARK-V1: An LLM-Agent for Knowledge Graph Question Answering Requiring Commonsense Reasoning

10.20.2025

ARK-V1, a novel LLM agent designed to iteratively explore Knowledge Graphs to answer complex natural language queries, particularly those requiring a blend of factual and commonsense knowledge.

LLM Agent Knowledge Graph Question Answering Commonsense Reasoning Multi-Hop Reasoning

Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning

10.13.2025

This research introduces PDDL-INSTRUCT, a novel instruction tuning framework that dramatically enhances Large Language Models’ (LLMs) ability to perform structured symbolic planning by integrating explicit logical Chain-of-Thought (CoT) reasoning and external verification feedback.

Symbolic Planning Large Language Models Instruction Tuning Chain-of-Thought PDDL