Updated on 2026/04/02 10:25:36
Trading
π 2026/03/30
- Model Predictive Control For Trade Execution
-
Thomas P. McAuliffe et al. 2603.28898v1 -
Abstract
We address the problem of executing large client orders in continuous double-auction markets under time and liquidity constraints. We propose a model predictive control (MPC) framework that balances three competing objectives: order completion, market impact, and opportunity cost. Our algorithm is guided by a trading schedule (such as time-weighted average price or volume-weighted average price) but allows for deviations to reduce the expected execution cost, with due regard to risk. Our MPC algorithm executes the order progressively, and at each decision step it solves a fast quadratic program that trades off expected transaction cost against schedule deviation, while incorporating a residual cost term derived from a simple base policy. Approximate schedule adherence is maintained through explicit bounds, while variance constraints on deviation provide direct risk control. The resulting system is modular, data-driven, and suitable for deployment in production trading infrastructure. Using six months of NASDAQ 'level 3' data and simulated orders, we show that our MPC approach reduces schedule shortfall by approximately 40-50% relative to spread-crossing benchmarks and achieves significant reductions in slippage. Moreover, augmenting the base policy with predictive price information further enhances performance, highlighting the framework's flexibility for integration with forecasting components.
-
π 2026/03/26
- Shifting Correlations: How Trade Policy Uncertainty Alters stock-T bill Relationships
-
Demetrio Lacava 2603.25285v1 -
Abstract
This paper examines how trade policy uncertainty influences the correlation between U.S. stock indices and short-term government bonds. The objective is to assess whether policy-related shocks, especially those linked to trade tensions, alter the traditional stock-T bill relationship and its implications for investors. We extend the Dynamic Conditional Correlation (DCC) framework by incorporating exogenous variables to account for external shocks. Three specifications are analyzed: one using the Trade Policy Uncertainty (TPU) index, one including a dummy variable reflecting presidential-cycle effects, and one combining both through an interaction term. The analysis is based on daily data for major U.S. stock indices and the 3-month Treasury bill. Results indicate that trade policy uncertainty exerts a significant effect on stock-T bill correlations. Moreover, its influence becomes stronger under specific political conditions, suggesting that political agendas can amplify the impact of trade-related shocks on financial markets. Crucially, augmenting the DCC framework with trade-policy-related variables improves also the economic relevance of correlation forecasts. Therefore, this study contributes to the literature by explicitly integrating policy-related uncertainty into correlation modeling through an augmented DCC framework. The findings provide new insights for portfolio allocation and risk management in environments characterized by heightened trade tensions.
-
π 2026/03/22
- FinRL-X: An AI-Native Modular Infrastructure for Quantitative Trading
-
Hongyang Yang et al. 2603.21330v1 -
Abstract
We present FinRL-X, a modular and deployment-consistent trading architecture that unifies data processing, strategy construction, backtesting, and broker execution under a weight-centric interface. While existing open-source platforms are often backtesting- or model-centric, they rarely provide system-level consistency between research evaluation and live deployment. FinRL-X addresses this gap through a composable strategy pipeline that integrates stock selection, portfolio allocation, timing, and portfolio-level risk overlays within a unified protocol. The framework supports both rule-based and AI-driven components, including reinforcement learning allocators and LLM-based sentiment signals, without altering downstream execution semantics. FinRL-X provides an extensible foundation for reproducible, end-to-end quantitative trading research and deployment. The official FinRL-X implementation is available at https://github.com/AI4Finance-Foundation/FinRL-Trading.
-
π 2026/03/20
- Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading
-
Nabeel Ahmad Saidd 2604.00031v1 -
Abstract
Applying reinforcement learning (RL) to foreign exchange (Forex) trading remains challenging because realistic environments, well-defined reward functions, and expressive action spaces must be satisfied simultaneously, yet many prior studies rely on simplified simulators, single scalar rewards, and restricted action representations, limiting both interpretability and practical relevance. This paper presents a modular RL framework designed to address these limitations through three tightly integrated components: a friction-aware execution engine that enforces strict anti-lookahead semantics, with observations at time t, execution at time t+1, and mark-to-market at time t+1, while incorporating realistic costs such as spread, commission, slippage, rollover financing, and margin-triggered liquidation; a decomposable 11-component reward architecture with fixed weights and per-step diagnostic logging to enable systematic ablation and component-level attribution; and a 10-action discrete interface with legal-action masking that encodes explicit trading primitives while enforcing margin-aware feasibility constraints. Empirical evaluation on EURUSD focuses on learning dynamics rather than generalization and reveals strongly non-monotonic reward interactions, where additional penalties do not reliably improve outcomes; the full reward configuration achieves the highest training Sharpe (0.765) and cumulative return (57.09 percent). The expanded action space increases return but also turnover and reduces Sharpe relative to a conservative 3-action baseline, indicating a return-activity trade-off under a fixed training budget, while scaling-enabled variants consistently reduce drawdown, with the combined configuration achieving the strongest endpoint performance.
-
π 2026/03/18
- Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization
-
Joohyoung Jeon et al. 2603.17692v1 -
Abstract
For LLM trading agents to be genuinely trustworthy, they must demonstrate understanding of market dynamics rather than exploitation of memorized ticker associations. Building responsible multi-agent systems demands rigorous signal validation: proving that predictions reflect legitimate patterns, not pre-trained recall. We address two sources of spurious performance: memorization bias from ticker-specific pre-training, and survivorship bias from flawed backtesting. Our approach is to blindfold the agents--anonymizing all identifiers--and verify whether meaningful signals persist. BlindTrade anonymizes tickers and company names, and four LLM agents output scores along with reasoning. We construct a GNN graph from reasoning embeddings and trade using PPO-DSR policy. On 2025 YTD (through 2025-08-01), we achieved Sharpe 1.40 +/- 0.22 across 20 seeds and validated signal legitimacy through negative control experiments. To assess robustness beyond a single OOS window, we additionally evaluate an extended period (2024--2025), revealing market-regime dependency: the policy excels in volatile conditions but shows reduced alpha in trending bull markets.
-
π 2026/03/04
- Quantum-Assisted Optimal Rebalancing with Uncorrelated Asset Selection for Algorithmic Trading Walk-Forward QUBO Scheduling via QAOA
-
Abraham Itzhak Weinberg 2603.16904v1 -
Abstract
We present a hybrid classical-quantum framework for portfolio construction and rebalancing. Asset selection is performed using Ledoit-Wolf shrinkage covariance estimation combined with hierarchical correlation clustering to extract n = 10 decorrelated stocks from the S&P 500 universe without survivorship bias. Portfolio weights are optimised via an entropy-regularised Genetic Algorithm (GA) accelerated on GPU, alongside closed-form minimum-variance and equal-weight benchmarks. Our primary contribution is the formulation of the portfolio rebalancing schedule as a Quadratic Unconstrained Binary Optimisation (QUBO) problem. The resulting combinatorial optimisation task is solved using the Quantum Approximate Optimisation Algorithm (QAOA) within a walk-forward framework designed to eliminate lookahead bias. This approach recasts dynamic rebalancing as a structured binary scheduling problem amenable to variational quantum methods. Backtests on S&P 500 data (training: 2010-2024; out-of-sample test: 2025, n = 249 trading days) show that the GA + QAOA strategy attains a Sharpe ratio of 0.588 and total return of 10.1%, modestly outperforming the strongest classical baseline (GA with 10-day periodic rebalancing, Sharpe 0.575) while executing 8 rebalances versus 24, corresponding to a 44.5% reduction in transaction costs. Multi-restart QAOA (4096 measurement shots per run) exhibits concentrated probability mass on high-quality schedules, indicating stable convergence of the variational procedure. These findings suggest that hybrid classical-quantum architectures can reduce turnover in portfolio rebalancing while preserving competitive risk-adjusted performance, providing a structured testbed for near-term quantum optimisation in financial applications.
-
π 2026/02/27
- TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure
-
Maxime Kawawa-Beaudan et al. 2602.23784v1 -
Abstract
Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions of trade events across >9K equities. To enable cross-asset generalization, we develop scale-invariant features and a universal tokenization scheme that map the heterogeneous, multi-modal event stream of order flow into a unified discrete sequence -- eliminating asset-specific calibration. Integrated with a deterministic market simulator, TradeFM-generated rollouts reproduce key stylized facts of financial returns, including heavy tails, volatility clustering, and absence of return autocorrelation. Quantitatively, TradeFM achieves 2-3x lower distributional error than Compound Hawkes baselines and generalizes zero-shot to geographically out-of-distribution APAC markets with moderate perplexity degradation. Together, these results suggest that scale-invariant trade representations capture transferable structure in market microstructure, opening a path toward synthetic data generation, stress testing, and learning-based trading agents.
-
π 2026/02/26
- Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
-
Kunihiro Miyazaki et al. 2602.23330v1 -
Abstract
The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and less transparent decision-making. Therefore, we propose a multi-agent LLM trading framework that explicitly decomposes investment analysis into fine-grained tasks, rather than providing coarse-grained instructions. We evaluate the proposed framework using Japanese stock data, including prices, financial statements, news, and macro information, under a leakage-controlled backtesting setting. Experimental results show that fine-grained task decomposition significantly improves risk-adjusted returns compared to conventional coarse-grained designs. Crucially, further analysis of intermediate agent outputs suggests that alignment between analytical outputs and downstream decision preferences is a critical driver of system performance. Moreover, we conduct standard portfolio optimization, exploiting low correlation with the stock index and the variance of each system's output. This approach achieves superior performance. These findings contribute to the design of agent structure and task configuration when applying LLM agents to trading systems in practical settings.
-
π 2026/02/24
- An Infinite-Dimensional Insider Trading Game
-
Christian Keller et al. 2602.21125v3 -
Abstract
We generalize the seminal framework of Kyle (1985) to a many-asset setting, bridging the gap between informed-trading theory and modern trading practices. Specifically, we formulate an infinite-dimensional Bayesian trading game in which the informed trader's private information may concern arbitrary aspects of the cross-sectional payoff structure across a continuum of traded assets. In this general setting, we obtain a parsimonious equilibrium characterized by a single scalar fixed point, which yields closed-form characterizations of equilibrium trading strategy, price impact within and across markets, and the information efficiency of equilibrium prices.
-
π 2026/02/23
- Detecting and Explaining Unlawful Insider Trading: A Shapley Value and Causal Forest Approach to Identifying Key Drivers and Causal Relationships
-
Krishna Neupane et al. 2602.19841v1 -
Abstract
Corporate insiders trade for diverse reasons, often possessing Material Non-Public Information (MNPI). Determining whether specific trades leverage MNPI is a significant challenge due to inherent complexity. This study focuses on two critical objectives: accurately detecting Unlawful Insider Trading (UIT) and identifying key features explaining classification. The analysis demonstrates how combining Shapley Values (SHAP) and Causal Forest (CF) reveals these explanatory drivers. The findings underscore the necessity of causality in identifying and interpreting UIT, requiring the consideration of alternative scenarios and potential outcomes. Within a high-dimensional feature space, the proposed architecture integrates state-of-the-art techniques to achieve high classification accuracy. The framework provides robust feature rankings via SHAP and causal significance assessments through CF, facilitating the discovery of unique causal relationships. Statistically significant relationships are documented between the outcome and several key features, including director status, price-to-book ratio, return, and market beta. These features significantly influence the likelihood of UIT, suggesting potential links between insider behavior and factors such as information asymmetry, valuation risk, market volatility, and stock performance. The analysis draws attention to the complexities of financial causality, noting that while initial descriptors offer intuitive insights, deeper examination is required to understand nuanced impacts. These findings reaffirm the architectural flexibility of decision tree models. By incorporating heterogeneity during tree construction, these models effectively uncover latent structures within trade, finance, and governance data, characterizing fraudulent behavior while maintaining reliable results.
-
π 2026/02/21
- Overreaction as an indicator for momentum in algorithmic trading: A Case of AAPL stocks
-
Szymon Lis et al. 2602.18912v1 -
Abstract
This paper investigates whether short-term market overreactions can be systematically predicted and monetized as momentum signals using high-frequency emotional information and modern machine learning methods. Focusing on Apple Inc. (AAPL), we construct a comprehensive intraday dataset that combines volatility normalized returns with transformer-based emotion features extracted from Twitter messages. Overreactions are defined as extreme return realizations relative to contemporaneous volatility and transaction costs and are modeled as a three-class prediction problem. We evaluate the performance of several nonlinear classifiers, including XGBoost, Random Forests, Deep Neural Networks, and Bidirectional LSTMs, across multiple intraday frequencies (1, 5, 10, and 15 minute data). Model outputs are translated into trading strategies and assessed using risk-adjusted performance measures and formal statistical tests. The results show that machine learning models significantly outperform benchmark overreaction rules at ultra short horizons, while classical behavioral momentum effects dominate at intermediate frequencies, particularly around 10 minutes. Explainability analysis based on SHAP reveals that volatility and negative emotions, especially fear and sadness, play a central role in driving predicted overreactions. Overall, the findings demonstrate that emotion-driven overreactions contain a predictable structure that can be exploited by machine learning models, offering new insights into the behavioral origins of intraday momentum and the interaction between sentiment, volatility, and algorithmic trading.
-
π 2026/02/11
- Trading in CEXs and DEXs with Priority Fees and Stochastic Delays
-
Philippe Bergault et al. 2602.10798v2 -
Abstract
We develop a mixed control framework that combines absolutely continuous controls with impulse interventions subject to stochastic execution delays. The model extends current impulse control formulations by allowing (i) the controller to choose the mean of the stochastic delay of their impulses, and allowing (ii) for multiple pending orders, so that several impulses can be submitted and executed asynchronously at random times. The framework is motivated by an optimal trading problem between centralized (CEX) and decentralized (DEX) exchanges. In DEXs, traders control the distribution of the execution delay through the priority fee paid, introducing a fundamental trade-off between delays, uncertainty, and costs. We study the optimal trading problem of an agent exploiting trading signals in CEXs and DEXs. From a mathematical perspective, we derive the associated dynamic programming principle of this new class of impulse control problems, and establish the viscosity properties of the corresponding quasi-variational inequalities. From a financial perspective, our model provides insights on how to carry out execution across CEXs and DEXs, highlighting how traders manage latency risk optimally through priority fee selection. We show that employing the optimal priority fee has a significant outperformance over non-strategic fee selection.
-
- A novel approach to trading strategy parameter optimization using double out-of-sample data and walk-forward techniques
-
Tomasz Mroziewicz et al. 2602.10785v1 -
Abstract
This study introduces a novel approach to walk-forward optimization by parameterizing the lengths of training and testing windows. We demonstrate that the performance of a trading strategy using the Exponential Moving Average (EMA) evaluated within a walk-forward procedure based on the Robust Sharpe Ratio is highly dependent on the chosen window size. We investigated the strategy on intraday Bitcoin data at six frequencies (1 minute to 60 minutes) using 81 combinations of walk-forward window lengths (1 day to 28 days) over a 19-month training period. The two best-performing parameter sets from the training data were applied to a 21-month out-of-sample testing period to ensure data independence. The strategy was only executed once during the testing period. To further validate the framework, strategy parameters estimated on Bitcoin were applied to Binance Coin and Ethereum. Our results suggest the robustness of our custom approach. In the training period for Bitcoin, all combinations of walk-forward windows outperformed a Buy-and-Hold strategy. During the testing period, the strategy performed similarly to Buy-and-Hold but with lower drawdown and a higher Information Ratio. Similar results were observed for Binance Coin and Ethereum. The real strength was demonstrated when a portfolio combining Buy-and-Hold with our strategies outperformed all individual strategies and Buy-and-Hold alone, achieving the highest overall performance and a 50 percent reduction in drawdown. A conservative fee of 0.1 percent per transaction was included in all calculations. A cost sensitivity analysis was performed as a sanity check, revealing that the strategy's break-even point was around 0.4 percent per transaction. This research highlights the importance of optimizing walk-forward window lengths and emphasizing the value of single-time out-of-sample testing for reliable strategy evaluation.
-
π 2026/02/10
- AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models
-
Wentao Zhang et al. 2602.18481v1 -
Abstract
The rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge tests to interactive trading simulations. However, current evaluations of real-time trading performance overlook a critical failure mode: severe behavioral instability in sequential decision-making under uncertainty. We empirically show that LLM-based trading agents exhibit extreme run-to-run variance, inconsistent action sequences even under deterministic decoding, and irrational action flipping across adjacent time steps. These issues stem from stateless autoregressive architectures lacking persistent action memory, as well as sensitivity to continuous-to-discrete action mappings in portfolio allocation. As a result, many existing financial trading benchmarks produce unreliable, non-reproducible, and uninformative evaluations. To address these limitations, we propose AlphaForgeBench, a principled framework that reframes LLMs as quantitative researchers rather than execution agents. Instead of emitting trading actions, LLMs generate executable alpha factors and factor-based strategies grounded in financial reasoning. This design decouples reasoning from execution, enabling fully deterministic and reproducible evaluation while aligning with real-world quantitative research workflows. Experiments across multiple state-of-the-art LLMs show that AlphaForgeBench eliminates execution-induced instability and provides a rigorous benchmark for assessing financial reasoning, strategy formulation, and alpha discovery.
-
π 2026/02/04
- LLM as a Risk Manager: LLM Semantic Filtering for Lead-Lag Trading in Prediction Markets
-
Sumin Kim et al. 2602.07048v2 -
Abstract
Prediction markets provide a unique setting where event-level time series are directly tied to natural-language descriptions, yet discovering robust lead-lag relationships remains challenging due to spurious statistical correlations. We propose a hybrid two-stage causal screener to address this challenge: (i) a statistical stage that uses Granger causality to identify candidate leader-follower pairs from market-implied probability time series, and (ii) an LLM-based semantic stage that re-ranks these candidates by assessing whether the proposed direction admits a plausible economic transmission mechanism based on event descriptions. Because causal ground truth is unobserved, we evaluate the ranked pairs using a fixed, signal-triggered trading protocol that maps relationship quality into realized profit and loss (PnL). On Kalshi Economics markets, our hybrid approach consistently outperforms the statistical baseline. Across rolling evaluations, the win rate increases from 51.4% to 54.5%. Crucially, the average magnitude of losing trades decreases substantially from 649 USD to 347 USD. This reduction is driven by the LLM's ability to filter out statistically fragile links that are prone to large losses, rather than relying on rare gains. These improvements remain stable across different trading configurations, indicating that the gains are not driven by specific parameter choices. Overall, the results suggest that LLMs function as semantic risk managers on top of statistical discovery, prioritizing lead-lag relationships that generalize under changing market conditions.
-
π 2026/02/02
- Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
-
Zeping Li et al. 2602.07023v2 -
Abstract
Recent works have increasingly applied Large Language Models (LLMs) as agents in financial stock market simulations to test if micro-level behaviors aggregate into macro-level phenomena. However, a crucial question arises: Do LLM agents' behaviors align with real market participants? This alignment is key to the validity of simulation results. To explore this, we select a financial stock market scenario to test behavioral consistency. Investors are typically classified as fundamental or technical traders, but most simulations fix strategies at initialization, failing to reflect real-world trading dynamics. In this work, we assess whether agents' strategy switching aligns with financial theory, providing a framework for this evaluation. We operationalize four behavioral-finance drivers-loss aversion, herding, wealth differentiation, and price misalignment-as personality traits set via prompting and stored long-term. In year-long simulations, agents process daily price-volume data, trade under a designated style, and reassess their strategy every 10 trading days. We introduce four alignment metrics and use Mann-Whitney U tests to compare agents' style-switching behavior with financial theory. Our results show that recent LLMs' switching behavior is only partially consistent with behavioral-finance theories, highlighting the need for further refinement in aligning agent behavior with financial theory.
-
π 2026/01/29
- Trade uncertainty impact on stock-bond correlations: Insights from conditional correlation models
-
Demetrio Lacava et al. 2601.21447v1 -
Abstract
This paper investigates the impact of Trade Policy Uncertainty (TPU) on stock-bond correlation dynamics in the United States. Using daily data on major U.S. stock indices and the 10-year Treasury bond from 2015 to 2025, we estimate correlation within a two-step GARCH-based framework, relying on multivariate specifications, including Constant Conditional Correlation (CCC), Smooth Transition Conditional Correlation (STCC), and Dynamic Conditional Correlation (DCC) models. We extend these frameworks by incorporating TPU index and a presidential dummy to capture effects of trade uncertainty and government cycles. The findings show that constant correlation models are strongly rejected in favor of time-varying specifications. Both STCC and DCC models confirm TPU's central role in driving correlation dynamics, with significant differences across political regimes. DCC models augmented with TPU and political effects deliver the best in-sample fit and strongest forecasting performance, as measured by statistical and economic loss functions.
-
π 2026/01/28
- PredictionMarketBench: A SWE-bench-Style Framework for Backtesting Trading Agents on Prediction Markets
-
Avi Arora et al. 2602.00133v1 -
Abstract
Prediction markets offer a natural testbed for trading agents: contracts have binary payoffs, prices can be interpreted as probabilities, and realized performance depends critically on market microstructure, fees, and settlement risk. We introduce PredictionMarketBench, a SWE-bench-style benchmark for evaluating algorithmic and LLM-based trading agents on prediction markets via deterministic, event-driven replay of historical limit-order-book and trade data. PredictionMarketBench standardizes (i) episode construction from raw exchange streams (orderbooks, trades, lifecycle, settlement), (ii) an execution-realistic simulator with maker/taker semantics and fee modeling, and (iii) a tool-based agent interface that supports both classical strategies and tool-calling LLM agents with reproducible trajectories. We release four Kalshi-based episodes spanning cryptocurrency, weather, and sports. Baseline results show that naive trading agents can underperform due to transaction costs and settlement losses, while fee-aware algorithmic strategies remain competitive in volatile episodes.
-
π 2026/01/27
- Generating Alpha: A Hybrid AI-Driven Trading System Integrating Technical Analysis, Machine Learning and Financial Sentiment for Regime-Adaptive Equity Strategies
-
Varun Narayan Kannan Pillai et al. 2601.19504v1 -
Abstract
The intricate behavior patterns of financial markets are influenced by fundamental, technical, and psychological factors. During times of high volatility and regime shifts causes many traditional strategies like trend-following or mean-reversion to fail. This paper proposes a hybrid AI-based trading strategy that combines (1) trend-following and directional momentum capture via EMA and MACD, (2) detection of price normalization through mean-reversion using RSI and Bollinger Bands, (3) market psychological interpretation through sentiment analysis using FinBERT, (4) signal generation through machine learning using XGBoost and (5)dynamically adjusting exposure with market regime filtering based on volatility and return environments. The system achieved a final portfolio value of $235,492.83, yielding a return of 135.49% on initial investment over a period of 24 months. The hybrid model outperformed major benchmark indexes like S&P 500 and NASDAQ-100 over the same period showing strong flexibility and lower downside risk with superior profits validating the use of multi-modal AI in algorithmic trading.
-
π 2026/01/22
- The GT-Score: A Robust Objective Function for Reducing Overfitting in Data-Driven Trading Strategies
-
Alexander Sheppert 2602.00080v1 -
Abstract
Overfitting remains a critical challenge in data-driven financial modeling, where machine learning (ML) systems learn spurious patterns in historical prices and fail out of sample and in deployment. This paper introduces the GT-Score, a composite objective function that integrates performance, statistical significance, consistency, and downside risk to guide optimization toward more robust trading strategies. This approach directly addresses critical pitfalls in quantitative strategy development, specifically data snooping during optimization and the unreliability of statistical inference under non-normal return distributions. Using historical stock data for 50 S&P 500 companies spanning 2010-2024, we conduct an empirical evaluation that includes walk-forward validation with nine sequential time splits and a Monte Carlo study with 15 random seeds across three trading strategies. In walk-forward validation, GT-Score improves the generalization ratio (validation return divided by training return) by 98% relative to baseline objective functions. Paired statistical tests on Monte Carlo out-of-sample returns indicate statistically detectable differences between objective functions (p < 0.01 for comparisons with Sortino and Simple), with small effect sizes. These results suggest that embedding an anti-overfitting structure into the objective can improve the reliability of backtests in quantitative research. Reproducible code and processed result files are provided as supplementary materials.
-
π 2026/01/19
- A Learnable Wavelet Transformer for Long-Short Equity Trading and Risk-Adjusted Return Optimization
-
Shuozhe Li et al. 2601.13435v4 -
Abstract
Learning profitable intraday trading policies from financial time series is challenging due to heavy noise, non-stationarity, and strong cross-sectional dependence among related assets. We propose \emph{WaveLSFormer}, a learnable wavelet-based long-short Transformer that jointly performs multi-scale decomposition and return-oriented decision learning. Unlike standard time-series forecasting that optimizes prediction error and typically requires a separate position-sizing or portfolio-construction step, our model directly outputs a market-neutral long/short portfolio and is trained end-to-end on a trading objective with risk-aware regularization. Specifically, a learnable wavelet front-end generates low-/high-frequency components via an end-to-end trained filter bank, guided by spectral regularizers that encourage stable and well-separated frequency bands. To fuse multi-scale information, we introduce a low-guided high-frequency injection (LGHI) module that refines low-frequency representations with high-frequency cues while controlling training stability. The model outputs a portfolio of long/short positions that is rescaled to satisfy a fixed risk budget and is optimized directly with a trading objective and risk-aware regularization. Extensive experiments on five years of hourly data across six industry groups, evaluated over ten random seeds, demonstrate that WaveLSFormer consistently outperforms MLP, LSTM and Transformer backbones, with and without fixed discrete wavelet front-ends. On average in all industries, WaveLSFormer achieves a cumulative overall strategy return of $0.607 \pm 0.045$ and a Sharpe ratio of $2.157 \pm 0.166$, substantially improving both profitability and risk-adjusted returns over the strongest baselines.
-
π 2026/01/14
- Bayesian Robust Financial Trading with Adversarial Synthetic Market Data
-
Haochong Xia et al. 2601.17008v1 -
Abstract
Algorithmic trading relies on machine learning models to make trading decisions. Despite strong in-sample performance, these models often degrade when confronted with evolving real-world market regimes, which can shift dramatically due to macroeconomic changes-e.g., monetary policy updates or unanticipated fluctuations in participant behavior. We identify two challenges that perpetuate this mismatch: (1) insufficient robustness in existing policy against uncertainties in high-level market fluctuations, and (2) the absence of a realistic and diverse simulation environment for training, leading to policy overfitting. To address these issues, we propose a Bayesian Robust Framework that systematically integrates a macro-conditioned generative model with robust policy learning. On the data side, to generate realistic and diverse data, we propose a macro-conditioned GAN-based generator that leverages macroeconomic indicators as primary control variables, synthesizing data with faithful temporal, cross-instrument, and macro correlations. On the policy side, to learn robust policy against market fluctuations, we cast the trading process as a two-player zero-sum Bayesian Markov game, wherein an adversarial agent simulates shifting regimes by perturbing macroeconomic indicators in the macro-conditioned generator, while the trading agent-guided by a quantile belief network-maintains and updates its belief over hidden market states. The trading agent seeks a Robust Perfect Bayesian Equilibrium via Bayesian neural fictitious self-play, stabilizing learning under adversarial market perturbations. Extensive experiments on 9 financial instruments demonstrate that our framework outperforms 9 state-of-the-art baselines. In extreme events like the COVID, our method shows improved profitability and risk management, offering a reliable solution for trading under uncertain and shifting market dynamics.
-
π 2026/01/13
- Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning
-
Yichen Luo et al. 2601.08641v3 -
Abstract
Copy trading has become the dominant entry strategy in meme coin markets. However, due to the market's extremely illiquid and volatile nature, the strategy exposes an exploitable attack surface: adversaries deploy manipulative bots to front-run trades, conceal positions, and fabricate sentiment, systematically extracting value from naΓ―ve copiers at scale. Despite its prevalence, bot-driven manipulation remains largely unexplored, and no robust defensive framework exists. We propose a manipulation-resistant copy-trading system based on a multi-agent architecture powered by a multi-modal large language model (LLM) and chain-of-thought (CoT) reasoning. Our approach outperforms zero-shot and most statistic-driven baselines in prediction accuracy as well as all baselines in economic performance, achieving an average copier return of 3% per meme coin investment under realistic market frictions. Overall, our results demonstrate the effectiveness of agent-based defenses and predictability of trader profitability in adversarial meme coin markets, providing a practical foundation for robust copy trading.
-
π 2026/01/10
- Cross-Market Alpha: Testing Short-Term Trading Factors in the U.S. Market via Double-Selection LASSO
-
Jin Du et al. 2601.06499v1 -
Abstract
Current asset pricing research exhibits a significant gap: a lack of sufficient cross-market validation regarding short-term trading-based factors. Against this backdrop, the development of the Chinese A-share market which is characterized by its retail-investor dominance, policy sensitivity, and high-frequency active trading -- has given rise to specific short-term trading-based factors. This study systematically examines the universality of factors from the Alpha191 library in the U.S. market, addressing the challenge of high-dimensional factor screening through the double-selection LASSO algorithm an established method for cross-market, high-dimensional research. After controlling for 151 fundamental factors from the U.S. equity factor zoo, 17 Alpha191 factors selected by this procedure exhibit significant incremental explanatory power for the cross-section of U.S. stock returns at the 5% level. Together these findings demonstrate that short-term trading-based factors, originating from the unique structure of the Chinese A-share market, provide incremental information not captured by existing mainstream pricing models, thereby enhancing the explanation of cross-sectional return differences.
-
π 2026/01/09
- Utility-Weighted Forecasting and Calibration for Risk-Adjusted Decisions under Trading Frictions
-
Craig S Wright 2601.07852v1 -
Abstract
Forecasting accuracy is routinely optimised in financial prediction tasks even though investment and risk-management decisions are executed under transaction costs, market impact, capacity limits, and binding risk constraints. This paper treats forecasting as an econometric input to a constrained decision problem. A predictive distribution induces a decision rule through a utility objective combined with an explicit friction operator consisting of both a cost functional and a feasible-set constraint system. The econometric target becomes minimisation of expected decision loss net of costs rather than minimisation of prediction error. The paper develops a utility-weighted calibration criterion aligned to the decision loss and establishes sufficient conditions under which calibrated predictive distributions weakly dominate uncalibrated alternatives. An empirical study using a pre-committed nested walk-forward protocol on liquid equity index futures confirms the theory: the proposed utility-weighted calibration reduces realised decision loss by over 30\% relative to an uncalibrated baseline ($t$-stat -30.31) for loss differential and improves the Sharpe ratio from -3.62 to -2.29 during a drawdown regime. The mechanism is identified as a structural reduction in the frequency of binding constraints (from 16.0\% to 5.1\%), preventing the "corner solution" failures that characterize overconfident forecasts in high-friction environments.
-
π 2026/01/08
- Trading Electrons: Predicting DART Spread Spikes in ISO Electricity Markets
-
Emma Hubert et al. 2601.05085v2 -
Abstract
We study the problem of forecasting and optimally trading day-ahead versus real-time (DART) price spreads in U.S. wholesale electricity markets. Building on the framework of Galarneau-Vincent et al., we extend spike prediction from a single zone to a multi-zone setting and treat both positive and negative DART spikes within a unified statistical model. To translate directional signals into economically meaningful positions, we develop a structural and market-consistent price impact model based on day-ahead bid stacks. This yields closed-form expressions for the optimal vector of zonal INC/DEC quantities, capturing asymmetric buy/sell impacts and cross-zone congestion effects. When applied to NYISO, the resulting impact-aware strategy significantly improves the risk-return profile relative to unit-size trading and highlights substantial heterogeneity across markets and seasons.
-
π 2026/01/07
- Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification
-
Rui Sun et al. 2601.03948v2 -
Abstract
Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to achieve remarkable reasoning in domains like mathematics and coding, where verifiable rewards provide clear signals. However, extending this paradigm to financial decision is challenged by the market's stochastic nature: rewards are verifiable but inherently noisy, causing standard RL to degenerate into reward hacking. To address this, we propose Trade-R1, a model training framework that bridges verifiable rewards to stochastic environments via process-level reasoning verification. Our key innovation is a verification method that transforms the problem of evaluating reasoning over lengthy financial documents into a structured Retrieval-Augmented Generation (RAG) task. We construct a triangular consistency metric, assessing pairwise alignment between retrieved evidence, reasoning chains, and decisions to serve as a validity filter for noisy market returns. We explore two reward integration strategies: Fixed-effect Semantic Reward (FSR) for stable alignment signals, and Dynamic-effect Semantic Reward (DSR) for coupled magnitude optimization. Experiments on different country asset selection demonstrate that our paradigm reduces reward hacking, with DSR achieving superior cross-market generalization while maintaining the highest reasoning consistency.
-
π 2026/01/06
- Trading with market resistance and concave price impact
-
Nathan De Carvalho et al. 2601.03215v2 -
Abstract
We consider an optimal trading problem under a market impact model with endogenous market resistance generated by a sophisticated trader who (partially) detects metaorders and trades against them to exploit price overreactions induced by the order flow. The model features a concave transient impact driven by a power-law propagator with a resistance term responding to the trader's rate via a fixed-point equation involving a general resistance function. We derive a (non)linear stochastic Fredholm equation as the first-order optimality condition satisfied by optimal trading strategies. Existence and uniqueness of the optimal control are established when the resistance function is linear, and an existence result is obtained when it is strictly convex using coercivity and weak lower semicontinuity of the associated profit-and-loss functional. We also propose an iterative scheme to solve the nonlinear stochastic Fredholm equation and prove an exponential convergence rate. Numerical experiments confirm this behavior and illustrate optimal round-trip strategies under "buy" signals with various decay profiles and different market resistance specifications.
-
π 2025/12/21
- Needles in a haystack: using forensic network science to uncover insider trading
-
Gian Jaeger et al. 2512.18918v1 -
Abstract
Although the automation and digitisation of anti-financial crime investigation has made significant progress in recent years, detecting insider trading remains a unique challenge, partly due to the limited availability of labelled data. To address this challenge, we propose using a data-driven networks approach that flags groups of corporate insiders who report coordinated transactions that are indicative of insider trading. Specifically, we leverage data on 2.9 million trades reported to the U.S. Securities and Exchange Commission (SEC) by company insiders (C-suite executives, board members and major shareholders) between 2014 and 2024. Our proposed algorithm constructs weighted edges between insiders based on the temporal similarity of their trades over the 10-year timeframe. Within this network we then uncover trends that indicate insider trading by focusing on central nodes and anomalous subgraphs. To highlight the validity of our approach we evaluate our findings with reference to two null models, generated by running our algorithm on synthetic empirically calibrated and shuffled datasets. The results indicate that our approach can be used to detect pairs or clusters of insiders whose behaviour suggests insider trading and/or market manipulation.
-
π 2025/12/16
- Sources and Nonlinearity of High Volume Return Premium: An Empirical Study on the Differential Effects of Investor Identity versus Trading Intensity (2020-2024)
-
Sungwoo Kang 2512.14134v2 -
Abstract
Chae and Kang (2019, \textit{Pacific-Basin Finance Journal}) documented a puzzling Low Volume Return Premium (LVRP) in Korea -- contradicting global High Volume Return Premium (HVRP) evidence. We resolve this puzzle. Using Korean market data (2020-2024), we demonstrate that HVRP exists in Korea but is masked by (1) pooling heterogeneous investor types and (2) using inappropriate intensity normalization. When institutional buying intensity is normalized by market capitalization rather than trading value, a perfect monotonic relationship emerges: highest-conviction institutional buying (Q4) generates +\institutionLedQFourDayPlusFiftyCAR\ cumulative abnormal returns over 50 days, while lowest-intensity trades (Q1) yield modest returns (+\institutionLedQOneDayPlusFiftyCAR). Retail investors exhibit a flat pattern -- their trading generates near-zero returns regardless of conviction level -- confirming the pure noise trader hypothesis. During the Donghak Ant Movement (2020-2021), however, coordinated retail investors temporarily transformed from noise traders to liquidity providers, generating returns comparable to institutional trading. Our findings reconcile conflicting international evidence and demonstrate that detecting informed trading signals requires investor-type decomposition, nonlinear quartile analysis, and conviction-based (market cap) rather than participation-based (trading value) measurement.
-
π 2025/12/12
- High-Frequency Analysis of a Trading Game with Transient Price Impact
-
Marcel Nutz et al. 2512.11765v1 -
Abstract
We study the high-frequency limit of an $n$-trader optimal execution game in discrete time. Traders face transient price impact of Obizhaeva--Wang type in addition to quadratic instantaneous trading costs $ΞΈ(ΞX_t)^2$ on each transaction $ΞX_t$. There is a unique Nash equilibrium in which traders choose liquidation strategies minimizing expected execution costs. In the high-frequency limit where the grid of trading dates converges to the continuous interval $[0,T]$, the discrete equilibrium inventories converge at rate $1/N$ to the continuous-time equilibrium of an Obizhaeva--Wang model with additional quadratic costs $\vartheta_0(ΞX_0)^2$ and $\vartheta_T(ΞX_T)^2$ on initial and terminal block trades, where $\vartheta_0=(n-1)/2$ and $\vartheta_T=1/2$. The latter model was introduced by Campbell and Nutz as the limit of continuous-time equilibria with vanishing instantaneous costs. Our results extend and refine previous results of Schied, Strehle, and Zhang for the particular case $n=2$ where $\vartheta_0=\vartheta_T=1/2$. In particular, we show how the coefficients $\vartheta_0=(n-1)/2$ and $\vartheta_T=1/2$ arise endogenously in the high-frequency limit: the initial and terminal block costs of the continuous-time model are identified as the limits of the cumulative discrete instantaneous costs incurred over small neighborhoods of $0$ and $T$, respectively, and these limits are independent of $ΞΈ>0$. By contrast, when $ΞΈ=0$ the discrete-time equilibrium strategies and costs exhibit persistent oscillations and admit no high-frequency limit, mirroring the non-existence of continuous-time equilibria without boundary block costs. Our results show that two different types of trading frictions -- a fine time discretization and small instantaneous costs in continuous time -- have similar regularizing effects and select a canonical model in the limit.
-
π 2025/12/11
- Not All Factors Crowd Equally: Modeling, Measuring, and Trading on Alpha Decay
-
Chorok Lee 2512.11913v2 -
Abstract
We derive a specific functional form for factor alpha decay -- hyperbolic decay alpha(t) = K/(1+lambda*t) -- from a game-theoretic equilibrium model, and test it against linear and exponential alternatives. Using eight Fama-French factors (1963--2024), we find: (1) Hyperbolic decay fits mechanical factors. Momentum exhibits clear hyperbolic decay (R^2 = 0.65), outperforming linear (0.51) and exponential (0.61) baselines -- validating the equilibrium foundation. (2) Not all factors crowd equally. Mechanical factors (momentum, reversal) fit the model; judgment-based factors (value, quality) do not -- consistent with a signal-ambiguity taxonomy paralleling Hua and Sun's "barriers to entry." (3) Crowding accelerated post-2015. Out-of-sample, the model over-predicts remaining alpha (0.30 vs. 0.15), correlating with factor ETF growth (rho = -0.63). (4) Average returns are efficiently priced. Crowding-based factor selection fails to generate alpha (Sharpe: 0.22 vs. 0.39 factor momentum benchmark). (5) Crowding predicts tail risk. Out-of-sample (2001--2024), crowded reversal factors show 1.7--1.8x higher crash probability (bottom decile returns), while crowded momentum shows lower crash risk (0.38x, p = 0.006). Our findings extend equilibrium crowding models (DeMiguel et al.) to temporal dynamics and show that crowding predicts crashes, not means -- useful for risk management, not alpha generation.
-
π 2025/12/06
- Wealth or Stealth? The Camouflage Effect in Insider Trading
-
Jin Ma et al. 2512.06309v1 -
Abstract
We consider a Kyle-type model where insider trading takes place among a potentially large population of liquidity traders and is subject to legal penalties. Insiders exploit the liquidity provided by the trading masses to "camouflage" their actions and balance expected wealth with the necessary stealth to avoid detection. Under a diverse spectrum of prosecution schemes, we establish the existence of equilibria for arbitrary population sizes and a unique limiting equilibrium. A convergence analysis determines the scale of insider trading by a stealth index $Ξ³$, revealing that the equilibrium can be closely approximated by a simple limit due to diminished price informativeness. Empirical aspects are derived from two calibration experiments using non-overlapping data sets spanning from 1980 to 2018, which underline the indispensable role of a large population in insider trading models with legal risk, along with important implications for the incidence of stealth trading and the deterrent effect of legal enforcement.
-
π 2025/12/05
- The Red Queenβs Trap: Limits of Deep Evolution in High-Frequency Trading
-
Yijia Chen 2512.15732v1 -
Abstract
The integration of Deep Reinforcement Learning (DRL) and Evolutionary Computation (EC) is frequently hypothesized to be the "Holy Grail" of algorithmic trading, promising systems that adapt autonomously to non-stationary market regimes. This paper presents a rigorous post-mortem analysis of "Galaxy Empire," a hybrid framework coupling LSTM/Transformer-based perception with a genetic "Time-is-Life" survival mechanism. Deploying a population of 500 autonomous agents in a high-frequency cryptocurrency environment, we observed a catastrophic divergence between training metrics (Validation APY $>300\%$) and live performance (Capital Decay $>70\%$). We deconstruct this failure through a multi-disciplinary lens, identifying three critical failure modes: the overfitting of \textit{Aleatoric Uncertainty} in low-entropy time-series, the \textit{Survivor Bias} inherent in evolutionary selection under high variance, and the mathematical impossibility of overcoming microstructure friction without order-flow data. Our findings provide empirical evidence that increasing model complexity in the absence of information asymmetry exacerbates systemic fragility.
-
π 2025/12/02
- Hidden Order in Trades Predicts the Size of Price Moves
-
Mainak Singha 2512.15720v1 -
Abstract
Financial markets exhibit an apparent paradox: while directional price movements remain largely unpredictable--consistent with weak-form efficiency--the magnitude of price changes displays systematic structure. Here we demonstrate that real-time order-flow entropy, computed from a 15-state Markov transition matrix at second resolution, predicts the magnitude of intraday returns without providing directional information. Analysis of 38.5 million SPY trades over 36 trading days reveals that conditioning on entropy below the 5th percentile increases subsequent 5-minute absolute returns by a factor of 2.89 (t = 12.41, p < 0.0001), while directional accuracy remains at 45.0%--statistically indistinguishable from chance (p = 0.12). This decoupling arises from a fundamental symmetry: entropy is invariant under sign permutation, detecting the presence of informed trading without revealing its direction. Walk-forward validation across five non-overlapping test periods confirms out-of-sample predictability, and label-permutation placebo tests yield z = 14.4 against the null. These findings suggest that information-theoretic measures may serve as volatility state variables in market microstructure, though the limited sample (36 days, single instrument) requires extended validation.
-
π 2025/11/30
- A Hybrid Architecture for Options Wheel Strategy Decisions: LLM-Generated Bayesian Networks for Transparent Trading
-
Xiaoting Kuang et al. 2512.01123v1 -
Abstract
Large Language Models (LLMs) excel at understanding context and qualitative nuances but struggle with the rigorous and transparent reasoning required in high-stakes quantitative domains such as financial trading. We propose a model-first hybrid architecture for the options "wheel" strategy that combines the strengths of LLMs with the robustness of a Bayesian Network. Rather than using the LLM as a black-box decision-maker, we employ it as an intelligent model builder. For each trade decision, the LLM constructs a context-specific Bayesian network by interpreting current market conditions, including prices, volatility, trends, and news, and hypothesizing relationships among key variables. The LLM also selects relevant historical data from an 18.75-year, 8,919-trade dataset to populate the network's conditional probability tables. This selection focuses on scenarios analogous to the present context. The instantiated Bayesian network then performs transparent probabilistic inference, producing explicit probability distributions and risk metrics to support decision-making. A feedback loop enables the LLM to analyze trade outcomes and iteratively refine subsequent network structures and data selection, learning from both successes and failures. Empirically, our hybrid system demonstrates effective performance on the wheel strategy. Over nearly 19 years of out-of-sample testing, it achieves a 15.3% annualized return with significantly superior risk-adjusted performance (Sharpe ratio 1.08 versus 0.62 for market benchmarks) and dramatically lower drawdown (-8.2% versus -60%) while maintaining a 0% assignment rate through strategic option rolling. Crucially, each trade decision is fully explainable, involving on average 27 recorded decision factors (e.g., volatility level, option premium, risk indicators, market context).
-
π 2025/11/20
- Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions
-
Juan C. King et al. 2512.02036v1 -
Abstract
The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and applying advanced techniques of Machine Learning and Deep Learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates Long Short-Term Memory (LSTM) networks with algorithms based on decision trees, such as Random Forest and Gradient Boosting. While the former analyze price patterns of financial assets, the latter are fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, Random Forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables.
-
π 2025/11/15
- Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy
-
Hongyang Yang et al. 2511.12120v1 -
Abstract
Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020}{GitHub}.
-
π 2025/11/11
- An extreme Gradient Boosting (XGBoost) Trees approach to Detect and Identify Unlawful Insider Trading (UIT) Transactions
-
Krishna Neupane et al. 2511.08306v1 -
Abstract
Corporate insiders have control of material non-public preferential information (MNPI). Occasionally, the insiders strategically bypass legal and regulatory safeguards to exploit MNPI in their execution of securities trading. Due to a large volume of transactions a detection of unlawful insider trading becomes an arduous task for humans to examine and identify underlying patterns from the insider's behavior. On the other hand, innovative machine learning architectures have shown promising results for analyzing large-scale and complex data with hidden patterns. One such popular technique is eXtreme Gradient Boosting (XGBoost), the state-of-the-arts supervised classifier. We, hence, resort to and apply XGBoost to alleviate challenges of identification and detection of unlawful activities. The results demonstrate that XGBoost can identify unlawful transactions with a high accuracy of 97 percent and can provide ranking of the features that play the most important role in detecting fraudulent activities.
-
π 2025/11/03
- JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading
-
Valentin Mohl et al. 2511.02136v1 -
Abstract
Agent-based modelling (ABM) approaches for high-frequency financial markets are difficult to calibrate and validate, partly due to the large parameter space created by defining fixed agent policies. Multi-agent reinforcement learning (MARL) enables more realistic agent behaviour and reduces the number of free parameters, but the heavy computational cost has so far limited research efforts. To address this, we introduce JaxMARL-HFT (JAX-based Multi-Agent Reinforcement Learning for High-Frequency Trading), the first GPU-accelerated open-source multi-agent reinforcement learning environment for high-frequency trading (HFT) on market-by-order (MBO) data. Extending the JaxMARL framework and building on the JAX-LOB implementation, JaxMARL-HFT is designed to handle a heterogeneous set of agents, enabling diverse observation/action spaces and reward functions. It is designed flexibly, so it can also be used for single-agent RL, or extended to act as an ABM with fixed-policy agents. Leveraging JAX enables up to a 240x reduction in end-to-end training time, compared with state-of-the-art reference implementations on the same hardware. This significant speed-up makes it feasible to exploit the large, granular datasets available in high-frequency trading, and to perform the extensive hyperparameter sweeps required for robust and efficient MARL research in trading. We demonstrate the use of JaxMARL-HFT with independent Proximal Policy Optimization (IPPO) for a two-player environment, with an order execution and a market making agent, using one year of LOB data (400 million orders), and show that these agents learn to outperform standard benchmarks. The code for the JaxMARL-HFT framework is available on GitHub.
-
- Trade Execution Flow as the Underlying Source of Market Dynamics
-
Mikhail Gennadievich Belov et al. 2511.01471v1 -
Abstract
In this work, we demonstrate experimentally that the execution flow, $I = dV/dt$, is the fundamental driving force of market dynamics. We develop a numerical framework to calculate execution flow from sampled moments using the Radon-Nikodym derivative. A notable feature of this approach is its ability to automatically determine thresholds that can serve as actionable triggers. The technique also determines the characteristic time scale directly from the corresponding eigenproblem. The methodology has been validated on actual market data to support these findings. Additionally, we introduce a framework based on the Christoffel function spectrum, which is invariant under arbitrary non-degenerate linear transformations of input attributes and offers an alternative to traditional principal component analysis (PCA), which is limited to unitary invariance.
-
π 2025/10/31
- Deep reinforcement learning for optimal trading with partial information
-
Andrea Macrì et al. 2511.00190v1 -
Abstract
Reinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters. The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal's parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.
-
- When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making
-
Ali Raza Jafree et al. 2510.27334v1 -
Abstract
We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent.
-
π 2025/10/25
- Understanding Carbon Trade Dynamics: A European Union Emissions Trading System Perspective
-
Avirup Chakraborty 2510.22341v1 -
Abstract
The European Union Emissions Trading System (EU ETS), the worlds largest cap-and-trade carbon market, is central to EU climate policy. This study analyzes its efficiency, price behavior, and market structure from 2010 to 2020. Using an AR-GARCH framework, we find pronounced price clustering and short-term return predictability, with 60.05 percent directional accuracy and a 70.78 percent hit rate within forecast intervals. Network analysis of inter-country transactions shows a concentrated structure dominated by a few registries that control most high-value flows. Country-specific log-log regressions of price on traded quantity reveal heterogeneous and sometimes positive elasticities exceeding unity, implying that trading volumes often rise with prices. These results point to persistent inefficiencies in the EU ETS, including partial predictability, asymmetric market power, and unconventional price-volume relationships, suggesting that while the system contributes to decarbonization, its trading dynamics and price formation remain imperfect.
-
π 2025/10/22
- News-Aware Direct Reinforcement Trading for Financial Markets
-
Qing-Yu Lan et al. 2510.19173v1 -
Abstract
The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process.
-
π 2025/10/20
- Trading with the Devil: Risk and Return in Foundation Model Strategies
-
Jinrui Zhang 2510.17165v1 -
Abstract
Foundation models - already transformative in domains such as natural language processing - are now starting to emerge for time-series tasks in finance. While these pretrained architectures promise versatile predictive signals, little is known about how they shape the risk profiles of the trading strategies built atop them, leaving practitioners reluctant to commit serious capital. In this paper, we propose an extension to the Capital Asset Pricing Model (CAPM) that disentangles the systematic risk introduced by a shared foundation model - potentially capable of generating alpha if the underlying model is genuinely predictive - from the idiosyncratic risk attributable to custom fine-tuning, which typically accrues no systematic premium. To enable a practical estimation of these separate risks, we align this decomposition with the concepts of uncertainty disentanglement, casting systematic risk as epistemic uncertainty (rooted in the pretrained model) and idiosyncratic risk as aleatory uncertainty (introduced during custom adaptations). Under the Aleatory Collapse Assumption, we illustrate how Monte Carlo dropout - among other methods in the uncertainty-quantization toolkit - can directly measure the epistemic risk, thereby mapping trading strategies to a more transparent risk-return plane. Our experiments show that isolating these distinct risk factors yields deeper insights into the performance limits of foundation-model-based strategies, their model degradation over time, and potential avenues for targeted refinements. Taken together, our results highlight both the promise and the pitfalls of deploying large pretrained models in competitive financial markets.
-
π 2025/10/14
- (Non-Parametric) Bootstrap Robust Optimization for Portfolios and Trading Strategies
-
Daniel Cunha Oliveira et al. 2510.12725v1 -
Abstract
Robust optimization provides a principled framework for decision-making under uncertainty, with broad applications in finance, engineering, and operations research. In portfolio optimization, uncertainty in expected returns and covariances demands methods that mitigate estimation error, parameter instability, and model misspecification. Traditional approaches, including parametric, bootstrap-based, and Bayesian methods, enhance stability by relying on confidence intervals or probabilistic priors but often impose restrictive assumptions. This study introduces a non-parametric bootstrap framework for robust optimization in financial decision-making. By resampling empirical data, the framework constructs flexible, data-driven confidence intervals without assuming specific distributional forms, thus capturing uncertainty in statistical estimates, model parameters, and utility functions. Treating utility as a random variable enables percentile-based optimization, naturally suited for risk-sensitive and worst-case decision-making. The approach aligns with recent advances in robust optimization, reinforcement learning, and risk-aware control, offering a unified perspective on robustness and generalization. Empirically, the framework mitigates overfitting and selection bias in trading strategy optimization and improves generalization in portfolio allocation. Results across portfolio and time-series momentum experiments demonstrate that the proposed method delivers smoother, more stable out-of-sample performance, offering a practical, distribution-free alternative to traditional robust optimization methods.
-
π 2025/10/13
- On Bellman equation in the limit order optimization problem for high-frequency trading
-
M. I. Balakaeva et al. 2510.15988v1 -
Abstract
An approximation method for construction of optimal strategies in the bid \& ask limit order book in the high-frequency trading (HFT) is studied. The basis is the article by M. Avellaneda \& S. Stoikov 2008, in which certain seemingly serious gaps have been found; in the present paper they are carefully corrected. However, a bit surprisingly, our corrections do not change the main answer in the cited paper, so that, in fact, the gaps turn out to be unimportant. An explanation of this effect is offered.
-
π 2025/10/12
- Integrating Large Language Models and Reinforcement Learning for Sentiment-Driven Quantitative Trading
-
Wo Long et al. 2510.10526v1 -
Abstract
This research develops a sentiment-driven quantitative trading system that leverages a large language model, FinGPT, for sentiment analysis, and explores a novel method for signal integration using a reinforcement learning algorithm, Twin Delayed Deep Deterministic Policy Gradient (TD3). We compare the performance of strategies that integrate sentiment and technical signals using both a conventional rule-based approach and a reinforcement learning framework. The results suggest that sentiment signals generated by FinGPT offer value when combined with traditional technical indicators, and that reinforcement learning algorithm presents a promising approach for effectively integrating heterogeneous signals in dynamic trading environments.
-
π 2025/10/10
- ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination
-
Charidimos Papadakis et al. 2510.15949v2 -
Abstract
Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.
-
π 2025/10/09
- An Adaptive Multi Agent Bitcoin Trading System
-
Aadi Singhi 2510.08068v2 -
Abstract
This paper presents a Multi Agent Bitcoin Trading system that utilizes Large Language Models (LLMs) for alpha generation and portfolio management in the cryptocurrencies market. Unlike equities, cryptocurrencies exhibit extreme volatility and are heavily influenced by rapidly shifting market sentiments and regulatory announcements, making them difficult to model using static regression models or neural networks trained solely on historical data. The proposed framework overcomes this by structuring LLMs into specialised agents for technical analysis, sentiment evaluation, decision-making, and performance reflection. The agents improve over time via a novel verbal feedback mechanism where a Reflect agent provides daily and weekly natural-language critiques of trading decisions. These textual evaluations are then injected into future prompts of the agents, allowing them to adjust allocation logic without weight updates or finetuning. Back-testing on Bitcoin price data from July 2024 to April 2025 shows consistent outperformance across market regimes: the Quantitative agent delivered over 30\% higher returns in bullish phases and 15\% overall gains versus buy-and-hold, while the sentiment-driven agent turned sideways markets from a small loss into a gain of over 100\%. Adding weekly feedback further improved total performance by 31\% and reduced bearish losses by 10\%. The results demonstrate that verbal feedback represents a new, scalable, and low-cost approach of tuning LLMs for financial goals.
-
π 2025/10/07
- The New Quant: A Survey of Large Language Models in Financial Prediction and Trading
-
Weilong Fu 2510.05533v1 -
Abstract
Large language models are reshaping quantitative investing by turning unstructured financial information into evidence-grounded signals and executable decisions. This survey synthesizes research with a focus on equity return prediction and trading, consolidating insights from domain surveys and more than fifty primary studies. We propose a task-centered taxonomy that spans sentiment and event extraction, numerical and economic reasoning, multimodal understanding, retrieval-augmented generation, time series prompting, and agentic systems that coordinate tools for research, backtesting, and execution. We review empirical evidence for predictability, highlight design patterns that improve faithfulness such as retrieval first prompting and tool-verified numerics, and explain how signals feed portfolio construction under exposure, turnover, and capacity controls. We assess benchmarks and datasets for prediction and trading and outline desiderata-for time safe and economically meaningful evaluation that reports costs, latency, and capacity. We analyze challenges that matter in production, including temporal leakage, hallucination, data coverage and structure, deployment economics, interpretability, governance, and safety. The survey closes with recommendations for standardizing evaluation, building auditable pipelines, and advancing multilingual and cross-market research so that language-driven systems deliver robust and risk-controlled performance in practice.
-
π 2025/09/29
- STRAPSim: A Portfolio Similarity Metric for ETF Alignment and Portfolio Trades
-
Mingshu Li et al. 2509.24151v1 -
Abstract
Accurately measuring portfolio similarity is critical for a wide range of financial applications, including Exchange-traded Fund (ETF) recommendation, portfolio trading, and risk alignment. Existing similarity measures often rely on exact asset overlap or static distance metrics, which fail to capture similarities among the constituents (e.g., securities within the portfolio) as well as nuanced relationships between partially overlapping portfolios with heterogeneous weights. We introduce STRAPSim (Semantic, Two-level, Residual-Aware Portfolio Similarity), a novel method that computes portfolio similarity by matching constituents based on semantic similarity, weighting them according to their portfolio share, and aggregating results via residual-aware greedy alignment. We benchmark our approach against Jaccard, weighted Jaccard, as well as BERTScore-inspired variants across public classification, regression, and recommendation tasks, as well as on corporate bond ETF datasets. Empirical results show that our method consistently outperforms baselines in predictive accuracy and ranking alignment, achieving the highest Spearman correlation with return-based similarity. By leveraging constituent-aware matching and dynamic reweighting, portfolio similarity offers a scalable, interpretable framework for comparing structured asset baskets, demonstrating its utility in ETF benchmarking, portfolio construction, and systematic execution.
-
π 2025/09/22
- Enhanced fill probability estimates in institutional algorithmic bond trading using statistical learning algorithms with quantum computers
-
Axel Ciceri et al. 2509.17715v1 -
Abstract
The estimation of fill probabilities for trade orders represents a key ingredient in the optimization of algorithmic trading strategies. It is bound by the complex dynamics of financial markets with inherent uncertainties, and the limitations of models aiming to learn from multivariate financial time series that often exhibit stochastic properties with hidden temporal patterns. In this paper, we focus on algorithmic responses to trade inquiries in the corporate bond market and investigate fill probability estimation errors of common machine learning models when given real production-scale intraday trade event data, transformed by a quantum algorithm running on IBM Heron processors, as well as on noiseless quantum simulators for comparison. We introduce a framework to embed these quantum-generated data transforms as a decoupled offline component that can be selectively queried by models in low-latency institutional trade optimization settings. A trade execution backtesting method is employed to evaluate the fill prediction performance of these models in relation to their input data. We observe a relative gain of up to ~ 34% in out-of-sample test scores for those models with access to quantum hardware-transformed data over those using the original trading data or transforms by noiseless quantum simulation. These empirical results suggest that the inherent noise in current quantum hardware contributes to this effect and motivates further studies. Our work demonstrates the emerging potential of quantum computing as a complementary explorative tool in quantitative finance and encourages applied industry research towards practical applications in trading.
-
π 2025/09/20
- Increase Alpha: Performance and Risk of an AI-Driven Trading Framework
-
Sid Ghatak et al. 2509.16707v2 -
Abstract
There are inefficiencies in financial markets, with unexploited patterns in price, volume, and cross-sectional relationships. While many approaches use large-scale transformers, we take a domain-focused path: feed-forward and recurrent networks with curated features to capture subtle regularities in noisy financial data. This smaller-footprint design is computationally lean and reliable under low signal-to-noise, crucial for daily production at scale. At Increase Alpha, we built a deep-learning framework that maps over 800 U.S. equities into daily directional signals with minimal computational overhead. The purpose of this paper is twofold. First, we outline the general overview of the predictive model without disclosing its core underlying concepts. Second, we evaluate its real-time performance through transparent, industry standard metrics. Forecast accuracy is benchmarked against both naive baselines and macro indicators. The performance outcomes are summarized via cumulative returns, annualized Sharpe ratio, and maximum drawdown. The best portfolio combination using our signals provides a low-risk, continuous stream of returns with a Sharpe ratio of more than 2.5, maximum drawdown of around 3%, and a near-zero correlation with the S&P 500 market benchmark. We also compare the model's performance through different market regimes, such as the recent volatile movements of the US equity market in the beginning of 2025. Our analysis showcases the robustness of the model and significantly stable performance during these volatile periods. Collectively, these findings show that market inefficiencies can be systematically harvested with modest computational overhead if the right variables are considered. This report will emphasize the potential of traditional deep learning frameworks for generating an AI-driven edge in the financial market.
-
π 2025/09/14
- Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning
-
Yijia Xiao et al. 2509.11420v1 -
Abstract
Developing professional, structured reasoning on par with human financial analysts and traders remains a central challenge in AI for finance, where markets demand interpretability and trust. Traditional time-series models lack explainability, while LLMs face challenges in turning natural-language analysis into disciplined, executable trades. Although reasoning LLMs have advanced in step-by-step planning and verification, their application to risk-sensitive financial decisions is underexplored. We present Trading-R1, a financially-aware model that incorporates strategic thinking and planning for comprehensive thesis composition, facts-grounded analysis, and volatility-adjusted decision making. Trading-R1 aligns reasoning with trading principles through supervised fine-tuning and reinforcement learning with a three-stage easy-to-hard curriculum. Training uses Tauric-TR1-DB, a 100k-sample corpus spanning 18 months, 14 equities, and five heterogeneous financial data sources. Evaluated on six major equities and ETFs, Trading-R1 demonstrates improved risk-adjusted returns and lower drawdowns compared to both open-source and proprietary instruction-following models as well as reasoning models. The system generates structured, evidence-based investment theses that support disciplined and interpretable trading decisions. Trading-R1 Terminal will be released at https://github.com/TauricResearch/Trading-R1.
-
π 2025/09/05
- MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading
-
Yang Chen et al. 2509.05080v2 -
Abstract
The inherent non-stationarity of financial markets and the complexity of multi-modal information pose significant challenges to existing quantitative trading models. Traditional methods relying on fixed structures and unimodal data struggle to adapt to market regime shifts, while large language model (LLM)-driven solutions - despite their multi-modal comprehension - suffer from static strategies and homogeneous expert designs, lacking dynamic adjustment and fine-grained decision mechanisms. To address these limitations, we propose MM-DREX: a Multimodal-driven, Dynamically-Routed EXpert framework based on large language models. MM-DREX explicitly decouples market state perception from strategy execution to enable adaptive sequential decision-making in non-stationary environments. Specifically, it (1) introduces a vision-language model (VLM)-powered dynamic router that jointly analyzes candlestick chart patterns and long-term temporal features to allocate real-time expert weights; (2) designs four heterogeneous trading experts (trend, reversal, breakout, positioning) generating specialized fine-grained sub-strategies; and (3) proposes an SFT-RL hybrid training paradigm to synergistically optimize the router's market classification capability and experts' risk-adjusted decision-making. Extensive experiments on multi-modal datasets spanning stocks, futures, and cryptocurrencies demonstrate that MM-DREX significantly outperforms 15 baselines (including state-of-the-art financial LLMs and deep reinforcement learning models) across key metrics: total return, Sharpe ratio, and maximum drawdown, validating its robustness and generalization. Additionally, an interpretability module traces routing logic and expert behavior in real time, providing an audit trail for strategy transparency.
-
π 2025/09/04
- Finance-Grounded Optimization For Algorithmic Trading
-
Kasymkhan Khubiev et al. 2509.04541v2 -
Abstract
Deep Learning is evolving fast and integrates into various domains. Finance is a challenging field for deep learning, especially in the case of interpretable artificial intelligence (AI). Although classical approaches perform very well with natural language processing, computer vision, and forecasting, they are not perfect for the financial world, in which specialists use different metrics to evaluate model performance. We first introduce financially grounded loss functions derived from key quantitative finance metrics, including the Sharpe ratio, Profit-and-Loss (PnL), and Maximum Draw down. Additionally, we propose turnover regularization, a method that inherently constrains the turnover of generated positions within predefined limits. Our findings demonstrate that the proposed loss functions, in conjunction with turnover regularization, outperform the traditional mean squared error loss for return prediction tasks when evaluated using algorithmic trading metrics. The study shows that financially grounded metrics enhance predictive performance in trading strategies and portfolio optimization.
-
π 2025/09/01
- Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading
-
Qizhao Chen et al. 2509.01393v2 -
Abstract
This paper introduces a reinforcement learning framework that employs Proximal Policy Optimization (PPO) to dynamically optimize the weights of multiple large language model (LLM)-generated formulaic alphas for stock trading strategies. Formulaic alphas are mathematically defined trading signals derived from price, volume, sentiment, and other data. Although recent studies have shown that LLMs can generate diverse and effective alphas, a critical challenge lies in how to adaptively integrate them under varying market conditions. To address this gap, we leverage a DeepSeek model to generate fifty alphas for ten stocks, and then use PPO to adjust their weights in real time. Experimental results indicate that the PPO-optimized strategy does not consistently deliver the highest cumulative returns across all stocks, but it achieves comparatively higher Sharpe ratios and smaller maximum drawdowns in most cases. When compared with baseline strategies, including equal-weighted, buy-and-hold, random entry/exit, and momentum approaches, PPO demonstrates more stable risk-adjusted performance. The findings highlight the importance of reinforcement learning in the allocation of alpha weights and show the potential of combining LLM-generated signals with adaptive optimization for robust financial forecasting and trading.
-
π 2025/08/31
- Prospects of Imitating Trading Agents in the Stock Market
-
Mateusz Wilinski et al. 2509.00982v1 -
Abstract
In this work we show how generative tools, which were successfully applied to limit order book data, can be utilized for the task of imitating trading agents. To this end, we propose a modified generative architecture based on the state-space model, and apply it to limit order book data with identified investors. The model is trained on synthetic data, generated from a heterogeneous agent-based model. Finally, we compare model's predicted distribution over different aspects of investors' actions, with the ground truths known from the agent-based model.
-
π 2025/08/28
- Agent-based model of information diffusion in the limit order book trading
-
Mateusz Wilinski et al. 2508.20672v1 -
Abstract
There are multiple explanations for stylized facts in high-frequency trading, including adaptive and informed agents, many of which have been studied through agent-based models. This paper investigates an alternative explanation by examining whether, and under what circumstances, interactions between traders placing limit order book messages can reproduce stylized facts, and what forms of interaction are required. While the agent-based modeling literature has introduced interconnected agents on networks, little attention has been paid to whether specific trading network topologies can generate stylized facts in limit order book markets. In our model, agents are strictly zero-intelligence, with no fundamental knowledge or chartist-like strategies, so that the role of network topology can be isolated. We find that scale-free connectivity between agents reproduces stylized facts observed in markets, whereas no-interaction does not. Our experiments show that regular lattices and Erdos-Renyi networks are not significantly different from the no-interaction baseline. Thus, we provide a completely new, potentially complementary, explanation for the emergence of stylized facts.
-
- QTMRL: An Agent for Quantitative Trading Decision-Making Based on Multi-Indicator Guided Reinforcement Learning
-
Xiangdong Liu et al. 2508.20467v1 -
Abstract
In the highly volatile and uncertain global financial markets, traditional quantitative trading models relying on statistical modeling or empirical rules often fail to adapt to dynamic market changes and black swan events due to rigid assumptions and limited generalization. To address these issues, this paper proposes QTMRL (Quantitative Trading Multi-Indicator Reinforcement Learning), an intelligent trading agent combining multi-dimensional technical indicators with reinforcement learning (RL) for adaptive and stable portfolio management. We first construct a comprehensive multi-indicator dataset using 23 years of S&P 500 daily OHLCV data (2000-2022) for 16 representative stocks across 5 sectors, enriching raw data with trend, volatility, and momentum indicators to capture holistic market dynamics. Then we design a lightweight RL framework based on the Advantage Actor-Critic (A2C) algorithm, including data processing, A2C algorithm, and trading agent modules to support policy learning and actionable trading decisions. Extensive experiments compare QTMRL with 9 baselines (e.g., ARIMA, LSTM, moving average strategies) across diverse market regimes, verifying its superiority in profitability, risk adjustment, and downside risk control. The code of QTMRL is publicly available at https://github.com/ChenJiahaoJNU/QTMRL.git
-
π 2025/08/10
- Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading
-
Yueyi Wang et al. 2508.07408v1 -
Abstract
In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use an LLM to automatically assign multi-label event categories to high-sentiment-intensity tweets. We align these labeled sentiment signals with forward returns over 1-to-7-day horizons to evaluate their statistical efficacy and market tradability. Our experiments reveal that certain event labels consistently yield negative alpha, with Sharpe ratios as low as -0.38 and information coefficients exceeding 0.05, all statistically significant at the 95\% confidence level. This study establishes the feasibility of transforming unstructured social media text into structured, multi-label event variables. A key contribution of this work is its commitment to transparency and reproducibility; all code and methodologies are made publicly available. Our results provide compelling evidence that social media sentiment is a valuable, albeit noisy, signal in financial forecasting and underscore the potential of open-source frameworks to democratize algorithmic trading research.
-
π 2025/08/09
- Empirical Analysis of the Model-Free Valuation Approach: Hedging Gaps, Conservatism, and Trading Opportunities
-
Zixing Chen et al. 2508.16595v2 -
Abstract
In this paper we study the quality of model-free valuation approaches for financial derivatives by systematically evaluating the difference between model-free super-hedging strategies and the realized payoff of financial derivatives using historical option prices from several constituents of the S&P 500 between 2018 and 2022. Our study allows in particular to describe the realized gap between payoff and model-free hedging strategy empirically so that we can quantify to which degree model-free approaches are overly conservative. Our results imply that the model-free hedging approach is only marginally more conservative than industry-standard models such as the Heston-model while being model-free at the same time. This finding, its statistical description and the model-independence of the hedging approach enable us to construct an explicit trading strategy which, as we demonstrate, can be profitably applied in financial markets, and additionally possesses the desirable feature with an explicit control of its downside risk due to its model-free construction preventing losses pathwise.
-
π 2025/08/04
- Language Model Guided Reinforcement Learning in Quantitative Trading
-
Adam Darmanin et al. 2508.02366v3 -
Abstract
Algorithmic trading requires short-term tactical decisions consistent with long-term financial objectives. Reinforcement Learning (RL) has been applied to such problems, but adoption is limited by myopic behaviour and opaque policies. Large Language Models (LLMs) offer complementary strategic reasoning and multi-modal signal interpretation when guided by well-structured prompts. This paper proposes a hybrid framework in which LLMs generate high-level trading strategies to guide RL agents. We evaluate (i) the economic rationale of LLM-generated strategies through expert review, and (ii) the performance of LLM-guided agents against unguided RL baselines using Sharpe Ratio (SR) and Maximum Drawdown (MDD). Empirical results indicate that LLM guidance improves both return and risk metrics relative to standard RL.
-
- Neural Network-Based Algorithmic Trading Systems: Multi-Timeframe Analysis and High-Frequency Execution in Cryptocurrency Markets
-
WΔi ZhΔng 2508.02356v1 -
Abstract
This paper explores neural network-based approaches for algorithmic trading in cryptocurrency markets. Our approach combines multi-timeframe trend analysis with high-frequency direction prediction networks, achieving positive risk-adjusted returns through statistical modeling and systematic market exploitation. The system integrates diverse data sources including market data, on-chain metrics, and orderbook dynamics, translating these into unified buy/sell pressure signals. We demonstrate how machine learning models can effectively capture cross-timeframe relationships, enabling sub-second trading decisions with statistical confidence.
-
π 2025/08/01
- Automated Trading System for Straddle-Option Based on Deep Q-Learning
-
Yiran Wan et al. 2509.07987v1 -
Abstract
Straddle Option is a financial trading tool that explores volatility premiums in high-volatility markets without predicting price direction. Although deep reinforcement learning has emerged as a powerful approach to trading automation in financial markets, existing work mostly focused on predicting price trends and making trading decisions by combining multi-dimensional datasets like blogs and videos, which led to high computational costs and unstable performance in high-volatility markets. To tackle this challenge, we develop automated straddle option trading based on reinforcement learning and attention mechanisms to handle unpredictability in high-volatility markets. Firstly, we leverage the attention mechanisms in Transformer-DDQN through both self-attention with time series data and channel attention with multi-cycle information. Secondly, a novel reward function considering excess earnings is designed to focus on long-term profits and neglect short-term losses over a stop line. Thirdly, we identify the resistance levels to provide reference information when great uncertainty in price movements occurs with intensified battle between the buyers and sellers. Through extensive experiments on the Chinese stock, Brent crude oil, and Bitcoin markets, our attention-based Transformer-DDQN model exhibits the lowest maximum drawdown across all markets, and outperforms other models by 92.5\% in terms of the average return excluding the crude oil market due to relatively low fluctuation.
-
- ContestTrade: A Multi-Agent Trading System Based on Internal Contest Mechanism
-
Li Zhao et al. 2508.00554v3 -
Abstract
In financial trading, large language model (LLM)-based agents demonstrate significant potential. However, the high sensitivity to market noise undermines the performance of LLM-based trading systems. To address this limitation, we propose a novel multi-agent system featuring an internal competitive mechanism inspired by modern corporate management structures. The system consists of two specialized teams: (1) Data Team - responsible for processing and condensing massive market data into diversified text factors, ensuring they fit the model's constrained context. (2) Research Team - tasked with making parallelized multipath trading decisions based on deep research methods. The core innovation lies in implementing a real-time evaluation and ranking mechanism within each team, driven by authentic market feedback. Each agent's performance undergoes continuous scoring and ranking, with only outputs from top-performing agents being adopted. The design enables the system to adaptively adjust to dynamic environment, enhances robustness against market noise and ultimately delivers superior trading performance. Experimental results demonstrate that our proposed system significantly outperforms prevailing multi-agent systems and traditional quantitative investment methods across diverse evaluation metrics. ContestTrade is open-sourced on GitHub at https://github.com/FinStep-AI/ContestTrade.
-
π 2025/07/28
- Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals
-
Dhanashekar Kandaswamy et al. 2507.20494v1 -
Abstract
As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning.
-
π 2025/07/27
- Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading
-
Longfei Lu 2507.20202v2 -
Abstract
Deep neural networks (DNNs) have transformed fields such as computer vision and natural language processing by employing architectures aligned with domain-specific structural patterns. In algorithmic trading, however, there remains a lack of architectures that directly incorporate the logic of traditional technical indicators. This study introduces Technical Indicator Networks (TINs), a structured neural design that reformulates rule-based financial heuristics into trainable and interpretable modules. The architecture preserves the core mathematical definitions of conventional indicators while extending them to multidimensional data and supporting optimization through diverse learning paradigms, including reinforcement learning. Analytical transformations such as averaging, clipping, and ratio computation are expressed as vectorized layer operators, enabling transparent network construction and principled initialization. This formulation retains the clarity and interpretability of classical strategies while allowing adaptive adjustment and data-driven refinement. As a proof of concept, the framework is validated on the Dow Jones Industrial Average constituents using a Moving Average Convergence Divergence (MACD) TIN. Empirical results demonstrate improved risk-adjusted performance relative to traditional indicator-based strategies. Overall, the findings suggest that TINs provide a generalizable foundation for interpretable, adaptive, and extensible learning architectures in structured decision-making domains and indicate substantial commercial potential for upgrading trading platforms with cross-market visibility and enhanced decision-support capabilities.
-
π 2025/07/24
- FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs
-
Giorgos Iacovides et al. 2507.18417v1 -
Abstract
Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).
-
π 2025/07/23
- Optimal Trading under Instantaneous and Persistent Price Impact, Predictable Returns and Multiscale Stochastic Volatility
-
Patrick Chan et al. 2507.17162v1 -
Abstract
We consider a dynamic portfolio optimization problem that incorporates predictable returns, instantaneous transaction costs, price impact, and stochastic volatility, extending the classical results of Garleanu and Pedersen (2013), which assume constant volatility. Constructing the optimal portfolio strategy in this general setting is challenging due to the nonlinear nature of the resulting Hamilton-Jacobi-Bellman (HJB) equations. To address this, we propose a multi-scale volatility expansion that captures stochastic volatility dynamics across different time scales. Specifically, the analysis involves a singular perturbation for the fast mean-reverting volatility factor and a regular perturbation for the slow-moving factor. We also introduce an approximation for small price impact and demonstrate its numerical accuracy. We formally derive asymptotic approximations up to second order and use Monte Carlo simulations to show how incorporating these corrections improves the Profit and Loss (PnL) of the resulting portfolio strategy.
-
π 2025/07/14
- Kernel Learning for Mean-Variance Trading Strategies
-
Owen Futter et al. 2507.10701v1 -
Abstract
In this article, we develop a kernel-based framework for constructing dynamic, pathdependent trading strategies under a mean-variance optimisation criterion. Building on the theoretical results of (Muca Cirone and Salvi, 2025), we parameterise trading strategies as functions in a reproducing kernel Hilbert space (RKHS), enabling a flexible and non-Markovian approach to optimal portfolio problems. We compare this with the signature-based framework of (Futter, Horvath, Wiese, 2023) and demonstrate that both significantly outperform classical Markovian methods when the asset dynamics or predictive signals exhibit temporal dependencies for both synthetic and market-data examples. Using kernels in this context provides significant modelling flexibility, as the choice of feature embedding can range from randomised signatures to the final layers of neural network architectures. Crucially, our framework retains closed-form solutions and provides an alternative to gradient-based optimisation.
-
- A Coincidence of Wants Mechanism for Swap Trade Execution in Decentralized Exchanges
-
Abhimanyu Nag et al. 2507.10149v1 -
Abstract
We propose a mathematically rigorous framework for identifying and completing Coincidence of Wants (CoW) cycles in decentralized exchange (DEX) aggregators. Unlike existing auction based systems such as CoWSwap, our approach introduces an asset matrix formulation that not only verifies feasibility using oracle prices and formal conservation laws but also completes partial CoW cycles of swap orders that are discovered using graph traversal and are settled using imbalance correction. We define bridging orders and show that the resulting execution is slippage free and capital preserving for LPs. Applied to real world Arbitrum swap data, our algorithm demonstrates efficient discovery of CoW cycles and supports the insertion of synthetic orders for atomic cycle closure. This work can be thought of as the detailing of a potential delta-neutral strategy by liquidity providing market makers: a structured CoW cycle execution.
-
π 2025/07/13
- Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500
-
Haojie Liu et al. 2507.09739v1 -
Abstract
This study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments.
-
- MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading
-
Siyi Wu et al. 2507.20474v3 -
Abstract
Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{MountainLion}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence.
-
π 2025/07/12
- A Framework for Predictive Directional Trading Based on Volatility and Causal Inference
-
Ivan Letteri 2507.09347v1 -
Abstract
Purpose: This study introduces a novel framework for identifying and exploiting predictive lead-lag relationships in financial markets. We propose an integrated approach that combines advanced statistical methodologies with machine learning models to enhance the identification and exploitation of predictive relationships between equities. Methods: We employed a Gaussian Mixture Model (GMM) to cluster nine prominent stocks based on their mid-range historical volatility profiles over a three-year period. From the resulting clusters, we constructed a multi-stage causal inference pipeline, incorporating the Granger Causality Test (GCT), a customised Peter-Clark Momentary Conditional Independence (PCMCI) test, and Effective Transfer Entropy (ETE) to identify robust, predictive linkages. Subsequently, Dynamic Time Warping (DTW) and a K-Nearest Neighbours (KNN) classifier were utilised to determine the optimal time lag for trade execution. The resulting strategy was rigorously backtested. Results: The proposed volatility-based trading strategy, tested from 8 June 2023 to 12 August 2023, demonstrated substantial efficacy. The portfolio yielded a total return of 15.38%, significantly outperforming the 10.39% return of a comparative Buy-and-Hold strategy. Key performance metrics, including a Sharpe Ratio up to 2.17 and a win rate up to 100% for certain pairs, confirmed the strategy's viability. Conclusion: This research contributes a systematic and robust methodology for identifying profitable trading opportunities derived from volatility-based causal relationships. The findings have significant implications for both academic research in financial modelling and the practical application of algorithmic trading, offering a structured approach to developing resilient, data-driven strategies.
-
π 2025/07/11
- To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions
-
Dimitrios Emmanoulopoulos et al. 2507.08584v1 -
Abstract
Large language models (LLMs) are increasingly deployed in agentic frameworks, in which prompts trigger complex tool-based analysis in pursuit of a goal. While these frameworks have shown promise across multiple domains including in finance, they typically lack a principled model-building step, relying instead on sentiment- or trend-based analysis. We address this gap by developing an agentic system that uses LLMs to iteratively discover stochastic differential equations for financial time series. These models generate risk metrics which inform daily trading decisions. We evaluate our system in both traditional backtests and using a market simulator, which introduces synthetic but causally plausible price paths and news events. We find that model-informed trading strategies outperform standard LLM-based agents, improving Sharpe ratios across multiple equities. Our results show that combining LLMs with agentic model discovery enhances market risk estimation and enables more profitable trading decisions.
-
π 2025/07/08
- Reinforcement Learning for Trade Execution with Market and Limit Orders
-
Patrick Cheridito et al. 2507.06345v2 -
Abstract
In this paper, we introduce a novel reinforcement learning framework for optimal trade execution in a limit order book. We formulate the trade execution problem as a dynamic allocation task whose objective is the optimal placement of market and limit orders to maximize expected revenue. By modeling market and limit order allocations with multivariate logistic-normal distributions, the framework enables efficient training of the reinforcement learning algorithm. Numerical experiments show that the proposed method outperforms traditional benchmark strategies in simulated limit order book environments featuring noise traders submitting random orders, tactical traders responding to order book imbalances, and a strategic trader seeking to acquire or liquidate an asset position.
-
π 2025/06/26
- Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market
-
Chi-Sheng Chen et al. 2506.20930v2 -
Abstract
We propose a hybrid quantum-classical reinforcement learning framework for sector rotation in the Taiwan stock market. Our system employs Proximal Policy Optimization (PPO) as the backbone algorithm and integrates both classical architectures (LSTM, Transformer) and quantum-enhanced models (QNN, QRWKV, QASA) as policy and value networks. An automated feature engineering pipeline extracts financial indicators from capital share data to ensure consistent model input across all configurations. Empirical backtesting reveals a key finding: although quantum-enhanced models consistently achieve higher training rewards, they underperform classical models in real-world investment metrics such as cumulative return and Sharpe ratio. This discrepancy highlights a core challenge in applying reinforcement learning to financial domains -- namely, the mismatch between proxy reward signals and true investment objectives. Our analysis suggests that current reward designs may incentivize overfitting to short-term volatility rather than optimizing risk-adjusted returns. This issue is compounded by the inherent expressiveness and optimization instability of quantum circuits under Noisy Intermediate-Scale Quantum (NISQ) constraints. We discuss the implications of this reward-performance gap and propose directions for future improvement, including reward shaping, model regularization, and validation-based early stopping. Our work offers a reproducible benchmark and critical insights into the practical challenges of deploying quantum reinforcement learning in real-world finance.
-
π 2025/06/23
- Making Leveraged Exchange-Traded Funds Work for your Portfolio
-
Peter Forsyth et al. 2506.19200v1 -
Abstract
We examine strategically incorporating broad stock market leveraged exchange-traded funds (LETFs) into investment portfolios. We demonstrate that easily understandable and implementable strategies can enhance the risk-return profile of a portfolio containing LETFs. Our analysis shows that seemingly reasonable investment strategies may result in undesirable Omega ratios, with these effects compounding across rebalancing periods. By contrast, relatively simple dynamic strategies that systematically de-risk the portfolio once gains are observed can exploit this compounding effect, taking advantage of favorable Omega ratio dynamics. Our findings suggest that LETFs represent a valuable tool for investors employing dynamic strategies, while confirming their well-documented unsuitability for passive or static approaches.
-
π 2025/06/13
- Dynamic Grid Trading Strategy: From Zero Expectation to Market Outperformance
-
Kai-Yuan Chen et al. 2506.11921v1 -
Abstract
We propose a profitable trading strategy for the cryptocurrency market based on grid trading. Starting with an analysis of the expected value of the traditional grid strategy, we show that under simple assumptions, its expected return is essentially zero. We then introduce a novel Dynamic Grid-based Trading (DGT) strategy that adapts to market conditions by dynamically resetting grid positions. Our backtesting results using minute-level data from Bitcoin and Ethereum between January 2021 and July 2024 demonstrate that the DGT strategy significantly outperforms both the traditional grid and buy-and-hold strategies in terms of internal rate of return and risk control.
-
π 2025/06/05
- Can Artificial Intelligence Trade the Stock Market?
-
JΔdrzej Maskiewicz et al. 2506.04658v1 -
Abstract
The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL's effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns.
-
π 2025/06/02
- Machine Learning Enhanced Multi-Factor Quantitative Trading: A Cross-Sectional Portfolio Optimization Approach with Bias Correction
-
Yimin Du 2507.07107v1 -
Abstract
This paper presents a comprehensive machine learning framework for quantitative trading that achieves superior risk-adjusted returns through systematic factor engineering, real-time computation optimization, and cross-sectional portfolio construction. Our approach integrates multi-factor alpha discovery with bias correction techniques, leveraging PyTorch-accelerated factor computation and advanced portfolio optimization. The system processes 500-1000 factors derived from open-source alpha101 extensions and proprietary market microstructure signals. Key innovations include tensor-based factor computation acceleration, geometric Brownian motion data augmentation, and cross-sectional neutralization strategies. Empirical validation on Chinese A-share markets (2010-2024) demonstrates annualized returns of $20\%$ with Sharpe ratios exceeding 2.0, significantly outperforming traditional approaches. Our analysis reveals the critical importance of bias correction in factor construction and the substantial impact of cross-sectional portfolio optimization on strategy performance. Code and experimental implementations are available at: https://github.com/initial-d/ml-quant-trading
-
π 2025/05/27
- Classifying and Clustering Trading Agents
-
Mateusz Wilinski et al. 2505.21662v1 -
Abstract
The rapid development of sophisticated machine learning methods, together with the increased availability of financial data, has the potential to transform financial research, but also poses a challenge in terms of validation and interpretation. A good case study is the task of classifying financial investors based on their behavioral patterns. Not only do we have access to both classification and clustering tools for high-dimensional data, but also data identifying individual investors is finally available. The problem, however, is that we do not have access to ground truth when working with real-world data. This, together with often limited interpretability of modern machine learning methods, makes it difficult to fully utilize the available research potential. In order to deal with this challenge we propose to use a realistic agent-based model as a way to generate synthetic data. This way one has access to ground truth, large replicable data, and limitless research scenarios. Using this approach we show how, even when classifying trading agents in a supervised manner is relatively easy, a more realistic task of unsupervised clustering may give incorrect or even misleading results. We complete the results with investigating the details of how supervised techniques were able to successfully distinguish between different trading behaviors.
-
- Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market
-
Penggan Xu 2505.20608v1 -
Abstract
This study replicates the findings of Wang et al. (2017) on reference-dependent preferences and their impact on the risk-return trade-off in the Chinese stock market, a unique context characterized by high retail investor participation, speculative trading behavior, and regulatory complexities. Capital Gains Overhang (CGO), a proxy for unrealized gains or losses, is employed to explore how behavioral biases shape cross-sectional stock returns in an emerging market setting. Utilizing data from 1995 to 2024 and econometric techniques such as Dependent Double Sorting and Fama-MacBeth regressions, this research investigates the interaction between CGO and five risk proxies: Beta, Return Volatility (RETVOL), Idiosyncratic Volatility (IVOL), Firm Age (AGE), and Cash Flow Volatility (CFVOL). Key findings reveal a weaker or absent positive risk-return relationship among high-CGO firms and stronger positive relationships among low-CGO firms, diverging from U.S. market results, and the interaction effects between CGO and risk proxies, significant and positive in the U.S., are predominantly negative in the Chinese market, reflecting structural and behavioral differences, such as speculative trading and diminished reliance on reference points. The results suggest that reference-dependent preferences play a less pronounced role in the Chinese market, emphasizing the need for tailored investment strategies in emerging economies.
-
π 2025/05/09
- FlowHFT: Imitation Learning via Flow Matching Policy for Optimal High-Frequency Trading under Diverse Market Conditions
-
Yang Li et al. 2505.05784v3 -
Abstract
High-frequency trading (HFT) is an investing strategy that continuously monitors market states and places bid and ask orders at millisecond speeds. Traditional HFT approaches fit models with historical data and assume that future market states follow similar patterns. This limits the effectiveness of any single model to the specific conditions it was trained for. Additionally, these models achieve optimal solutions only under specific market conditions, such as assumptions about stock price's stochastic process, stable order flow, and the absence of sudden volatility. Real-world markets, however, are dynamic, diverse, and frequently volatile. To address these challenges, we propose the FlowHFT, a novel imitation learning framework based on flow matching policy. FlowHFT simultaneously learns strategies from numerous expert models, each proficient in particular market scenarios. As a result, our framework can adaptively adjust investment decisions according to the prevailing market state. Furthermore, FlowHFT incorporates a grid-search fine-tuning mechanism. This allows it to refine strategies and achieve superior performance even in complex or extreme market scenarios where expert strategies may be suboptimal. We test FlowHFT in multiple market environments. We first show that flow matching policy is applicable in stochastic market environments, thus enabling FlowHFT to learn trading strategies under different market conditions. Notably, our single framework consistently achieves performance superior to the best expert for each market condition.
-
π 2025/05/08
- Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer
-
Wenhao Guo et al. 2505.05595v1 -
Abstract
In the complex landscape of traditional futures trading, where vast data and variables like real-time Limit Order Books (LOB) complicate price predictions, we introduce the FutureQuant Transformer model, leveraging attention mechanisms to navigate these challenges. Unlike conventional models focused on point predictions, the FutureQuant model excels in forecasting the range and volatility of future prices, thus offering richer insights for trading strategies. Its ability to parse and learn from intricate market patterns allows for enhanced decision-making, significantly improving risk management and achieving a notable average gain of 0.1193% per 30-minute trade over state-of-the-art models with a simple algorithm using factors such as RSI, ATR, and Bollinger Bands. This innovation marks a substantial leap forward in predictive analytics within the volatile domain of futures trading.
-
π 2025/04/29
- ClusterLOB: Enhancing Trading Strategies by Clustering Orders in Limit Order Books
-
Yichi Zhang et al. 2504.20349v3 -
Abstract
In the rapidly evolving world of financial markets, understanding the dynamics of limit order book (LOB) is crucial for unraveling market microstructure and participant behavior. We introduce ClusterLOB as a method to cluster individual market events in a stream of market-by-order (MBO) data into different groups. To do so, each market event is augmented with six time-dependent features. By applying the K-means++ clustering algorithm to the resulting order features, we are then able to assign each new order to one of three distinct clusters, which we identify as directional, opportunistic, and market-making participants, each capturing unique trading behaviors. Our experimental results are performed on one year of MBO data containing small-tick, medium-tick, and large-tick stocks from NASDAQ. To validate the usefulness of our clustering, we compute order flow imbalances across each cluster within 30-minute buckets during the trading day. We treat each cluster's imbalance as a signal that provides insights into trading strategies and participants' responses to varying market conditions. To assess the effectiveness of these signals, we identify the trading strategy with the highest Sharpe ratio in the training dataset, and demonstrate that its performance in the test dataset is superior to benchmark trading strategies that do not incorporate clustering. We also evaluate trading strategies based on order flow imbalance decompositions across different market event types, including add, cancel, and trade events, to assess their robustness in various market conditions. This work establishes a robust framework for clustering market participant behavior, which helps us to better understand market microstructure, and inform the development of more effective predictive trading signals with practical applications in algorithmic trading and quantitative finance.
-
π 2025/04/15
- Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations
-
Alejandro Lopez-Lira 2504.10789v1 -
Abstract
This paper presents a realistic simulated stock market where large language models (LLMs) act as heterogeneous competing trading agents. The open-source framework incorporates a persistent order book with market and limit orders, partial fills, dividends, and equilibrium clearing alongside agents with varied strategies, information sets, and endowments. Agents submit standardized decisions using structured outputs and function calls while expressing their reasoning in natural language. Three findings emerge: First, LLMs demonstrate consistent strategy adherence and can function as value investors, momentum traders, or market makers per their instructions. Second, market dynamics exhibit features of real financial markets, including price discovery, bubbles, underreaction, and strategic liquidity provision. Third, the framework enables analysis of LLMs' responses to varying market conditions, similar to partial dependence plots in machine-learning interpretability. The framework allows simulating financial theories without closed-form solutions, creating experimental designs that would be costly with human participants, and establishing how prompts can generate correlated behaviors affecting market stability.
-
π 2025/04/10
- Trading Graph Neural Network
-
Xian Wu 2504.07923v1 -
Abstract
This paper proposes a new algorithm -- Trading Graph Neural Network (TGNN) that can structurally estimate the impact of asset features, dealer features and relationship features on asset prices in trading networks. It combines the strength of the traditional simulated method of moments (SMM) and recent machine learning techniques -- Graph Neural Network (GNN). It outperforms existing reduced-form methods with network centrality measures in prediction accuracy. The method can be used on networks with any structure, allowing for heterogeneity among both traders and assets.
-
π 2025/04/09
- Maximizing Battery Storage Profits via High-Frequency Intraday Trading
-
David Schaurecker et al. 2504.06932v3 -
Abstract
Maximizing revenue for grid-scale battery energy storage systems in continuous intraday electricity markets requires strategies that are able to seize trading opportunities as soon as new information arrives. This paper introduces and evaluates an automated high-frequency trading strategy for battery energy storage systems trading on the intraday market for power while explicitly considering the dynamics of the limit order book, market rules, and technical parameters. The standard rolling intrinsic strategy is adapted for continuous intraday electricity markets and solved using a dynamic programming approximation that is two to three orders of magnitude faster than an exact mixed-integer linear programming solution. A detailed backtest over a full year of German order book data demonstrates that the proposed dynamic programming formulation does not reduce trading profits and enables the policy to react to every relevant order book update, enabling realistic rapid backtesting. Our results show the significant revenue potential of high-frequency trading: our policy earns 58% more than when re-optimizing only once every hour and 14% more than when re-optimizing once per minute, highlighting that profits critically depend on trading speed. Furthermore, we leverage the speed of our algorithm to train a parametric extension of the rolling intrinsic, increasing yearly revenue by 8.4% out of sample.
-
π 2025/03/21
- China and G7 in the Current Context of the World Trading
-
N. S. Gonchar et al. 2503.17225v1 -
Abstract
The paper analyses trade between the most developed economies of the world. The analysis is based on the previously proposed model of international trade. This model of international trade is based on the theory of general economic equilibrium. The demand for goods in this model is built on the import of goods by each of the countries participating in the trade. The structure of supply of goods in this model is determined by the structure of exports of each country. It is proved that in such a model, given a certain structure of supply and demand, there exists a so-called ideal equilibrium state in which the trade balance of each country is zero. Under certain conditions on the structure of supply and demand, there is an equilibrium state in which each country have a strictly positive trade balance. Among the equilibrium states under a certain structure of supply and demand, there are some that differ from the ones described above. Such states are characterized by the fact that there is an inequitable distribution of income between the participants in the trade. Such states are called degenerate. In this paper, based on the previously proposed model of international trade, an analysis of the dynamics of international trade of 8 of the world's most developed economies is made. It is shown that trade between these countries was not in a state of economic equilibrium. The found relative equilibrium price vector turned out to be very degenerate, which indicates the unequal exchange of goods on the market of the 8 studied countries. An analysis of the dynamics of supply to the market of the world's most developed economies showed an increase in China's share. The same applies to the share of demand.
-
π 2025/03/20
- Practical Portfolio Optimization with Metaheuristics:Pre-assignment Constraint and Margin Trading
-
Hang Kin Poon 2503.15965v1 -
Abstract
Portfolio optimization is a critical area in finance, aiming to maximize returns while minimizing risk. Metaheuristic algorithms were shown to solve complex optimization problems efficiently, with Genetic Algorithms and Particle Swarm Optimization being among the most popular methods. This paper introduces an innovative approach to portfolio optimization that incorporates pre-assignment to limit the search space for investor preferences and better results. Additionally, taking margin trading strategies in account and using a rare performance ratio to evaluate portfolio efficiency. Through an illustrative example, this paper demonstrates that the metaheuristic-based methodology yields superior risk-adjusted returns compared to traditional benchmarks. The results highlight the potential of metaheuristics with help of assets filtering in enhancing portfolio performance in terms of risk adjusted return.
-
π 2025/03/13
- Label Unbalance in High-frequency Trading
-
Zijian Zhao et al. 2503.09988v3 -
Abstract
In financial trading, return prediction is one of the foundation for a successful trading system. By the fast development of the deep learning in various areas such as graphical processing, natural language, it has also demonstrate significant edge in handling with financial data. While the success of the deep learning relies on huge amount of labeled sample, labeling each time/event as profitable or unprofitable, under the transaction cost, especially in the high-frequency trading world, suffers from serious label imbalance issue.In this paper, we adopts rigurious end-to-end deep learning framework with comprehensive label imbalance adjustment methods and succeed in predicting in high-frequency return in the Chinese future market. The code for our method is publicly available at https://github.com/RS2002/Label-Unbalance-in-High-Frequency-Trading .
-
π 2025/03/12
- A Deep Reinforcement Learning Approach to Automated Stock Trading, using xLSTM Networks
-
Faezeh Sarlakifar et al. 2503.09655v1 -
Abstract
Traditional Long Short-Term Memory (LSTM) networks are effective for handling sequential data but have limitations such as gradient vanishing and difficulty in capturing long-term dependencies, which can impact their performance in dynamic and risky environments like stock trading. To address these limitations, this study explores the usage of the newly introduced Extended Long Short Term Memory (xLSTM) network in combination with a deep reinforcement learning (DRL) approach for automated stock trading. Our proposed method utilizes xLSTM networks in both actor and critic components, enabling effective handling of time series data and dynamic market environments. Proximal Policy Optimization (PPO), with its ability to balance exploration and exploitation, is employed to optimize the trading strategy. Experiments were conducted using financial data from major tech companies over a comprehensive timeline, demonstrating that the xLSTM-based model outperforms LSTM-based methods in key trading evaluation metrics, including cumulative return, average profitability per trade, maximum earning rate, maximum pullback, and Sharpe ratio. These findings mark the potential of xLSTM for enhancing DRL-based stock trading systems.
-
- Leveraging LLMS for Top-Down Sector Allocation In Automated Trading
-
Ryan Quek Wei Heng et al. 2503.09647v5 -
Abstract
This paper introduces a methodology leveraging Large Language Models (LLMs) for sector-level portfolio allocation through systematic analysis of macroeconomic conditions and market sentiment. Our framework emphasizes top-down sector allocation by processing multiple data streams simultaneously, including policy documents, economic indicators, and sentiment patterns. Empirical results demonstrate superior risk-adjusted returns compared to traditional cross momentum strategies, achieving a Sharpe ratio of 2.51 and portfolio return of 8.79% versus -0.61 and -1.39% respectively. These results suggest that LLM-based systematic macro analysis presents a viable approach for enhancing automated portfolio allocation decisions at the sector level.
-
π 2025/03/07
- Multi-asset optimal trade execution with stochastic cross-effects: An Obizhaeva-Wang-type framework
-
Julia Ackermann et al. 2503.05594v2 -
Abstract
We analyze a continuous-time optimal trade execution problem in multiple assets where the price impact and the resilience can be matrix-valued stochastic processes that incorporate cross-impact effects. In addition, we allow for stochastic terminal and running targets. Initially, we formulate the optimal trade execution task as a stochastic control problem with a finite-variation control process that acts as an integrator both in the state dynamics and in the cost functional. We then extend this problem continuously to a stochastic control problem with progressively measurable controls. By identifying this extended problem as equivalent to a certain linear-quadratic stochastic control problem, we can use established results in linear-quadratic stochastic control to solve the extended problem. This work generalizes [Ackermann, Kruse, Urusov; FinancStoch'24] from the single-asset setting to the multi-asset case. In particular, we reveal cross-hedging effects, showing that it can be optimal to trade in an asset despite having no initial position. Moreover, as a subsetting we discuss a multi-asset variant of the model in [Obizhaeva, Wang; JFinancMark'13].
-
π 2025/03/06
- Risk-aware Trading Portfolio Optimization
-
Marco Bianchetti et al. 2503.04662v1 -
Abstract
We investigate portfolio optimization in financial markets from a trading and risk management perspective. We term this task Risk-Aware Trading Portfolio Optimization (RATPO), formulate the corresponding optimization problem, and propose an efficient Risk-Aware Trading Swarm (RATS) algorithm to solve it. The key elements of RATPO are a generic initial portfolio P, a specific set of Unique Eligible Instruments (UEIs), their combination into an Eligible Optimization Strategy (EOS), an objective function, and a set of constraints. RATS searches for an optimal EOS that, added to P, improves the objective function repecting the constraints. RATS is a specialized Particle Swarm Optimization method that leverages the parameterization of P in terms of UEIs, enables parallel computation with a large number of particles, and is fully general with respect to specific choices of the key elements, which can be customized to encode financial knowledge and needs of traders and risk managers. We showcase two RATPO applications involving a real trading portfolio made of hundreds of different financial instruments, an objective function combining both market risk (VaR) and profit&loss measures, constrains on market sensitivities and UEIs trading costs. In the case of small-sized EOS, RATS successfully identifies the optimal solution and demonstrates robustness with respect to hyper-parameters tuning. In the case of large-sized EOS, RATS markedly improves the portfolio objective value, optimizing risk and capital charge while respecting risk limits and preserving expected profits. Our work bridges the gap between the implementation of effective trading strategies and compliance with stringent regulatory and economic capital requirements, allowing a better alignment of business and risk management objectives.
-
π 2025/03/04
- To Hedge or Not to Hedge: Optimal Strategies for Stochastic Trade Flow Management
-
Philippe Bergault et al. 2503.02496v1 -
Abstract
This paper addresses the trade-off between internalisation and externalisation in the management of stochastic trade flows. We consider agents who must absorb flows and manage risk by deciding whether to warehouse it or hedge in the market, thereby incurring transaction costs and market impact. Unlike market makers, these agents cannot skew their quotes to attract offsetting flows and deter risk-increasing ones, leading to a fundamentally different problem. Within the Almgren-Chriss framework, we derive almost-closed-form solutions in the case of quadratic execution costs, while more general cases require numerical methods. In particular, we discuss the challenges posed by artificial boundary conditions when using classical grid-based numerical PDE techniques and propose reinforcement learning methods as an alternative.
-