| ---
|
| library_name: stable-baselines3
|
| tags:
|
| - reinforcement-learning
|
| - finance
|
| - stock-trading
|
| - deep-reinforcement-learning
|
| - dqn
|
| - ppo
|
| - a2c
|
| model-index:
|
| - name: RL-Trading-Agents
|
| results:
|
| - task:
|
| type: reinforcement-learning
|
| name: Stock Trading
|
| metrics:
|
| - type: sharpe_ratio
|
| value: Variable
|
| - type: total_return
|
| value: Variable
|
| ---
|
|
|
| # ๐ค Multi-Agent Reinforcement Learning Trading System
|
|
|
| This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL).
|
|
|
| ## ๐ง Models
|
|
|
| The following algorithms were used:
|
| 1. **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces.
|
| 2. **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability.
|
| 3. **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method.
|
| 4. **Ensemble**: A meta-voter that takes the majority decision from the above three.
|
|
|
| ## ๐๏ธ Training Data
|
|
|
| The models were trained on technical indicators derived from historical daily price data (2018-2024):
|
| * **Returns**: Daily percentage change.
|
| * **RSI (14)**: Relative Strength Index.
|
| * **MACD**: Moving Average Convergence Divergence.
|
| * **Bollinger Bands**: Volatility measure.
|
| * **Volume Ratio**: Relative volume intensity.
|
| * **Market Regime**: Bull/Bear trend classification.
|
|
|
| ## ๐ Related Data
|
|
|
| * **Dataset Repository**: [AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data](https://huggingface.co/AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data)
|
| * **GitHub Repository**: [ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data](https://github.com/ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data)
|
|
|
|
|
| ## ๐ฎ Environment (`TradingEnv`)
|
|
|
| * **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`.
|
| * **Observation Space**: Box(10,) - Normalized technical features + portfolio state.
|
| * **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties.
|
|
|
| ## ๐ Usage
|
|
|
| ```python
|
| import gymnasium as gym
|
| from stable_baselines3 import PPO
|
|
|
| # Load the environment (custom wrapper required)
|
| # env = TradingEnv(df)
|
|
|
| # Load model
|
| model = PPO.load("ppo_AAPL.zip")
|
|
|
| # Predict
|
| action, _ = model.predict(obs, deterministic=True)
|
| ```
|
|
|
| ## ๐ Performance
|
|
|
| Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent.
|
|
|
| ## ๐ ๏ธ Credits
|
|
|
| Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project.
|
|
|