RC Car Autonomous Navigation β€” SAC (Camera + LiDAR)

A Soft Actor-Critic (SAC) agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both camera images and LiDAR sensor data as observations. The agent learns to reach target positions while avoiding obstacles.


Model Description

This model uses a MultiInputPolicy with a hybrid perception backbone:

  • Visual stream β€” RGB camera frames processed by a CNN (NatureCNN)
  • Sensor stream β€” LiDAR point cloud + navigation state processed by an MLP

Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning.

Property Value
Algorithm Soft Actor-Critic (SAC)
Policy MultiInputPolicy
Observation Dict β€” image (64Γ—64Γ—3) + sensor vector (184,)
Action Space Box([-1, -1], [1, 1]) β€” speed & steering
Simulator Gazebo (Ignition/Harmonic) via ROS 2
Framework Stable-Baselines3

Environments

Two training environments are available:

RcCarTargetEnv

The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles.

RcCarComplexEnv

Same goal-reaching task but with 6 randomly placed box obstacles that are reshuffled every episode, requiring active collision avoidance.


Observation Space

spaces.Dict({
    "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8),
    "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32)
})

The sensor vector contains:

  • [0:180] β€” Normalised LiDAR ranges (180 beams, max range 10 m)
  • [180] β€” Normalised linear speed
  • [181] β€” Normalised steering angle
  • [182] β€” Normalised distance to target (clipped at 10 m)
  • [183] β€” Normalised relative angle to target

Action Space

spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32)
Index Meaning Scale
action[0] Linear speed Γ— 1.0 m/s
action[1] Steering angle Γ— 0.6 rad/s

Steering is smoothed with a low-pass filter: steer = 0.6 Γ— prev + 0.4 Γ— target.


Reward Function

RcCarTargetEnv

Event Reward
Progress toward target Ξ”distance Γ— 40.0
Reached target (< 0.6 m) +100.0
Collision (LiDAR < 0.22 m) βˆ’50.0
Per-step penalty βˆ’0.05

RcCarComplexEnv

Event Reward
Progress toward target Ξ”distance Γ— 40.0
Forward speed bonus (on progress) +speed Γ— 0.5
Proximity warning (LiDAR < 0.5 m) βˆ’0.5
Collision βˆ’50.0
Reached target +100.0
Per-step penalty βˆ’0.1

Training Setup

model = SAC(
    "MultiInputPolicy",
    env,
    learning_rate=3e-4,
    buffer_size=50000,
    policy_kwargs=dict(
        net_arch=dict(pi=[256, 256], qf=[256, 256])
    ),
    device="auto"
)
  • Action repeat: 4 steps per agent decision
  • Frame stacking: configurable via Hydra config (n_stack)
  • Vectorised env: DummyVecEnv + VecFrameStack (channels_order="last")
  • Experiment tracking: Weights & Biases (W&B) with SB3 callback

Hardware & Software Requirements

Component Requirement
ROS 2 Humble or newer
Gazebo Ignition Fortress / Harmonic
Python 3.10+
PyTorch 2.0+
stable-baselines3 β‰₯ 2.0
gymnasium β‰₯ 0.29
opencv-python any recent
cv_bridge ROS 2 package

How to Use

1. Install dependencies

pip install stable-baselines3 wandb hydra-core gymnasium opencv-python

2. Launch the simulator

ros2 launch my_bot_pkg sim.launch.py

3. Run training

python train.py experiment.mode=target experiment.total_timesteps=500000

4. Load and run inference

from stable_baselines3 import SAC
from rc_car_envs_camera import RcCarTargetEnv

env = RcCarTargetEnv()
model = SAC.load("sac_target_camera_final", env=env)

obs, _ = env.reset()
while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()

Project Structure

β”œβ”€β”€ rc_car_envs_camera.py   # Gym environments (Base, Target, Complex)
β”œβ”€β”€ train.py                # Hydra-based training entry point
β”œβ”€β”€ configs/
β”‚   └── config.yaml         # Hydra config (mode, timesteps, wandb, etc.)
└── models/                 # Saved checkpoints (W&B)

Limitations & Known Issues

  • Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently.
  • DummyVecEnv runs a single environment β€” parallelisation would require SubprocVecEnv with careful ROS node naming.
  • Camera latency under heavy load may cause the scan_received / cam_received wait loop to time out, potentially delivering stale observations.
  • The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries.

Citation

If you use this environment or training code in your research, please cite:

@misc{rccar_sac_nav,
  title  = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)},
  year   = {2025},
  url    = {https://huggingface.co/Hajorda/SAC_Complex_Camera}
}

License

MIT License

Downloads last month
32
Video Preview
loading