RC Car Autonomous Navigation — SAC (Camera + LiDAR)

A Soft Actor-Critic (SAC) agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both camera images and LiDAR sensor data as observations. The agent learns to reach target positions while avoiding obstacles.

Model Description

This model uses a MultiInputPolicy with a hybrid perception backbone:

Visual stream — RGB camera frames processed by a CNN (NatureCNN)
Sensor stream — LiDAR point cloud + navigation state processed by an MLP

Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning.

Property	Value
Algorithm	Soft Actor-Critic (SAC)
Policy	`MultiInputPolicy`
Observation	`Dict` — image `(64×64×3)` + sensor vector `(184,)`
Action Space	`Box([-1, -1], [1, 1])` — speed & steering
Simulator	Gazebo (Ignition/Harmonic) via ROS 2
Framework	Stable-Baselines3

Environments

Two training environments are available:

`RcCarTargetEnv`

The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles.

`RcCarComplexEnv`

Same goal-reaching task but with 6 randomly placed box obstacles that are reshuffled every episode, requiring active collision avoidance.

Observation Space

spaces.Dict({
    "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8),
    "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32)
})

The sensor vector contains:

[0:180] — Normalised LiDAR ranges (180 beams, max range 10 m)
[180] — Normalised linear speed
[181] — Normalised steering angle
[182] — Normalised distance to target (clipped at 10 m)
[183] — Normalised relative angle to target

Action Space

spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32)

Index	Meaning	Scale
`action[0]`	Linear speed	× 1.0 m/s
`action[1]`	Steering angle	× 0.6 rad/s

Steering is smoothed with a low-pass filter: steer = 0.6 × prev + 0.4 × target.

Reward Function

`RcCarTargetEnv`

Event	Reward
Progress toward target	`Δdistance × 40.0`
Reached target (< 0.6 m)	`+100.0`
Collision (LiDAR < 0.22 m)	`−50.0`
Per-step penalty	`−0.05`

`RcCarComplexEnv`

Event	Reward
Progress toward target	`Δdistance × 40.0`
Forward speed bonus (on progress)	`+speed × 0.5`
Proximity warning (LiDAR < 0.5 m)	`−0.5`
Collision	`−50.0`
Reached target	`+100.0`
Per-step penalty	`−0.1`

Training Setup

model = SAC(
    "MultiInputPolicy",
    env,
    learning_rate=3e-4,
    buffer_size=50000,
    policy_kwargs=dict(
        net_arch=dict(pi=[256, 256], qf=[256, 256])
    ),
    device="auto"
)

Action repeat: 4 steps per agent decision
Frame stacking: configurable via Hydra config (n_stack)
Vectorised env: DummyVecEnv + VecFrameStack (channels_order="last")
Experiment tracking: Weights & Biases (W&B) with SB3 callback

Hardware & Software Requirements

Component	Requirement
ROS 2	Humble or newer
Gazebo	Ignition Fortress / Harmonic
Python	3.10+
PyTorch	2.0+
stable-baselines3	≥ 2.0
gymnasium	≥ 0.29
opencv-python	any recent
cv_bridge	ROS 2 package

How to Use

1. Install dependencies

pip install stable-baselines3 wandb hydra-core gymnasium opencv-python

2. Launch the simulator

ros2 launch my_bot_pkg sim.launch.py

3. Run training

python train.py experiment.mode=target experiment.total_timesteps=500000

4. Load and run inference

from stable_baselines3 import SAC
from rc_car_envs_camera import RcCarTargetEnv

env = RcCarTargetEnv()
model = SAC.load("sac_target_camera_final", env=env)

obs, _ = env.reset()
while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()

Project Structure

├── rc_car_envs_camera.py   # Gym environments (Base, Target, Complex)
├── train.py                # Hydra-based training entry point
├── configs/
│   └── config.yaml         # Hydra config (mode, timesteps, wandb, etc.)
└── models/                 # Saved checkpoints (W&B)

Limitations & Known Issues

Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently.
DummyVecEnv runs a single environment — parallelisation would require SubprocVecEnv with careful ROS node naming.
Camera latency under heavy load may cause the scan_received / cam_received wait loop to time out, potentially delivering stale observations.
The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries.

Citation

If you use this environment or training code in your research, please cite:

@misc{rccar_sac_nav,
  title  = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)},
  year   = {2025},
  url    = {https://huggingface.co/Hajorda/SAC_Complex_Camera}
}

License

MIT License

Downloads last month: 32

Video Preview

Reinforcement Learning