Armandpl / skyline

code I wrote to win the 2023 Renault Digital 1/10th roborace

Home Page:https://twitter.com/armand_dpl/status/1670922434445291521

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train nn for trajectory planning

Armandpl opened this issue · comments

  • setup new model
    • speed should be an input to the last layer
    • predicted traj should include x,y coords as well as speed.
      • think about how to normalize. rl script should output real values I think, e.g deg for steering and m/s. based on the model
    • add tanh or sigmoid or smth to help
  • write code to load the trajectories
  • write code to project trajectories on the images
  • add augmentations, start with basic ones
  • investigate other models. e.g efficientnet
    • maybe there are lighter and faster models
    • probably stick with torchvision, no need for smth too complicated
  • after setting up the new cam position and choosing a crop, record video on the local track
    • use nvargus, probably easier and cleaner than python script
    • this will be our test set to viz the model prediction
  • at the end (or even during?) of training, plot model predictions on this video
    • choose where to store camera params, maybe this should be the output of the blender script
      • maybe the blender script should output those where it outputs the images
  • fine tune stable diffusion to make sim images look like real images using control net
    • add augs in blender
    • upload dataset on wandb for reproducibility
    • add classical augs
    • train one nn with/without SD augs/maybe one with only augmented images
  • for each trajectory, add small rotation + offset to the camera
  • check if there are other NNs that could work on the jetson
  • check if we could add a GRU and still get decent speed
  • make the max speed of the car in sim a speed we can actually safely use for testing in real life
  • add tanh to the output of the neural net
  • re-project real images to be as if the camera is in the new position to get a test set just match sim

trajectories depend on car speed which the nn can't figure out from a single image. We could add a GRU or feed two frames to the model.
A less costly approach (in terms of compute) would be to feed the speed at t-1 in sim and the measured speed in real life to the last FC layer of the network

We could then viz predicted trajectories for a range of speeds and compare losses with and without the speed info

put model back on cpu after training, see if it fixes tensor rt not knowing about mps

Act as an excellent engineer, the type that can write haskell and cuda kernels but also Python, the type that manages to write clear, readable code and communicate about it. Also never forget I believe in you <3.

I need you to write me a torch.util.data.Dataset class to load my custom dataset.

  • The dataset is a set of images stored in root_dir/images, they are number from 0000.png to 10000.png when len(dataset) == 10_000.
  • These images are accompanied by labels stored in root_dir/rl_trajectories.txt which is a numpy array of shape (10000, 7). Each row is (car.pos_x, car.pos_y, car.yaw, steering_command, speed_command, end_of_sequence). The car position is stored in a global frame
  • For each image I'd like to know the future trajectory (x, y positions) for the N next steps, in the car frame
    • this mean you'll need to split the trajectory list using the end_of_sequence boolean flag, to make sure you don't return the trajectory coordinates of the next sequence
    • you will also need to rotate the trajectory coordinates using the car yaw at the current step

Please ask any clarifying question before generating the code

I need you to write code to project a trajectory in 3d space (a set of points) relative to a camera onto the image. Use the following function to project the points:

def project_points(point_3d: torch.Tensor, camera_matrix: torch.Tensor) -> torch.Tensor:
    r"""Project a 3d point onto the 2d camera plane.

    Args:
        point3d: tensor containing the 3d points to be projected
            to the camera plane. The shape of the tensor can be :math:`(*, 3)`.
        camera_matrix: tensor containing the intrinsics camera
            matrix. The tensor shape must be :math:`(*, 3, 3)`.

    Returns:
        tensor of (u, v) cam coordinates with shape :math:`(*, 2)`.

    Example:
        >>> _ = torch.manual_seed(0)
        >>> X = torch.rand(1, 3)
        >>> K = torch.eye(3)[None]
        >>> project_points(X, K)
        tensor([[5.6088, 8.6827]])
    """
    # projection eq. [u, v, w]' = K * [x y z 1]'
    # u = fx * X / Z + cx
    # v = fy * Y / Z + cy
    # project back using depth dividing in a safe way
    xy_coords: torch.Tensor = convert_points_from_homogeneous(point_3d)
    return denormalize_points_with_intrinsics(xy_coords, camera_matrix)

Please ask any clarifying question then finish writing the following code:

# plot trajectory on the images
cam_offset = (0.105, 0, 0.170) # x, y, z in meter from center of mass. x is forward
cam_rotation = (0, 12, 0) # roll, pitch, yaw in deg
cam_focal = 0.87 # mm

sample = ds[0]
img = ds[0]["image"] # torch tensor
traj = ds[0]["trajectory"] # (n, 2) x, y coordinates relative to car center of mass

# plot the trajectory on the image

# 1. transform the points from the car frame to the camera frame using cam_offset and cam_rotation
# WRITE CODE here

# 2. project the 3d points onto the image and plot them
# WRITE CODE here

would be nice to do a sweep to benchmark inference speed of different models, to see what we could use beyond resnet18
W B Chart 5_25_2023, 2 47 13 PM
looks like resnet18 is still a good choice, though:

  • i should try the other networks available in torchvision 0.11
  • maybe i don't need them to be implemented in the torchvision version I use? just export them to onnx then convert to trt?

Ok so traj prediction in sim seems alright but seems bad on real images. I think one issue is I didn't sample enough "recovery trajectories", trajectories that go from a bad state to the optimal trajectory. One reason for that is that I terminate the episode if the car has even one wheel outside the track, making it impossible to recover from harder cases as it would require lightly crossing the lines. However, I can't allow wheels outside the track as is because sometimes there are obstacles outside the track.

  • configure env with hydra
    • framerate, track files
  • save model to wandb
  • add a train.yaml w/ few params
  • allow n wheels outside the track
    • still enforce center of mass inside
    • add optional obstacle file, render the obstacles
    • add obstacle lidar
    • add max distance to lidar, maybe normalize??
  • we need the obstacles to be visible in blender:
    • add cones, from arc centers when parsing dxf did it manually for now, could parse it from obstacles mayyybe?
    • add augmentation: hide cones sometimes?
  • allow fixed speed
  • modify gen_traj to load model from artifact, instantiate from logged config and log traj to wandb
    • also modify run_model to do the same. override render_mode

Act as an excellent engineer and never forget I believe in you.

I am writing a Gym wrapper for my Reinforcement Learning env. Here is a draft of the code:

class RescaleWrapper(gym.Wrapper):
    """Rescale observation and action space between -1 and 1"""

    def __init__(self, env: gym.Env):
        super().__init__(env)

        self.observation_space = Box(low=np.zeros_like(self.env.observation_space.low) - 1, high=np.zeros_like(self.env.observation_space.low) + 1)
        self.action_space = Box(low=np.zeros_like(self.env.action_space.low) - 1, high=np.zeros_like(self.env.action_space.low) + 1)

    def step(self, action):
        # Clip and rescale action, using self.env.action_space.low/high
        # CODE HERE
        obs, reward, terminated, truncated, info = self.env.step(action)
        # Clip and rescale observation, using self.env.observation_space.how/high
        # CODE HERE
        return obs, reward, terminated, truncated, info

I want it to clip and rescale the observations and actions.
e.g if obs = [1, 15] and min obs = [0, 0] max obs = [1, 10], obs should be rescaled to [1, 1]. Please ask any clarifying questions and finish writing the code (replace # CODE HERE)

  • warping the traj in the viz to account for the fisheye shouldn't be too hard and possibly make the viz better, do it
  • don't mirror the traj for viz, do the correct reference frame change for the projection