KTH-RPL / DeFlow

[ICRA'24] DeFlow: Decoder of Scene Flow Network in Autonomous Driving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference on nuscenes dataset

Iriswwhy opened this issue · comments

Hi,
I want to use the deflow module to calculate the flow between two frames of point cloud of nuscenes dataset (2Hz, since the keyframes with annotations are at 2Hz). I have get the computed flow, but I found the result is not correct, is it because 2Hz is too slow?
The parameters used are:
voxel_size = [0.2, 0.2, 8] (actually, voxel_size 0.1 and 0.4 are tried, but the result is still not correct
grid_feature_size = [512, 512]
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
num_iters = 2 / 4

Thank you, looking forward to your reply.

The pre-trained model is trained on 10Hz Argoverse 2. If you infer on nus with 10Hz (which I tried), it's still good enough without any training.

The pre-trained model is trained on 10Hz Argoverse 2. If you infer on nus with 10Hz (which I tried), it's still good enough without any training.

Thank you very much! I will try 10Hz nuscenes data.

image
I don't know what is going wrong... The red one is pc0, the green one is pc1, and the blue one is pc0+flows. It seems that the flow is also added to the places that should be considered static. In the image above, where there are dynamic points, the flow is going foward to the right place, but it is not yet completely aligned with pc1.

I think I need your code for processing nus data to tell the reason..

Thank you, the main code is shown below.

In class Nuscenes_pt(data.Dataset):
def getitem(self, index):
info = self.nusc_infos[index] # for pc1
lidar_sd_token = self.nusc.get('sample', info['token'])['data']['LIDAR_TOP'] # for pc1

    # 获取前两帧sweep的lidar_sd_token_prev(10Hz间隔)
    prev_token = self.nusc.get('sample_data', lidar_sd_token)['prev'] 
    prev_prev_token = self.nusc.get('sample_data', prev_token)['prev'] # for pc0

    # 获取lidar_channel
    lidar_token = lidar_sd_token
    lidar_token_prev = prev_prev_token
    lidar_channel = self.nusc.get("sample_data", lidar_token)
    lidar_channel_prev = self.nusc.get("sample_data", lidar_token_prev)

    if self.version == "v1.0-trainval":
        lidar_path = info['lidar_path'][16:]
        lidar_path_prev = lidar_channel_prev['filename']
    elif self.version == "v1.0-mini":
        lidar_path = info['lidar_path'][44:]
        lidar_path_prev = lidar_channel_prev['filename']
        # lidar_path_prev = info_prev['lidar_path'][44:]
    elif self.version == "v1.0-test":
        lidar_path = info['lidar_path'][16:]
        lidar_path_prev = lidar_channel_prev['filename']
        
    points = np.fromfile(os.path.join(self.data_path, lidar_path), dtype=np.float32, count=-1).reshape([-1, 5])
    points_prev = np.fromfile(os.path.join(self.data_path, lidar_path_prev), dtype=np.float32, count=-1).reshape([-1, 5]) # N*5
     
    # t-1时刻点云转换到t时刻点云 for flow部分 
    pcd_trans_flow_prev = PCDTransformTool(points_prev[:, :3])
    # First step: transform the t-1 pointcloud to the ego vehicle frame; t-1 lidar→ t-1 ego
    cs_record_prev = self.nusc.get('calibrated_sensor', lidar_channel_prev['calibrated_sensor_token'])
    pcd_trans_flow_prev.rotate(Quaternion(cs_record_prev['rotation']).rotation_matrix)
    pcd_trans_flow_prev.translate(np.array(cs_record_prev['translation']))

    # Second step: transform from ego to the global frame at timestamp of the first frame; t-1 ego→0ego
    poserecord_prev = self.nusc.get('ego_pose', lidar_channel_prev['ego_pose_token'])
    pcd_trans_flow_prev.rotate(Quaternion(poserecord_prev['rotation']).rotation_matrix)
    pcd_trans_flow_prev.translate(np.array(poserecord_prev['translation']))

    # Third step: transform from global first frame to t pointcloud ego; 0ego→t ego
    poserecord = self.nusc.get('ego_pose', lidar_channel['ego_pose_token'])
    pcd_trans_flow_prev.translate(-np.array(poserecord['translation']))
    pcd_trans_flow_prev.rotate(Quaternion(poserecord['rotation']).rotation_matrix.T)
    
    # Fourth step: transform from ego into the lidar. t ego→t lidar
    cs_record = self.nusc.get('calibrated_sensor', lidar_channel['calibrated_sensor_token'])
    pcd_trans_flow_prev.translate(-np.array(cs_record['translation']))
    pcd_trans_flow_prev.rotate(Quaternion(cs_record['rotation']).rotation_matrix.T)
    points_prev_transtot = pcd_trans_flow_prev.pcd.T 
    
    # 记录Fego
    pose_1 = get_lidar_pose(self.nusc, lidar_channel)
    pose_2 = get_lidar_pose(self.nusc, lidar_channel_prev)
    # 计算自车运动矩阵 Fego
    Fego = np.dot(np.linalg.inv(pose_2), pose_1)

The code above is the data reading and preprocessing part, which has converted the pc0 point cloud to the pc1 coordinate system. After this part, 'points_prev_transtot' is input to deflow module as pc0, 'points' is input as pc1.

Then, I use the deflow module to align pc0 and pc1.

   #################################################### 
   # Deflow pointcloud alignmenet (Deflow Module)

    pc0 = fusion_dict['return_xyz_prev'] # torch.Size([34752, 3]) fusion_dict['return_xyz_prev'] is 'points_prev_transtot'
    pc0_before_pseudoimages, pc0_voxel_infos_lst = self.embedder(pc0) # pc0_voxel_infos_lst:torch.Size([32201, 3])
    pc1 = fusion_dict['xyz_cart'] # torch.Size([34784, 3]) fusion_dict['xyz_cart'] is 'points'
    pc1_before_pseudoimages, pc1_voxel_infos_lst = self.embedder(pc1) # pc1_voxel_infos_lst[0]['points'].shape: torch.Size([32186, 3])
    grid_flow_pseudoimage = self.backbone(pc0_before_pseudoimages, pc1_before_pseudoimages) # ([1, 64, 512, 512])
    flows = self.head(
        torch.cat((pc0_before_pseudoimages, pc1_before_pseudoimages), dim=1), 
        grid_flow_pseudoimage, pc0_voxel_infos_lst)

and the deflow parameters are set in init function:

    num_iters = 2
    voxel_size = [0.2, 0.2, 8] # 0.2 0.2
    grid_feature_size = [512, 512] # 512 512
    point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0] 
    self.embedder = DynamicEmbedder(voxel_size=voxel_size, pseudo_image_dims=grid_feature_size, point_cloud_range=point_cloud_range, feat_channels=32)
    self.backbone = FastFlow3DUNet()
    self.head = ConvGRUDecoder(num_iters = num_iters)

the functions DynamicEmbedder, FastFlow3DUNet(), ConvGRUDecoder are all from deflow/scripts/network folder

Besides, the visualization code is as follows:

    # 可视化pc0+flows 和 pc1 看分别对应点坐标
    allpoint_pc0 = o3d.geometry.PointCloud()
    allpoint_pc0.points = o3d.utility.Vector3dVector(pc0[0].cpu().numpy())
    allpoint_pc0.paint_uniform_color([1, 0, 0])  # 红色
    allpoint_pc1 = o3d.geometry.PointCloud()
    allpoint_pc1.points = o3d.utility.Vector3dVector(pc1[0].cpu().numpy())
    allpoint_pc1.paint_uniform_color([0, 1, 0])  # 绿色
    o3d.io.write_point_cloud("pcd_pc0_all.ply", allpoint_pc0)
    o3d.io.write_point_cloud("pcd_pc1_all.ply", allpoint_pc1)
    
    pc0_flow_points = pc0_voxel_infos_lst[0]['points']
    pc0_newpoints = pc0_flow_points + flows[0]
    pc0_newpoints_np = pc0_newpoints.detach().cpu().numpy()
    
    pcd_pc0 = o3d.geometry.PointCloud()
    pcd_pc0.points = o3d.utility.Vector3dVector(pc0_newpoints_np)
    pcd_pc0.paint_uniform_color([0, 0, 1])  # 蓝色
    # 保存点云和中心点到文件
    o3d.io.write_point_cloud("pcd_pc0_flow.ply", pcd_pc0)

Is it because only two frames are used so the flow is not accurate?

Is there any the smallest test script that I can run fast (one script to export h5py file so I can run model by my self)? Otherwise it's hard to get your problem here. You can reference this h5py format we need here: https://github.com/KTH-RPL/DeFlow/blob/main/dataprocess/README.md#process

Here is one of deflow_best model result in 10Hz nus. (I only downloaded mini set, the figure I show here from scene_id: scene-0103, timestamp: 1533151622551488)

image

I uploaded the check_flow vis func in the test/scene_flow.py script for you to run:

def check_flow(
data_dir: str ="/home/kin/data/av2/preprocess/sensor/mini",
flow_mode: str = "flow", # "flow", "flow_est"
start_id: int = -1,
point_size: float = 3.0,
):
dataset = HDF5Data(data_dir, vis_name=flow_mode, flow_view=True)
o3d_vis = MyVisualizer(view_file=VIEW_FILE, window_title=f"view {'ground truth flow' if flow_mode == 'flow' else f'{flow_mode} flow'}, `SPACE` start/stop")
opt = o3d_vis.vis.get_render_option()
opt.background_color = np.asarray([80/255, 90/255, 110/255])
opt.point_size = point_size
for data_id in (pbar := tqdm(range(0, len(dataset)))):
# for easy stop and jump to any id, and save same id always from 0.
if data_id < start_id and start_id != -1:
continue
data = dataset[data_id]
now_scene_id = data['scene_id']
pbar.set_description(f"id: {data_id}, scene_id: {now_scene_id}, timestamp: {data['timestamp']}")
gm0 = data['gm0']
pc0 = data['pc0'][~gm0]
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(pc0[:, :3])
pcd.paint_uniform_color([1.0, 0.0, 0.0]) # red: pc0
pc1 = data['pc1']
pcd1 = o3d.geometry.PointCloud()
pcd1.points = o3d.utility.Vector3dVector(pc1[:, :3][~data['gm1']])
pcd1.paint_uniform_color([0.0, 1.0, 0.0]) # green: pc1
pcd2 = o3d.geometry.PointCloud()
# pcd2.points = o3d.utility.Vector3dVector(pc0[:, :3] + pose_flow) # if you want to check pose_flow
pcd2.points = o3d.utility.Vector3dVector(pc0[:, :3] + data[flow_mode][~gm0])
pcd2.paint_uniform_color([0.0, 0.0, 1.0]) # blue: pc0 + flow
o3d_vis.update([pcd, pcd1, pcd2, o3d.geometry.TriangleMesh.create_coordinate_frame(size=2)])

Thanks a lot! TAT I really appreciate your time and work. I'll try to get the h5py format script and use the check_flow vis func.

I noticed that there is a ground point filtering process in the data preprocessing part (extract_av2.py), but the nus data doesn‘’t have real-valued ground height like av2 to directly filter out the ground points. How did you filter ground points when testing on nus data? Did you use the Z threshold directly (for example, set Z<=-1.5 to be the ground point)? Besides, I have some trouble on writing extract_nus.py like extract_av2.py. Can I reference your code for testing nus data?
And I may found the reason why the code I used didn't work before. I didn’t see the preprocessing code (extract_av2.py) at first. I only combined the paper and the core scripts/network code. I directly transferred the pc0 point cloud to the lidar coordinate system of pc1, and used the converted pc0 and original pc1 directly as the input of the deflow module without going through the data preprocessing part like extract_av2 (such as calculating pose_flow caused by ego-car motion and ground point filtering).

For nuscene, I directly use my own Python func linefit to do ground segmentation: https://github.com/Kin-Zhang/linefit. And I believe DeFlow paper clearly said in Section IV-A that we use only non-ground points as input here: Leveraging HD maps provided by most autonomous driv- ing datasets [32], [11], or by deploying ground segmentation techniques like [33], we can easily exclude ground points from both Pt and Pt+1.

Of course, you can directly try z<=-1.5 to filter ground for a quick test.

And for the script, as said in README: https://github.com/KTH-RPL/DeFlow/blob/main/dataprocess/README.md#dataset It needs time with works under review to publish waymo and nus data script.
Since in DeFlow's paper, we only focus on av2 data. And the scripts we are talking about now are for other private papers.

Let me know after you try inputting without a ground point into DeFlow network.

4907b7a6dd3933c581c497cfcff57d1
The above is the result after removing the ground points. The result of these two frames look better than before, but when zoomed in, the flow of the ego vehicle and the dynamic vehicle seems to be not right. I wonder if it is still a problem of nus data preprocessing.
I want to confirm about the input point cloud. Is pc0 converted to the pc1 lidar coordinate system and then input together with pc1 itself, and finally the flow generated by the network is added to the converted pc0 point cloud as the visualization result?
Besides, I also tried to debug in the original code to compare the original preprocessing process with the code I used, and to see where is different. But I found that there was a problem with the path.
In extract_av2.py:

def main(
    argo_dir: str = "../data/av2",
    output_dir: str ="../data/av2/preprocess",
    av2_type: str = "sensor",
    data_mode: str = "test",
    mask_dir: str = "../data/av2/3d_scene_flow",
    nproc: int = (multiprocessing.cpu_count() - 1)
):
    data_root_ = Path(argo_dir) / av2_type/ data_mode
    output_dir_ = Path(output_dir) / av2_type / data_mode
    output_dir_.mkdir(exist_ok=True, parents=True)
    process_logs(data_root_, output_dir_, nproc)
    create_reading_index(output_dir_)

I set the argo_dir, output_dir as above, but there is an error: [Errno 2] No such file or directory: '../data/av2/sensor/train/00a6ffc1-6ce9-3bc3-a060-6006e9893a1a/map/../data/av2/sensor/train/00a6ffc1-6ce9-3bc3-a060-6006e9893a1a/map/log_map_archive_00a6ffc1-6ce9-3bc3-a060-6006e9893a1a____PIT_city_31785.json'
It can be seen that the first half of the path is repeated twice, but I didn't find where the path was joined ​​during the debugging process. Do you know what is wrong?
Thanks a lot, looking forward to your reply~

I want to confirm about the input point cloud. Is pc0 converted to the pc1 lidar coordinate system and then input together with pc1 itself, and finally the flow generated by the network is added to the converted pc0 point cloud as the visualization result?

Yes, you can clearly see the code here:

pose_flows = []
transform_pc0s = []
for batch_id in range(batch_sizes):
selected_pc0 = batch["pc0"][batch_id]
self.timer[0][0].start("pose")
with torch.no_grad():
if 'ego_motion' in batch:
pose_0to1 = batch['ego_motion'][batch_id]
else:
pose_0to1 = cal_pose0to1(batch["pose0"][batch_id], batch["pose1"][batch_id])
self.timer[0][0].stop()
self.timer[0][1].start("transform")
# transform selected_pc0 to pc1
transform_pc0 = selected_pc0 @ pose_0to1[:3, :3].T + pose_0to1[:3, 3]
self.timer[0][1].stop()
pose_flows.append(transform_pc0 - selected_pc0)
transform_pc0s.append(transform_pc0)

I set the argo_dir, output_dir as above, but there is an error: [Errno 2] No such file or directory

  • argo_dir is the official av2 data folder
  • output_dir is the one we will output preprocess data (h5 file)

I'm not sure why you don't have the file but I checked the file existed I downloaded it from the official av2 website (could you check if you download the data correctly):

image

Oh, I found that in class ArgoverseStaticMap:
vector_data_json_path = log_map_dirpath / vector_data_fname
the log_map_dirpath and vector_data_fname both have "/data/av2/sensor/train/00a6ffc1-6ce9-3bc3-a060-6006e9893a1a/map/", really weird...
Thanks a lot!

Could you try the absolute path with full path, I tried the code and run again just now and it runs successfully, I cannot reproduce your result.

python dataprocess/extract_av2.py --av2_type sensor --data_mode train --argo_dir /home/kin/data/av2 --output_dir /home/kin/data/av2/preprocess

Thanks a lot! Using the absolute path with full path is right!

I have tried on av2 dataset, and the generated flow is right (have run the extract_av2.py, 3_vis.py and scene_flow.py). But when I tried on nus data, the flow is not correct. I think maybe is the problem of the input point cloud. I found the corresponding two frames of point cloud near scene_id: scene-0103, timestamp: 1533151622551488, which is timestamp: 1533151622448916 as pc0 and timestamp: 1533151622551488 as pc1.
Since I cannot write extract_nus.py in a short time (still a rookie), for quick test, I directly put the two frames of point cloud in deflow.py and got the flow. The test code added in deflow.py is:

        # test nus
        ply_file_path = "/opt/data/private/DeFlow/tests/pcd_pc0_all.ply"
        pc0 = o3d.io.read_point_cloud(ply_file_path)
        points = np.asarray(pc0.points)
        point_pc0s = torch.tensor(points, dtype=torch.float32)
        point_pc0s = point_pc0s.unsqueeze(0)
        pc0_before_pseudoimages, pc0_voxel_infos_lst = self.embedder(point_pc0s)

        ply_file_path = "/opt/data/private/DeFlow/tests/pcd_pc1_all.ply"
        pc1 = o3d.io.read_point_cloud(ply_file_path)
        points = np.asarray(pc1.points)
        point_pc1s = torch.tensor(points, dtype=torch.float32)
        point_pc1s  = point_pc1s.unsqueeze(0)
        pc1_before_pseudoimages, pc1_voxel_infos_lst = self.embedder(point_pc1s)
        
        grid_flow_pseudoimage = self.backbone(pc0_before_pseudoimages,
                                            pc1_before_pseudoimages)

        device = grid_flow_pseudoimage.device # cpu add----- this part is for device unification
        pc0_before_pseudoimages = pc0_before_pseudoimages.to(device)
        pc1_before_pseudoimages = pc1_before_pseudoimages.to(device)
        
        flows = self.head(
            torch.cat((pc0_before_pseudoimages, pc1_before_pseudoimages),
                    dim=1), grid_flow_pseudoimage, pc0_voxel_infos_lst)
        
        pc0_valid_point_idxes = [e["point_idxes"] for e in pc0_voxel_infos_lst]
        pcd3 = o3d.geometry.PointCloud()
        pcd3.points = o3d.utility.Vector3dVector((point_pc0s[0,:,:][pc0_valid_point_idxes[0]]+flows[0]))
        # pcd2.points = o3d.utility.Vector3dVector(pc0[:, :3] + data[flow_mode][~gm0])
        pcd3.paint_uniform_color([0.0, 0.0, 1.0]) # blue: pc0 + flow
        o3d.io.write_point_cloud("nus_result.ply", pcd3)

In this way, the result after added flows is almost the same as pc0. But the result you got is correct. Do you know what causes the flow to be almost the same as pc0? Attached files are the pc0 and pc1 files I got after processing and input into the deflow module. Are they the same as the ones you input into the network?
pc0 and pc1.zip

The result of I got for timestamp: 1533151622551488 using the way mentioned above is shown below:
1720588479651
It can be seen that the flow for the car in the yellow box looks correct, but the flow in other places is very small, which makes the result in blue basically overlap with pc0 in red.

AHa, Thanks for providing the detail of this, I knew the reason now:

It's because of the flow we predict is deltaF. (check the eq. 2 in our paper) and I also attached code here for you to aware that we made a transform based on the pose1 and pose2 you provided.

image

pose_flows = []
transform_pc0s = []
for batch_id in range(batch_sizes):
selected_pc0 = batch["pc0"][batch_id]
self.timer[0][0].start("pose")
with torch.no_grad():
if 'ego_motion' in batch:
pose_0to1 = batch['ego_motion'][batch_id]
else:
pose_0to1 = cal_pose0to1(batch["pose0"][batch_id], batch["pose1"][batch_id])
self.timer[0][0].stop()
self.timer[0][1].start("transform")
# transform selected_pc0 to pc1
transform_pc0 = selected_pc0 @ pose_0to1[:3, :3].T + pose_0to1[:3, 3]
self.timer[0][1].stop()
pose_flows.append(transform_pc0 - selected_pc0)
transform_pc0s.append(transform_pc0)

By the way, I update the linefit (ground segmentation) config for nuscene if you want to use:

Let me know if you have more questions.

Thanks for providing the linefit config for nuscene~
I added the pose_flows to the result, that is, transform_pc0 + flows(from the network) + pose_flows is the final visualization reuslt. (Does this match your idea?) But I found the final result is still different with yours.
image
Can I have your pose0to1 matrix between the two frames mentioned previously (timestamp: 1533151622448916 as pc0 and timestamp: 1533151622551488 as pc1) ? My pose_0to1 is:
image
The translation in Y in my pose_0to1 is 0.85m, but the (transform_pc0 + flows from network) and the transform_pc0 taht I got have more than 1m in tranlsation in Y. So I wonder if there is something wrong with my calculated of pose_0to1.

Hi, I checked my ego_pose with pose0 and pose1 for you to check:

image

And the transform_pc0 means pc0 -> pc1 which the network estimate \delta F only as eq.2 said. If you save flow from 3_vis.py script. The final flow in the h5 file: save output+pose_flow = final_flow inside the key, attach code here:

DeFlow/scripts/pl_model.py

Lines 249 to 266 in a71c9ef

if 'pc0_valid_point_idxes' in res_dict:
valid_from_pc2res = res_dict['pc0_valid_point_idxes']
# flow in the original pc0 coordinate
pred_flow = pose_flow[~batch['gm0']].clone()
pred_flow[valid_from_pc2res] = pose_flow[~batch['gm0']][valid_from_pc2res] + res_dict['flow']
final_flow[~batch['gm0']] = pred_flow
else:
final_flow[~batch['gm0']] = res_dict['flow'] + pose_flow[~batch['gm0']]
# write final_flow into the dataset.
key = str(batch['timestamp'])
scene_id = batch['scene_id']
with h5py.File(os.path.join(self.dataset_path, f'{scene_id}.h5'), 'r+') as f:
if self.vis_name in f[key]:
del f[key][self.vis_name]
f[key].create_dataset(self.vis_name, data=final_flow.cpu().detach().numpy().astype(np.float32))

If you directly get the output from network output as follows, then transform_pc0 + flows(from the network) + pose_flows is the final visualization, yes.

model_res = {
"flow": flows,
'pose_flow': pose_flows,