Generating HDF5 detections from custom dataset or bottom-up-attention TSV

Question

Generating HDF5 detections from custom dataset or bottom-up-attention TSV

SandroJijavadze opened this issue 3 years ago · comments

I have a custom dataset,

I have generated the detections TSV using : https://github.com/airsplay/py-bottom-up-attention
But the model requires HDF5.

TSV has these per each example:

{
   'image_id': image_id,
   'image_h': np.size(im, 0),
   'image_w': np.size(im, 1),
   'num_boxes' : len(keep_boxes),
   'boxes': base64.b64encode(cls_boxes[keep_boxes]),
   'features': base64.b64encode(pool5[keep_boxes])
}

When examining the coco dataset examples I see the following for example:

>>> dts["35368_boxes"]
<HDF5 dataset "35368_boxes": shape (37, 4), type "<f4">
>>> dts["35368_features"]
<HDF5 dataset "35368_features": shape (37, 2048), type "<f4">
>>> dts["35368_cls_prob"]
<HDF5 dataset "35368_cls_prob": shape (37, 1601), type "<f4">

>>> dts["35368_boxes"][36]
array([349.57147, 154.07967, 420.0327 , 408.64462], dtype=float32)

I'll try to figure out how to convert my TSV to required HDF5 myself from the code but guide would be appreciated.

Thank you.

hcwei · Answer 1 · Mon May 10 2021 10:15:48 GMT+0800 (China Standard Time)

I have a custom dataset,

I have generated the detections TSV using : https://github.com/airsplay/py-bottom-up-attention
But the model requires HDF5.

TSV has these per each example:
{
   'image_id': image_id,
   'image_h': np.size(im, 0),
   'image_w': np.size(im, 1),
   'num_boxes' : len(keep_boxes),
   'boxes': base64.b64encode(cls_boxes[keep_boxes]),
   'features': base64.b64encode(pool5[keep_boxes])
}  
When examining the coco dataset examples I see the following for example:
>>> dts["35368_boxes"]
<HDF5 dataset "35368_boxes": shape (37, 4), type "<f4">
>>> dts["35368_features"]
<HDF5 dataset "35368_features": shape (37, 2048), type "<f4">
>>> dts["35368_cls_prob"]
<HDF5 dataset "35368_cls_prob": shape (37, 1601), type "<f4">
>>> dts["35368_boxes"][36]
array([349.57147, 154.07967, 420.0327 , 408.64462], dtype=float32)
I'll try to figure out how to convert my TSV to required HDF5 myself from the code but guide would be appreciated.

Thank you.

Do you solve this problem?

buda · Answer 2 · Mon May 10 2021 17:31:14 GMT+0800 (China Standard Time)

@whongchen
No unfortunately,
I am going to try figure out the process myself this week. Will give update if I do.
Please comment if you find anything useful.

Eugenio Tonanzi · Answer 3 · Thu May 13 2021 21:38:29 GMT+0800 (China Standard Time)

I'm working on this either, still haven't done it myself but I think you just need to convert the tsv into a hdf5 file, it has nothing to do with M2T or py-bottom-up-attention code.
You read your tsv using csv or pandas and then you can use libraries like h5py to store and save your data in hdf5 format using names "_boxes", "_features" and "_cls_prob", in which you put data relative to bounding box corners, feature vectors and class probabilities, as specified in M2T repo readme file.
I believe it would be straightforward, don't know about how much time it would take.
Let me know if you manage to do it

Matteo Stefanini · Answer 4 · Thu May 13 2021 22:14:22 GMT+0800 (China Standard Time)

Hi everyone,
thank you @eugeniotonanzi for your answer, that should exactly solve the problem.
Once you have a hdf5 file for your custom dataset with the same format, the model should work as expected.
Let us know if you have any other issues.
Best,
Matteo

buda · Answer 5 · Wed May 26 2021 18:13:22 GMT+0800 (China Standard Time)

That solved it, closing this issue.
Thank you.

hwbhwbgao · Answer 6 · Sat Aug 14 2021 19:54:53 GMT+0800 (China Standard Time)

That solved it, closing this issue.
Thank you.
Have you solved this problem, can it be convenient to release the relevant code, thank you！

ksz-creat · Answer 7 · Fri Sep 17 2021 11:29:30 GMT+0800 (China Standard Time)

That solved it, closing this issue.
Thank you.
Hi, have you solved this problem, can it be convenient to release the relevant code, thank you very much

MikeMACintosh · Answer 8 · Tue Jan 25 2022 22:39:50 GMT+0800 (China Standard Time)

@eugeniotonanzi thanks for your advice, I'm working with it right now, but maybe you've already implemented it?

buda · Answer 9 · Tue Jan 25 2022 23:15:48 GMT+0800 (China Standard Time)

@hwbhwbgao @ksz-creat @MikeMACintosh
I didn't see your replies.
Unfortunately I can't share the whole code, but I will share relevant bits
I modified 2 methods in https://github.com/peteanderson80/bottom-up-attention

def get_detections_from_im(net, im_file, image_id, conf_thresh=0.2):
    im = cv2.imread(im_file)
    scores, boxes, attr_scores, rel_scores = im_detect(net, im)

    # Keep the original boxes, don't worry about the regresssion bbox outputs
    rois = net.blobs['rois'].data.copy()
    # unscale back to raw image space
    blobs, im_scales = _get_blobs(im, None)

    cls_boxes = rois[:, 1:5] / im_scales[0]
    cls_prob = net.blobs['cls_prob'].data
    pool5 = net.blobs['pool5_flat'].data

    # Keep only the best detections
    max_conf = np.zeros((rois.shape[0]))
    for cls_ind in range(1,cls_prob.shape[1]):
        cls_scores = scores[:, cls_ind]
        dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])).astype(np.float32)
        keep = np.array(nms(dets, cfg.TEST.NMS))
        max_conf[keep] = np.where(cls_scores[keep] > max_conf[keep], cls_scores[keep], max_conf[keep])

    keep_boxes = np.where(max_conf >= conf_thresh)[0]
    if len(keep_boxes) < MIN_BOXES:
        keep_boxes = np.argsort(max_conf)[::-1][:MIN_BOXES]
    elif len(keep_boxes) > MAX_BOXES:
        keep_boxes = np.argsort(max_conf)[::-1][:MAX_BOXES]
    featureid = "".join([s.lstrip("0") for s in image_id.split() if s.isdigit()])
    num_boxes = len(keep_boxes)
    cls_boxes = cls_boxes[keep_boxes].reshape((num_boxes, 4))
    cls_features = pool5[keep_boxes].reshape(num_boxes, 2048)
    cls_prob = cls_prob[keep_boxes].reshape(num_boxes, 1601)

    return (featureid + "_boxes", cls_boxes), (featureid + "_features", cls_features), (featureid + "_cls_prob", cls_prob)

https://github.com/peteanderson80/bottom-up-attention/blob/master/tools/generate_tsv.py#L140

def generate_hdf5(gpu_id, prototxt, weights, image_ids, outfile):
    wanted_ids = set([int(image_id[1]) for image_id in image_ids])
    found_ids = set()

    missing = wanted_ids - found_ids
    if len(missing) == 0:
        print 'GPU {:d}: already completed {:d}'.format(gpu_id, len(image_ids))
    else:
        print 'GPU {:d}: missing {:d}/{:d}'.format(gpu_id, len(missing), len(image_ids))
    if len(missing) > 0:
        caffe.set_mode_gpu()
        caffe.set_device(gpu_id)
        net = caffe.Net(prototxt, caffe.TEST, weights=weights)
        with h5py.File(outfile, 'w') as h5pyfile:
           # writer = csv.DictWriter(tsvfile, delimiter = '\t', fieldnames = FIELDNAMES)
            _t = {'misc' : Timer()}
            count = 0
            for im_file,image_id in image_ids:
                if int(image_id) in missing:
                    _t['misc'].tic()
                    boxes, features, probabilities = get_detections_from_im(net, im_file, image_id)
                    h5pyfile.create_dataset(boxes[0], data=boxes[1])
                    h5pyfile.create_dataset(features[0], data=features[1])
                    h5pyfile.create_dataset(probabilities[0], data=probabilities[1])
                    if (count % 100) == 0:
                        print 'GPU {:d}: {:d}/{:d} {:.3f}s (projected finish: {:.2f} hours)' \
                              .format(gpu_id, count+1, len(missing), _t['misc'].average_time,
                              _t['misc'].average_time*(len(missing)-count)/3600)
                    count += 1

Also depending on how have you arranged your data you will need to modify "load_image_ids" method.

You can use this docker image for environment:
https://hub.docker.com/r/airsplay/bottom-up-attention

hwbhwbgao · Answer 10 · Thu Mar 03 2022 09:26:29 GMT+0800 (China Standard Time)

Thank you very much！

Dufresue · Answer 11 · Tue Nov 21 2023 10:40:29 GMT+0800 (China Standard Time)

@hwbhwbgao @ksz-creat @MikeMACintosh I didn't see your replies. Unfortunately I can't share the whole code, but I will share relevant bits I modified 2 methods in https://github.com/peteanderson80/bottom-up-attention

def get_detections_from_im(net, im_file, image_id, conf_thresh=0.2):
    im = cv2.imread(im_file)
    scores, boxes, attr_scores, rel_scores = im_detect(net, im)

    # Keep the original boxes, don't worry about the regresssion bbox outputs
    rois = net.blobs['rois'].data.copy()
    # unscale back to raw image space
    blobs, im_scales = _get_blobs(im, None)

    cls_boxes = rois[:, 1:5] / im_scales[0]
    cls_prob = net.blobs['cls_prob'].data
    pool5 = net.blobs['pool5_flat'].data

    # Keep only the best detections
    max_conf = np.zeros((rois.shape[0]))
    for cls_ind in range(1,cls_prob.shape[1]):
        cls_scores = scores[:, cls_ind]
        dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])).astype(np.float32)
        keep = np.array(nms(dets, cfg.TEST.NMS))
        max_conf[keep] = np.where(cls_scores[keep] > max_conf[keep], cls_scores[keep], max_conf[keep])

    keep_boxes = np.where(max_conf >= conf_thresh)[0]
    if len(keep_boxes) < MIN_BOXES:
        keep_boxes = np.argsort(max_conf)[::-1][:MIN_BOXES]
    elif len(keep_boxes) > MAX_BOXES:
        keep_boxes = np.argsort(max_conf)[::-1][:MAX_BOXES]
    featureid = "".join([s.lstrip("0") for s in image_id.split() if s.isdigit()])
    num_boxes = len(keep_boxes)
    cls_boxes = cls_boxes[keep_boxes].reshape((num_boxes, 4))
    cls_features = pool5[keep_boxes].reshape(num_boxes, 2048)
    cls_prob = cls_prob[keep_boxes].reshape(num_boxes, 1601)

    return (featureid + "_boxes", cls_boxes), (featureid + "_features", cls_features), (featureid + "_cls_prob", cls_prob)

https://github.com/peteanderson80/bottom-up-attention/blob/master/tools/generate_tsv.py#L140

def generate_hdf5(gpu_id, prototxt, weights, image_ids, outfile):
    wanted_ids = set([int(image_id[1]) for image_id in image_ids])
    found_ids = set()

    missing = wanted_ids - found_ids
    if len(missing) == 0:
        print 'GPU {:d}: already completed {:d}'.format(gpu_id, len(image_ids))
    else:
        print 'GPU {:d}: missing {:d}/{:d}'.format(gpu_id, len(missing), len(image_ids))
    if len(missing) > 0:
        caffe.set_mode_gpu()
        caffe.set_device(gpu_id)
        net = caffe.Net(prototxt, caffe.TEST, weights=weights)
        with h5py.File(outfile, 'w') as h5pyfile:
           # writer = csv.DictWriter(tsvfile, delimiter = '\t', fieldnames = FIELDNAMES)
            _t = {'misc' : Timer()}
            count = 0
            for im_file,image_id in image_ids:
                if int(image_id) in missing:
                    _t['misc'].tic()
                    boxes, features, probabilities = get_detections_from_im(net, im_file, image_id)
                    h5pyfile.create_dataset(boxes[0], data=boxes[1])
                    h5pyfile.create_dataset(features[0], data=features[1])
                    h5pyfile.create_dataset(probabilities[0], data=probabilities[1])
                    if (count % 100) == 0:
                        print 'GPU {:d}: {:d}/{:d} {:.3f}s (projected finish: {:.2f} hours)' \
                              .format(gpu_id, count+1, len(missing), _t['misc'].average_time,
                              _t['misc'].average_time*(len(missing)-count)/3600)
                    count += 1

Also depending on how have you arranged your data you will need to modify "load_image_ids" method.

You can use this docker image for environment: https://hub.docker.com/r/airsplay/bottom-up-attention

thank you very much for the work you did, btw, i am not familiar with docker, would you please tell me how to use the docker image you provide? where should i modify? looking forward to your reply!