Image_search_UserText

Image Search on Large Driving Perception dataset using an User Input Text

This repository consists of following two Jupyter notebooks:

Create_Embeddings_fromImageDataset.ipynb
- Creates Vector Embeddings using the Images from the Large Autonomous driving Perception Dataset- ONE MILLION SCENES: ONCE Dataset.
- Vector Embeddings are created using BLIP and are stored for Step 2.
Retrieve_Image_from_UserText.ipynb
- Search the Vector Embeddings corresponding to Images semantically similar to the Input User Text.
- Search or Image-Text Matching is also done using BLIP.
- Top 3 images matching the example user Text "a car driving on an intersection" are given in the notebook and are shown below.
  
  Another example with user text "a car driving on a highway" is shown below. It retrieves images semantically similar to the highway.
You can try More!!
Pyspark is used to read the large Image dataset in a Spark Dataframe and then perform Distributed Inferece (Calculation of Vector Embeddings and Image-Text Matching Scores using the BLIP Pretrained model).

Following Datasets and Models were used in this work.

References

@article{mao2021one,
  title={One Million Scenes for Autonomous Driving: ONCE Dataset},
  author={Mao, Jiageng and Niu, Minzhe and Jiang, Chenhan and Liang, Hanxue and Liang, Xiaodan and Li, Yamin and Ye, Chaoqiang and Zhang, Wei and Li, Zhenguo and Yu, Jie and others},
  journal={NeurIPS},
  year={2021}
}
@inproceedings{li2022blip,
      title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation}, 
      author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi},
      year={2022},
      booktitle={ICML},
}

saxenam06 / Image_retrieval_from_UserText_BLIP

Image_search_UserText

References

About

Languages