Image Search on Large Driving Perception dataset using an User Input Text
This repository consists of following two Jupyter notebooks:
-
Create_Embeddings_fromImageDataset.ipynb
- Creates Vector Embeddings using the Images from the Large Autonomous driving Perception Dataset- ONE MILLION SCENES: ONCE Dataset.
- Vector Embeddings are created using BLIP and are stored for Step 2.
-
Retrieve_Image_from_UserText.ipynb
-
Search the Vector Embeddings corresponding to Images semantically similar to the Input User Text.
-
Search or Image-Text Matching is also done using BLIP.
-
Top 3 images matching the example user Text "a car driving on an intersection" are given in the notebook and are shown below.
Another example with user text "a car driving on a highway" is shown below. It retrieves images semantically similar to the highway.
You can try More!!
-
-
Pyspark is used to read the large Image dataset in a Spark Dataframe and then perform Distributed Inferece (Calculation of Vector Embeddings and Image-Text Matching Scores using the BLIP Pretrained model).
Following Datasets and Models were used in this work.
@article{mao2021one, title={One Million Scenes for Autonomous Driving: ONCE Dataset}, author={Mao, Jiageng and Niu, Minzhe and Jiang, Chenhan and Liang, Hanxue and Liang, Xiaodan and Li, Yamin and Ye, Chaoqiang and Zhang, Wei and Li, Zhenguo and Yu, Jie and others}, journal={NeurIPS}, year={2021} } @inproceedings{li2022blip, title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation}, author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi}, year={2022}, booktitle={ICML}, }