This project delivers an advanced system for detecting similar or duplicate images using cutting-edge deep learning and computer vision. It's built to efficiently handle large datasets, making it ideal for applications like content moderation, copyright protection, image clustering, and enhancing visual search capabilities. We tackle the challenge of identifying visually similar images, even those with minor alterations, using robust feature extraction and rapid search techniques.
- π§ Deep Feature Extraction: Leverages pre-trained models (like ResNet) to generate powerful feature vectors that capture the essence of images, resilient to changes in lighting or orientation.
- β¨ Efficient Dimensionality Reduction: Employs techniques like PCA (Principal Component Analysis) to reduce the complexity of high-dimensional features, speeding up computations while retaining key information.
- β‘ Blazing-Fast Similarity Search: Integrates FAISS (Facebook AI Similarity Search) for highly optimized indexing and querying, enabling rapid identification of similar images even in massive collections.
- π Insightful Visualization: Utilizes tools like t-SNE to map the high-dimensional feature space into 2D, providing visual insights into how images cluster based on similarity.
- Feature Extraction: Using PyTorch with pre-trained CNNs (e.g., ResNet) to create dense vector representations of images.
- Dimensionality Reduction: Applying scikit-learn's PCA to streamline feature vectors.
- Indexing & Search: Building efficient search indices with FAISS.
- Visualization: Employing t-SNE (via scikit-learn) and Matplotlib for exploring feature distributions.
- π Python 3.6+
- π₯ PyTorch
- π FAISS
- βοΈ scikit-learn
- π¨ Matplotlib
- πΌοΈ Pillow
- π‘ CUDA (Optional, for GPU acceleration)
project-root/
βββ data/ # Directory for image datasets
βββ extract_features.ipynb # Notebook for feature extraction
βββ image_similarity.ipynb # Notebook for similarity search experiments
βββ visualize_similarity.ipynb # Notebook for visual analysis and clustering
βββ pickle/ # Directory for serialized features and metadata
β βββ filenames-*.pickle
β βββ features-*.pickle
βββ models/ # Pre-trained deep learning models (e.g., ResNet)
βββ utils/ # Utility scripts and helper functions
βββ requirements.txt # List of dependencies
βββ README.md # Project documentation
- Python 3.6 or higher
- PyTorch
- FAISS (CPU or GPU version)
- scikit-learn
- Matplotlib
- Pillow
- π» CPU: Runs on standard CPUs (slower).
- β‘ GPU: CUDA-compatible GPU highly recommended for significant speedup.
# Clone the repository (if applicable)
# git clone git@github.com:mdhasnainali/Image-Similarity-Detection.git
# cd Image-Similarity-Detection
# Install dependencies
pip install -r requirements.txtπ Organize your images in the data/ directory.
βοΈ Run extract_features.ipynb to process images and save features/filenames to the pickle/ directory.
π Use image_similarity.ipynb to input a query image and find its most similar matches.
π Explore feature clusters using visualize_similarity.ipynb.
- β Fast query response times.
- π― High precision in identifying similar and duplicate images.
- π¨ Effective visualization of image clusters in the feature space.
- π Integration with real-time image ingestion pipelines.
- π§© Support for alternative feature extraction models (e.g.,
VGG,EfficientNet). - β¨ Enhanced interactive visualization tools.
- π± Potential mobile application integration.
Contributions make the open-source community amazing! Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE file for more information.
- FAISS developers for their efficient similarity search library.
- The PyTorch team for the flexible deep learning framework.
- Caltech101 dataset (used during development/testing).