Visual Query Tool (Image Recommendation Engine):

Objective:

One of the main challenges in training or fine-tuning deep learning models is data curation from large datasets containing millions of images. Traditional statistical metrics like Structural Similarity Index (SSIM) can replace manual curation but may not capture implicit image features. In this project, I developed a similar image search tool by training a Vision Transformer for representation learning.

Contributions:

  • Trained a Vision Transformer with ~1M images in a self supervision setting using view invariance and masked-image-modelling objectives.
  • Setup distributed training on DGX cluster to enable training on multi-gpu DGX cluster.
  • Developed a feature extraction pipeline using the trainied Vision Transformer and deployed it as on AWS as a Lambda function.
  • Automated the extraction of ~60M images using the deployed feature extraction pipeline to store representations in parquet files and DynamoDB.
  • Developed a REST API based Neighbor Search Tool allowing users to query for similar images in a database given a query image.
  • Designed a novel methodology to allow similar image search based on multiple images and multiple objects in the images using transformer model attention maps. This method allowed the user to search for specific attributes of objects from one or more images. For example, you could specifically search for a person in the shadow of a tractor.

Result:

  • This replaced the SSIM based approach for searching similar images and resulted in over 90% improvement in similar image retrieval accuracy and 100x increase in speed of retrieval.
  • The novel method allowed fine-grained similar image search helping curate data for rare failure cases.
  • Application for a patent for the novel methodology for fine-grained similar image search.

Technology used:

Python, PyTorch, AWS Lambda, AWS S3, AWS DynamoDB, Docker, Facebook AI Similarity Search (FAISS), MLFlow, Git.