Tech Blog

Vector Database Image Searches in AI: How I Find Joy Daily

May 11, 2023

Introduction

With the widespread adoption of Artificial Intelligence, organizations have access to unprecedented amounts of data, insights, and hidden patterns that can be extremely valuable to any organization that can leverage it. However, exploring the data quickly and efficiently can become tricky when you start dealing with tens or hundreds of gigabytes worth of data.  Traditional key lookup databases can struggle with querying vectors, often missing insights and patterns that a vector database will enable the user to find. With the growing popularity of AI-driven data, it’s important to use a database that gives you the best ability to explore the data. 

What is a Vector?

In simple terms, a vector is a series of numbers that is created from the data input’s features that are interpretable by a machine. For an image, imagine the vector is an abstract numerical representation of visual features, such as color, texture, shape, etc.

Searching with Vectors

Using models such as OpenAI’s CLIP, you can encode both images and text to vectors. For more information on this process, I encourage you to read more about CLIP here. Once all our vectors are created, we can place them in a nth dimensional map. For a great example of what a vector mapping might look like visualized, we’ll use the Embedding Projector.  

The vectors in the Embedding Projector are projected into a three dimensional space, which is several hundreds of dimensions less than the actual vectors’ dimensions, but we’re only human, so we’re going to have to settle for this simplification of the visualization.  

In the projection, similar data points are clumped together. The best way to see this pattern is to use the “Mnist with images” data set and selecting “Color by label”. Notice how the most distinct handwritten digits are clumped tightest together, while digits that tend to share the same shape (like 4’s, 7’s, and 9’s) tend to blend into another digit’s space. From here, we can start to see how vector searches might take form.

Finding the distance between points in a two-dimensional space is relatively simple with the following algorithm:
D=√((X2 – X1)² + (Y2 – Y1)²)

Adding another dimension, bringing us to a three dimensional space. Which can be computed as:
D=(D1+D2+D3)/3
D1 = √[ ( X2-X1)^2 + (Y2-Y1)^2)]
D2 = √[ ( X3-X1)^2 + (Y3-Y1)^2)]
D3 = √[ ( X3-X2)^2 + (Y3-Y2)^2)]


Adding the next 781 dimensions for us to reach the 784 dimensions of the “Mnist with images” data set quickly becomes complex and computationally expensive. This will give us the exact set of nearest neighbors at the cost of speed and computation power.

Alternatively, we can trade a sliver of accuracy for speed and efficiency using Approximate Nearest Neighbor (ANN) algorithms. While you may lose the ability to know the true nearest neighbor, you get an excellent approximate nearest neighbor.

Applying this to image search, vector searches give us a powerful new way to traverse our data. Rather than relying on traditional image search methods such as keyword or metadata-based search, vector searches can quickly identify images that are similar to a provided vector, which allows for a more flexible approach that is not influenced by subjective labeling.  Additionally, vector database providers will also generally give you a selection of algorithms, as well as the tooling, so you can change and modify your search to best suit your data. 

Image Search Example

For the example images below, the demo author used CLIP to encode images, which were stored in a file and read into memory at run time. The prompt is converted from a string to a vector by CLIP, which is then taken and used to find similar images in the dataset. The images that are part of the search never have to be labeled, examined, or classified by a human but can still be free-text searched by their contents.


Search demo was provided by Vivien Tran-Thrien’s github demo https://github.com/vivien000/clip-demo

 

Additional Applications of Vector Databases

The popularity of vector databases’ ability to analyze large amounts of data quickly and efficiently isn’t just limited to images. You’ve probably even used some applications that leverage vector searching, like Spotify’s recommendation system, which is powered by their open-source ANNOY library. E-commerce is taking advantage of the insights to provide personalized recommendations based on customer behavior and purchase history, which they can use to suggest more relevant products to the customer. Anomalies can be detected in usage patterns, alerting organizations of potential security threats and flagging fraud. In conjunction with ChatGPT, you can create long-term memory, enabling custom chatbots to manage your support flows or FAQs. Vector databases have even been used in healthcare, where they can analyze vast amounts of patient data and provide immediate insights to doctors in conjunction with a neural network.  

Scaling Vector Search

Now that we know the strengths and power of vector search, we need to find a way to scale and serve this data in the real world. In the example above, a precomputed collection of vectors was used to search against. To add images to the collection, we would first compute the vectors, then add the vectors to the existing file, and finally reload the data into memory. Maintaining this file would be cumbersome at best, requiring a new file for each index and a file edit for each update. The vector database will enable us to index and add data to our database in real-time, allowing our queries to always be up to date. This approach gives us the flexibility, accuracy, speed, and insights from vector searches, while giving us the stability and real-time data of a traditional database approach.

 

Conclusion

If you’re looking for a powerful and efficient way to store, query, and analyze complex data structures, vector databases are definitely worth giving a try. Their ability to handle high-dimensional data, perform lightning-fast similarity searches, and support real-time analytics make them a valuable tool for any application. Whether you’re a data scientist, a software engineer, or a business analyst, vector databases can help you unlock new insights and drive innovation in your field.

So why not take the plunge and explore the world of vector databases today? You might be surprised at how much they can do for you!