Why It’s Important
With the growth of generative AI and vast amounts of data being fed to its algorithms, vector databases allow unstructured data such as text, image and audio to be converted into high-dimensional data vectors. This helps to make similarity searches faster and more accurate.
IBRS has explored the different approaches to deploying AI. Moving to serverless API components that are price based on consumption is much cheaper than running these components on enterprise servers, Cloud servers, or containers.
In one experiment, IBRS created a RAG (retrieval augmented generation) AI consisting of 4000 documents, each around 7000 kb in size. Building and running the model on the ‘single platforms’ of all three major hyper scale Clouds, was multiple times more costly than using OpenAI and Pinecone directly, with the service orchestration being a serverless API capability (from Google Cloud). Pre-package AI solutions built using these same types of AI components (Copilot, Google Vertext Search, etc.) can be more expensive, but they can be delivered more quickly.
Who’s Impacted
- CIO
- Developers
- Data scientists
What’s Next?
- Assess your company’s vector database needs by evaluating current generative AI projects. Identify areas where vector search and retrieval can improve performance and accuracy. To determine data volume and complexity, consider the amount and type of data. Next, assess the update frequency and required query speed.
- Investigate alternative service providers offering serverless vector databases. Understand their cost implications and potential savings compared to pod-based or on-premise solutions. Then, evaluate multi-tenant compute models for infrastructure management benefits.
- Before using serverless indexes in production, it’s recommended to check the current limitations and test thoroughly.