To answer a query, encode it into a vector and use FAISS to search the most relevant saved embeddings. This makes the response contextual and efficient.
The top matches can be passed as context to an LLM for generating a precise answer.
import faiss
import numpy as np
# Sample query vector
query_vector = np.random.rand(1, 512).astype('float32')
# Load the FAISS index
index = faiss.read_index("my_index.faiss")
# Search for top 5 nearest neighbors
D, I = index.search(query_vector, k=5)
print("Top 5 match indices:", I)