Vector Search
MongoDB Vector Search in-action!
Introduction
This page describes how to setup and showcase semantic search capability of MongoDB using the vector search feature.
Demo Setup
Clone the MAAP Framework github repository : MAAP Framework Github
Pre-requisites
Before proceeding, ensure that your environment meets the following requirements:
- Node Version: v20.0+
- MongoDB Version (Atlas): v7.0 (M10 Cluster Tier)
You are also required to generate a FIREWORKS.AI
API key in order to get access to the model. Visit this quick-start guide to generate a key.
Once generated, store it in the .env
file, located at builder/partnerproduct/.env
as;
FIREWORKS_API_KEY=xxxxx
Components selection
The first step in the setup process is configuring the config.yaml
file. You can adjust the necessary settings from the list of available partners to make it best work for your needs. For this demo, we are utilizing the Nomic
embedding class and the Mixtral
model for LLMs.
You are required to update the fields as required with your personal generated values below.
ingest:
- source: 'pdf'
source_path: '<pdf_file_path>'
chunk_size: 2000
chunk_overlap: 200
embedding:
class_name: Nomic-v1.5
vector_store:
connectionString: '<you_mdb_connection_string>'
dbName: '<db_name>'
collectionName: 'embedded_content'
embeddingKey: 'embedding'
textKey: 'text'
numCandidates: 150
minScore: 0.1
vectorSearchIndexName: 'vector_index'
llms:
class_name: Fireworks
model_name: 'accounts/fireworks/models/mixtral-8x22b-instruct'
temperature: ''
top_p: ''
top_k: ''
Data ingestion
The data can be loaded from different data sources of your choice, we are using pdf
in this case.
Link to the pdf file used in the demo. Download this pdf file to your machine.
In order to start ingesting the data run the below command.
npm run ingest <path_to_your_config.yaml>
This command takes into considerations the ingest
pipeline mentioned in the config.yaml
file and starts ingesting data from the listed sources. After the data is loaded successfully, the required vector index is also created automatically.
The data is loaded in embedded_content
collection, and must have created vector search index named vector_index
. Verify this before proceeding the to next step.
Running the application
In order to start the application, the server and front-end should be running in two separate terminals.
-
Run the server
Navigate to the src folder, and run the server using below command.
npm run start-semantic-search <path_to_your_config.yaml>
-
Start your application UI
You can start your UI client by running the following command.
cd builder/partnerproduct/ui
npm install
npm run startThe
npm install
will help you in installing the required libraries.Your application will be running at http://localhost:3000.
Making queries
The queries can be asked based on the data ingested and relevant data can be retrieved based on the score. The below fields will be returned for each relevant document.
Score
,Content
andMetadata
.