Skip to main content

Vector Search

MongoDB Vector Search in-action!

Introduction

This page describes how to setup and showcase semantic search capability of MongoDB using the vector search feature.

Demo Setup

Clone the MAAP Framework github repository : MAAP Framework Github

Pre-requisites

Before proceeding, ensure that your environment meets the following requirements:

  • Node Version: v20.0+
  • MongoDB Version (Atlas): v7.0 (M10 Cluster Tier)

You are also required to generate a FIREWORKS.AI API key in order to get access to the model. Visit this quick-start guide to generate a key.

Once generated, store it in the .env file, located at builder/partnerproduct/.env as;

FIREWORKS_API_KEY=xxxxx

Components selection

The first step in the setup process is configuring the config.yaml file. You can adjust the necessary settings from the list of available partners to make it best work for your needs. For this demo, we are utilizing the Nomic embedding class and the Mixtral model for LLMs.

You are required to update the fields as required with your personal generated values below.

ingest:
- source: 'pdf'
source_path: '<pdf_file_path>'
chunk_size: 2000
chunk_overlap: 200
embedding:
class_name: Nomic-v1.5
vector_store:
connectionString: '<you_mdb_connection_string>'
dbName: '<db_name>'
collectionName: 'embedded_content'
embeddingKey: 'embedding'
textKey: 'text'
numCandidates: 150
minScore: 0.1
vectorSearchIndexName: 'vector_index'
llms:
class_name: Fireworks
model_name: 'accounts/fireworks/models/mixtral-8x22b-instruct'
temperature: ''
top_p: ''
top_k: ''

Data ingestion

The data can be loaded from different data sources of your choice, we are using pdf in this case.

Link to the pdf file used in the demo. Download this pdf file to your machine.

In order to start ingesting the data run the below command.

npm run ingest <path_to_your_config.yaml>

This command takes into considerations the ingest pipeline mentioned in the config.yaml file and starts ingesting data from the listed sources. After the data is loaded successfully, the required vector index is also created automatically.

The data is loaded in embedded_content collection, and must have created vector search index named vector_index. Verify this before proceeding the to next step.

Running the application

In order to start the application, the server and front-end should be running in two separate terminals.

  • Run the server

    Navigate to the src folder, and run the server using below command.

    npm run start-semantic-search <path_to_your_config.yaml>
  • Start your application UI

    You can start your UI client by running the following command.

    cd builder/partnerproduct/ui
    npm install
    npm run start

    The npm install will help you in installing the required libraries.

    Your application will be running at http://localhost:3000.

Making queries

The queries can be asked based on the data ingested and relevant data can be retrieved based on the score. The below fields will be returned for each relevant document.

Score, Content and Metadata.