eugenetan01 / MongoDBEcommExample

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MongoDB Ecommerce Example

1. The ability to do product recommendations
2. The ability to search for related products (i.e. searching for “short grain” rice and getting “japanico” rice)
3. The ability to encrypt sensitive PII data (e.g. credit card numbers / cvc / mobile phone) when stored in the cloud

SA Maintainer: Eugene Tan
Time to setup: 30 mins
Time to execute: 15 mins


Description

This proof shows how to use vector search to look for products of similar semantics to achieve product recommendations using MongoDB. It also showcases how to use Lucene to surface related products via innate categories / similar keywords. The proof also showcases queryable encryption and how to store PII data in encrypted form on MongoDB and allow it to be queried and processed without decryption except on the app layer.


Setup

1. Configure Laptop

  • Ensure MongoDB version 7.0+ is installed on your laptop in order to access the MongoDB command line tools (a MongoDB Atlas cluster will be used to actually host the data)
  • Download and install Compass on your laptop
  • Ensure C# (DotNet version 7+) and NPM are installed your laptop
  • Ensure Visual Studio / Visual Studio Code with C# DotNet support installed

2. Configure Atlas Environment

  • Log-on to your Atlas account (using the MongoDB SA preallocated Atlas credits system) and navigate to your SA project
  • In the project's Security tab, choose to add a new user, e.g. main_user, and for User Privileges specify Read and write to any database (make a note of the password you specify)
  • In the Security tab, add a new IP Whitelist for your laptop's current IP address
  • Create an M10 based 3 node replica-set in a single cloud provider region of your choice with default settings
  • In the Atlas console, for the database cluster you deployed, click the Connect button, select Connect Your Application, and for the latest C# version copy the Connection String Only - make a note of this MongoDB URL address to be used in the next step

3. Load Data Into A Collection In The Atlas Cluster

  • Load the data in the 0. Setup folder by running mongorestore

    mongorestore --uri mongodb+srv://sa:admin@shengshiong.g3aer.mongodb.net/

Paste the copied URI into the --uri parameter, replacing the username & password fields with those you created earlier.

Note: This process generates over 1GB of data. The mgeneratejs process itself only takes ~10 minutes to complete but the upload process can take significantly longer if you're using a slow network connection.

4. Check That The Collection's Data Can Be Accessed Using Compass

  • From the Atlas console click the Connect button, select Connect With MongoDB Compass and click the Copy button to copy the connection string
  • Launch Compass and when prompted select to use the MongoDB Connection String detected from the clipboard, fill in the Username and Password fields as required and then click the Connect button

5. Setup MongoDB environment (playground) on Vscode and connect to your cluster there

6. Create the following indexes on the product_catalogue collection

  • For scenario 1. PR-VectorSearch
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "itemDesc_embeddings": {
        "dimensions": 384,
        "similarity": "dotProduct",
        "type": "knnVector"
      }
    }
  }
}
  • For scenario 3. Synonyms
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "itemDescription": {
        "type": "string"
      }
    }
  },
  "synonyms": [
    {
      "analyzer": "lucene.standard",
      "name": "synonym_mapping",
      "source": {
        "collection": "product_synonyms"
      }
    }
  ]
}

7. Create the following indexes on the product_catalog_categories collection - for scenario 2. MoreLikeThis

{
  "mappings": {
    "dynamic": true
  }
}

9. Download and install Automatic encryption shared library from here

8. Edit appsettings.json to point to your local mongo_crypt shared library folder where lib/mongo_crypt_v1.dylib is stored


Execution

1. Navigate to the 1.PR-VectorSearch folder

  • Open the code in Visual Studio
  • Show that in compass - search for the keyword: {itemDescription: "orange"}
  • Show that there are no results where product is "orange"
  • Show on line 62 of program.cs that we are searching for orange and accessing the vector embeddings using vector search on lines 81 - 92
  • Run the program.cs and show that similar products to orange were returned using vector search

2. Navigate to the 2. MoreLikeThis folder

  • Show in compass that in the "product_catalog_categories" collection, there are different "rice" products
  • Show that short grain rice and japanico rice are in the "SGrice" category
  • Show that multi grain rice is in the "rice" category
  • Show the search query is looking for items with Short grain keyword and in the SGRice category
  • Show that Japanico and Short grain came back as top 2 results with highest relevance although "short grain" is not present in the keyword "japanico rice"

3. Navigate to the 3. Synonyms folder

  • Navigate to Atlas -> Data explorer and show the product_synonyms collection
  • Show that there is an equivalent synonym where turkey and ham mean the same thing
  • Navigate to 3. Synonyms folder and copy the command in commands.txt
  • Click "Edit search query"
  • Navigate to search query tester and paste in the command into the editor and run search
  • SHow that both turkey and ham returned as a result although only search for turkey

4. Navigate to the 4. Queryable Encryption folder

  • Show Line 44 and 72 we are dropping the collections if they already exist
  • Show the code in QueryableEncryptionTutorial.cs where line 47 - 69 is the schema for fields (PII fields namely ssn and credit card number) that are to be encrypted
  • Show Line 79 we are grabbing the key to generate a new data encryption key
  • Show lines 83 - 94 we are using the CMK to create a new collection with the new DEK (Data encryption key) according to the specified schema
  • Show lines 104 - 120 a new record with new credit card number is inserted
  • Show Line 129 trying to retrieve the same credit card record inserted
  • Run the program.cs

5. Navigate to the 5. Auto HA folder

  • From a separate terminal/shell, execute the Python scripts to start continuously inserting records into and from the Atlas deployed database collection AUTO_HA.records, specifying the parameter for mongodb URI (include retryWrites and retryReads equal to false in the URL you provide here), e.g.:
    ./continuous-insert.py 'mongodb+srv://<username>:<password>@testcluster-abcd.mongodb.net?retryWrites=true'
  • View the terminal/shell output of the Python scripts to check it has successfully connected to the Atlas database and is reporting that records are being inserted and read
  • From the Atlas console, select the .../Test Failover option to force a failure of the replica-set primary server; the Atlas console will then show a dialog similar to the following:

atlasfailover

  • If the failure is detected and DB connection error is found, it will be written to the "tracker.txt" file - lines 56-63 at the exact time an error was found and when the driver reconnected to the DB

Measurement for scenario 4

1. Go to Compass and show that the fields are all encrypted and cannot be accessed

2. View the terminal output after program.cs runs to see the retrieved document - show that the values are decrypted and visible only to the application


Measurement for scenario 5

After using the Atlas console's Test Failure feature (when the retryable writes feature is employed), look for connection error data similar to the following in the "tracker.txt":

2019-01-01 18:27:27.148666 - INSERT 4110
2019-01-01 18:27:27.828104 - INSERT 4140
2019-01-01 18:27:28.497667 - INSERT 4170
2019-01-01 18:27:29.093474 - DB-CONNECTION-PROBLEM: connection closed
2019-01-01 18:27:31.534841 - RECONNECTED-TO-DB
2019-01-01 18:27:31.599981 - INSERT 4200
2019-01-01 18:27:32.272013 - INSERT 4230
2019-01-01 18:27:32.909000 - INSERT 4260

There should be no errors logged because retrywrites=true is enabled, ensuring no exceptions or downtime is experienced by the application even when a server went down and a failover happened on the MongoDB Atlas cluster

About


Languages

Language:C# 71.1%Language:Python 28.9%