CyborgDB uses IVF* index types, which leverage clustering algorithms to segment the index into smaller sections for efficient querying. These clustering algorithms must be trained on the specific data being indexed in order to adequately represent that data.
In CyborgDB Service, training is typically handled automatically by the service. However, you can explicitly trigger training once enough vectors have been added.
Copy
Ask AI
# Train the encrypted indexindex.train()# Or train with a specific number of clustersindex.train(n_lists=128)
You must have at least 2 * n_lists number of vectors in the index (ingested via upsert) before you can call train.
Parameters are available to customize the training process:
Parameter
Type
Default
Description
n_lists
int
None (auto)
(Optional) Number of inverted index lists to create in the index. When None or omitted, auto-determines based on the number of vectors in the index.
batch_size
int
None
(Optional) Number of vectors to process per training batch. When None, the server uses 2048.
max_iters
int
None
(Optional) Maximum number of training iterations. When None, the server uses 100.
tolerance
float
None
(Optional) Convergence tolerance for training completion. When None, the server uses 1e-6.
n_lists is the number of clusters into which each vector in the index can be categorized. Typically, the higher the value, the higher the recall (but also the slower the indexing process). As a good rule of thumbs, n_lists should be:
A base-2 number (e.g., 2,048, 4,096). Not a requirement, but yields performance optimizations.
Each cluster should have between 100 - 10,000 vectors; so n_lists should be roughly between 1/100 - 1/10,000 of the total number of items which will be indexed.
If not specified, CyborgDB will auto-determine the best n_lists value based on the number of vectors in the index.
While training is technically optional (you can use CyborgDB without ever calling train), it is recommended that you do so once you have a large number of vectors in the index (e.g., > 50,000). If you don’t, and you call query, you will see a warning in the console, stating:
Copy
Ask AI
Warning: querying untrained index with more than 50000 indexed vectors.