Quick Start Guide

This guide will get you up and running with YARP in just a few minutes.

Basic Usage

Here’s the most basic way to use YARP:

from yarp.vector_index import LocalMemoryIndex

# Your documents to search
documents = [
    "The quick brown fox jumps over the lazy dog",
    "Python is a powerful programming language",
    "Machine learning helps solve complex problems",
    "Natural language processing is a subset of AI"
]

# Create the index
index = LocalMemoryIndex(documents)

# Process documents (build the search index)
index.process()

# Search for similar content
results = index.query("programming languages")

# Display results
for result in results:
    print(f"Score: {result.matching_score:.1f} - {result.document}")

Understanding Search Scores

YARP uses a hybrid scoring system that combines:

  1. Semantic similarity: Based on the meaning of text (using embeddings)

  2. String similarity: Based on character-level similarity (using Levenshtein distance)

You can control the balance between these two approaches:

# Prioritize semantic meaning (good for conceptual searches)
results = index.query("programming", weight_semantic=0.8, weight_levenshtein=0.2)

# Prioritize exact text matching (good for finding specific phrases)
results = index.query("Python", weight_semantic=0.2, weight_levenshtein=0.8)

Adding and Removing Documents

You can modify your index after creation:

# Add new documents
index.add("New document about artificial intelligence")
index.add(["Multiple", "documents", "at once"])

# Remove a document (must match exactly)
index.delete("Python is a powerful programming language")

Saving and Loading Indexes

For better performance, save your processed index:

# Save the index
index.backup("/path/to/save/index")

# Later, load it back
loaded_index = LocalMemoryIndex.load("/path/to/save/index")

# Ready to search immediately (no need to call process())
results = loaded_index.query("search text")

Performance Tuning

For better performance, you can adjust several parameters:

# More trees = better accuracy, slower build time
index.process(num_trees=256)  # Default is 128

# More search candidates = better results, slower search
results = index.query("text", search_k=100)  # Default is 50

Choosing the Right Model

YARP uses sentence transformer models for embeddings. You can choose different models based on your needs:

# Default: Good balance of speed and quality
index = LocalMemoryIndex(documents, model_name="all-MiniLM-L6-v2")

# Better quality, slower
index = LocalMemoryIndex(documents, model_name="all-mpnet-base-v2")

# Faster, lower quality
index = LocalMemoryIndex(documents, model_name="all-MiniLM-L12-v1")

Error Handling

YARP performs preflight checks for required packages at import time. If a required package is missing, you will see a clear error message.

YARP provides specific exceptions to help you handle errors gracefully:

from yarp.exceptions import (
    LocalMemoryTreeNotBuildException,
    LocalMemoryBadRequestException
)
from yarp.exceptions.runtime import EmbeddingProviderNotFoundException

try:
    # This will fail if index isn't built
    results = index.query("test")
except LocalMemoryTreeNotBuildException:
    print("You need to call index.process() first!")
    index.process()
    results = index.query("test")

try:
    # This will fail if weights don't sum to 1.0
    results = index.query("test", weight_semantic=0.3, weight_levenshtein=0.4)
except LocalMemoryBadRequestException as e:
    print(f"Invalid parameters: {e}")

try:
    # This will fail if embedding provider is missing
    index = LocalMemoryIndex(["Hello world"])
    index.process()
except EmbeddingProviderNotFoundException as e:
    print(f"Missing dependency: {e}")

Next Steps

Now that you understand the basics, check out: