此版本仍在开发中，尚未被视为稳定版本。最新的快照版本请使用 Spring AI 1.0.0-SNAPSHOT！spring-doc.cadn.net.cn

Azure Cosmos DB

本节将指导您完成设置CosmosDBVectorStore来存储文档嵌入并执行相似性搜索。spring-doc.cadn.net.cn

什么是 Azure Cosmos DB？

Azure Cosmos DB 是 Microsoft 的全球分布式云原生数据库服务，专为任务关键型应用程序而设计。它提供高可用性、低延迟以及水平扩展以满足现代应用程序需求的能力。它是从头开始构建的，其核心是全球分发、精细的多租户和水平可扩展性。它是 Azure 中的一项基础服务，被全球范围内的大多数 Microsoft 关键任务应用程序使用，包括 Teams、Skype、Xbox Live、Office 365、Bing、Azure Active Directory、Azure 门户、Microsoft Store 等。它还被数以千计的外部客户使用，包括 OpenAI for ChatGPT 和其他需要弹性扩展、交钥匙全球分发以及全球低延迟和高可用性的任务关键型 AI 应用程序。spring-doc.cadn.net.cn

什么是 DiskANN？

DiskANN（基于磁盘的近似最近邻搜索）是 Azure Cosmos DB 中用于增强矢量搜索性能的创新技术。它通过对 Cosmos DB 中存储的嵌入进行索引，实现对高维数据的高效且可缩放的相似性搜索。spring-doc.cadn.net.cn

DiskANN 具有以下优势：spring-doc.cadn.net.cn

效率：与传统方法相比，通过利用基于磁盘的结构，DiskANN 显著缩短了查找最近邻所需的时间。spring-doc.cadn.net.cn
可扩展性：它可以处理超过内存容量的大型数据集，使其适用于各种应用程序，包括机器学习和 AI 驱动的解决方案。spring-doc.cadn.net.cn
低延迟：DiskANN 最大限度地减少了搜索作期间的延迟，确保应用程序即使在数据量很大的情况下也能快速检索结果。spring-doc.cadn.net.cn

在 Spring AI for Azure Cosmos DB 的上下文中，向量搜索将创建并利用 DiskANN 索引来确保相似性查询的最佳性能。spring-doc.cadn.net.cn

使用自动配置设置 Azure Cosmos DB 矢量存储

以下代码演示了如何设置CosmosDBVectorStore使用 auto-configuration：spring-doc.cadn.net.cn

package com.example.demo;

import io.micrometer.observation.ObservationRegistry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Lazy;

import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootApplication
@EnableAutoConfiguration
public class DemoApplication implements CommandLineRunner {

    private static final Logger log = LoggerFactory.getLogger(DemoApplication.class);

    @Lazy
    @Autowired
    private VectorStore vectorStore;

    public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        Document document1 = new Document(UUID.randomUUID().toString(), "Sample content1", Map.of("key1", "value1"));
        Document document2 = new Document(UUID.randomUUID().toString(), "Sample content2", Map.of("key2", "value2"));
		this.vectorStore.add(List.of(document1, document2));
        List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Sample content").topK(1).build());

        log.info("Search results: {}", results);

        // Remove the documents from the vector store
		this.vectorStore.delete(List.of(document1.getId(), document2.getId()));
    }

    @Bean
    public ObservationRegistry observationRegistry() {
        return ObservationRegistry.create();
    }
}



Auto Configuration


Add the following dependency to your Maven project:spring-doc.cadn.net.cn



<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-cosmos-db-store-spring-boot-starter</artifactId>
</dependency>





Configuration Properties


The following configuration properties are available for the Cosmos DB vector store:spring-doc.cadn.net.cn








Property
Description




spring.ai.vectorstore.cosmosdb.databaseNamespring-doc.cadn.net.cn
The name of the Cosmos DB database to use.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.containerNamespring-doc.cadn.net.cn
The name of the Cosmos DB container to use.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.partitionKeyPathspring-doc.cadn.net.cn
The path for the partition key.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.metadataFieldsspring-doc.cadn.net.cn
Comma-separated list of metadata fields.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.vectorStoreThroughputspring-doc.cadn.net.cn
The throughput for the vector store.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.vectorDimensionsspring-doc.cadn.net.cn
The number of dimensions for the vectors.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.endpointspring-doc.cadn.net.cn
The endpoint for the Cosmos DB.spring-doc.cadn.net.cn


spring.ai.vectorstore.cosmosdb.keyspring-doc.cadn.net.cn
The key for the Cosmos DB.spring-doc.cadn.net.cn






Complex Searches with Filters


You can perform more complex searches using filters in the Cosmos DB vector store.
Below is a sample demonstrating how to use filters in your search queries.spring-doc.cadn.net.cn



Map<String, Object> metadata1 = new HashMap<>();
metadata1.put("country", "UK");
metadata1.put("year", 2021);
metadata1.put("city", "London");

Map<String, Object> metadata2 = new HashMap<>();
metadata2.put("country", "NL");
metadata2.put("year", 2022);
metadata2.put("city", "Amsterdam");

Document document1 = new Document("1", "A document about the UK", this.metadata1);
Document document2 = new Document("2", "A document about the Netherlands", this.metadata2);

vectorStore.add(List.of(document1, document2));

FilterExpressionBuilder builder = new FilterExpressionBuilder();
List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().query("The World")
    .topK(10)
    .filterExpression((this.builder.in("country", "UK", "NL")).build()).build());





Setting up Azure Cosmos DB Vector Store without Auto Configuration


The following code demonstrates how to set up the CosmosDBVectorStore without relying on auto-configuration:spring-doc.cadn.net.cn



@Bean
public VectorStore vectorStore(ObservationRegistry observationRegistry) {
    // Create the Cosmos DB client
    CosmosAsyncClient cosmosClient = new CosmosClientBuilder()
            .endpoint(System.getenv("COSMOSDB_AI_ENDPOINT"))
            .key(System.getenv("COSMOSDB_AI_KEY"))
            .userAgentSuffix("SpringAI-CDBNoSQL-VectorStore")
            .gatewayMode()
            .buildAsyncClient();

    // Create and configure the vector store
    return CosmosDBVectorStore.builder(cosmosClient, embeddingModel)
            .databaseName("test-database")
            .containerName("test-container")
            // Configure metadata fields for filtering
            .metadataFields(List.of("country", "year", "city"))
            // Set the partition key path (optional)
            .partitionKeyPath("/id")
            // Configure performance settings
            .vectorStoreThroughput(1000)
            .vectorDimensions(1536)  // Match your embedding model's dimensions
            // Add custom batching strategy (optional)
            .batchingStrategy(new TokenCountBatchingStrategy())
            // Add observation registry for metrics
            .observationRegistry(observationRegistry)
            .build();
}

@Bean
public EmbeddingModel embeddingModel() {
    return new TransformersEmbeddingModel();
}



This configuration shows all the available builder options:spring-doc.cadn.net.cn




databaseName: The name of your Cosmos DB databasespring-doc.cadn.net.cn


containerName: The name of your container within the databasespring-doc.cadn.net.cn


partitionKeyPath: The path for the partition key (e.g., "/id")spring-doc.cadn.net.cn


metadataFields: List of metadata fields that will be used for filteringspring-doc.cadn.net.cn


vectorStoreThroughput: The throughput (RU/s) for the vector store containerspring-doc.cadn.net.cn


vectorDimensions: The number of dimensions for your vectors (should match your embedding model)spring-doc.cadn.net.cn


batchingStrategy: Strategy for batching document operations (optional)spring-doc.cadn.net.cn






Manual Dependency Setup


Add the following dependency in your Maven project:spring-doc.cadn.net.cn



<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-cosmos-db-store</artifactId>
</dependency>





Accessing the Native Client


The Azure Cosmos DB Vector Store implementation provides access to the underlying native Azure Cosmos DB client (CosmosClient) through the getNativeClient() method:spring-doc.cadn.net.cn



CosmosDBVectorStore vectorStore = context.getBean(CosmosDBVectorStore.class);
Optional<CosmosClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    CosmosClient client = nativeClient.get();
    // Use the native client for Azure Cosmos DB-specific operations
}



The native client gives you access to Azure Cosmos DB-specific features and operations that might not be exposed through the VectorStore interface.spring-doc.cadn.net.cn