JavaScript Chromadb 快速入门

Chroma是一款开源的向量数据库，它使用向量相似度搜索技术，可以快速有效地存储和检索大规模高维向量数据。

Chroma是一款应用嵌入式数据库，以包的形式嵌入到我们的代码，Chroma的优点就是简单，如果你在开发LLM应用需要一个向量数据库实现LLM记忆功能，需要支持文本相似语言搜索，又不想安装独立的向量数据库，Chroma是不错的选择，本教程主要基于JavaScript讲解。

1. 安装包

npm install --save chromadb # yarn add chromadb

2. 初始化Chroma客户端

const {ChromaClient} = require('chromadb');
const client = new ChromaClient();

3. 创建集合

集合(collection)是在chroma数据库的作用类似Mysql的表，存储向量数据（包括文档和其他源数据）的地方，下面创建一个集合：

const {OpenAIEmbeddingFunction} = require('chromadb');
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "your_api_key"})
const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})

这里使用的是openai的文本嵌入模型计算文本向量，所以你需要填写openai的api key，当然你也可以不传embeddingFunction 参数，使用chroma内置的模型计算向量，或者替换成其他开源的文本嵌入模型。

4. 添加数据

前面定义了一个集合，这里向集合添加数据，Chroma会存储我们的数据，并根据文本数据的向量创建专门的向量索引方便后面查询。

# 使用集合设置的文本嵌入模型自动计算documents文本的向量
await collection.add({
    ids: ["id1", "id2"],
    metadatas: [{"source": "my_source"}, {"source": "my_source"}],
    documents: ["This is a document", "This is another document"],
})

使用我们提前算好的文本向量，不使用chroma内置的嵌入函数计算

#  embeddings参数设置我们提前算好的向量，保持跟documents文档数组一一对应
await collection.add({
    ids: ["id1", "id2"],
    embeddings: [[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    where: [{"source": "my_source"}, {"source": "my_source"}],
    documents: ["This is a document", "This is another document"]
})

5.查询集合数据

根据queryTexts设置的查询条件，Chroma 会返回 nResults 个最相似的结果。

const results = await collection.query({
    nResults: 2, 
    queryTexts: ["This is a query document"]
})

关联主题

LangChain开发指南

梯子教程-tizi365.com

Chroma向量数据库教程

Python

JavaScript