_vectors.embedder
now means “no embedding” rather than “1 embedding of dimension 0”Implementation status: ✅ PR landed
Rationale: the previous behavior was surprising and not useful.
_vectors
field in documentsImplementation status: ✅ First version PR landed, second PR landed
The _vectors
field in documents plays a special role, as it is used by Meilisearch to extract the embeddings (vectors of numbers) related to the document and each embedder declared in the settings.
In v1.8, the _vectors
field must have the following shape for the embeddings to be used in embedders:
{
"id": 42, // primary key for this document
// assuming at least 3 configured embedders "default", "text" and "image"
"_vectors": {
// specifies a single embedding for the embedder called "default"
// if "default" is a "userProvided" embedder, then this field
// is mandatory in _vectors (but it can be null)
// if "default" is **not** a "userProvided" embedder, then the embedder
// will not generate an embedding for this document, and will use
// the provided embedding(s) instead.
"default": [0.1, 0.2 ],
// specifies two embeddings for the embedder called "text"
"text": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
// specifies zero embeddings for the embedder called "image"
// this document will have no embedding for this embedder.
"image": null
}
In v1.9, this syntax is extended to allow for the value to be an object in addition to the current embeddings or array of embeddings.
{
"id": 42,
"_vectors": {
// the existing syntax is still allowed
"default": [0.1, 0.2 ],
// new syntax, equivalent to the previous example for the "text" embedder:
// the provided embedding(s) will override any
// existing embeddings or embeddings to be generated by the embedder
// regenerate: false means future updates to the document won't regenerate
// the embeddings.
"text": {
"embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
"regenerate": false
},
// new syntax and semantics: the provided embedding(s) will override any
// existing embeddings or embeddings to be generated by the embedder,
// however, future updates to the document will regenerate the embeddings
// from the documentTemplate.
// Setting `regenerate` to true for a `userProvided` embedder is always
// an error.
"translation": {
"embeddings": [0.1, 0.2, 0.3, 0.4],
"regenerate": true
},
// new syntax and semantics: setting regenerate to false without providing
// any embedding will keep whatever embeddings are currently in the database,
// and future updates to the document will not regenerate the embeddings
// from the documentTemplate.
"image": {
"regenerate": false,
}
}
This change allows importing embeddings
to autoembedders as a one-shot process, by setting them as regenerate: true
.
The dump system makes use of this, so with this change embeddings won’t be regenerated when importing a dump created with Meilisearch v1.9.
_vectors
no longer returned in documents (by default)retrieveVectors
in the search/get docs param