API/behavior changes

⚠️ Breaking changes

Empty array in _vectors.embedder now means “no embedding” rather than “1 embedding of dimension 0”

Implementation status: ✅ PR landed

Rationale: the previous behavior was surprising and not useful.

Extensions to the _vectors field in documents

Implementation status: 🚧 First version PR landed, second PR in review

The _vectors field in documents plays a special role, as it is used by Meilisearch to extract the embeddings (vectors of numbers) related to the document and each embedder declared in the settings.

In v1.8, the _vectors field must have the following shape for the embeddings to be used in embedders:

 {
   "id": 42, // primary key for this document
   // assuming at least 3 configured embedders "default", "text" and "image"
   "_vectors": {
		 // specifies a single embedding for the embedder called "default"
		 // if "default" is a "userProvided" embedder, then this field
		 // is mandatory in _vectors (but it can be null)
		 // if "default" is **not** a "userProvided" embedder, then the embedder
		 // will not generate an embedding for this document, and will use
		 // the provided embedding(s) instead.
     "default": [0.1, 0.2 ],
     // specifies two embeddings for the embedder called "text"
		 "text": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
		 // specifies zero embeddings for the embedder called "image"
		 // this document will have no embedding for this embedder.
		 "image": null
 }

In v1.9, this syntax is extended to allow for the value to be an object in addition to the current embeddings or array of embeddings.

 {
   "id": 42,
   "_vectors": {
	   // the existing syntax is still allowed
     "default": [0.1, 0.2 ],
     // new syntax, equivalent to the previous example for the "text" embedder:
     // the provided embedding(s) will override any
		 // existing embeddings or embeddings to be generated by the embedder
		 // regenerate: false means future updates to the document won't regenerate
		 // the embeddings.
		 "text": {
		   "embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
		   "regenerate": false
		 },
		 // new syntax and semantics: the provided embedding(s) will override any
		 // existing embeddings or embeddings to be generated by the embedder,
		 // however, future updates to the document will regenerate the embeddings
		 // from the documentTemplate.
		 // Setting `regenerate` to true for a `userProvided` embedder is always
		 // an error.
		 "translation": {
		   "embeddings": [0.1, 0.2, 0.3, 0.4],
		   "regenerate": true
		 },
		 // new syntax and semantics: setting regenerate to false without providing
		 // any embedding will keep whatever embeddings are currently in the database,
		 // and future updates to the document will not regenerate the embeddings
		 // from the documentTemplate.
		 "image": {
		   "regenerate": false,
		 }
 }

This change allows importing embeddings to autoembedders as a one-shot process, by setting them as regenerate: true.

The dump system makes use of this, so with this change embeddings won’t be regenerated when importing a dump created with Meilisearch v1.9.

_vectors no longer returned in documents (by default)

Implemented Solution: use a dedicated bool retrieveVectors in the search/get docs param