pooling parameter for huggingFace embeddersPATCH /indexes/{:indexUid}/settings and PATCH /indexes/{:indexUid}/settings/embedders routes, is modified as follows:
pooling is added to the embedder object and allows to override the pooling method of huggingFace embedders.
huggingFace embedders transforms a text into an embedding, it starts by transforming the text into tokens, then it computes an embedding for each of these tokens. Lastly, it computes a single sentence embedding from the token embeddings by using a pooling method.pooling is a string with values "useModel", "forceMean" or "forceCls".
"useModel" : fetch the pooling method from the model configuration"forceMean": always use mean pooling"forceCls" always use CLS poolingpooling is optional and defaults to "useModel"huggingFace embedders that were created in a previous version of Meilisearch, and imported using a dump or the dumpless upgrade feature will have pooling set to "forceMean" , as this was the behavior of these embedders in previous versions of Meilisearch.pooling is only available for embedders with source huggingFacepooling always triggers a full reindexing.compositeEmbedders is added to the /experimental-features route.PATCH /indexes/{:indexUid}/settings and PATCH /indexes/{:indexUid}/settings/embedders routes, is modified as follows:
source parameter is allowed: "composite" . This value is selectable when the compositeEmbedders feature is set to true."composite":
searchEmbedder: an object whose keys are the same as an embedder object. The embedder it describes will be used at search time.indexingEmbedder: an object whose keys are the same as an embedder object. The embedder it describes will be used at search time.compositeEmbedders feature is set to true.searchEmbedder and the indexingEmbedder are “similar enough”: Meilisearch computes the angular distance for both embeddings in each test case, and checks that that distance is < 0.01.This feature allows using different embedders at search and indexing time, which can be used to optimize the embedder to each use case:
huggingFace) to a Hugging Face inference endpoint{
"embedders": {
"text": {
"source": "composite",
"searchEmbedder": {
"source": "huggingFace", // locally computed embeddings using a model from the Hugging Face Hub
"model": "baai/bge-base-en-v1.5",
"revision": "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"
},
"indexingEmbedder": {
"source": "rest", // remotely computed embeddings using Hugging Face inference endpoints
"url": "<https://URL.endpoints.huggingface.cloud>",
"apiKey": "hf_XXXXXXX",
"documentTemplate": "Your {{doc.template}}",
"request": {
"inputs": [
"{{text}}",
"{{..}}"
]
},
"response": [
"{{embedding}}",
"{{..}}"
]
}
}
}
}
huggingFace source) with a Cloudflare AI worker{
"embedders": {
"text": {
"source": "composite",
"searchEmbedder": {
"source": "huggingFace",
"model": "baai/bge-base-en-v1.5",
"revision": "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a",
"pooling": "forceMean"
},
"indexingEmbedder": {
"source": "rest",
"url": "<https://api.cloudflare.com/client/v4/accounts/ACCOUNT_NUMBER/ai/run/@cf/baai/bge-base-en-v1.5>",
"apiKey": "API_KEY",
"documentTemplate": "Your {{doc.template}}",
"request": {
"text": [
"{{text}}",
"{{..}}"
]
},
"response": {
"result": {
"data": [
"{{embedding}}",
"{{..}}"
]
}
}
}
}
}
}
embedders.sources can now contain the value composite| Name | Description | Example |
|---|---|---|
| e.g. infos.log_level | e.g. “value of --log-level” | e.g. “debug” |
infos.experimental_composite_embedders |
true if the compositeEmbedders feature is set to true for this instance, otherwise false |
false |
composite_embedders sent with Experimental features Updated event |
true if the compositeEmbedders feature is set to true after that call to /experimental-features |
true |
"composite" source cannot be nested inside of a composite embedder, trying to set searchEmbedder.source or indexingEmbedder.source to "composite" will return a 400 invalid_settings_embedder
.embedders.test.searchEmbedder.source: Source composite is not available in a nested embedder