Creation and Usage of Embedding Models
Creating Embedding Model Classes
Similar to chat model classes, you can use create_openai_compatible_embedding to create an integrated embedding model class. This function accepts the following parameters:
| Parameter | Description |
|---|---|
embedding_provider |
Embedding model provider name, e.g., vllm. Must start with a letter or number, can only contain letters, numbers, and underscores, with a maximum length of 20 characters.Type: strRequired: Yes |
base_url |
Default API endpoint for the model provider. Type: strRequired: No |
embedding_model_cls_name |
Embedding model class name (must comply with Python class naming conventions). Default value is {Provider}Embeddings (where {Provider} is the capitalized provider name).Type: strRequired: No |
Similarly, we use create_openai_compatible_embedding to integrate vLLM's embedding model.
from langchain_dev_utils.embeddings.adapters import create_openai_compatible_embedding
VLLMEmbeddings = create_openai_compatible_embedding(
embedding_provider="vllm",
base_url="http://localhost:8000/v1",
embedding_model_cls_name="VLLMEmbeddings",
)
embedding = VLLMEmbeddings(model="qwen3-embedding-4b")
print(embedding.embed_query("Hello"))
base_url can also be omitted. If not provided, the library will read the environment variable VLLM_API_BASE by default:
At this point, the code can omit base_url:
from langchain_dev_utils.embeddings.adapters import create_openai_compatible_embedding
VLLMEmbeddings = create_openai_compatible_embedding(
embedding_provider="vllm",
embedding_model_cls_name="VLLMEmbeddings",
)
embedding = VLLMEmbeddings(model="qwen3-embedding-4b")
print(embedding.embed_query("Hello"))
Note: The above code successfully runs assuming the environment variable VLLM_API_KEY is configured. Although vLLM itself does not require an API Key, the embedding model class initialization requires one. Therefore, please set this variable first, for example:
Using the Embedding Model Class
Here, we use the previously created VLLMEmbeddings class to initialize an embedding model instance.
Vectorizing Queries
Similarly, asynchronous invocation is also supported:
embedding = VLLMEmbeddings(model="qwen3-embedding-4b")
res = await embedding.aembed_query("Hello")
print(res)
Vectorizing a List of Strings
documents = ["Hello", "Hello, I am Zhang San"]
embedding = VLLMEmbeddings(model="qwen3-embedding-4b")
print(embedding.embed_documents(documents))
documents = ["Hello", "Hello, I am Zhang San"]
embedding = VLLMEmbeddings(model="qwen3-embedding-4b")
res = await embedding.aembed_documents(documents)
print(res)
Embedding Model Compatibility Notes
OpenAI-compatible embedding APIs generally exhibit good compatibility, but the following differences should be noted:
-
check_embedding_ctx_length: Set toTrueonly when using the official OpenAI embedding service; for all other embedding models, set it toFalse. -
dimensions: If the model supports custom vector dimensions (e.g., 1024, 4096), you can directly pass this parameter. -
chunk_size: The maximum number of texts that can be processed in a single API call. For example, achunk_sizeof 10 means a single request can vectorize up to 10 texts. -
Single-text token limit: Cannot be controlled via parameters; must be ensured during preprocessing and chunking stages.
Note
Similarly, this function uses pydantic's create_model under the hood to create the embedding model class, which incurs a certain performance overhead. It is recommended to create the integration class during the project startup phase and avoid dynamic creation later on.
Best Practice
When connecting to an OpenAI-compatible API embedding model provider, you can directly use langchain-openai's OpenAIEmbeddings and point base_url and api_key to your provider's service. Embedding model API compatibility is usually better: in most cases, you can directly use OpenAIEmbeddings with check_embedding_ctx_length=False.