Skip to main content
Instructor is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g. classification, retrieval, clustering, text evaluation, etc.) and domains (e.g. science, finance, etc.) by simply providing the task instruction, without any finetuning. If you want to calculate customized embeddings for specific sentences, you can follow the unified template to write instructions.
Represent the domain text_type for task_objective:
  • domain is optional, and it specifies the domain of the text, e.g. science, finance, medicine, etc.
  • text_type is required, and it specifies the encoding unit, e.g. sentence, document, paragraph, etc.
  • task_objective is optional, and it specifies the objective of embedding, e.g. retrieve a document, classify the sentence, etc.
More information about the model can be found at the source URL.
ArgumentTypeDefaultDescription
namestr”hkunlp/instructor-base”The name of the model to use
batch_sizeint32The batch size to use when generating embeddings
devicestr"cpu"The device to use when generating embeddings
show_progress_barboolTrueWhether to show a progress bar when generating embeddings
normalize_embeddingsboolTrueWhether to normalize the embeddings
quantizeboolFalseWhether to quantize the model
source_instructionstr"represent the document for retrieval"The instruction for the source column
query_instructionstr"represent the document for retrieving the most similar documents"The instruction for the query