Represent the
domain text_type for task_objective:domainis optional, and it specifies the domain of the text, e.g. science, finance, medicine, etc.text_typeis required, and it specifies the encoding unit, e.g. sentence, document, paragraph, etc.task_objectiveis optional, and it specifies the objective of embedding, e.g. retrieve a document, classify the sentence, etc.
| Argument | Type | Default | Description |
|---|---|---|---|
name | str | ”hkunlp/instructor-base” | The name of the model to use |
batch_size | int | 32 | The batch size to use when generating embeddings |
device | str | "cpu" | The device to use when generating embeddings |
show_progress_bar | bool | True | Whether to show a progress bar when generating embeddings |
normalize_embeddings | bool | True | Whether to normalize the embeddings |
quantize | bool | False | Whether to quantize the model |
source_instruction | str | "represent the document for retrieval" | The instruction for the source column |
query_instruction | str | "represent the document for retrieving the most similar documents" | The instruction for the query |