> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal Data (Blobs)

> Learn how to store and query multimodal data (images, audio, video) directly in LanceDB using binary columns.

export const RsBlobApiIngest = "let blob_rows = vec![\n    (1_i64, b\"fake_video_bytes_1\".to_vec()),\n    (2_i64, b\"fake_video_bytes_2\".to_vec()),\n];\n\nlet blob_schema = Arc::new(blob_schema);\nlet blob_batch = RecordBatch::try_new(\n    blob_schema.clone(),\n    vec![\n        Arc::new(Int64Array::from_iter_values(blob_rows.iter().map(|row| row.0))),\n        Arc::new(LargeBinaryArray::from_iter_values(\n            blob_rows.iter().map(|row| row.1.as_slice()),\n        )),\n    ],\n)\n.unwrap();\nlet blob_reader = RecordBatchIterator::new(vec![Ok(blob_batch)].into_iter(), blob_schema);\nlet blob_table = db\n    .create_table(\"videos\", blob_reader)\n    .mode(CreateTableMode::Overwrite)\n    .execute()\n    .await\n    .unwrap();\n";

export const TsBlobApiIngest = "const blobData = lancedb.makeArrowTable(\n  [\n    { id: 1, video: Buffer.from(\"fake_video_bytes_1\") },\n    { id: 2, video: Buffer.from(\"fake_video_bytes_2\") },\n  ],\n  { schema: blobSchema },\n);\nconst blobTable = await db.createTable(\"videos\", blobData, {\n  mode: \"overwrite\",\n});\n";

export const BlobApiIngest = "import lancedb\n\ndb = lancedb.connect(db_path_factory(\"blob_db\"))\n    \n# Create sample data\ndata = [\n    {\"id\": 1, \"video\": b\"fake_video_bytes_1\"},\n    {\"id\": 2, \"video\": b\"fake_video_bytes_2\"}\n]\n    \n# Create the table\ntbl = db.create_table(\"videos\", data=data, schema=schema)\n";

export const RsBlobApiSchema = "let blob_metadata = HashMap::from([(\n    \"lance-encoding:blob\".to_string(),\n    \"true\".to_string(),\n)]);\nlet blob_schema = Schema::new(vec![\n    Field::new(\"id\", DataType::Int64, false),\n    Field::new(\"video\", DataType::LargeBinary, true).with_metadata(blob_metadata),\n]);\n";

export const TsBlobApiSchema = "const blobSchema = new arrow.Schema([\n  new arrow.Field(\"id\", new arrow.Int64()),\n  new arrow.Field(\n    \"video\",\n    new arrow.LargeBinary(),\n    true,\n    new Map([[\"lance-encoding:blob\", \"true\"]]),\n  ),\n]);\n";

export const BlobApiSchema = "import pyarrow as pa\n\n# Define schema with Blob API metadata for lazy loading\nschema = pa.schema([\n    pa.field(\"id\", pa.int64()),\n    pa.field(\n        \"video\", \n        pa.large_binary(), \n        metadata={\"lance-encoding:blob\": \"true\"} # Enable Blob API\n    ),\n])\n";

export const RsProcessResults = "for batch in &results {\n    let filenames = batch\n        .column_by_name(\"filename\")\n        .unwrap()\n        .as_any()\n        .downcast_ref::<StringArray>()\n        .unwrap();\n    let images = batch\n        .column_by_name(\"image_blob\")\n        .unwrap()\n        .as_any()\n        .downcast_ref::<BinaryArray>()\n        .unwrap();\n\n    for row in 0..batch.num_rows() {\n        let image_bytes = images.value(row);\n        println!(\n            \"Retrieved image: {}, Byte length: {}\",\n            filenames.value(row),\n            image_bytes.len()\n        );\n    }\n}\n";

export const TsProcessResults = "for (const row of results) {\n  const imageBytes = row.image_blob as Uint8Array;\n  console.log(\n    `Retrieved image: ${row.filename}, Byte length: ${imageBytes.length}`,\n  );\n}\n";

export const ProcessResults = "# Convert back to PIL Image\nfor _, row in results.iterrows():\n    image_bytes = row['image_blob']\n    image = Image.open(io.BytesIO(image_bytes))\n    print(f\"Retrieved image: {row['filename']}, Size: {image.size}\")\n    # You can now use 'image' with other libraries or display it\n";

export const RsSearchData = "let query_vector = vec![0.1_f32; 128];\nlet results = table\n    .query()\n    .nearest_to(query_vector)\n    .unwrap()\n    .limit(1)\n    .execute()\n    .await\n    .unwrap()\n    .try_collect::<Vec<_>>()\n    .await\n    .unwrap();\n";

export const TsSearchData = "const queryVector = Array.from({ length: 128 }, (_, i) => (i % 16) / 16);\nconst results = await tbl.search(queryVector).limit(1).toArray();\n";

export const SearchData = "# Search for similar images\nquery_vector = np.random.rand(128).astype(np.float32)\nresults = tbl.search(query_vector).limit(1).to_pandas()\n";

export const RsIngestData = "let schema = Arc::new(schema);\nlet image_batch = RecordBatch::try_new(\n    schema.clone(),\n    vec![\n        Arc::new(Int32Array::from_iter_values(data.iter().map(|row| row.0))),\n        Arc::new(StringArray::from_iter_values(data.iter().map(|row| row.1))),\n        Arc::new(\n            FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(\n                data.iter()\n                    .map(|row| Some(row.2.iter().copied().map(Some).collect::<Vec<_>>())),\n                128,\n            ),\n        ),\n        Arc::new(BinaryArray::from_iter_values(\n            data.iter().map(|row| row.3.as_slice()),\n        )),\n        Arc::new(StringArray::from_iter_values(data.iter().map(|row| row.4))),\n    ],\n)\n.unwrap();\nlet image_reader = RecordBatchIterator::new(vec![Ok(image_batch)].into_iter(), schema.clone());\nlet table = db\n    .create_table(\"images\", image_reader)\n    .mode(CreateTableMode::Overwrite)\n    .execute()\n    .await\n    .unwrap();\n";

export const TsIngestData = "const multimodalData = lancedb.makeArrowTable(data, { schema });\nconst tbl = await db.createTable(\"images\", multimodalData, {\n  mode: \"overwrite\",\n});\n";

export const IngestData = "tbl = db.create_table(\"images\", data=data, schema=schema, mode=\"overwrite\")\n";

export const RsDefineSchema = "let schema = Schema::new(vec![\n    Field::new(\"id\", DataType::Int32, false),\n    Field::new(\"filename\", DataType::Utf8, false),\n    Field::new(\n        \"vector\",\n        DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 128),\n        false,\n    ),\n    Field::new(\"image_blob\", DataType::Binary, false),\n    Field::new(\"label\", DataType::Utf8, false),\n]);\n";

export const TsDefineSchema = "const schema = new arrow.Schema([\n  new arrow.Field(\"id\", new arrow.Int32()),\n  new arrow.Field(\"filename\", new arrow.Utf8()),\n  new arrow.Field(\n    \"vector\",\n    new arrow.FixedSizeList(\n      128,\n      new arrow.Field(\"item\", new arrow.Float32(), true),\n    ),\n  ),\n  new arrow.Field(\"image_blob\", new arrow.Binary()),\n  new arrow.Field(\"label\", new arrow.Utf8()),\n]);\n";

export const DefineSchema = "# Define schema explictly to ensure image_blob is treated as binary\nschema = pa.schema([\n    pa.field(\"id\", pa.int32()),\n    pa.field(\"filename\", pa.string()),\n    pa.field(\"vector\", pa.list_(pa.float32(), 128)),\n    pa.field(\"image_blob\", pa.binary()), # Important: Use pa.binary() for blobs\n    pa.field(\"label\", pa.string())\n])\n";

export const RsCreateDummyData = "let create_dummy_image = |color: u8| -> Vec<u8> {\n    let mut png_like = vec![137, 80, 78, 71, 13, 10, 26, 10];\n    png_like.push(color);\n    png_like\n};\n\nlet data = vec![\n    (\n        1_i32,\n        \"red_square.png\",\n        vec![0.1_f32; 128],\n        create_dummy_image(1),\n        \"red\",\n    ),\n    (\n        2_i32,\n        \"blue_square.png\",\n        vec![0.2_f32; 128],\n        create_dummy_image(2),\n        \"blue\",\n    ),\n];\n";

export const TsCreateDummyData = "const createDummyImage = (color: string): Uint8Array => {\n  const pngHeader = Uint8Array.from([137, 80, 78, 71, 13, 10, 26, 10]);\n  return Buffer.concat([Buffer.from(pngHeader), Buffer.from(color, \"utf8\")]);\n};\n\nconst data = [\n  {\n    id: 1,\n    filename: \"red_square.png\",\n    vector: Array.from({ length: 128 }, (_, i) => (i % 16) / 16),\n    image_blob: createDummyImage(\"red\"),\n    label: \"red\",\n  },\n  {\n    id: 2,\n    filename: \"blue_square.png\",\n    vector: Array.from({ length: 128 }, (_, i) => ((i + 8) % 16) / 16),\n    image_blob: createDummyImage(\"blue\"),\n    label: \"blue\",\n  },\n];\n";

export const CreateDummyData = "# Create some dummy images\ndef create_dummy_image(color):\n    img = Image.new('RGB', (100, 100), color=color)\n    buf = io.BytesIO()\n    img.save(buf, format='PNG')\n    return buf.getvalue()\n\n# Create dataset with metadata, vectors, and image blobs\ndata = [\n    {\n        \"id\": 1,\n        \"filename\": \"red_square.png\",\n        \"vector\": np.random.rand(128).astype(np.float32),\n        \"image_blob\": create_dummy_image('red'),\n        \"label\": \"red\"\n    },\n    {\n        \"id\": 2,\n        \"filename\": \"blue_square.png\",\n        \"vector\": np.random.rand(128).astype(np.float32),\n        \"image_blob\": create_dummy_image('blue'),\n        \"label\": \"blue\"\n    }\n]\n";

export const RsMultimodalImports = "use std::collections::HashMap;\nuse std::sync::Arc;\n\nuse arrow_array::types::Float32Type;\nuse arrow_array::{\n    BinaryArray, FixedSizeListArray, Int32Array, Int64Array, LargeBinaryArray, RecordBatch,\n    RecordBatchIterator, StringArray,\n};\nuse arrow_schema::{DataType, Field, Schema};\nuse futures_util::TryStreamExt;\nuse lancedb::connect;\nuse lancedb::database::CreateTableMode;\nuse lancedb::query::{ExecutableQuery, QueryBase};\n";

export const TsMultimodalImports = "import * as arrow from \"apache-arrow\";\nimport { Buffer } from \"node:buffer\";\nimport * as lancedb from \"@lancedb/lancedb\";\n";

export const MultimodalImports = "import lancedb\nimport pyarrow as pa\nimport pandas as pd\nimport numpy as np\nimport io\nfrom PIL import Image\n";

LanceDB handles multimodal data—images, audio, video, and PDF files—natively by storing the raw bytes in a binary column alongside your vectors and metadata. This approach simplifies your data infrastructure by keeping the raw assets and their embeddings in the same database, eliminating the need for separate object storage for many use cases.

This guide demonstrates how to ingest, store, and retrieve image data using standard binary columns, and also introduces the **Lance Blob API** for optimized handling of larger multimodal files.

## Store binary data

To store binary data, define a binary Arrow field in your schema (`pa.binary()` in Python, `Binary` in TypeScript, and `DataType::Binary` in Rust).

### 1. Setup and imports

First, import the necessary libraries for LanceDB and Arrow in your SDK.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {MultimodalImports}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsMultimodalImports}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsMultimodalImports}
  </CodeBlock>
</CodeGroup>

### 2. Prepare data

For this example, we'll create some dummy in-memory images. In a real application, you would read these from files or an API. The key is to convert your data (image, audio, etc.) into a raw `bytes` object.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {CreateDummyData}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsCreateDummyData}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsCreateDummyData}
  </CodeBlock>
</CodeGroup>

### 3. Define the schema

When creating the table, it is **highly recommended** to define the schema explicitly. This ensures that your binary data is correctly interpreted as a `binary` type by Arrow/LanceDB and not as a generic string or list.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {DefineSchema}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsDefineSchema}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsDefineSchema}
  </CodeBlock>
</CodeGroup>

### 4. Ingest data

Now, create the table using the data and the defined schema.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {IngestData}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsIngestData}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsIngestData}
  </CodeBlock>
</CodeGroup>

## Retrieve and use blobs

When you search your LanceDB table, you can retrieve the binary column just like any other metadata.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {SearchData}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsSearchData}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsSearchData}
  </CodeBlock>
</CodeGroup>

### Convert bytes back to objects

Once you have the bytes back from the search result, you can decode them into the original format (for example, an image object or audio buffer).

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ProcessResults}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsProcessResults}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsProcessResults}
  </CodeBlock>
</CodeGroup>

## Large Blobs (Blob API)

For larger files like high-resolution images or videos, Lance provides a specialized **Blob API**. By using a large-binary Arrow type (`pa.large_binary()` in Python, `LargeBinary` in TypeScript, and `DataType::LargeBinary` in Rust) and specific metadata, you enable **lazy loading** and optimized encoding. This allows you to work with massive datasets without loading all binary data into memory upfront.

### 1. Define a blob schema

To use the Blob API, you must mark the column with `{"lance-encoding:blob": "true"}` metadata.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {BlobApiSchema}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsBlobApiSchema}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsBlobApiSchema}
  </CodeBlock>
</CodeGroup>

### 2. Ingest large blobs

You can then ingest data normally, and Lance will handle the optimized storage.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {BlobApiIngest}
  </CodeBlock>

  <CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
    {TsBlobApiIngest}
  </CodeBlock>

  <CodeBlock filename="Rust" language="Rust" icon="rust">
    {RsBlobApiIngest}
  </CodeBlock>
</CodeGroup>

<Card>
  For more advanced usage, including random access and file-like reading of blobs, see the
  Lance format's [blob API documentation](https://lance.org/guide/blob/).
</Card>

## Other modalities

The `pa.binary()` and `pa.large_binary()` types are universal. You can use this same pattern for other types of multimodal data:

* **Audio:** Read `.wav` or `.mp3` files as bytes.
* **Video:** Store video transitions or full clips using the Blob API.
* **PDFs/Documents:** Store the raw file content for document search.
