🧩 Blog 3: Storing and Indexing Vectors with Oracle’s Native Vector Type


In the previous blogs, we explored what Vector Search is and how to generate image embeddings using ONNX and text embeddings using Cohere. Now, it’s time to understand how to store, index, and efficiently search these vectors using Oracle Database’s native VECTOR data type, introduced in Oracle Database 23ai

πŸ”Έ What is Oracle’s Native VECTOR Type?

Oracle 23ai introduces a new VECTOR data type, specifically designed for high-dimensional embeddings (such as from AI models). This enables developers and data scientists to store and query AI-generated vectors efficiently without relying on external vector databases.



✅ Key Features:

  • Fixed-length vector storage (e.g., VECTOR(768))

  • Optimized vector distance functions

  • Indexing support for fast nearest neighbor search

  • Seamless SQL integration

🧠 Why Store Vectors in Oracle?

Traditionally, storing high-dimensional vectors required external vector databases like FAISS, Pinecone, or Milvus. However, with Oracle 23ai, you can now store and process vectors natively within the same database where your enterprise data already resides. This reduces latency, eliminates complex ETL pipelines, and enhances security, scalability, and integration with existing apps, especially Oracle APEX, REST APIs, and PL/SQL.

πŸ”’ Understanding Vector Dimensions and Size

Each AI model generates vectors of a specific fixed size:

  • BERT / Cohere: 768-dimensions

  • CLIP (OpenAI): 512-dimensions

  • EfficientNet B0: 128-dimensions (for images)

It's critical that your Oracle VECTOR(n) column matches the exact length of these outputs. Misalignment can cause insert failures or incorrect distance calculations. You can always check embedding dimensions with:

len(embedding_vector)


πŸ—ƒ️ Creating a Table with VECTOR Type

You can now create a table in Oracle with a VECTOR column as easily as any other datatype.

CREATE TABLE image_vectors (
    id          NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    image_name  VARCHAR2(255),
    embedding   VECTOR(128)  -- Example: EfficientNetB0 outputs 128-dim vector
);

Similarly, for text vectors:

CREATE TABLE text_vectors (
    id          NUMBER GENERATED BY DEFAULT AS IDENTITY,
    text_input  CLOB,
    embedding   VECTOR(768)  -- Cohere embed model
);

πŸš€ Inserting Vectors into the Table

Once you've generated embeddings using ONNX (for images) or Cohere (for text), you can insert them like any other data type. From Python:

sql = "INSERT INTO image_vectors (image_name, embedding) VALUES (:1, :2)"
cursor.execute(sql, ["image1.jpg", embedding_vector])

Ensure the embedding matches the declared VECTOR(n) length in the table.

πŸ” Querying with VECTOR_DISTANCE
Oracle provides built-in SQL functions for comparing vectors. The most commonly used one is:
SELECT image_name
FROM image_vectors
ORDER BY VECTOR_DISTANCE(embedding, VECTOR_INPUT(:query_vector))
FETCH FIRST 5 ROWS ONLY;

  • VECTOR_INPUT() accepts a vector input for comparison.

  • VECTOR_DISTANCE() computes Euclidean distance or other supported distance metrics.

You can also use VECTOR_COSINE_DISTANCE() for semantic comparisons.

🧱 Internal Storage Mechanics (Brief)

Oracle stores vectors using an optimized internal format that enables fast memory access and indexing. Unlike using raw BLOBs or VARCHAR2 arrays, the VECTOR type is designed specifically for numeric precision and vector algebra. This format ensures low-latency computation, even with thousands or millions of vectors, by leveraging SIMD (Single Instruction, Multiple Data) under the hood.

πŸ§ͺ Euclidean vs. Cosine Distance – When to Use What?

Oracle supports different types of vector similarity calculations:

  • VECTOR_DISTANCE(): Measures Euclidean (L2) distance – good for visual similarity (images).

  • VECTOR_COSINE_DISTANCE(): Measures cosine similarity – ideal for semantic similarity (text).

Choose the one that aligns with your data type:

  • Use Euclidean for images, object detection vectors.

  • Use Cosine for text embeddings, sentence similarity, or semantic search.

⚙️ Hybrid Vector + Metadata Filtering

In real-world apps, you often want to filter vector results by attributes like category, region, or date. Oracle allows you to combine standard SQL filters with vector ranking in a single query.

SELECT name

FROM document_vectors

WHERE language = 'EN'

ORDER BY VECTOR_COSINE_DISTANCE(embedding, VECTOR_INPUT(:query_vec))

FETCH FIRST 5 ROWS ONLY;

This gives you semantic + logical filtering together.


🧰 Tools and Libraries to Interact with VECTOR Data

You can interact with vector data in Oracle using familiar tools:

  • Python: oracledb or cx_Oracle to insert/query vectors.

  • SQLcl / SQL Developer: For manual DDL/DML and debugging.

  • APEX: Build dashboards and search interfaces powered by vector logic.

  • ORDS or Oracle Functions: Expose REST endpoints that accept embedding inputs and return ranked results.


πŸ”„ Updating or Deleting Vector Data

Updating or deleting vectors is straightforward, but if indexing is enabled, Oracle will also update the vector index to maintain query accuracy.

  • Use UPDATE to modify embeddings (e.g., with refined AI models).

  • Use DELETE to remove stale or invalid records.

  • Consider batching updates for large-scale retraining scenarios

⚡ Creating a VECTOR Index for Fast Search

To improve search performance, you can create an approximate vector index using the new VECTOR INDEX feature:

CREATE INDEX image_vector_idx

ON image_vectors (embedding)

INDEXTYPE IS VECTOR_INDEX

PARAMETERS ('ALGORITHM HNSW');

This uses HNSW (Hierarchical Navigable Small World) graph-based indexing, enabling sub-second ANN (Approximate Nearest Neighbor) queries.


🧠 Real-Time Use Case: Semantic Image or Text Search

Once data is indexed, you can:

  • Search for images similar to an input image vector

  • Search text documents based on semantic similarity

  • Power hybrid searches using both text + image vectors

This is especially useful in:

  • E-commerce product recommendations

  • AI-driven knowledge bases

  • Enterprise document retrieval systems

πŸ“Š Monitoring Index Performance

Use DBA_INDEXES and DBA_IND_COLUMNS to monitor vector index status and health. You can also check:

  • Query execution plans (EXPLAIN PLAN) for performance tuning.

  • Index build stats for HNSW (e.g., levels, neighbors).

  • System metrics to see memory/disk usage when vector sizes grow.

Pro tip: Partition your data if you’re dealing with millions of vectors.


πŸ§ͺ Performance Tips

  • Use appropriate vector length (VECTOR(128), VECTOR(768), etc.)

  • Index only high-volume or search-heavy vector columns

  • Combine with traditional filters for hybrid search (e.g., by category/date)


πŸ”š Conclusion

Oracle Database 23ai’s native VECTOR support makes it easy to store, index, and query AI embeddings using nothing more than SQL. You no longer need to rely on third-party vector databases — you can build powerful AI-enabled search systems right within Oracle.


✅ What’s Next?

In the next blog, we’ll explore:
πŸ“¦ Vector Search in Action — Using Oracle SQL and APEX to Build Real-Time AI-Powered Apps!

Stay tuned!


Comments

Popular posts from this blog

Introduction to Oracle Vector Search – Concepts, Requirements & Use Cases

Setting Up Monitoring and Alerts in OCI for Your Resources