Generating Embeddings in Oracle – ONNX for Images & Cohere for Text



πŸš€ Overview

In our first blog, we introduced the basics of vector search and how Oracle 23ai enables AI-native querying.

Now, let’s explore how to generate embeddings, the core building blocks for vector search, from:

  • πŸ–Ό️ Images using ONNX models

  • 🧠 Text using Cohere embeddings API

Both image and text vectors can be stored in Oracle's VECTOR data type and indexed for fast similarity search.


πŸ–Ό️ Part 1: Image Embeddings Using ONNX in Oracle

In modern applications, image embeddings are crucial for enabling tasks like image similarity, face recognition, object detection, and product search. Oracle Database 23ai empowers you to do this directly inside the database using OML4Py (Oracle Machine Learning for Python) and ONNX Runtime.

🧠 What Are Image Embeddings?

Image embeddings are fixed-size numerical representations of images generated by deep learning models. These embeddings capture the semantic and visual features of an image in a compact vector form.

⚙️ What Is ONNX and Why Use It?

ONNX (Open Neural Network Exchange) is an open format to represent machine learning models. It allows you to export models trained in frameworks like PyTorch or TensorFlow and run them anywhere — including inside Oracle — using ONNX Runtime.

ONNX models are:

  • Pretrained

  • Optimized for inference

  • Easy to integrate inside OML4Py

For example:

  • Two visually similar product images will have closely located vectors in embedding space.

  • These vectors can then be stored in Oracle’s VECTOR column and compared using VECTOR_DISTANCE() for similarity search.

Let’s break it down step by step.

Oracle Database 23ai supports Python execution inside the database using OML4Py. This allows us to:

  • Load a pre-trained ONNX model like EfficientNet

  • Read image BLOBs from a table

  • Preprocess the image

  • Generate a feature vector

  • Store it in a VECTOR column



✅ Oracle Setup Requirements

To use ONNX inside Oracle:

  • Oracle Database 23ai with OML4Py enabled

  • Python environment inside the database (OML4Py container or ORDS backend)

  • onnxruntime and Pillow (for image processing)

  • ONNX model (e.g., efficientnet-lite4.onnx, arcface.onnx)

πŸ“¦ Step-by-Step Process

Step 1: Prepare Your ONNX Model

Download a pretrained ONNX model like:

Store it in a directory accessible from your Python environment.


Step 2: Read and Preprocess the Image

Images are often stored in Oracle as BLOBs. You’ll extract them using OML4Py or external Python, then preprocess as needed (resize, normalize, etc.).


from PIL import Image import numpy as np img = Image.open("sample.jpg").resize((224, 224)) img_array = np.asarray(img).astype(np.float32) / 255.0 img_array = np.transpose(img_array, (2, 0, 1)) # Channels first if required img_array = img_array[np.newaxis, ...] # Add batch dimension

Step 3: Run ONNX Inference

Use onnxruntime to load the model and run inference.


import onnxruntime as ort session = ort.InferenceSession("efficientnet-lite4.onnx") input_name = session.get_inputs()[0].name output = session.run(None, {input_name: img_array}) embedding = output[0].flatten()

This gives you a vector of float32 values, e.g., 512-dimensional or 1280-dimensional based on the model.


Step 4: Store in Oracle VECTOR Column

Create a table to store embeddings:


CREATE TABLE image_vectors ( id NUMBER GENERATED BY DEFAULT AS IDENTITY, image BLOB, label VARCHAR2(100), embedding VECTOR(1280) -- Match your model's output size );

Use Python to insert the vector:


import cx_Oracle conn = cx_Oracle.connect("user/password@host:port/service") cursor = conn.cursor() sql = "INSERT INTO image_vectors (image, label, embedding) VALUES (:1, :2, :3)" cursor.execute(sql, [img_blob, "shoe", embedding.tolist()]) conn.commit()

Now, your Oracle DB has semantically meaningful image vectors ready for fast similarity search!


πŸ” Use Case: Image-to-Image Similarity

With these embeddings, you can perform SQL-based vector search like:

SELECT label
FROM image_vectors ORDER BY VECTOR_DISTANCE(embedding, VECTOR_INPUT(:embedding)) FETCH FIRST 5 ROWS ONLY;

This fetches the 5 most visually similar images to the input.


πŸ› ️ Bonus Tip: Batch Embedding from Oracle BLOBs

You can loop through rows in Python that fetch image BLOBs from Oracle, convert them to PIL format, generate embeddings, and store them back—all in a batched, automated pipeline.


🎯 Summary

✅ You’ve now learned how to:

  • Load and preprocess images

  • Run ONNX inference using onnxruntime

  • Generate embeddings for BLOBs in Oracle

  • Store and search these vectors using SQL

This forms the core vector pipeline for image search, directly integrated with Oracle 23ai.




πŸ€–Part 2: Text Embeddings Using Cohere (Third-Party API)

In today’s intelligent applications, semantic understanding of text is key to enhancing user experience — whether it's matching questions to answers, retrieving similar documents, or understanding intent.

Oracle 23ai’s native VECTOR support allows you to store and search text embeddings efficiently. While Oracle doesn’t currently generate embeddings natively, you can use third-party APIs like Cohere to generate state-of-the-art text embeddings — then store them in your Oracle database.

🧠 What Are Text Embeddings?

Text embeddings are vector representations of words, sentences, or documents that capture semantic meaning rather than just lexical similarity.

For example:
How to create a table in Oracle” and “Steps to define a table in SQL” are different strings — but their embeddings are close in vector space, enabling semantic search.




✅ Why Cohere?

  • High-quality language embeddings

  • RESTful interface (easy to integrate with Oracle Functions or APEX)

  • Ideal for building hybrid vector search on text, documents, and FAQs

πŸ” Requirements:

  • Cohere API key

  • Python requests package

  • Internet connectivity on Oracle Function (or external middleware)

πŸ“© Example Python Code:

import requests

headers = {
    "Authorization": "Bearer YOUR_COHERE_API_KEY",
    "Content-Type": "application/json",
}

data = {
    "texts": ["Find me similar blogs on Oracle AI."],
    "model": "embed-english-v3.0"
}

response = requests.post("https://api.cohere.ai/v1/embed", headers=headers, json=data)
embedding = response.json()["embeddings"][0]

This embedding can be stored into a VECTOR column and used for semantic search in Oracle just like image vectors.

🧠 Where to Store the Embeddings?

Create a table like this:

CREATE TABLE vector_store (
  id          NUMBER,
  type        VARCHAR2(10), -- 'image' or 'text'
  source      CLOB,         -- original content
  embedding   VECTOR(512)   -- adjust based on model
);

Populate it using Python scripts or Oracle APEX APIs depending on your architecture.


πŸ” Use Case: Hybrid Vector Search

Imagine a use case where:

  • A user uploads a product image → You search using image vector

  • A user types a product query → You search using Cohere text vector

Both can be retrieved from the same VECTOR column using Oracle’s VECTOR_DISTANCE() function.

πŸ”„ Use Case: Text-to-Text Matching

You can now enable:

  • FAQ match from user input

  • Intent classification

  • Semantic document search

  • Tag or label suggestion for large text data

This powers smarter enterprise search and NLP within Oracle.

πŸ’‘ Advanced Tip: Automating via Oracle Functions

You can wrap Cohere API calls inside an Oracle Cloud Function and call it via APEX or PL/SQL:

  1. Accept text as input

  2. Call Cohere API

  3. Return vector

  4. Store in VECTOR column


πŸ”š Conclusion

You’ve now seen how to generate image embeddings with ONNX and text embeddings with Cohere. In the next post, we’ll cover:

  • Storing vectors efficiently in Oracle

  • Creating VECTOR indexes

  • Performing fast ANN (approximate nearest neighbor) search in SQL

πŸ”œ Up Next:

Blog 3: Storing and Indexing Vectors in Oracle Database



Comments

Popular posts from this blog

Setting Up Monitoring and Alerts in OCI for Your Resources

Introduction to Oracle Vector Search – Concepts, Requirements & Use Cases