Generating Embeddings in Oracle – ONNX for Images & Cohere for Text
π Overview
In our first blog, we introduced the basics of vector search and how Oracle 23ai enables AI-native querying.
Now, let’s explore how to generate embeddings, the core building blocks for vector search, from:
-
πΌ️ Images using ONNX models
-
π§ Text using Cohere embeddings API
Both image and text vectors can be stored in Oracle's VECTOR data type and indexed for fast similarity search.
πΌ️ Part 1: Image Embeddings Using ONNX in Oracle
In modern applications, image embeddings are crucial for enabling tasks like image similarity, face recognition, object detection, and product search. Oracle Database 23ai empowers you to do this directly inside the database using OML4Py (Oracle Machine Learning for Python) and ONNX Runtime.
π§ What Are Image Embeddings?
Image embeddings are fixed-size numerical representations of images generated by deep learning models. These embeddings capture the semantic and visual features of an image in a compact vector form.
⚙️ What Is ONNX and Why Use It?
ONNX (Open Neural Network Exchange) is an open format to represent machine learning models. It allows you to export models trained in frameworks like PyTorch or TensorFlow and run them anywhere — including inside Oracle — using ONNX Runtime.
ONNX models are:
-
Pretrained
-
Optimized for inference
-
Easy to integrate inside OML4Py
For example:
-
Two visually similar product images will have closely located vectors in embedding space.
-
These vectors can then be stored in Oracle’s
VECTORcolumn and compared usingVECTOR_DISTANCE()for similarity search.
Let’s break it down step by step.
Oracle Database 23ai supports Python execution inside the database using OML4Py. This allows us to:
-
Load a pre-trained ONNX model like EfficientNet
-
Read image BLOBs from a table
-
Preprocess the image
-
Generate a feature vector
-
Store it in a VECTOR column
✅ Oracle Setup Requirements
To use ONNX inside Oracle:
-
Oracle Database 23ai with OML4Py enabled
-
Python environment inside the database (OML4Py container or ORDS backend)
-
onnxruntimeandPillow(for image processing) -
ONNX model (e.g.,
efficientnet-lite4.onnx,arcface.onnx)
π¦ Step-by-Step Process
Step 1: Prepare Your ONNX Model
Download a pretrained ONNX model like:
Store it in a directory accessible from your Python environment.
Step 2: Read and Preprocess the Image
Images are often stored in Oracle as BLOBs. You’ll extract them using OML4Py or external Python, then preprocess as needed (resize, normalize, etc.).
Step 3: Run ONNX Inference
Use onnxruntime to load the model and run inference.
This gives you a vector of float32 values, e.g., 512-dimensional or 1280-dimensional based on the model.
Step 4: Store in Oracle VECTOR Column
Create a table to store embeddings:
Use Python to insert the vector:
Now, your Oracle DB has semantically meaningful image vectors ready for fast similarity search!
π Use Case: Image-to-Image Similarity
With these embeddings, you can perform SQL-based vector search like:
This fetches the 5 most visually similar images to the input.
π ️ Bonus Tip: Batch Embedding from Oracle BLOBs
You can loop through rows in Python that fetch image BLOBs from Oracle, convert them to PIL format, generate embeddings, and store them back—all in a batched, automated pipeline.
π― Summary
✅ You’ve now learned how to:
-
Load and preprocess images
-
Run ONNX inference using
onnxruntime -
Generate embeddings for BLOBs in Oracle
-
Store and search these vectors using SQL
This forms the core vector pipeline for image search, directly integrated with Oracle 23ai.
In today’s intelligent applications, semantic understanding of text is key to enhancing user experience — whether it's matching questions to answers, retrieving similar documents, or understanding intent.
Oracle 23ai’s native VECTOR support allows you to store and search text embeddings efficiently. While Oracle doesn’t currently generate embeddings natively, you can use third-party APIs like Cohere to generate state-of-the-art text embeddings — then store them in your Oracle database.
π§ What Are Text Embeddings?
Text embeddings are vector representations of words, sentences, or documents that capture semantic meaning rather than just lexical similarity.
For example:
“How to create a table in Oracle” and “Steps to define a table in SQL” are different strings — but their embeddings are close in vector space, enabling semantic search.

✅ Why Cohere?
-
High-quality language embeddings
-
RESTful interface (easy to integrate with Oracle Functions or APEX)
-
Ideal for building hybrid vector search on text, documents, and FAQs
π Requirements:
-
Cohere API key
-
Python
requestspackage -
Internet connectivity on Oracle Function (or external middleware)
embedding can be stored into a VECTOR column and used for semantic search in Oracle just like image vectors.Populate it using Python scripts or Oracle APEX APIs depending on your architecture.
Imagine a use case where:
-
A user uploads a product image → You search using image vector
-
A user types a product query → You search using Cohere text vector
Both can be retrieved from the same VECTOR column using Oracle’s VECTOR_DISTANCE() function.
π Use Case: Text-to-Text Matching
You can now enable:
-
FAQ match from user input
-
Intent classification
-
Semantic document search
-
Tag or label suggestion for large text data
This powers smarter enterprise search and NLP within Oracle.
π‘ Advanced Tip: Automating via Oracle Functions
You can wrap Cohere API calls inside an Oracle Cloud Function and call it via APEX or PL/SQL:
-
Accept text as input
-
Call Cohere API
-
Return vector
-
Store in
VECTORcolumn
π Conclusion
You’ve now seen how to generate image embeddings with ONNX and text embeddings with Cohere. In the next post, we’ll cover:
-
Storing vectors efficiently in Oracle
-
Creating VECTOR indexes
-
Performing fast ANN (approximate nearest neighbor) search in SQL
π Up Next:
Blog 3: Storing and Indexing Vectors in Oracle Database




Comments
Post a Comment