Class ClipEmbeddingEngine

Namespace: VisioForge.Core.AI.Clip

Assembly: VisioForge.Core.AI.dll

A CLIP dual-tower embedding engine. It owns two ONNX sessions — a vision tower that turns an image into an embedding and a text tower that turns text into an embedding in the same space — so an image and a natural-language query can be compared by cosine similarity. Both towers include the CLIP projection head, so their outputs share the embedding dimension exposed by VisioForge.Core.AI.Clip.ClipEmbeddingEngine.Dimension. All outputs are L2-normalized.

public sealed class ClipEmbeddingEngine : IDisposable

Remarks

Input and output tensor names, and the embedding dimension, are read from the model metadata at VisioForge.Core.AI.Clip.ClipEmbeddingEngine.Init, so an fp16 or quantized re-export of the same model stays drop-in. Text is tokenized with the in-house ClipTokenizer (max length 77, begin/end-of-text wrapped). The engine is thread-safe for concurrent VisioForge.Core.AI.Clip.ClipEmbeddingEngine.EncodeImage(SkiaSharp.SKBitmap) and VisioForge.Core.AI.Clip.ClipEmbeddingEngine.EncodeText(System.String) calls (they use separate sessions, and ONNX Runtime Run is itself thread-safe).

Constructors

ClipEmbeddingEngine(VideoEmbeddingSettings)

Initializes a new instance of the VisioForge.Core.AI.Clip.ClipEmbeddingEngine class.

public ClipEmbeddingEngine(VideoEmbeddingSettings settings)

Parameters

settings VideoEmbeddingSettings: The video embedding settings carrying the CLIP model and tokenizer paths.

Exceptions

ArgumentNullException: Thrown when settings is null.

Properties

ActiveProvider

Gets the execution provider the vision session actually engaged. Valid after VisioForge.Core.AI.Clip.ClipEmbeddingEngine.Init.

public OnnxExecutionProvider ActiveProvider { get; }

Property Value

OnnxExecutionProvider

Dimension

Gets the embedding dimension shared by the vision and text towers, read from the model output metadata at VisioForge.Core.AI.Clip.ClipEmbeddingEngine.Init. Zero before initialization.

public int Dimension { get; }

Property Value

int

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

EncodeImage(VideoFrameX)

Encodes an RGBA video frame into an L2-normalized CLIP image embedding.

public float[] EncodeImage(VideoFrameX frame)

Parameters

frame VideoFrameX: The source RGBA frame.

Returns

float[]: The L2-normalized embedding, or null when the frame is empty or the engine failed to init.

EncodeImage(SKBitmap)

Encodes a bitmap into an L2-normalized CLIP image embedding.

public float[] EncodeImage(SKBitmap image)

Parameters

image SKBitmap: The source image (any color type/size).

Returns

float[]: The L2-normalized embedding, or null when the image is null or the engine failed to init.

EncodeText(string)

Encodes a text query into an L2-normalized CLIP text embedding, in the same space as the image embeddings.

public float[] EncodeText(string text)

Parameters

text string: The query text.

Returns

float[]: The L2-normalized embedding.

Exceptions

InvalidOperationException: Thrown when the engine is not initialized, or the text model / tokenizer files were not provided.

Init()

Loads the vision and text CLIP models, resolves their input/output names and the embedding dimension, and loads the CLIP tokenizer.

public bool Init()

Returns

bool: true if initialization succeeded; otherwise, false.

SetContext(BaseContext)

Sets the logging context.

public void SetContext(BaseContext context)

Parameters

context BaseContext: The context.

Table of Contents

Class ClipEmbeddingEngine

Inheritance

Implements

Inherited Members

Remarks

Constructors

ClipEmbeddingEngine(VideoEmbeddingSettings)

Parameters

Exceptions

Properties

ActiveProvider

Property Value

Dimension

Property Value

Methods

Dispose()

EncodeImage(VideoFrameX)

Parameters

Returns

EncodeImage(SKBitmap)

Parameters

Returns

EncodeText(string)

Parameters

Returns

Exceptions

Init()

Returns

SetContext(BaseContext)

Parameters