Table of Contents

Class VLMSettings

Namespace
VisioForge.Core.Types.X.AI
Assembly
VisioForge.Core.dll

Settings for the Florence-2 vision-language model (VLM) block. The block runs a four-session ONNX pipeline (vision encoder, token embedder, text encoder, and a merged decoder) to caption frames, detect objects, run OCR, or ground phrases, selected by VisioForge.Core.Types.X.AI.VLMSettings.Task.

public class VLMSettings

Inheritance

Inherited Members

Remarks

The model weights are not shipped with the SDK. Use the VisioForge.Core.Types.X.AI.VLMSettings.#ctor(System.String) constructor to point at a folder that holds the conventionally-named Florence-2 ONNX files and tokenizer assets, or set the individual path properties. Decoding is greedy (no beam search); the block runs inference on a background worker and throttles it with VisioForge.Core.Types.X.AI.VLMSettings.ProcessingInterval so live video is never stalled.

Constructors

VLMSettings()

Initializes a new instance of the VisioForge.Core.Types.X.AI.VLMSettings class with no model paths set.

public VLMSettings()

VLMSettings(string)

Initializes a new instance of the VisioForge.Core.Types.X.AI.VLMSettings class, resolving all four ONNX models and the three tokenizer assets from a folder using the conventional Florence-2 file names.

public VLMSettings(string modelFolder)

Parameters

modelFolder string

The folder holding the Florence-2 ONNX models and tokenizer files.

Exceptions

ArgumentNullException

Thrown when modelFolder is null.

Fields

AddedTokensFileName

Conventional file name of the Florence-2 added-tokens map (task and region tokens).

public const string AddedTokensFileName = "florence2-added-tokens.json"

Field Value

string

DecoderFileName

Conventional file name of the Florence-2 merged-decoder ONNX model.

public const string DecoderFileName = "florence2-base-decoder-merged.onnx"

Field Value

string

EmbedTokensFileName

Conventional file name of the Florence-2 token-embedding ONNX model.

public const string EmbedTokensFileName = "florence2-base-embed-tokens.onnx"

Field Value

string

EncoderFileName

Conventional file name of the Florence-2 text-encoder ONNX model.

public const string EncoderFileName = "florence2-base-encoder.onnx"

Field Value

string

MergesFileName

Conventional file name of the Florence-2 BPE merges.

public const string MergesFileName = "florence2-merges.txt"

Field Value

string

VisionEncoderFileName

Conventional file name of the Florence-2 vision-encoder ONNX model.

public const string VisionEncoderFileName = "florence2-base-vision-encoder.onnx"

Field Value

string

VocabFileName

Conventional file name of the Florence-2 BPE vocabulary.

public const string VocabFileName = "florence2-vocab.json"

Field Value

string

Properties

AddedTokensFilePath

Gets or sets the path to the Florence-2 added_tokens.json file (task and <loc_N> region tokens). Required for the grounding tasks; the caption tasks work without it.

public string AddedTokensFilePath { get; set; }

Property Value

string

BoxColor

Gets or sets the color used for region boxes and labels when VisioForge.Core.Types.X.AI.VLMSettings.DrawResults is enabled. Defaults to lime green.

public SKColor BoxColor { get; set; }

Property Value

SKColor

BoxThickness

Gets or sets the stroke thickness, in pixels, of the region boxes. Defaults to 2.

public float BoxThickness { get; set; }

Property Value

float

DecoderModelPath

Gets or sets the path to the Florence-2 merged-decoder ONNX model. Required.

public string DecoderModelPath { get; set; }

Property Value

string

DrawResults

Gets or sets a value indicating whether grounded regions and the caption bar are drawn into the video frame. Defaults to true.

public bool DrawResults { get; set; }

Property Value

bool

EmbedTokensPath

Gets or sets the path to the Florence-2 token-embedding ONNX model. Required.

public string EmbedTokensPath { get; set; }

Property Value

string

EncoderModelPath

Gets or sets the path to the Florence-2 text-encoder ONNX model. Required.

public string EncoderModelPath { get; set; }

Property Value

string

LabelFontSize

Gets or sets the label / caption font size, in pixels. A value of 0 auto-scales the font to the frame height. Defaults to 0.

public float LabelFontSize { get; set; }

Property Value

float

MaxNewTokens

Gets or sets the maximum number of new tokens the decoder generates per frame. Defaults to 256.

public int MaxNewTokens { get; set; }

Property Value

int

MergesFilePath

Gets or sets the path to the BART merges.txt file. Required.

public string MergesFilePath { get; set; }

Property Value

string

ProcessingInterval

Gets or sets the minimum interval between two inferences on the live stream. The block runs the model on at most one frame per interval (gated by frame timestamp); other frames only redraw the cached result. Defaults to one second.

public TimeSpan ProcessingInterval { get; set; }

Property Value

TimeSpan

Task

Gets or sets the task the model performs on each processed frame. Defaults to VisioForge.Core.Types.X.AI.VLMTask.Caption. This property can be changed at runtime; the new task takes effect on the next inference.

public VLMTask Task { get; set; }

Property Value

VLMTask

TextInput

Gets or sets the auxiliary text input. Only used by VisioForge.Core.Types.X.AI.VLMTask.PhraseGrounding, where it is the caption whose phrases are grounded to image regions. This property can be changed at runtime.

public string TextInput { get; set; }

Property Value

string

VisionEncoderPath

Gets or sets the path to the Florence-2 vision-encoder ONNX model. Required.

public string VisionEncoderPath { get; set; }

Property Value

string

VocabFilePath

Gets or sets the path to the BART vocab.json file. Required.

public string VocabFilePath { get; set; }

Property Value

string