Class VLMSettings
- Namespace
- VisioForge.Core.Types.X.AI
- Assembly
- VisioForge.Core.dll
Settings for the Florence-2 vision-language model (VLM) block. The block runs a four-session ONNX pipeline (vision encoder, token embedder, text encoder, and a merged decoder) to caption frames, detect objects, run OCR, or ground phrases, selected by VisioForge.Core.Types.X.AI.VLMSettings.Task.
public class VLMSettingsInheritance
Inherited Members
Remarks
The model weights are not shipped with the SDK. Use the VisioForge.Core.Types.X.AI.VLMSettings.#ctor(System.String) constructor to point at a folder that holds the conventionally-named Florence-2 ONNX files and tokenizer assets, or set the individual path properties. Decoding is greedy (no beam search); the block runs inference on a background worker and throttles it with VisioForge.Core.Types.X.AI.VLMSettings.ProcessingInterval so live video is never stalled.
Constructors
VLMSettings()
Initializes a new instance of the VisioForge.Core.Types.X.AI.VLMSettings class with no model paths set.
public VLMSettings()VLMSettings(string)
Initializes a new instance of the VisioForge.Core.Types.X.AI.VLMSettings class, resolving all four ONNX models and the three tokenizer assets from a folder using the conventional Florence-2 file names.
public VLMSettings(string modelFolder)Parameters
modelFolderstring-
The folder holding the Florence-2 ONNX models and tokenizer files.
Exceptions
- ArgumentNullException
-
Thrown when
modelFolderisnull.
Fields
AddedTokensFileName
Conventional file name of the Florence-2 added-tokens map (task and region tokens).
public const string AddedTokensFileName = "florence2-added-tokens.json"Field Value
DecoderFileName
Conventional file name of the Florence-2 merged-decoder ONNX model.
public const string DecoderFileName = "florence2-base-decoder-merged.onnx"Field Value
EmbedTokensFileName
Conventional file name of the Florence-2 token-embedding ONNX model.
public const string EmbedTokensFileName = "florence2-base-embed-tokens.onnx"Field Value
EncoderFileName
Conventional file name of the Florence-2 text-encoder ONNX model.
public const string EncoderFileName = "florence2-base-encoder.onnx"Field Value
MergesFileName
Conventional file name of the Florence-2 BPE merges.
public const string MergesFileName = "florence2-merges.txt"Field Value
VisionEncoderFileName
Conventional file name of the Florence-2 vision-encoder ONNX model.
public const string VisionEncoderFileName = "florence2-base-vision-encoder.onnx"Field Value
VocabFileName
Conventional file name of the Florence-2 BPE vocabulary.
public const string VocabFileName = "florence2-vocab.json"Field Value
Properties
AddedTokensFilePath
Gets or sets the path to the Florence-2 added_tokens.json file (task and <loc_N>
region tokens). Required for the grounding tasks; the caption tasks work without it.
public string AddedTokensFilePath { get; set; }Property Value
BoxColor
Gets or sets the color used for region boxes and labels when VisioForge.Core.Types.X.AI.VLMSettings.DrawResults is enabled. Defaults to lime green.
public SKColor BoxColor { get; set; }Property Value
BoxThickness
Gets or sets the stroke thickness, in pixels, of the region boxes. Defaults to 2.
public float BoxThickness { get; set; }Property Value
DecoderModelPath
Gets or sets the path to the Florence-2 merged-decoder ONNX model. Required.
public string DecoderModelPath { get; set; }Property Value
DrawResults
Gets or sets a value indicating whether grounded regions and the caption bar are drawn into the video
frame. Defaults to true.
public bool DrawResults { get; set; }Property Value
EmbedTokensPath
Gets or sets the path to the Florence-2 token-embedding ONNX model. Required.
public string EmbedTokensPath { get; set; }Property Value
EncoderModelPath
Gets or sets the path to the Florence-2 text-encoder ONNX model. Required.
public string EncoderModelPath { get; set; }Property Value
LabelFontSize
Gets or sets the label / caption font size, in pixels. A value of 0 auto-scales the font to the frame height. Defaults to 0.
public float LabelFontSize { get; set; }Property Value
MaxNewTokens
Gets or sets the maximum number of new tokens the decoder generates per frame. Defaults to 256.
public int MaxNewTokens { get; set; }Property Value
MergesFilePath
Gets or sets the path to the BART merges.txt file. Required.
public string MergesFilePath { get; set; }Property Value
ProcessingInterval
Gets or sets the minimum interval between two inferences on the live stream. The block runs the model on at most one frame per interval (gated by frame timestamp); other frames only redraw the cached result. Defaults to one second.
public TimeSpan ProcessingInterval { get; set; }Property Value
Task
Gets or sets the task the model performs on each processed frame. Defaults to VisioForge.Core.Types.X.AI.VLMTask.Caption. This property can be changed at runtime; the new task takes effect on the next inference.
public VLMTask Task { get; set; }Property Value
TextInput
Gets or sets the auxiliary text input. Only used by VisioForge.Core.Types.X.AI.VLMTask.PhraseGrounding, where it is the caption whose phrases are grounded to image regions. This property can be changed at runtime.
public string TextInput { get; set; }Property Value
VisionEncoderPath
Gets or sets the path to the Florence-2 vision-encoder ONNX model. Required.
public string VisionEncoderPath { get; set; }Property Value
VocabFilePath
Gets or sets the path to the BART vocab.json file. Required.
public string VocabFilePath { get; set; }