Table of Contents

Class VLMBlock

Namespace
VisioForge.Core.MediaBlocks.AI
Assembly
VisioForge.Core.AI.dll

A vision-language media block that runs a Florence-2 ONNX pipeline on video frames to caption them, detect objects, run OCR, or ground phrases (selected by VisioForge.Core.Types.X.AI.VLMSettings.Task). Implements the VisioForge.Core.MediaBlocks.MediaBlock. Implements the VisioForge.Core.MediaBlocks.IMediaBlockInternals.

public class VLMBlock : MediaBlock, IMediaBlockInternals, IVideoProcessingBlock, IMediaBlock, IDisposable

Inheritance

MediaBlock

Implements

IMediaBlockInternals
IVideoProcessingBlock
IMediaBlock

Inherited Members

MediaBlock._isBuilt
MediaBlock._pipeline
MediaBlock._pipelineCtx
MediaBlock.GetPipelineContext()
MediaBlock.SetPipelineContext(BlockPipelineContext)
MediaBlock.SetPipeline(MediaBlocksPipeline)
MediaBlock.Context
MediaBlock.Name
MediaBlock.IsBuilt
MediaBlock.Owner
MediaBlock.Type
MediaBlock.ID
MediaBlock.Input
MediaBlock.Inputs
MediaBlock.Output
MediaBlock.Outputs
MediaBlock.HasInputs
MediaBlock.HasOutputs
MediaBlock.Build()
MediaBlock.CreateElements()
MediaBlock.AddElementsToPipeline()
MediaBlock.RemoveElementsFromPipeline()
MediaBlock.DeepCopy(string)
MediaBlock.Reset()
MediaBlock.ToYAMLBlock()
MediaBlock.ClearPads()
MediaBlock.Dispose(bool)
MediaBlock.Dispose()

Remarks

Inference is expensive (a full autoregressive generation per frame), so it runs on a background worker and is throttled by VisioForge.Core.Types.X.AI.VLMSettings.ProcessingInterval (gated on the frame timestamp): at most one frame per interval is generated, and other frames only redraw the most recent result. Decoding is greedy (argmax); beam search is not implemented. The VisioForge.Core.MediaBlocks.AI.VLMBlock.Task and VisioForge.Core.MediaBlocks.AI.VLMBlock.TextInput properties can be changed while the pipeline runs and take effect on the next inference.

Constructors

VLMBlock(VLMSettings)

Initializes a new instance of the VisioForge.Core.MediaBlocks.AI.VLMBlock class.

public VLMBlock(VLMSettings settings)

Parameters

settings VLMSettings

The VLM settings. Must specify the four Florence-2 model paths and tokenizer files.

Exceptions

ArgumentNullException

Thrown when settings is null.

Properties

ActiveProvider

Gets the execution provider that the model sessions actually engaged. Valid after the block has been built; reports VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CPU otherwise.

public OnnxExecutionProvider ActiveProvider { get; }

Property Value

OnnxExecutionProvider

DroppedFrameCount

Gets the number of frames the analysis had to discard. Stays 0 under the sample-latest design: the video always passes through untouched and the worker simply captions the freshest frame once the previous generation finishes, so nothing is dropped. Use VisioForge.Core.MediaBlocks.AI.VLMBlock.LastInferenceTimeMs to gauge model speed.

public long DroppedFrameCount { get; }

Property Value

long

Input

Gets the primary input pad.

public override MediaBlockPad Input { get; }

Property Value

MediaBlockPad

Inputs

Gets the array of all input pads.

public override MediaBlockPad[] Inputs { get; }

Property Value

MediaBlockPad[]

LastInferenceTimeMs

Gets the wall-clock time, in milliseconds, the most recent generation took.

public float LastInferenceTimeMs { get; }

Property Value

float

Output

Gets the primary output pad.

public override MediaBlockPad Output { get; }

Property Value

MediaBlockPad

Outputs

Gets the array of all output pads.

public override MediaBlockPad[] Outputs { get; }

Property Value

MediaBlockPad[]

Task

Gets or sets the task the model performs on each processed frame. Applied on the next inference.

public VLMTask Task { get; set; }

Property Value

VLMTask

TextInput

Gets or sets the auxiliary text input used by VisioForge.Core.Types.X.AI.VLMTask.PhraseGrounding. Applied on the next inference.

public string TextInput { get; set; }

Property Value

string

Type

Gets the type of the media block.

public override MediaBlockType Type { get; }

Property Value

MediaBlockType

Methods

Build()

Constructs the internal GStreamer elements and pads for this block.

public override bool Build()

Returns

bool

true if successful, false otherwise.

CleanUp()

Cleans up internal resources, specifically the GStreamer element.

public void CleanUp()

Dispose(bool)

Releases unmanaged and - optionally - managed resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

true to release both managed and unmanaged resources; false to release only unmanaged resources.

GetCore()

Gets the core GStreamer element wrapped by this block.

public BaseElement GetCore()

Returns

BaseElement

The VisioForge.Core.GStreamer.Base.BaseElement wrapper, or null if not built yet.

GetElement()

Gets the GStreamer element instance.

public Element GetElement()

Returns

Element

The Gst.Element, or null if not built yet.

IMediaBlockInternals.SetContext(MediaBlocksPipeline)

Sets the pipeline context for this block.

void IMediaBlockInternals.SetContext(MediaBlocksPipeline pipeline)

Parameters

pipeline MediaBlocksPipeline

The pipeline.

OnResultGenerated

Event raised when the model finishes an inference on a frame.

public event EventHandler<VLMResultGeneratedEventArgs> OnResultGenerated

Event Type

EventHandler<VLMResultGeneratedEventArgs>

See Also

MediaBlock
IMediaBlockInternals