Enum VLMTask

Namespace: VisioForge.Core.Types.X.AI

Assembly: VisioForge.Core.dll

The task a Florence-2 vision-language model performs on each processed frame. The task selects the natural-language prompt fed to the model and how its output is interpreted (free text vs. grounded regions).

public enum VLMTask

Fields

Caption = 0: Generate a short one-sentence caption describing the image (Florence-2 <CAPTION>).
DetailedCaption = 1: Generate a more detailed caption of the image (Florence-2 <DETAILED_CAPTION>).
MoreDetailedCaption = 2: Generate a paragraph-length, highly detailed caption (Florence-2 <MORE_DETAILED_CAPTION>).
ObjectDetection = 3: Detect objects and report a category label with a bounding box for each (Florence-2 <OD>).
DenseRegionCaption = 4: Detect regions and report a short description with a bounding box for each (Florence-2 <DENSE_REGION_CAPTION>).
Ocr = 5: Read all text in the image as a single string (Florence-2 <OCR>).
OcrWithRegion = 6: Read text in the image and report each text block with a quadrilateral region (Florence-2 <OCR_WITH_REGION>).
PhraseGrounding = 7: Ground the phrases of a caption supplied in VisioForge.Core.Types.X.AI.VLMSettings.TextInput to image regions (Florence-2 <CAPTION_TO_PHRASE_GROUNDING>).

Table of Contents

Enum VLMTask

Fields