Enum VLMTask
- Namespace
- VisioForge.Core.Types.X.AI
- Assembly
- VisioForge.Core.dll
The task a Florence-2 vision-language model performs on each processed frame. The task selects the natural-language prompt fed to the model and how its output is interpreted (free text vs. grounded regions).
public enum VLMTaskFields
Caption = 0-
Generate a short one-sentence caption describing the image (Florence-2
<CAPTION>). DetailedCaption = 1-
Generate a more detailed caption of the image (Florence-2
<DETAILED_CAPTION>). MoreDetailedCaption = 2-
Generate a paragraph-length, highly detailed caption (Florence-2
<MORE_DETAILED_CAPTION>). ObjectDetection = 3-
Detect objects and report a category label with a bounding box for each (Florence-2
<OD>). DenseRegionCaption = 4-
Detect regions and report a short description with a bounding box for each (Florence-2
<DENSE_REGION_CAPTION>). Ocr = 5-
Read all text in the image as a single string (Florence-2
<OCR>). OcrWithRegion = 6-
Read text in the image and report each text block with a quadrilateral region (Florence-2
<OCR_WITH_REGION>). PhraseGrounding = 7-
Ground the phrases of a caption supplied in VisioForge.Core.Types.X.AI.VLMSettings.TextInput to image regions (Florence-2
<CAPTION_TO_PHRASE_GROUNDING>).