Table of Contents

Class SpeechToTextSettings

Namespace
VisioForge.Core.Types.X.AI
Assembly
VisioForge.Core.dll

Settings for the speech-to-text block (Whisper ASR + optional Silero VAD).

public class SpeechToTextSettings

Inheritance

Inherited Members

Remarks

Unlike the vision AI settings, this type does NOT derive from VisioForge.Core.Types.X.AI.OnnxInferenceSettings: Whisper runs through Whisper.net (whisper.cpp / GGML), not ONNX Runtime, so the ONNX-specific input-size/normalization knobs do not apply. The Whisper GGML weights and the Silero VAD model are downloaded at runtime — neither is shipped in the SDK NuGet packages.

Constructors

SpeechToTextSettings()

Initializes a new instance of the VisioForge.Core.Types.X.AI.SpeechToTextSettings class.

public SpeechToTextSettings()

SpeechToTextSettings(string)

Initializes a new instance of the VisioForge.Core.Types.X.AI.SpeechToTextSettings class with a Whisper model path.

public SpeechToTextSettings(string whisperModelPath)

Parameters

whisperModelPath string

The absolute path to the Whisper GGML model file (ggml-*.bin).

Properties

BackpressureWhenBusy

Gets or sets a value indicating whether the audio tap applies BACKPRESSURE to the upstream source when transcription cannot keep up, instead of dropping the oldest audio. Defaults to false.

When false (the default, intended for LIVE captioning) the internal audio ring overwrites the oldest samples on overflow, so a slow transcriber never stalls a live source — at the cost of dropping audio.

When true (intended for FILE transcription) there is no ring and no background worker: the tap segments and transcribes synchronously on the streaming thread, so the tap BLOCKS until Whisper has consumed the audio, pacing the source to exactly the transcription throughput. Nothing is dropped (lossless), the pipeline runs as fast as Whisper can transcribe — no faster, no slower — and the pipeline position tracks the transcription frontier (useful for a progress bar). Do NOT enable this for a live capture source: a live device cannot slow down to absorb the backpressure. Pair with a non-synced sink (for example NullRendererBlock { IsSync = false }) so no real-time clock caps the speed.

public bool BackpressureWhenBusy { get; set; }

Property Value

bool

DeviceId

Gets or sets the hardware device id used when a GPU provider is selected. Defaults to 0.

public int DeviceId { get; set; }

Property Value

int

EmitInterim

Reserved for a future interim-hypothesis capability and currently has no effect. The block emits only final segments regardless of this value — interim (non-final) results are not produced yet. Defaults to false.

public bool EmitInterim { get; set; }

Property Value

bool

EnableVad

Gets or sets a value indicating whether Silero VAD segments speech before transcription. Defaults to true. When disabled, audio is transcribed in fixed windows, which is cheaper to wire up but is prone to hallucinating text during silence.

public bool EnableVad { get; set; }

Property Value

bool

FixedWindowSeconds

Gets or sets the fixed transcription window length, in seconds, used when VisioForge.Core.Types.X.AI.SpeechToTextSettings.EnableVad is false. Ignored when VAD is enabled (segment boundaries come from the VAD then). Defaults to 5 and is clamped to the range 1–30 s so one transcription (and the time a stop waits for it to finish) stays bounded.

public int FixedWindowSeconds { get; set; }

Property Value

int

Language

Gets or sets the spoken language as an ISO 639-1 code (for example, "en", "es", "fr"), or "auto" to let Whisper detect it. Defaults to "auto".

public string Language { get; set; }

Property Value

string

ModelSize

Gets or sets the Whisper model variant this path corresponds to. Informational only — it lets an application label/choose a download; the file actually loaded is VisioForge.Core.Types.X.AI.SpeechToTextSettings.WhisperModelPath. Defaults to VisioForge.Core.Types.X.AI.WhisperModelSize.Base.

public WhisperModelSize ModelSize { get; set; }

Property Value

WhisperModelSize

OutputSrtPath

Gets or sets an optional path to a side-car .srt subtitle file the block writes as final segments are recognized. null (the default) disables SRT output.

public string OutputSrtPath { get; set; }

Property Value

string

OutputVttPath

Gets or sets an optional path to a side-car .vtt (WebVTT) subtitle file the block writes as final segments are recognized. null (the default) disables VTT output.

public string OutputVttPath { get; set; }

Property Value

string

Provider

Gets or sets the execution provider for Whisper. Only VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CPU and VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CUDA are meaningful for the GGML backend (it does not use DirectML); any other value, including VisioForge.Core.Types.X.AI.OnnxExecutionProvider.Auto, allows the best available runtime (CUDA when present, else CPU). Defaults to VisioForge.Core.Types.X.AI.OnnxExecutionProvider.Auto.

public OnnxExecutionProvider Provider { get; set; }

Property Value

OnnxExecutionProvider

Task

Gets or sets the task: transcribe in the source language or translate to English. Defaults to VisioForge.Core.Types.X.AI.SpeechToTextTask.Transcribe.

public SpeechToTextTask Task { get; set; }

Property Value

SpeechToTextTask

Threads

Gets or sets the number of CPU threads Whisper uses. 0 (the default) lets Whisper.net choose based on the available processor count.

public int Threads { get; set; }

Property Value

int

Vad

Gets or sets the Silero VAD settings used when VisioForge.Core.Types.X.AI.SpeechToTextSettings.EnableVad is true.

public SileroVadSettings Vad { get; set; }

Property Value

SileroVadSettings

WhisperModelPath

Gets or sets the absolute path to the Whisper GGML model file (ggml-*.bin). Required.

public string WhisperModelPath { get; set; }

Property Value

string