Class SpeechToTextSettings
- Namespace
- VisioForge.Core.Types.X.AI
- Assembly
- VisioForge.Core.dll
Settings for the speech-to-text block (Whisper ASR + optional Silero VAD).
public class SpeechToTextSettingsInheritance
Inherited Members
Remarks
Unlike the vision AI settings, this type does NOT derive from VisioForge.Core.Types.X.AI.OnnxInferenceSettings: Whisper runs through Whisper.net (whisper.cpp / GGML), not ONNX Runtime, so the ONNX-specific input-size/normalization knobs do not apply. The Whisper GGML weights and the Silero VAD model are downloaded at runtime — neither is shipped in the SDK NuGet packages.
Constructors
SpeechToTextSettings()
Initializes a new instance of the VisioForge.Core.Types.X.AI.SpeechToTextSettings class.
public SpeechToTextSettings()SpeechToTextSettings(string)
Initializes a new instance of the VisioForge.Core.Types.X.AI.SpeechToTextSettings class with a Whisper model path.
public SpeechToTextSettings(string whisperModelPath)Parameters
whisperModelPathstring-
The absolute path to the Whisper GGML model file (
ggml-*.bin).
Properties
BackpressureWhenBusy
Gets or sets a value indicating whether the audio tap applies BACKPRESSURE to the upstream
source when transcription cannot keep up, instead of dropping the oldest audio. Defaults to
false.
When false (the default, intended for LIVE captioning) the internal audio ring
overwrites the oldest samples on overflow, so a slow transcriber never stalls a live source —
at the cost of dropping audio.
When true (intended for FILE transcription) there is no ring and no background worker:
the tap segments and transcribes synchronously on the streaming thread, so the tap BLOCKS until
Whisper has consumed the audio, pacing the source to exactly the transcription throughput.
Nothing is dropped (lossless), the pipeline runs as fast as Whisper can transcribe — no faster,
no slower — and the pipeline position tracks the transcription frontier (useful for a progress
bar). Do NOT enable this for a live capture source: a live device cannot slow down to absorb the
backpressure. Pair with a non-synced sink (for example NullRendererBlock { IsSync = false })
so no real-time clock caps the speed.
public bool BackpressureWhenBusy { get; set; }Property Value
DeviceId
Gets or sets the hardware device id used when a GPU provider is selected. Defaults to 0.
public int DeviceId { get; set; }Property Value
EmitInterim
Reserved for a future interim-hypothesis capability and currently has no effect. The block emits only
final segments regardless of this value — interim (non-final) results are not produced yet. Defaults to
false.
public bool EmitInterim { get; set; }Property Value
EnableVad
Gets or sets a value indicating whether Silero VAD segments speech before transcription. Defaults to
true. When disabled, audio is transcribed in fixed windows, which is cheaper to wire up but is
prone to hallucinating text during silence.
public bool EnableVad { get; set; }Property Value
FixedWindowSeconds
Gets or sets the fixed transcription window length, in seconds, used when VisioForge.Core.Types.X.AI.SpeechToTextSettings.EnableVad is
false. Ignored when VAD is enabled (segment boundaries come from the VAD then). Defaults to 5
and is clamped to the range 1–30 s so one transcription (and the time a stop waits for it to
finish) stays bounded.
public int FixedWindowSeconds { get; set; }Property Value
Language
Gets or sets the spoken language as an ISO 639-1 code (for example, "en", "es", "fr"), or "auto" to let Whisper detect it. Defaults to "auto".
public string Language { get; set; }Property Value
ModelSize
Gets or sets the Whisper model variant this path corresponds to. Informational only — it lets an application label/choose a download; the file actually loaded is VisioForge.Core.Types.X.AI.SpeechToTextSettings.WhisperModelPath. Defaults to VisioForge.Core.Types.X.AI.WhisperModelSize.Base.
public WhisperModelSize ModelSize { get; set; }Property Value
OutputSrtPath
Gets or sets an optional path to a side-car .srt subtitle file the block writes as final
segments are recognized. null (the default) disables SRT output.
public string OutputSrtPath { get; set; }Property Value
OutputVttPath
Gets or sets an optional path to a side-car .vtt (WebVTT) subtitle file the block writes as
final segments are recognized. null (the default) disables VTT output.
public string OutputVttPath { get; set; }Property Value
Provider
Gets or sets the execution provider for Whisper. Only VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CPU and VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CUDA are meaningful for the GGML backend (it does not use DirectML); any other value, including VisioForge.Core.Types.X.AI.OnnxExecutionProvider.Auto, allows the best available runtime (CUDA when present, else CPU). Defaults to VisioForge.Core.Types.X.AI.OnnxExecutionProvider.Auto.
public OnnxExecutionProvider Provider { get; set; }Property Value
Task
Gets or sets the task: transcribe in the source language or translate to English. Defaults to VisioForge.Core.Types.X.AI.SpeechToTextTask.Transcribe.
public SpeechToTextTask Task { get; set; }Property Value
Threads
Gets or sets the number of CPU threads Whisper uses. 0 (the default) lets Whisper.net choose based on the available processor count.
public int Threads { get; set; }Property Value
Vad
Gets or sets the Silero VAD settings used when VisioForge.Core.Types.X.AI.SpeechToTextSettings.EnableVad is true.
public SileroVadSettings Vad { get; set; }Property Value
WhisperModelPath
Gets or sets the absolute path to the Whisper GGML model file (ggml-*.bin). Required.
public string WhisperModelPath { get; set; }