Table of Contents

Class SileroVadSettings

Namespace
VisioForge.Core.Types.X.AI
Assembly
VisioForge.Core.dll

Settings for the Silero neural voice-activity detector that segments speech inside the speech-to-text block.

public class SileroVadSettings

Inheritance

Inherited Members

Remarks

Silero VAD is a tiny (~2 MB, MIT) ONNX model that classifies each short audio window as speech or non-speech. Used as a real-time pre-filter, it lets the block run the (much heavier) Whisper model only on actual speech, which both cuts inference cost and removes Whisper's tendency to hallucinate text on silence. The model file (silero_vad.onnx) is downloaded at runtime; it is not shipped in the SDK NuGet packages.

Properties

DeviceId

Gets or sets the hardware device id for the VAD session when a GPU provider is selected. Defaults to 0.

public int DeviceId { get; set; }

Property Value

int

MaxSpeechMs

Gets or sets the maximum speech-segment length, in milliseconds, before the segmenter force-cuts an ongoing run (so a long monologue is transcribed incrementally rather than after it finally pauses). Defaults to 15000 ms (15 s). A non-positive value falls back to the default cap — the cut cannot be disabled, so a single transcription (and the time a stop waits for it to finish) stays bounded even on a continuous-speech stream with no silence gaps.

public int MaxSpeechMs { get; set; }

Property Value

int

MinSilenceMs

Gets or sets the minimum trailing silence, in milliseconds, that ends a speech segment. Defaults to 100 ms. Larger values merge short pauses into one segment; smaller values split more eagerly.

public int MinSilenceMs { get; set; }

Property Value

int

MinSpeechMs

Gets or sets the minimum duration, in milliseconds, a detected speech run must reach to be emitted. Defaults to 250 ms. Discards spurious sub-quarter-second blips.

public int MinSpeechMs { get; set; }

Property Value

int

ModelPath

Gets or sets the absolute path to the Silero VAD ONNX model (silero_vad.onnx).

public string ModelPath { get; set; }

Property Value

string

Provider

Gets or sets the execution provider for the VAD ONNX session. Defaults to VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CPU — the model is tiny (about 1 ms per window on CPU), so a GPU provider adds latency without benefit.

public OnnxExecutionProvider Provider { get; set; }

Property Value

OnnxExecutionProvider

SpeechPadMs

Gets or sets the onset padding, in milliseconds, prepended to each detected speech segment so the start of speech is not clipped. Defaults to 30 ms. (Trailing context comes from the VisioForge.Core.Types.X.AI.SileroVadSettings.MinSilenceMs tail that is retained before a segment is closed.)

public int SpeechPadMs { get; set; }

Property Value

int

SpeechThreshold

Gets or sets the speech-probability threshold (0..1) above which a window is considered speech. Defaults to 0.5. Raise it in noisy environments to reduce false triggers.

public float SpeechThreshold { get; set; }

Property Value

float