Class SileroVadSettings
- Namespace
- VisioForge.Core.Types.X.AI
- Assembly
- VisioForge.Core.dll
Settings for the Silero neural voice-activity detector that segments speech inside the speech-to-text block.
public class SileroVadSettingsInheritance
Inherited Members
Remarks
Silero VAD is a tiny (~2 MB, MIT) ONNX model that classifies each short audio window as speech or
non-speech. Used as a real-time pre-filter, it lets the block run the (much heavier) Whisper model only
on actual speech, which both cuts inference cost and removes Whisper's tendency to hallucinate text on
silence. The model file (silero_vad.onnx) is downloaded at runtime; it is not shipped in the SDK
NuGet packages.
Properties
DeviceId
Gets or sets the hardware device id for the VAD session when a GPU provider is selected. Defaults to 0.
public int DeviceId { get; set; }Property Value
MaxSpeechMs
Gets or sets the maximum speech-segment length, in milliseconds, before the segmenter force-cuts an ongoing run (so a long monologue is transcribed incrementally rather than after it finally pauses). Defaults to 15000 ms (15 s). A non-positive value falls back to the default cap — the cut cannot be disabled, so a single transcription (and the time a stop waits for it to finish) stays bounded even on a continuous-speech stream with no silence gaps.
public int MaxSpeechMs { get; set; }Property Value
MinSilenceMs
Gets or sets the minimum trailing silence, in milliseconds, that ends a speech segment. Defaults to 100 ms. Larger values merge short pauses into one segment; smaller values split more eagerly.
public int MinSilenceMs { get; set; }Property Value
MinSpeechMs
Gets or sets the minimum duration, in milliseconds, a detected speech run must reach to be emitted. Defaults to 250 ms. Discards spurious sub-quarter-second blips.
public int MinSpeechMs { get; set; }Property Value
ModelPath
Gets or sets the absolute path to the Silero VAD ONNX model (silero_vad.onnx).
public string ModelPath { get; set; }Property Value
Provider
Gets or sets the execution provider for the VAD ONNX session. Defaults to VisioForge.Core.Types.X.AI.OnnxExecutionProvider.CPU — the model is tiny (about 1 ms per window on CPU), so a GPU provider adds latency without benefit.
public OnnxExecutionProvider Provider { get; set; }Property Value
SpeechPadMs
Gets or sets the onset padding, in milliseconds, prepended to each detected speech segment so the start of speech is not clipped. Defaults to 30 ms. (Trailing context comes from the VisioForge.Core.Types.X.AI.SileroVadSettings.MinSilenceMs tail that is retained before a segment is closed.)
public int SpeechPadMs { get; set; }Property Value
SpeechThreshold
Gets or sets the speech-probability threshold (0..1) above which a window is considered speech. Defaults to 0.5. Raise it in noisy environments to reduce false triggers.
public float SpeechThreshold { get; set; }