vad

Copyright 2014-2015 Anthony Larcher and Sylvain Meignier

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.vad.label_fusion(label, win=3)[source]

Apply a morphological filtering on the label to remove isolated labels. In case the input is a two channel label (2D ndarray of boolean of same length) the labels of two channels are fused to remove overlaping segments of speech.

Parameters:
  • label – input labels given in a 1D or 2D ndarray
  • win – parameter or the morphological filters
frontend.vad.pre_emphasis(input, pre)[source]

Pre-emphasis of an audio signal.

Parameters:pre – value that defines the pre-emphasis filter.
frontend.vad.segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0)[source]

Generate a new array that chops the given array along the given axis into overlapping frames.

This method has been implemented by Anne Archibald, as part of the talk box toolkit example:

segment_axis(arange(10), 4, 2)
array([[0, 1, 2, 3],
   ( [2, 3, 4, 5],
     [4, 5, 6, 7],
     [6, 7, 8, 9]])
Parameters:
  • a – the array to segment
  • length – the length of each frame
  • overlap – the number of array elements by which the frames should overlap
  • axis – the axis to operate on; if None, act on the flattened array
  • end – what to do with the last frame, if the array is not evenly divisible into pieces. Options are: - ‘cut’ Simply discard the extra values - ‘wrap’ Copy values from the beginning of the array - ‘pad’ Pad with a constant value
  • endvalue – the value to use for end=’pad’
Returns:

a ndarray

The array is not copied unless necessary (either because it is unevenly strided and being flattened or because end is set to ‘pad’ or ‘wrap’).

frontend.vad.speech_enhancement(X, Gain, Noise_floor, Fs, Ascale, NN)[source]

This program is only to process the single file seperated by the silence section if the silence section is detected, then a counter to number of buffer is set and pre-processing is required.

Usage: SpeechENhance(wavefilename, Gain, Noise_floor)

Parameters:
  • X – input audio signal
  • Noise_floor – default value is 0.02 : suggestion range from 0.2 to 0.001
  • gain – default value is 0.9, suggestion range 0.6 to 1.4, higher value means more subtraction or noise redcution
  • Fs – sampling frequency of the input signal
  • Ascale – 1 to add noise, 0 not to add noise
  • NN
Returns:

a 1-dimensional array of boolean that is True for high energy frames.

Copyright 2014 Sun Han Wu and Anthony Larcher

frontend.vad.vad_snr(sig, snr, fs=16000, shift=0.01, nwin=256)[source]

Select high energy frames based on the Signal to Noise Ratio of the signal.

Parameters:
  • sig – the input audio signal
  • fs – sampling frequency of the input signal in Hz. Default is 16000.
  • shift – shift between two frames in seconds. Default is 0.01
  • nwin – number of samples of the sliding window. Default is 256.

Previous topic

Normfeat

Next topic

The libsvm package

This Page