FeaturesServer

class features_server.FeaturesServer(input_dir='./', input_file_extension='.sph', label_dir='./', label_file_extension='.lbl', from_file='audio', config='sid_8k', single_channel_extension=[''], double_channel_extension=['_a', '_b'], sampling_frequency=None, lower_frequency=None, higher_frequency=None, linear_filters=None, log_filters=None, window_size=None, shift=None, ceps_number=None, snr=None, vad=None, feat_norm=None, log_e=None, delta=None, double_delta=None, rasta=None, keep_all_features=None)[source]

A class for acoustic feature management. FeaturesServer should be used to extract acoustic features (MFCC or LFCC) from audio files in SPHERE, WAV or RAW PCM format. It can also be used to read and write acoustic features from and to disk in SPRO4 or HTK format.

Attr input_dir:directory where to load audio or feature files
Attr input_file_extension:
 extension of the incoming files
Attrlabel_dir:directory where to read and write label files
Attr label_files_extension:
 extension of label files to read and write
Attr from_file:format of the input files to read, can be audio, spro4 or htk, for audio files, format is given by the extension
Attr config:pre-defined configuration for speaker diarization or recognition in 8 or 16kHz. Default is speaker recognition 8kHz
Attr single_channel_extension:
 list with a single extension to add to the audio filename when processing a single channel file. Default is empty, means the feature file has the same name as the audio file
Attr double_channel_extension:
 list of two channel extension to add to the audio filename when processing two channel files. Default is [‘_a’, ‘_b’]
Attr sampling_frequency:
 sample frequency in Hz, default is None, determine when reading the audio file
Attr lower_frequency:
 lower frequency limit of the filter bank
Attr higher_frequency:
 higher frequency limit of the filter bank
Attr linear_filters:
 number of linear filters to use for LFCC extraction
Attr log_filters:
 number of linear filters to use for MFCC extraction
Attr window_size:
 size of the sliding window in seconds
Attr shift:time shift between two feature vectors
Attr ceps_number:
 number of cepstral coefficients to extract
Attr snr:snr level to consider for SNR-based voice activity detection
Attr vad:type of voice activity detection to use, can be ‘snr’, ‘energy’ (using a three Gaussian detector) or ‘label’ when reading the info from pre-computed label files
Attr feat_norm:normalization of the acoustic features, can be ‘cms’ for cepstral mean subtraction, ‘mvn’ for mean variance normalization or ‘stg’ for short term Gaussianization
Attr log_e:boolean, keep log energy
Attr delta:boolean, add the first derivative of the cepstral coefficients
Attr double_delta:
 boolean, add the second derivative of the cepstral coefficients
Attr rasta:boolean, perform RASTA filtering
Attr keep_all_features:
 boolean, if False, only features labeled as “speech” by the vad are saved if True, all features are saved and a label file is produced
load(show)[source]

Load a cep from audio or mfcc file. This method loads all channels available in the file.

Parameters:show – the name of the show to load
Returns:the cep array and the label array
load_and_stack(fileList, numThread=1)[source]

Load a list of feature files and stack them in a unique ndarray. The list of files to load is splited in sublists processed in parallel

Parameters:
  • fileList – a list of files to load
  • numThread – numbe of thead (optional, default is 1)
save(show, filename, mfcc_format, and_label=True)[source]

Save the cep array in file

Parameters:
  • show – the name of the show to save (loaded if need)
  • filename – the file name of the mffc file or a list of 2 filenames for the case of double channel files
  • mfcc_format – format of the mfcc file taken in values [‘pickle’, ‘spro4’, ‘htk’]
Raise:

Exception if feature format is unknown

save_list(audio_file_list, feature_file_list, mfcc_format, feature_dir, feature_file_extension, and_label=False)[source]

Function that takes a list of audio files and extract features

Parameters:audio_file_list – an array of string containing the name of the feature files to load
save_parallel(input_audio_list, output_feature_list, mfcc_format, feature_dir, feature_file_extension, and_label=False, numThread=1)[source]

Extract features from audio file using parallel computation

Parameters:
  • input_audio_list – an array of string containing the name of the audio files to process
  • output_feature_list – an array of string containing the name of the features files to save
  • numThread – number of parallel process to run

Previous topic

Main Classes

Next topic

Mixture

This Page