Speech Features

Goal

Add custom speech feature extraction ops, and compare the extracted features with kaldi's.

Procedure

  1. Create custom C++ op, 'xxx.h' and 'xxx.cc'

    Files should be stored in delta/layers/ops/kernels/, details can refer to existing files, e.g., pitch.cc / pitch.h

  2. Implement the kernel for the op, 'xxx_op.cc'

    Files should be stored in delta/layers/ops/kernels/, details can be found in Tensorflow Guild: Adding a New Op

  3. Define the op's interface, 'x_ops.cc'

    Files should be stored in delta/layers/ops/kernels/, details in above link

  4. Compile by using 'delta/layers/ops/Makefile'

  5. Register op in 'delta/layers/ops/py_x_ops.py'

  6. Unit-test 'xxx_op_test.py'

Code Style

C++ code: using clang-format and cpplint for formatting and checking

Python code: using yapf and pylint for formatting and checking

Please follow Contributing Guide

Existing Ops

  • Pitch
  • Frame power
  • Zero-cross rate
  • Power spectrum (PS) / log PS
  • Cepstrum / MFCC
  • Perceptual Linear Prediction (PLP)
  • Analysis filter bank (AFB)Currently support window_length = 30ms and frame_length = 10ms for perfect reconstruction.
  • Synthesis filter bank (SFB)

The specific interfaces of feature functions are shown below: ../../_images/speech_features.pngSpeech Features

Comparsion with KALDI

Extracted features are compared to existing KALDI features.

  1. Pitch

    ../../_images/pitch_compare.pngPitch

  2. Log power spectrum

    ../../_images/log_spectrum_compare.pngLog power spectrum

  3. Cepstrum / MFCC

    ../../_images/mfcc_compare.pngCepstrum / MFCC

  4. PLP

    ../../_images/plp_compare.pngPLP