a neural network-based approach for the prediction of German phonemes in audio files using both spectrogram data and Mel-frequency cepstral coefficients (MFCCs)
Sources:
-
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
-
https://github.com/llSourcell/tensorflow_speech_recognition_demo
-
https://www.isca-speech.org/archive/interspeech_2015/papers/i15_1478.pdf
Fun reading: