Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logmel Spectrogram- feature extraction #4

Open
shivashankarivr opened this issue Nov 12, 2021 · 1 comment
Open

Logmel Spectrogram- feature extraction #4

shivashankarivr opened this issue Nov 12, 2021 · 1 comment

Comments

@shivashankarivr
Copy link

Hi, am doing speech recognition for micro controller. Am new to this and trying to modify the code which is written for Acoustic Scene Classification where they have used 30sec wav audio dataset.

Now, I need to use 1sec dataset for speech recognition but am not getting proper value after feature extraction.

Below are the codes which am using for log mel spectrogram. Can help me pls?

"""LogMel Feature Extraction example."""

import numpy as np
import sys
import librosa
import librosa.display
import scipy.fftpack as fft

SR = 16000
N_FFT = 1024
N_MELS = 30

def create_col(y):
assert y.shape == (1024,)

# Create time-series window
fft_window = librosa.filters.get_window('hann', N_FFT, fftbins=True)
assert fft_window.shape == (1024,), fft_window.shape

# Hann window
y_windowed = fft_window * y
assert y_windowed.shape == (1024,), y_windowed.shape

# FFT
fft_out = fft.fft(y_windowed, axis=0)[:513]
assert fft_out.shape == (513,), fft_out.shape

# Power spectrum
S_pwr = np.abs(fft_out)**2

assert S_pwr.shape == (513,)

# Generation of Mel Filter Banks
mel_basis = librosa.filters.mel(SR, n_fft=N_FFT, n_mels=N_MELS, htk=False)
assert mel_basis.shape == (30, 513)

# Apply Mel Filter Banks
S_mel = np.dot(mel_basis, S_pwr)
S_mel.astype(np.float32)
assert S_mel.shape == (30,)

return S_mel

def feature_extraction(y):
assert y.shape == (32, 1024)

S_mel = np.empty((30, 32), dtype=np.float32, order='C')
for col_index in range(0, 32):
    S_mel[:, col_index] = create_col(y[col_index])

# Scale according to reference power
S_mel = S_mel / S_mel.max()
# Convert to dB
S_log_mel = librosa.power_to_db(S_mel, top_db=80.0)
assert S_log_mel.shape == (30, 32)

return S_log_mel
@shivashankarivr
Copy link
Author

What changes should I need to make for 1sec long audio?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant