speechDescriptorsUtils
speechDescriptorsUtils
|
Compute the Acoustic Voice Quality Index (AVQI). |
- pyampact.speechDescriptorsUtils.compute_avqi(shimmer_local, hnr_mean_db, cpps_vowel, slope_vowel, cpps_speech, slope_speech)[source]
Compute the Acoustic Voice Quality Index (AVQI).
- pyampact.speechDescriptorsUtils.compute_bark_band_energies(S, sr, n_bands=24, mean_energies=True)[source]
Compute bark band energies via bark_scale_freqs edges function
- Parameters:
S (ndraay) – magnitude spectrum
sr (float) – sample rate of audio file
n_bands (int) – number of bark bands; default = 24
mean_energies (bool) – default (True) returns mean of band energies, False returns array of band energies
- Returns:
band_energies – mean of band energies as float, or array of band energies
- Return type:
float
- pyampact.speechDescriptorsUtils.compute_correlation_dimension(y, m=3, tau=2, r_frac=0.1)[source]
Compute Correlation Dimension (CD)
- Parameters:
y (array) – time series values
m (int) – embedding dimension for phase-space reconstruction; default = 3
tau (int) – time delay (in samples); default = 2
r_frac (float) – fraction of the distance standard deviation used to set the radius threshold; default = 0.1
- Returns:
cd – estimated correlation dimenstion, or np.nan if insufficient data
- Return type:
float
- pyampact.speechDescriptorsUtils.compute_dfa(y, min_window=4, max_window=None, n_windows=20)[source]
Compute DFA (Detrended Fluctuation Analysis) scaling component (alpha) for 1D signal y
- Parameters:
y (array) – speech segment waveform
min_window (max_window) – smallest window size (samples); default = 4
min_window – largest window size (samples); default = len(y)//4
n_windows (int) – number of window sizes to test; default = 20
- Returns:
alpha – DFA scaling exponent
- Return type:
float
- pyampact.speechDescriptorsUtils.compute_dpi(snd, pitch, intensity, intensity_thresh, time_step=0.01, min_pause_dur=0.15)[source]
Compute Duration of Pause Intervals (DPI).
- Parameters:
snd (Praat Sound Object) – Parselmouth/Praat Sound Object
pitch (Praat Pitch Object) – Parselmouth/Praat Pitch Object
intensity (Praat Intensity Object) – time delay (in samples)
intensity_thresh (int) – dB SPL threshold for silence
time_step (float) – analysis step in seconds; default = 0.01
min_pause_dur (float) – minimum pause duration to count (sec); default = 0.15
- Returns:
pause_durations (array) – list of pause durations (s)
mean_dpi (float) – average pause duration
- pyampact.speechDescriptorsUtils.compute_emd_features(y, max_imfs=6, max_sift=10)[source]
Empirical Mode Decomposition (EMD) features. Returns scalar complexity measures, not raw IMFs.
- Parameters:
y (1D numpy array) – input signal (e.g. gneVals or waveform)
max_imfs (int) – maximum IMFs to extract; default = 6
max_sift (int) – max sifting iterations per IMF; default = 10
- Returns:
emd_n_imfs (int) – number of imfs
emd_energy_ratio (float) – high-frequency IMF energy / total energy
emd_entropy (float) – Shannon entropy of IMF energy distribution
- pyampact.speechDescriptorsUtils.compute_mfcc(seg, sr, n_mfcc=13, n_fft=2048, hop_length=512)[source]
Compute Mel Frequency Cepstral Coefficients (MFCCs)
- Parameters:
seg (Praat Sound Object) – segment of Sound cut around onset and offset
n_mfcc (int) – number of coefficients to return; default = 13
n_fft (int) – FFT window size; default = 2048
hop_length (int) – number of samples of analysis advances between consecutive frames; default = 512
- Returns:
mfcc – list of MFCCs
- Return type:
JSON Stringified list
- pyampact.speechDescriptorsUtils.compute_ppe(pitch_values_hz)[source]
Compute Pitch Period Entropy (PPE) from a 1D array of pitch values in Hz.
- Parameters:
pitch_values_hz (ndarray) – array of pitch values
- Returns:
entropy – Pitch Period Entropy (PPE)
- Return type:
float
- pyampact.speechDescriptorsUtils.compute_recurrence_period_and_rpde(pp, onset, offset, n_bins=50)[source]
Compute recurrence period (RP) and recurrence period density entropy (RPDE)
RP is the mean inter-pulse interval within a segment. RPDE is the normalized entropy of the distribution of recurrence periods.
Low = highly periodic/stable signal High = irregular, noisy, chaotic signal
- Parameters:
pp (Praat Pitch Object) – Periodic (cc) of the segment
onset (float)
offset (float) – segment boundaries in seconds
n_bins (int) – bin count for histogram calculation; default = 50
- Returns:
rp_ms (float) – recurrence period in ms
rpde (float) – normalized entropy of recurrence-period distribution
- pyampact.speechDescriptorsUtils.compute_spectral_spread(S, freqs)[source]
Compute spectral spread
- Parameters:
S (ndarray) – magnitude spectrum
freqs (array) – frequency axis in Hz (freq_bins,)
- Returns:
rp_ms (float) – recurrence period in ms
rpde (float) – normalized entropy of recurrence-period distribution
- pyampact.speechDescriptorsUtils.compute_tqwt_features(y, sr, Q=1.0, r=3.0, J=8, n_keep=5)[source]
Compute a fixed-width feature vector from the Tunable Q-factor Wavelet Transform (TQWT). Features are derived from subband energies of the TQWT decomposition of a 1-D signal.
- Parameters:
y (1D numpy array) – input signal
sr (float) – sample rate of the signal in Hz. (Currently unused; included for API consistency.)
Q (float) – Q-factor controlling oscillatory behavior of the wavelets; default = 1.0
r (float) – redundancy factor of the TQWT; default = 3.0
J (int) – number of TQWT decomposition levels; default = 8
n_keep (int) – number of lowest-index subband log-energies to return; default = 5
- Returns:
features – Feature vector containing, in order:
- tqwt_e1 .. tqwt_e{n_keep}float
Log10 subband energies of the first n_keep TQWT bands.
- tqwt_entropyfloat
Normalized Shannon entropy of the subband energy distribution.
- tqwt_centroidfloat
Energy-weighted centroid of subband indices.
- tqwt_low_high_ratiofloat
Ratio of summed low-band energy to high-band energy.
If insufficient valid data are available, all features are returned as NaN.
- Return type:
list of float
- pyampact.speechDescriptorsUtils.compute_vot(seg, pitch, sr)[source]
Compute Voice Onset Time (VOT)
- Parameters:
seg (Praat Sound Object) – segment/slice of Snd object between onset and offset
pitch (Pitch (CC) object) – pitch object from Praat
sr (float) – sample rate of audio
- Returns:
entropy – voice onset time (in ms)
- Return type:
float
- pyampact.speechDescriptorsUtils.compute_wt_features(y, wavelet='db4', level=5)[source]
Compute wavelet-based features from a 1-D signal using discrete wavelet transform (DWT).
Features are derived from the normalized energy distribution of wavelet subbands.
- Parameters:
y (1D numpy array) – input signal
wavelet (str) – Wavelet family used for decomposition; default = “db4”
level (int) – Decomposition level; default = 5
- Returns:
wt_entropy (float) – Shannon entropy (base 2) of the normalized wavelet subband energies.
wt_low_high_ratio (float) – Ratio of high-frequency subband energy to low-frequency subband energy.
- pyampact.speechDescriptorsUtils.feature_spectral_flux(X, f_s)[source]
Calculates spectral flux
- Parameters:
X (ndarray) – spectrogram
f_s (float) – sample rate of audio data
- Returns:
vsf – spectral_flux
- Return type:
float