speechDescriptorsUtils

compute_avqi(shimmer_local, hnr_mean_db, ...)

Compute the Acoustic Voice Quality Index (AVQI).

pyampact.speechDescriptorsUtils.compute_avqi(shimmer_local, hnr_mean_db, cpps_vowel, slope_vowel, cpps_speech, slope_speech)[source]: Compute the Acoustic Voice Quality Index (AVQI).

pyampact.speechDescriptorsUtils.compute_bark_band_energies(S, sr, n_bands=24, mean_energies=True)[source]

Compute bark band energies via bark_scale_freqs edges function

Parameters:

S (ndraay) – magnitude spectrum
sr (float) – sample rate of audio file
n_bands (int) – number of bark bands; default = 24
mean_energies (bool) – default (True) returns mean of band energies, False returns array of band energies

Returns:

band_energies – mean of band energies as float, or array of band energies

Return type:

float

pyampact.speechDescriptorsUtils.compute_correlation_dimension(y, m=3, tau=2, r_frac=0.1)[source]

Compute Correlation Dimension (CD)

Parameters:

y (array) – time series values
m (int) – embedding dimension for phase-space reconstruction; default = 3
tau (int) – time delay (in samples); default = 2
r_frac (float) – fraction of the distance standard deviation used to set the radius threshold; default = 0.1

Returns:

cd – estimated correlation dimenstion, or np.nan if insufficient data

Return type:

float

pyampact.speechDescriptorsUtils.compute_dfa(y, min_window=4, max_window=None, n_windows=20)[source]

Compute DFA (Detrended Fluctuation Analysis) scaling component (alpha) for 1D signal y

Parameters:

y (array) – speech segment waveform
min_window (max_window) – smallest window size (samples); default = 4
min_window – largest window size (samples); default = len(y)//4
n_windows (int) – number of window sizes to test; default = 20

Returns:

alpha – DFA scaling exponent

Return type:

float

pyampact.speechDescriptorsUtils.compute_dpi(snd, pitch, intensity, intensity_thresh, time_step=0.01, min_pause_dur=0.15)[source]

Compute Duration of Pause Intervals (DPI).

Parameters:

snd (Praat Sound Object) – Parselmouth/Praat Sound Object
pitch (Praat Pitch Object) – Parselmouth/Praat Pitch Object
intensity (Praat Intensity Object) – time delay (in samples)
intensity_thresh (int) – dB SPL threshold for silence
time_step (float) – analysis step in seconds; default = 0.01
min_pause_dur (float) – minimum pause duration to count (sec); default = 0.15

Returns:

pause_durations (array) – list of pause durations (s)
mean_dpi (float) – average pause duration

pyampact.speechDescriptorsUtils.compute_emd_features(y, max_imfs=6, max_sift=10)[source]

Empirical Mode Decomposition (EMD) features. Returns scalar complexity measures, not raw IMFs.

Parameters:

y (1D numpy array) – input signal (e.g. gneVals or waveform)
max_imfs (int) – maximum IMFs to extract; default = 6
max_sift (int) – max sifting iterations per IMF; default = 10

Returns:

emd_n_imfs (int) – number of imfs
emd_energy_ratio (float) – high-frequency IMF energy / total energy
emd_entropy (float) – Shannon entropy of IMF energy distribution

pyampact.speechDescriptorsUtils.compute_mfcc(seg, sr, n_mfcc=13, n_fft=2048, hop_length=512)[source]

Compute Mel Frequency Cepstral Coefficients (MFCCs)

Parameters:

seg (Praat Sound Object) – segment of Sound cut around onset and offset
n_mfcc (int) – number of coefficients to return; default = 13
n_fft (int) – FFT window size; default = 2048
hop_length (int) – number of samples of analysis advances between consecutive frames; default = 512

Returns:

mfcc – list of MFCCs

Return type:

JSON Stringified list

pyampact.speechDescriptorsUtils.compute_ppe(pitch_values_hz)[source]

Compute Pitch Period Entropy (PPE) from a 1D array of pitch values in Hz.

Parameters:: pitch_values_hz (ndarray) – array of pitch values
Returns:: entropy – Pitch Period Entropy (PPE)
Return type:: float

pyampact.speechDescriptorsUtils.compute_recurrence_period_and_rpde(pp, onset, offset, n_bins=50)[source]

Compute recurrence period (RP) and recurrence period density entropy (RPDE)

RP is the mean inter-pulse interval within a segment. RPDE is the normalized entropy of the distribution of recurrence periods.

Low = highly periodic/stable signal High = irregular, noisy, chaotic signal

Parameters:

pp (Praat Pitch Object) – Periodic (cc) of the segment
onset (float)
offset (float) – segment boundaries in seconds
n_bins (int) – bin count for histogram calculation; default = 50

Returns:

rp_ms (float) – recurrence period in ms
rpde (float) – normalized entropy of recurrence-period distribution

pyampact.speechDescriptorsUtils.compute_spectral_spread(S, freqs)[source]

Compute spectral spread

Parameters:

S (ndarray) – magnitude spectrum
freqs (array) – frequency axis in Hz (freq_bins,)

Returns:

rp_ms (float) – recurrence period in ms
rpde (float) – normalized entropy of recurrence-period distribution

pyampact.speechDescriptorsUtils.compute_tqwt_features(y, sr, Q=1.0, r=3.0, J=8, n_keep=5)[source]

Compute a fixed-width feature vector from the Tunable Q-factor Wavelet Transform (TQWT). Features are derived from subband energies of the TQWT decomposition of a 1-D signal.

Parameters:

y (1D numpy array) – input signal
sr (float) – sample rate of the signal in Hz. (Currently unused; included for API consistency.)
Q (float) – Q-factor controlling oscillatory behavior of the wavelets; default = 1.0
r (float) – redundancy factor of the TQWT; default = 3.0
J (int) – number of TQWT decomposition levels; default = 8
n_keep (int) – number of lowest-index subband log-energies to return; default = 5

Returns:

features – Feature vector containing, in order:

tqwt_e1 .. tqwt_e{n_keep}float
Log10 subband energies of the first n_keep TQWT bands.
tqwt_entropyfloat
Normalized Shannon entropy of the subband energy distribution.
tqwt_centroidfloat
Energy-weighted centroid of subband indices.
tqwt_low_high_ratiofloat
Ratio of summed low-band energy to high-band energy.

If insufficient valid data are available, all features are returned as NaN.

Return type:

list of float

pyampact.speechDescriptorsUtils.compute_vot(seg, pitch, sr)[source]

Compute Voice Onset Time (VOT)

Parameters:

seg (Praat Sound Object) – segment/slice of Snd object between onset and offset
pitch (Pitch (CC) object) – pitch object from Praat
sr (float) – sample rate of audio

Returns:

entropy – voice onset time (in ms)

Return type:

float

pyampact.speechDescriptorsUtils.compute_wt_features(y, wavelet='db4', level=5)[source]

Compute wavelet-based features from a 1-D signal using discrete wavelet transform (DWT).

Features are derived from the normalized energy distribution of wavelet subbands.

Parameters:

y (1D numpy array) – input signal
wavelet (str) – Wavelet family used for decomposition; default = “db4”
level (int) – Decomposition level; default = 5

Returns:

wt_entropy (float) – Shannon entropy (base 2) of the normalized wavelet subband energies.
wt_low_high_ratio (float) – Ratio of high-frequency subband energy to low-frequency subband energy.

pyampact.speechDescriptorsUtils.feature_spectral_flux(X, f_s)[source]

Calculates spectral flux

Parameters:

X (ndarray) – spectrogram
f_s (float) – sample rate of audio data

Returns:

vsf – spectral_flux

Return type:

float