speechDescriptorsUtils

speechDescriptorsUtils

compute_avqi(shimmer_local, hnr_mean_db, ...)

Compute the Acoustic Voice Quality Index (AVQI).

pyampact.speechDescriptorsUtils.compute_avqi(shimmer_local, hnr_mean_db, cpps_vowel, slope_vowel, cpps_speech, slope_speech)[source]

Compute the Acoustic Voice Quality Index (AVQI).

pyampact.speechDescriptorsUtils.compute_bark_band_energies(S, sr, n_bands=24, mean_energies=True)[source]

Compute bark band energies via bark_scale_freqs edges function

Parameters:
  • S (ndraay) – magnitude spectrum

  • sr (float) – sample rate of audio file

  • n_bands (int) – number of bark bands; default = 24

  • mean_energies (bool) – default (True) returns mean of band energies, False returns array of band energies

Returns:

band_energies – mean of band energies as float, or array of band energies

Return type:

float

pyampact.speechDescriptorsUtils.compute_correlation_dimension(y, m=3, tau=2, r_frac=0.1)[source]

Compute Correlation Dimension (CD)

Parameters:
  • y (array) – time series values

  • m (int) – embedding dimension for phase-space reconstruction; default = 3

  • tau (int) – time delay (in samples); default = 2

  • r_frac (float) – fraction of the distance standard deviation used to set the radius threshold; default = 0.1

Returns:

cd – estimated correlation dimenstion, or np.nan if insufficient data

Return type:

float

pyampact.speechDescriptorsUtils.compute_dfa(y, min_window=4, max_window=None, n_windows=20)[source]

Compute DFA (Detrended Fluctuation Analysis) scaling component (alpha) for 1D signal y

Parameters:
  • y (array) – speech segment waveform

  • min_window (max_window) – smallest window size (samples); default = 4

  • min_window – largest window size (samples); default = len(y)//4

  • n_windows (int) – number of window sizes to test; default = 20

Returns:

alpha – DFA scaling exponent

Return type:

float

pyampact.speechDescriptorsUtils.compute_dpi(snd, pitch, intensity, intensity_thresh, time_step=0.01, min_pause_dur=0.15)[source]

Compute Duration of Pause Intervals (DPI).

Parameters:
  • snd (Praat Sound Object) – Parselmouth/Praat Sound Object

  • pitch (Praat Pitch Object) – Parselmouth/Praat Pitch Object

  • intensity (Praat Intensity Object) – time delay (in samples)

  • intensity_thresh (int) – dB SPL threshold for silence

  • time_step (float) – analysis step in seconds; default = 0.01

  • min_pause_dur (float) – minimum pause duration to count (sec); default = 0.15

Returns:

  • pause_durations (array) – list of pause durations (s)

  • mean_dpi (float) – average pause duration

pyampact.speechDescriptorsUtils.compute_emd_features(y, max_imfs=6, max_sift=10)[source]

Empirical Mode Decomposition (EMD) features. Returns scalar complexity measures, not raw IMFs.

Parameters:
  • y (1D numpy array) – input signal (e.g. gneVals or waveform)

  • max_imfs (int) – maximum IMFs to extract; default = 6

  • max_sift (int) – max sifting iterations per IMF; default = 10

Returns:

  • emd_n_imfs (int) – number of imfs

  • emd_energy_ratio (float) – high-frequency IMF energy / total energy

  • emd_entropy (float) – Shannon entropy of IMF energy distribution

pyampact.speechDescriptorsUtils.compute_mfcc(seg, sr, n_mfcc=13, n_fft=2048, hop_length=512)[source]

Compute Mel Frequency Cepstral Coefficients (MFCCs)

Parameters:
  • seg (Praat Sound Object) – segment of Sound cut around onset and offset

  • n_mfcc (int) – number of coefficients to return; default = 13

  • n_fft (int) – FFT window size; default = 2048

  • hop_length (int) – number of samples of analysis advances between consecutive frames; default = 512

Returns:

mfcc – list of MFCCs

Return type:

JSON Stringified list

pyampact.speechDescriptorsUtils.compute_ppe(pitch_values_hz)[source]

Compute Pitch Period Entropy (PPE) from a 1D array of pitch values in Hz.

Parameters:

pitch_values_hz (ndarray) – array of pitch values

Returns:

entropy – Pitch Period Entropy (PPE)

Return type:

float

pyampact.speechDescriptorsUtils.compute_recurrence_period_and_rpde(pp, onset, offset, n_bins=50)[source]

Compute recurrence period (RP) and recurrence period density entropy (RPDE)

RP is the mean inter-pulse interval within a segment. RPDE is the normalized entropy of the distribution of recurrence periods.

Low = highly periodic/stable signal High = irregular, noisy, chaotic signal

Parameters:
  • pp (Praat Pitch Object) – Periodic (cc) of the segment

  • onset (float)

  • offset (float) – segment boundaries in seconds

  • n_bins (int) – bin count for histogram calculation; default = 50

Returns:

  • rp_ms (float) – recurrence period in ms

  • rpde (float) – normalized entropy of recurrence-period distribution

pyampact.speechDescriptorsUtils.compute_spectral_spread(S, freqs)[source]

Compute spectral spread

Parameters:
  • S (ndarray) – magnitude spectrum

  • freqs (array) – frequency axis in Hz (freq_bins,)

Returns:

  • rp_ms (float) – recurrence period in ms

  • rpde (float) – normalized entropy of recurrence-period distribution

pyampact.speechDescriptorsUtils.compute_tqwt_features(y, sr, Q=1.0, r=3.0, J=8, n_keep=5)[source]

Compute a fixed-width feature vector from the Tunable Q-factor Wavelet Transform (TQWT). Features are derived from subband energies of the TQWT decomposition of a 1-D signal.

Parameters:
  • y (1D numpy array) – input signal

  • sr (float) – sample rate of the signal in Hz. (Currently unused; included for API consistency.)

  • Q (float) – Q-factor controlling oscillatory behavior of the wavelets; default = 1.0

  • r (float) – redundancy factor of the TQWT; default = 3.0

  • J (int) – number of TQWT decomposition levels; default = 8

  • n_keep (int) – number of lowest-index subband log-energies to return; default = 5

Returns:

features – Feature vector containing, in order:

  • tqwt_e1 .. tqwt_e{n_keep}float

    Log10 subband energies of the first n_keep TQWT bands.

  • tqwt_entropyfloat

    Normalized Shannon entropy of the subband energy distribution.

  • tqwt_centroidfloat

    Energy-weighted centroid of subband indices.

  • tqwt_low_high_ratiofloat

    Ratio of summed low-band energy to high-band energy.

If insufficient valid data are available, all features are returned as NaN.

Return type:

list of float

pyampact.speechDescriptorsUtils.compute_vot(seg, pitch, sr)[source]

Compute Voice Onset Time (VOT)

Parameters:
  • seg (Praat Sound Object) – segment/slice of Snd object between onset and offset

  • pitch (Pitch (CC) object) – pitch object from Praat

  • sr (float) – sample rate of audio

Returns:

entropy – voice onset time (in ms)

Return type:

float

pyampact.speechDescriptorsUtils.compute_wt_features(y, wavelet='db4', level=5)[source]

Compute wavelet-based features from a 1-D signal using discrete wavelet transform (DWT).

Features are derived from the normalized energy distribution of wavelet subbands.

Parameters:
  • y (1D numpy array) – input signal

  • wavelet (str) – Wavelet family used for decomposition; default = “db4”

  • level (int) – Decomposition level; default = 5

Returns:

  • wt_entropy (float) – Shannon entropy (base 2) of the normalized wavelet subband energies.

  • wt_low_high_ratio (float) – Ratio of high-frequency subband energy to low-frequency subband energy.

pyampact.speechDescriptorsUtils.feature_spectral_flux(X, f_s)[source]

Calculates spectral flux

Parameters:
  • X (ndarray) – spectrogram

  • f_s (float) – sample rate of audio data

Returns:

vsf – spectral_flux

Return type:

float