alignmentUtils

alignmentUtils

dp(M)

Port of Dan Ellis dp.m [p,q,D] = dp(M)

gh(v1, i1, v2, i2, domain[, frac])

Get an element that is frac fraction of the way between v1[i1] and v2[i2], but check bounds on both vectors.

g(vec, idx, domain)

Get an element from vec, checking bounds.

orio_simmx(M, D)

Calculate an Orio&Schwartz-style (Peak Structure Distance) similarity matrix.

maptimes(t, intime, outtime)

Map onset/offset times using a monotone linear interpolation from intime to outtime.

f0_est_weighted_sum(x, f, f0i[, fMax, fThresh])

Calculate F0, power, and spectrum for an inputted spectral representation.

f0_est_weighted_sum_spec(noteStart_s, ...[, ...])

Calculate F0, power, and spectrum for a single note.

trim_silences(nmat_dict, y, sr[, ...])

Remove or clamp note events in a note matrix that fall outside the active audio region, as determined by an RMS energy threshold.

merge_grace_notes(nmat[, offset])

Merge grace-note sub-parts back into their parent voice and resolve any resulting onset overlaps.

pyampact.alignmentUtils.dp(M)[source]

Port of Dan Ellis dp.m [p,q,D] = dp(M)

pyampact.alignmentUtils.f0_est_weighted_sum(x, f, f0i, fMax=20000, fThresh=None)[source]

Calculate F0, power, and spectrum for an inputted spectral representation.

Parameters:
  • x (np.ndarray, shape (F, T)) – Matrix of complex spectrogram values, where F is the number of frequency bins and T is the number of time frames.

  • f (np.ndarray, shape (F, T)) – Matrix of frequencies corresponding to each of the spectrogram values in x.

  • f0i (np.ndarray, shape (1, T)) – Initial estimates of F0 for each time frame. This should be a 1D array containing the F0 estimates for each time point.

  • fMax (float, optional) – Maximum frequency to consider in the weighted sum. Defaults to 5000 Hz.

  • fThresh (float, optional) – Maximum distance in Hz from each harmonic to consider. If not specified, no threshold will be applied.

Returns:

  • f0 (np.ndarray) – Vector of estimated F0s from the beginning to the end of the input time series.

  • p (np.ndarray) – Vector of corresponding “powers” derived from the weighted sum of the estimated F0.

  • strips (np.ndarray) – Estimated spectrum for each partial frequency based on the weighted contributions.

pyampact.alignmentUtils.f0_est_weighted_sum_spec(noteStart_s, noteEnd_s, midiNote, freqs, D, sr, useIf=True)[source]

Calculate F0, power, and spectrum for a single note.

Parameters:
  • filename (str) – Name of the WAV file to analyze.

  • noteStart_s (float) – Start position (in seconds) of the note to analyze.

  • noteEnd_s (float) – End position (in seconds) of the note to analyze.

  • midiNote (int) – MIDI note number of the note to analyze.

  • y (np.ndarray) – Audio time series data from the WAV file.

  • sr (int) – Sample rate of the audio signal.

  • useIf (bool, optional) – If true, use instantaneous frequency; otherwise, use spectrogram frequencies. Defaults to True.

Returns:

  • f0 (np.ndarray) – Vector of estimated F0s from noteStart_s to noteEnd_s.

  • p (np.ndarray) – Vector of corresponding “powers” derived from the weighted sum of the estimated F0.

  • M (np.ndarray) – Estimated spectrum for the analyzed note.

pyampact.alignmentUtils.g(vec, idx, domain)[source]

Get an element from vec, checking bounds. Domain is the set of points that vec is a subset of.

Parameters:
  • vec (np.ndarray) – A 1D numpy array representing the input vector.

  • idx (int) – The index of the desired element in vec.

  • domain (np.ndarray) – A 1D numpy array representing the set of valid points, of which vec is a subset.

Returns:

The element from vec at index idx if it is within bounds; otherwise, the first element of domain if idx is less than 0, or the last element of domain if idx exceeds the bounds of vec.

Return type:

float

pyampact.alignmentUtils.gh(v1, i1, v2, i2, domain, frac=0.5)[source]

Get an element that is frac fraction of the way between v1[i1] and v2[i2], but check bounds on both vectors. frac of 0 returns v1[i1], frac of 1 returns v2[i2], frac of 0.5 (the default) returns halfway between them.

Parameters:
  • v1 (np.ndarray) – A 1D numpy array representing the first vector.

  • i1 (int) – The index in v1 from which to retrieve the value.

  • v2 (np.ndarray) – A 1D numpy array representing the second vector.

  • i2 (int) – The index in v2 from which to retrieve the value.

  • domain (tuple) – A tuple representing the valid bounds for both vectors. This should define the minimum and maximum allowable indices.

  • frac (float, optional) – A fraction indicating how far between the two specified elements to interpolate. Default is 0.5.

Returns:

The element that is frac fraction of the way between v1[i1] and v2[i2], clipped to the specified domain bounds.

Return type:

float

pyampact.alignmentUtils.maptimes(t, intime, outtime)[source]

Map onset/offset times using a monotone linear interpolation from intime to outtime. Handles duplicate intime entries (DTW path repeats) by averaging their outtime.

pyampact.alignmentUtils.merge_grace_notes(nmat, offset=0.025)[source]

Merge grace-note sub-parts back into their parent voice and resolve any resulting onset overlaps.

Parameters:
  • nmat (pd.DataFrame) – Note matrix dictionary keyed by part name.

  • offset (float, optional) – Time in seconds added to the ONSET_SEC and OFFSET_SEC of every grace-note sub-part before merging. Default is 0.025.

Returns:

nmat – The updated note matrix with all grace-note sub-parts folded into their respective base parts and removed as separate keys.

Return type:

dict of str → pd.DataFrame

pyampact.alignmentUtils.orio_simmx(M, D)[source]

Calculate an Orio&Schwartz-style (Peak Structure Distance) similarity matrix.

Parameters:
  • M (np.ndarray) – A binary mask where each column corresponds to a row in the output similarity matrix S. The mask indicates the presence or absence of MIDI notes or relevant features.

  • D (np.ndarray) – The regular spectrogram where the columns of the similarity matrix S correspond to the columns of D. This spectrogram represents the audio signal over time and frequency.

Returns:

The similarity matrix S, calculated based on the Peak Structure Distance between the binary mask M and the spectrogram D.

Return type:

np.ndarray

pyampact.alignmentUtils.trim_silences(nmat_dict, y, sr, rms_thresh_db=-40.0, pad=0.25)[source]

Remove or clamp note events in a note matrix that fall outside the active audio region, as determined by an RMS energy threshold.

Parameters:
  • nmat_dict (pd.DataFrame) – Note matrix dictionary keyed by part name.

  • y (np.ndarray) – Audio time series at sample rate sr.

  • sr (int) – Sample rate of y in Hz.

  • rms_thresh_db (float, optional) – RMS energy threshold in dBFS below which frames are considered silent. Default is -40.0.

  • pad (float, optional) – Reserved for future use. Currently unused. Default is 0.25.

Returns:

nmat_dict – The input dictionary with each part’s DataFrame trimmed in-place to the active audio window.

Return type:

pd.DataFrame