alignmentUtils
alignmentUtils
|
Port of Dan Ellis dp.m [p,q,D] = dp(M) |
|
Get an element that is frac fraction of the way between v1[i1] and v2[i2], but check bounds on both vectors. |
|
Get an element from vec, checking bounds. |
|
Calculate an Orio&Schwartz-style (Peak Structure Distance) similarity matrix. |
|
Map onset/offset times using a monotone linear interpolation from intime to outtime. |
|
Calculate F0, power, and spectrum for an inputted spectral representation. |
|
Calculate F0, power, and spectrum for a single note. |
|
Remove or clamp note events in a note matrix that fall outside the active audio region, as determined by an RMS energy threshold. |
|
Merge grace-note sub-parts back into their parent voice and resolve any resulting onset overlaps. |
- pyampact.alignmentUtils.dp(M)[source]
Port of Dan Ellis dp.m [p,q,D] = dp(M)
- pyampact.alignmentUtils.f0_est_weighted_sum(x, f, f0i, fMax=20000, fThresh=None)[source]
Calculate F0, power, and spectrum for an inputted spectral representation.
- Parameters:
x (np.ndarray, shape (F, T)) – Matrix of complex spectrogram values, where F is the number of frequency bins and T is the number of time frames.
f (np.ndarray, shape (F, T)) – Matrix of frequencies corresponding to each of the spectrogram values in x.
f0i (np.ndarray, shape (1, T)) – Initial estimates of F0 for each time frame. This should be a 1D array containing the F0 estimates for each time point.
fMax (float, optional) – Maximum frequency to consider in the weighted sum. Defaults to 5000 Hz.
fThresh (float, optional) – Maximum distance in Hz from each harmonic to consider. If not specified, no threshold will be applied.
- Returns:
f0 (np.ndarray) – Vector of estimated F0s from the beginning to the end of the input time series.
p (np.ndarray) – Vector of corresponding “powers” derived from the weighted sum of the estimated F0.
strips (np.ndarray) – Estimated spectrum for each partial frequency based on the weighted contributions.
- pyampact.alignmentUtils.f0_est_weighted_sum_spec(noteStart_s, noteEnd_s, midiNote, freqs, D, sr, useIf=True)[source]
Calculate F0, power, and spectrum for a single note.
- Parameters:
filename (str) – Name of the WAV file to analyze.
noteStart_s (float) – Start position (in seconds) of the note to analyze.
noteEnd_s (float) – End position (in seconds) of the note to analyze.
midiNote (int) – MIDI note number of the note to analyze.
y (np.ndarray) – Audio time series data from the WAV file.
sr (int) – Sample rate of the audio signal.
useIf (bool, optional) – If true, use instantaneous frequency; otherwise, use spectrogram frequencies. Defaults to True.
- Returns:
f0 (np.ndarray) – Vector of estimated F0s from noteStart_s to noteEnd_s.
p (np.ndarray) – Vector of corresponding “powers” derived from the weighted sum of the estimated F0.
M (np.ndarray) – Estimated spectrum for the analyzed note.
- pyampact.alignmentUtils.g(vec, idx, domain)[source]
Get an element from vec, checking bounds. Domain is the set of points that vec is a subset of.
- Parameters:
vec (np.ndarray) – A 1D numpy array representing the input vector.
idx (int) – The index of the desired element in vec.
domain (np.ndarray) – A 1D numpy array representing the set of valid points, of which vec is a subset.
- Returns:
The element from vec at index idx if it is within bounds; otherwise, the first element of domain if idx is less than 0, or the last element of domain if idx exceeds the bounds of vec.
- Return type:
float
- pyampact.alignmentUtils.gh(v1, i1, v2, i2, domain, frac=0.5)[source]
Get an element that is frac fraction of the way between v1[i1] and v2[i2], but check bounds on both vectors. frac of 0 returns v1[i1], frac of 1 returns v2[i2], frac of 0.5 (the default) returns halfway between them.
- Parameters:
v1 (np.ndarray) – A 1D numpy array representing the first vector.
i1 (int) – The index in v1 from which to retrieve the value.
v2 (np.ndarray) – A 1D numpy array representing the second vector.
i2 (int) – The index in v2 from which to retrieve the value.
domain (tuple) – A tuple representing the valid bounds for both vectors. This should define the minimum and maximum allowable indices.
frac (float, optional) – A fraction indicating how far between the two specified elements to interpolate. Default is 0.5.
- Returns:
The element that is frac fraction of the way between v1[i1] and v2[i2], clipped to the specified domain bounds.
- Return type:
float
- pyampact.alignmentUtils.maptimes(t, intime, outtime)[source]
Map onset/offset times using a monotone linear interpolation from intime to outtime. Handles duplicate intime entries (DTW path repeats) by averaging their outtime.
- pyampact.alignmentUtils.merge_grace_notes(nmat, offset=0.025)[source]
Merge grace-note sub-parts back into their parent voice and resolve any resulting onset overlaps.
- Parameters:
nmat (pd.DataFrame) – Note matrix dictionary keyed by part name.
offset (float, optional) – Time in seconds added to the
ONSET_SECandOFFSET_SECof every grace-note sub-part before merging. Default is0.025.
- Returns:
nmat – The updated note matrix with all grace-note sub-parts folded into their respective base parts and removed as separate keys.
- Return type:
dict of str → pd.DataFrame
- pyampact.alignmentUtils.orio_simmx(M, D)[source]
Calculate an Orio&Schwartz-style (Peak Structure Distance) similarity matrix.
- Parameters:
M (np.ndarray) – A binary mask where each column corresponds to a row in the output similarity matrix S. The mask indicates the presence or absence of MIDI notes or relevant features.
D (np.ndarray) – The regular spectrogram where the columns of the similarity matrix S correspond to the columns of D. This spectrogram represents the audio signal over time and frequency.
- Returns:
The similarity matrix S, calculated based on the Peak Structure Distance between the binary mask M and the spectrogram D.
- Return type:
np.ndarray
- pyampact.alignmentUtils.trim_silences(nmat_dict, y, sr, rms_thresh_db=-40.0, pad=0.25)[source]
Remove or clamp note events in a note matrix that fall outside the active audio region, as determined by an RMS energy threshold.
- Parameters:
nmat_dict (pd.DataFrame) – Note matrix dictionary keyed by part name.
y (np.ndarray) – Audio time series at sample rate
sr.sr (int) – Sample rate of
yin Hz.rms_thresh_db (float, optional) – RMS energy threshold in dBFS below which frames are considered silent. Default is
-40.0.pad (float, optional) – Reserved for future use. Currently unused. Default is
0.25.
- Returns:
nmat_dict – The input dictionary with each part’s DataFrame trimmed in-place to the active audio window.
- Return type:
pd.DataFrame