alignment

alignment

pyampact.alignment.align_midi_wav(piece, WF, sr, TH, ST, width, tsr, nhar, wms, hop, showSpec)[source]

Align a midi file to a wav file using the “peak structure distance” of Orio et al. that use the MIDI notes to build a mask that is compared against harmonics in the audio

Parameters:
  • piece (Score) – A Score instance containing the symbolic MIDI data.

  • WF (ndarray) – Audio time series of the WAV file.

  • sr (int) – Sampling rate of the audio file.

  • TH (float) – Time step resolution, typically in seconds (default is 0.050).

  • ST (int) – Similarity type; 0 (default) uses the triangle inequality.

  • width (float) – Width of the mask for the analysis.

  • tsr (int) – Target sample rate for resampling the audio (if needed).

  • nhar (int) – Number of harmonics to include in the mask.

  • wms (float) – Window size in milliseconds.

  • hop (int) – Hop size for the analysis window.

  • showSpec (bool) – If True, displays the spectrogram.

Returns:

  • m (ndarray) – The map such that M[:,m] corresponds to the alignment.

  • path (tuple of ndarrays) – [p, q], the path from dynamic programming (DP) that aligns the MIDI and audio.

  • S (ndarray) – The similarity matrix used for alignment.

  • D (ndarray) – The spectrogram of the audio.

  • M (ndarray) – The MIDI-note-derived mask, including harmonic information if available.

pyampact.alignment.alignment_visualiser(audio_spec, times=None, freqs=None, fig=1, showSpec=True)[source]

Visualizes the dynamic time warping (DTW) alignment.

Parameters:
  • audio_spec (ndarray) – Spectrogram of the audio file to be visualized.

  • times (ndarray, optional) – Array of segment times corresponding to the audio spectrogram. If not provided, defaults to None.

  • freqs (ndarray, optional) – Array of sample frequencies corresponding to the audio spectrogram. If not provided, defaults to None.

  • fig (int, optional) – Figure number for the plot. Default is 1.

  • showSpec (bool, optional) – If True, displays the spectrogram overlayed with the alignment information. Default is True.

Returns:

The visualized spectrogram plot with DTW alignment overlays.

Return type:

matplotlib.figure.Figure

pyampact.alignment.get_ons_offs(onsoffs)[source]

Extract onset and offset times from a 3*N alignment matrix generated by AMPACT’s HMM-based alignment algorithm.

Parameters:

onsoffs (ndarray) – A 3*N alignment matrix where: - The first row contains N states. - The second row contains the corresponding ending times for each state. - The third row contains the state indices.

Returns:

res – A dictionary containing: - ‘ons’: List of onset times. - ‘offs’: List of offset times.

Return type:

dict

pyampact.alignment.ifgram(audiofile, tsr, win_ms, showSpec=False)[source]

Compute the instantaneous frequency (IF) spectrogram of an audio file using the reassigned spectrogram and Short-Time Fourier Transform (STFT).

Parameters:
  • audiofile (str) – Path to the audio file to be analyzed.

  • tsr (int) – Target sample rate of the audio signal.

  • win_ms (float) – Window size in milliseconds for spectral analysis.

  • showSpec (bool, optional) – If True, displays the spectrogram of the reassigned spectrogram. Default is False.

Returns:

  • freqs (ndarray) – Reassigned frequency bins of the spectrogram.

  • times (ndarray) – Time frames corresponding to the spectrogram.

  • mags (ndarray) – Magnitudes of the reassigned spectrogram.

  • f0_values (ndarray) – Fundamental frequency estimates for each time frame.

  • mags_mat (ndarray) – Magnitude matrix from the Short-Time Fourier Transform (STFT).

pyampact.alignment.run_DTW_alignment(y, original_sr, piece, tres, width, target_sr, nharm, win_ms, hop, nmat, showSpec)[source]

Perform a dynamic time warping (DTW) alignment between an audio file and its corresponding MIDI file.

This function returns the aligned onset and offset times with corresponding MIDI note numbers, as well as the spectrogram of the audio and other DTW-related data.

Parameters:
  • y (ndarray) – Audio time series of the file.

  • original_sr (int) – Original sample rate of the audio file.

  • piece (Score) – A Score instance containing the symbolic (MIDI) data.

  • tres (float) – Time resolution for MIDI-to-spectrum information conversion.

  • width (float) – Width parameter for the DTW alignment.

  • target_sr (int) – Target sample rate for resampling the audio (if needed).

  • nharm (int) – Number of harmonics to include in the analysis.

  • win_ms (float) – Window size in milliseconds for analysis.

  • hop (int) – Number of samples between successive frames for analysis.

  • nmat (DataFrame) – DataFrame containing note matrix (nmat) data before alignment.

  • showSpec (bool) – If True, displays the spectrogram of the audio file.

Returns:

  • align (dict) – MIDI-audio alignment structure from DTW containing: - ‘on’: Onset times of the notes. - ‘off’: Offset times of the notes. - ‘midiNote’: MIDI note numbers corresponding to the aligned notes.

  • spec (ndarray) – Spectrogram of the audio file.

  • dtw (dict) – A dictionary of DTW returns, including: - ‘M’: The map such that M[:,m] corresponds to the alignment. - ‘MA/RA’: Path from dynamic programming (DP) for MIDI-audio alignment. - ‘S’: Similarity matrix used in the alignment process. - ‘D’: Spectrogram of the audio. - ‘notemask’: The MIDI-note-derived mask used in the alignment. - ‘pianoroll’: MIDI-note-derived piano roll.

  • nmat (DataFrame) – Updated DataFrame containing the note matrix (nmat) data after alignment.

pyampact.alignment.run_alignment(y, original_sr, piece, nmat, width=3, target_sr=4000, nharm=3, win_ms=100, hop=32, showSpec=False)[source]
Parameters:
  • y (ndarray) – Audio time series.

  • original_sr (int) – Original sample rate of the audio file.

  • piece (Score) – A Score instance containing the symbolic MIDI data.

  • means (ndarray) – Mean values for each state in the alignment process.

  • covars (ndarray) – Covariance values for each state in the alignment process.

  • width (float) – Width parameter for the DTW alignment.

  • target_sr (int) – Target sample rate for resampling the audio (if needed).

  • nharm (int) – Number of harmonics to include in the analysis.

  • win_ms (float) – Window size in milliseconds for the analysis.

  • hop (int) – Number of samples between successive frames.

  • showSpec (bool) – If True, displays the spectrogram of the audio.

Returns:

  • align (dict) – MIDI-audio alignment structure from DTW containing: - ‘on’: Onset times of the notes. - ‘off’: Offset times of the notes. - ‘midiNote’: MIDI note numbers corresponding to the aligned notes.

  • dtw (dict) – A dictionary of DTW returns, including: - ‘M’: The map such that M[:,m] corresponds to the alignment. - ‘MA’: Path from dynamic programming (DP) for MIDI-audio alignment. - ‘RA’: Path from DP for real audio alignment. - ‘S’: Similarity matrix used in the alignment process. - ‘D’: Spectrogram of the audio. - ‘notemask’: The MIDI-note-derived mask used in the alignment.

  • spec (ndarray) – Spectrogram of the audio file.

  • newNmat (DataFrame) – Updated DataFrame containing the note matrix (nmat) data after alignment.

Notes

This function leverages DTW to align MIDI note information with the time series audio signal. It computes onset and offset times and updates the alignment using a similarity matrix. Optionally, it can display the audio spectrogram for visual analysis.