alignment

alignment

run_alignment(audio_file, score_file[, ...])

run_DTW_alignment(y, original_sr, piece, ...)

Perform a dynamic time warping (DTW) alignment between an audio file and its corresponding MIDI file.

align_midi_wav(nmat, piece, WF, sr, TH, ...)

Align a midi file to a wav file using the "peak structure distance" of Orio et al. that use the MIDI notes to build a mask that is compared against harmonics in the audio.

pyampact.alignment.align_midi_wav(nmat, piece, WF, sr, TH, width, tsr, nhar, wms, hop)[source]

Align a midi file to a wav file using the “peak structure distance” of Orio et al. that use the MIDI notes to build a mask that is compared against harmonics in the audio

Parameters:
  • piece (Music21 Object) – Object from load_score with all data and labels

  • WF (ndarray) – Audio time series of the WAV file.

  • sr (int) – Sampling rate of the audio file.

  • TH (float) – Time step resolution, typically in seconds (default is 0.025).

  • ST (int) – Similarity type; 0 (default) uses the triangle inequality.

  • width (float) – Width of the mask for the analysis.

  • tsr (int) – Target sample rate for resampling the audio (if needed).

  • nhar (int) – Number of harmonics to include in the mask.

  • wms (float) – Window size in milliseconds.

  • hop (int) – Hop size for the analysis window.

Returns:

  • m (ndarray) – The map such that M[:,m] corresponds to the alignment.

  • path (tuple of ndarrays) – [p, q], the path from dynamic programming (DP) that aligns the MIDI and audio.

  • S (ndarray) – The similarity matrix used for alignment.

  • D (ndarray) – The spectrogram of the audio.

  • M (ndarray) – The MIDI-note-derived mask, including harmonic information if available.

pyampact.alignment.run_DTW_alignment(y, original_sr, piece, tres, width, target_sr, nharm, win_ms, hop, nmat)[source]

Perform a dynamic time warping (DTW) alignment between an audio file and its corresponding MIDI file.

This function returns the aligned onset and offset times with corresponding MIDI note numbers, as well as the spectrogram of the audio and other DTW-related data.

Parameters:
  • y (ndarray) – Audio time series of the file.

  • original_sr (int) – Original sample rate of the audio file.

  • piece (Score) – A Score instance containing the symbolic (MIDI) data.

  • tres (float) – Time resolution for MIDI-to-spectrum information conversion.

  • width (float) – Width parameter for the DTW alignment.

  • target_sr (int) – Target sample rate for resampling the audio (if needed).

  • nharm (int) – Number of harmonics to include in the analysis.

  • win_ms (float) – Window size in milliseconds for analysis.

  • hop (int) – Number of samples between successive frames for analysis.

  • nmat (DataFrame) – DataFrame containing note matrix (nmat) data before alignment.

Returns:

  • align (dict) – MIDI-audio alignment structure from DTW containing: - ‘on’: Onset times of the notes. - ‘off’: Offset times of the notes. - ‘midiNote’: MIDI note numbers corresponding to the aligned notes.

  • spec (ndarray) – Spectrogram of the audio file.

  • dtw (dict) – A dictionary of DTW returns, including: - ‘M’: The map such that M[:,m] corresponds to the alignment. - ‘MA/RA’: Path from dynamic programming (DP) for MIDI-audio alignment. - ‘S’: Similarity matrix used in the alignment process. - ‘D’: Spectrogram of the audio. - ‘notemask’: The MIDI-note-derived mask used in the alignment. - ‘pianoroll’: MIDI-note-derived piano roll.

  • nmat (DataFrame) – Updated DataFrame containing the note matrix (nmat) data after alignment.

pyampact.alignment.run_alignment(audio_file, score_file, width=3, target_sr=4000, nharm=3, win_ms=100, hop=32)[source]
Parameters:
  • audio_file (string) – Path to audio file

  • score_file (string) – Path to score/symbolic file

  • width (float) – Width parameter for the DTW alignment.

  • target_sr (int) – Target sample rate for resampling the audio (if needed).

  • nharm (int) – Number of harmonics to include in the analysis.

  • win_ms (float) – Window size in milliseconds for the analysis.

  • hop (int) – Number of samples between successive frames.

Returns:

  • spec (ndarray) – Spectrogram of the audio file.

  • piece (Music21 Object) – All labels in Music21 format

  • newNmat (DataFrame) – Updated DataFrame containing the note matrix (nmat) data after alignment.

  • y (ndarray) – Audio data of audio_file

  • original_sr (int) – Sample rate returned by audio_file

Notes

This function leverages DTW to align MIDI note information with the time series audio signal. It computes onset and offset times and updates the alignment using a similarity matrix. Optionally, it can display the audio spectrogram for visual analysis.