Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multi-track Polyphonic Music
Generation from Voice Melody
Transcription with Neural
Networks
by Carlos Toxtli
Content
● Summary
○ Brief explanation of the results
● Demo
○ Show how Hum2Song works
● Detailed explanation
○ Explain my ...
Hum2Song! is an AI-powered web
application that is able to compose the
musical accompaniment of a melody
produced by a hum...
Diagram
Problems predicting genre from the melody
● Genre is an ambiguous concept
● i.e. Pop music means "popular" regardless of t...
Results in literature for genre prediction from MIDI
Cory McKay, Automatic Genre Classification of MIDI Recordings
Proposed method - 55.8%
After running 1,300 experiments (over 4 conditions), our
best model of single track 1-D features e...
Best case
Layers: 128, 64, 32, 3
Input: 1D Vector 128 features from drums
Output: 3 classes
Activation functions: RELU & S...
Layers: [64, 128, 16, 64, 256, 32, 3]
Input: 1D Vector 64 features from melody
Output: 3 classes
Activation functions: REL...
RMSprop
It was devised by the legendary Geoffrey Hinton, while suggesting a random
idea during a Coursera class. Consist i...
Softmax and Cross-Entropy
Examples
MuseGAN sample:
Hum2Song sample:
My journey - Starting point
● I decided to do it from scratch without consulting previous work.
● I had no domain knowledg...
My journey - Steps to follow
● Implement an https site that
allows voice recording
● Implement my model and
Google Magenta...
Features
● The MIDI file format consists of time series, each note contains a pitch,
a start time and an end time.
● In or...
Choosing Neural Network Architecture
● In order to decide which architecture to use, all the possible
combinations of [16,...
Best resultsMelody 64 - 48.6%
[64, 128, 256, 64, 16, 3]
Melody 128 - 46.7%
[128, 256, 128, 64, 16, 3]
Drums 64 - 51.3%
[64...
Google Magenta models
Piano transcription
Multitrack progression
Trio generator
Outcomes
GitHub repository Interactive tutorial
Medium Blog post ProductHunt release
Portfolio demo
Magenta demos
Conferen...
Thanks
Contact:
@ctoxtli
Demo:
https://www.carlostoxtli.com/hum2song
Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks
Upcoming SlideShare
Loading in …5
×

of

Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 1 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 2 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 3 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 4 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 5 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 6 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 7 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 8 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 9 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 10 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 11 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 12 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 13 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 14 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 15 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 16 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 17 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 18 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 19 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 20 Hum2 song   multi-track polyphonic music generation from voice melody transcription with neural networks Slide 21
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

3 Likes

Share

Download to read offline

Hum2 song multi-track polyphonic music generation from voice melody transcription with neural networks

Download to read offline

Hum2Song! is an AI-powered web application that is able to compose the musical accompaniment of a melody produced by the human voice. Demo: https://www.carlostoxtli.com/hum2song/

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Hum2 song multi-track polyphonic music generation from voice melody transcription with neural networks

  1. 1. Multi-track Polyphonic Music Generation from Voice Melody Transcription with Neural Networks by Carlos Toxtli
  2. 2. Content ● Summary ○ Brief explanation of the results ● Demo ○ Show how Hum2Song works ● Detailed explanation ○ Explain my journey building it.
  3. 3. Hum2Song! is an AI-powered web application that is able to compose the musical accompaniment of a melody produced by a human voice. Summary - System description
  4. 4. Diagram
  5. 5. Problems predicting genre from the melody ● Genre is an ambiguous concept ● i.e. Pop music means "popular" regardless of the genre ● Much songs combine different genres. ● It is needed multitrack analysis for genre prediction ● The same melody can be used in different genres
  6. 6. Results in literature for genre prediction from MIDI Cory McKay, Automatic Genre Classification of MIDI Recordings
  7. 7. Proposed method - 55.8% After running 1,300 experiments (over 4 conditions), our best model of single track 1-D features experiment got 55.8% val_acc that overperformed previous work.
  8. 8. Best case Layers: 128, 64, 32, 3 Input: 1D Vector 128 features from drums Output: 3 classes Activation functions: RELU & Softmax Optimizer: Rmsprop Loss function: Categorical Cross Entropy Val_acc: 55.8%
  9. 9. Layers: [64, 128, 16, 64, 256, 32, 3] Input: 1D Vector 64 features from melody Output: 3 classes Activation functions: RELU & Softmax Optimizer: Rmsprop Loss function: Categorical Cross Entropy Val_acc: 48.6% Case implemented in the demo
  10. 10. RMSprop It was devised by the legendary Geoffrey Hinton, while suggesting a random idea during a Coursera class. Consist in divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight.Gradient Descend Rmsprop
  11. 11. Softmax and Cross-Entropy
  12. 12. Examples MuseGAN sample: Hum2Song sample:
  13. 13. My journey - Starting point ● I decided to do it from scratch without consulting previous work. ● I had no domain knowledge (music theory) ● My main area of research is Human Computer Interaction. ● I had no experience building Web-AI apps. ● I only had ~1 month ● My main goal was to learn by trying and to have something to show in my portfolio.
  14. 14. My journey - Steps to follow ● Implement an https site that allows voice recording ● Implement my model and Google Magenta models ● Clean the noisy transcribed data ● Get the genre, a drum, a bass, a tonal scale, and chords progression from the melody. ● Create a song from progressions ● Adapt a web music editor ● Publish the website ● Promote online demo ● Learn how MIDI files are structured ● http://www.midiworld.com scraping (16k files) ● Decide the features to use ● Data preprocessing ● Stratified sampling ● Evaluate several NN architecture combinations (325 per condition). ● Fine tuning the best options ● Convert the best model to tensorflow.js
  15. 15. Features ● The MIDI file format consists of time series, each note contains a pitch, a start time and an end time. ● In order to convert the notes to a feature vector is needed to define a sample rate. I defined 64 (4 seconds) and 128 samples (8 seconds). ● In order to get a pattern that represents the main melody, 2 string algorithms were applied (learned from String Algorithms class): ○ Longest Common Subsequence (LCS) ○ Longest Repeated Subsequence (LRS) ● Our 4 conditions were Melody 64 features, Melody 128 features, Drums 64 features, and Drums 128 features. ● For the melody conditions we adapted the pitches to the human voice range.
  16. 16. Choosing Neural Network Architecture ● In order to decide which architecture to use, all the possible combinations of [16, 32, 64, 128, 256] were tested ● 100 epochs were trained per each combination ● The accuracy and confusion matrices were used to pick the best. ● 4 NVIDIA Tesla K80 GPUs were used from Google Colaboratory. ● Keras checkpoints were used to preserve the best models.
  17. 17. Best resultsMelody 64 - 48.6% [64, 128, 256, 64, 16, 3] Melody 128 - 46.7% [128, 256, 128, 64, 16, 3] Drums 64 - 51.3% [64, 128, 16, 64, 256, 32, 3] Drums 128 - 55.8% [128, 64, 32, 3] Multitrack features Sander Shi, CMU (original) [4, 5, 3] 46.4% [4, 64, 32, 16, 128, 256, 3] 51.6%
  18. 18. Google Magenta models Piano transcription Multitrack progression Trio generator
  19. 19. Outcomes GitHub repository Interactive tutorial Medium Blog post ProductHunt release Portfolio demo Magenta demos Conference poster/demo
  20. 20. Thanks Contact: @ctoxtli Demo: https://www.carlostoxtli.com/hum2song
  • GretchenBass1

    Dec. 4, 2021
  • anna8885

    Jan. 14, 2019
  • carlostoxtli

    Dec. 18, 2018

Hum2Song! is an AI-powered web application that is able to compose the musical accompaniment of a melody produced by the human voice. Demo: https://www.carlostoxtli.com/hum2song/

Views

Total views

405

On Slideshare

0

From embeds

0

Number of embeds

4

Actions

Downloads

5

Shares

0

Comments

0

Likes

3

×