SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Multi-track Polyphonic Music
Generation from Voice Melody
Transcription with Neural
Networks
by Carlos Toxtli
Content
● Summary
○ Brief explanation of the results
● Demo
○ Show how Hum2Song works
● Detailed explanation
○ Explain my journey building it.
Hum2Song! is an AI-powered web
application that is able to compose the
musical accompaniment of a melody
produced by a human voice.
Summary - System description
Diagram
Problems predicting genre from the melody
● Genre is an ambiguous concept
● i.e. Pop music means "popular" regardless of the genre
● Much songs combine different genres.
● It is needed multitrack analysis for genre prediction
● The same melody can be used in different genres
Results in literature for genre prediction from MIDI
Cory McKay, Automatic Genre Classification of MIDI Recordings
Proposed method - 55.8%
After running 1,300 experiments (over 4 conditions), our
best model of single track 1-D features experiment got
55.8% val_acc that overperformed previous work.
Best case
Layers: 128, 64, 32, 3
Input: 1D Vector 128 features from drums
Output: 3 classes
Activation functions: RELU & Softmax
Optimizer: Rmsprop
Loss function: Categorical Cross Entropy
Val_acc: 55.8%
Layers: [64, 128, 16, 64, 256, 32, 3]
Input: 1D Vector 64 features from melody
Output: 3 classes
Activation functions: RELU & Softmax
Optimizer: Rmsprop
Loss function: Categorical Cross Entropy
Val_acc: 48.6%
Case implemented in the demo
RMSprop
It was devised by the legendary Geoffrey Hinton, while suggesting a random
idea during a Coursera class. Consist in divide the learning rate for a weight
by a running average of the magnitudes of recent gradients
for that weight.Gradient Descend Rmsprop
Softmax and Cross-Entropy
Examples
MuseGAN sample:
Hum2Song sample:
My journey - Starting point
● I decided to do it from scratch without consulting previous work.
● I had no domain knowledge (music theory)
● My main area of research is Human Computer Interaction.
● I had no experience building Web-AI apps.
● I only had ~1 month
● My main goal was to learn by trying and to have something to show in
my portfolio.
My journey - Steps to follow
● Implement an https site that
allows voice recording
● Implement my model and
Google Magenta models
● Clean the noisy transcribed data
● Get the genre, a drum, a bass, a
tonal scale, and chords
progression from the melody.
● Create a song from progressions
● Adapt a web music editor
● Publish the website
● Promote online demo
● Learn how MIDI files are
structured
● http://www.midiworld.com
scraping (16k files)
● Decide the features to use
● Data preprocessing
● Stratified sampling
● Evaluate several NN architecture
combinations (325 per condition).
● Fine tuning the best options
● Convert the best model to
tensorflow.js
Features
● The MIDI file format consists of time series, each note contains a pitch,
a start time and an end time.
● In order to convert the notes to a feature vector is needed to define a
sample rate. I defined 64 (4 seconds) and 128 samples (8 seconds).
● In order to get a pattern that represents the main melody, 2 string
algorithms were applied (learned from String Algorithms class):
○ Longest Common Subsequence (LCS)
○ Longest Repeated Subsequence (LRS)
● Our 4 conditions were Melody 64 features, Melody 128 features, Drums
64 features, and Drums 128 features.
● For the melody conditions we adapted the pitches to the human voice
range.
Choosing Neural Network Architecture
● In order to decide which architecture to use, all the possible
combinations of [16, 32, 64, 128, 256] were tested
● 100 epochs were trained per each combination
● The accuracy and confusion matrices were used to pick the best.
● 4 NVIDIA Tesla K80 GPUs were used from Google Colaboratory.
● Keras checkpoints were used to preserve the best models.
Best resultsMelody 64 - 48.6%
[64, 128, 256, 64, 16, 3]
Melody 128 - 46.7%
[128, 256, 128, 64, 16, 3]
Drums 64 - 51.3%
[64, 128, 16, 64, 256, 32, 3]
Drums 128 - 55.8%
[128, 64, 32, 3]
Multitrack features
Sander Shi, CMU
(original) [4, 5, 3] 46.4%
[4, 64, 32, 16, 128, 256, 3] 51.6%
Google Magenta models
Piano transcription
Multitrack progression
Trio generator
Outcomes
GitHub repository Interactive tutorial
Medium Blog post ProductHunt release
Portfolio demo
Magenta demos
Conference
poster/demo
Thanks
Contact:
@ctoxtli
Demo:
https://www.carlostoxtli.com/hum2song

Weitere ähnliche Inhalte

Mehr von Carlos Toxtli

ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Carlos Toxtli
 
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCarlos Toxtli
 
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - MegatendenciasCarlos Toxtli
 
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConCarlos Toxtli
 
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementCarlos Toxtli
 
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completaCarlos Toxtli
 
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaCarlos Toxtli
 
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Carlos Toxtli
 
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)Carlos Toxtli
 
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Carlos Toxtli
 
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsCarlos Toxtli
 
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompCarlos Toxtli
 
Computación cuántica y tecnologías del futuro - SISel
Computación cuántica y tecnologías del futuro - SISelComputación cuántica y tecnologías del futuro - SISel
Computación cuántica y tecnologías del futuro - SISelCarlos Toxtli
 
Programación del futuro, predicciones a 10 años siscti
Programación del futuro, predicciones a 10 años   sisctiProgramación del futuro, predicciones a 10 años   siscti
Programación del futuro, predicciones a 10 años sisctiCarlos Toxtli
 
Tecnología del futuro, predicciones a 10 años
Tecnología del futuro, predicciones a 10 añosTecnología del futuro, predicciones a 10 años
Tecnología del futuro, predicciones a 10 añosCarlos Toxtli
 
Programación del futuro, profecías a 10 años
Programación del futuro, profecías a 10 añosProgramación del futuro, profecías a 10 años
Programación del futuro, profecías a 10 añosCarlos Toxtli
 
CopyofResume-CarlosToxtli
CopyofResume-CarlosToxtliCopyofResume-CarlosToxtli
CopyofResume-CarlosToxtliCarlos Toxtli
 
El lado oscuro de la programación
El lado oscuro de la programaciónEl lado oscuro de la programación
El lado oscuro de la programaciónCarlos Toxtli
 

Mehr von Carlos Toxtli (20)

Bots for Crowds
Bots for CrowdsBots for Crowds
Bots for Crowds
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
 
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificial
 
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - Megatendencias
 
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsCon
 
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task Management
 
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completa
 
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en Latinoamérica
 
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
 
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)
 
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)
 
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startups
 
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiComp
 
Computación cuántica y tecnologías del futuro - SISel
Computación cuántica y tecnologías del futuro - SISelComputación cuántica y tecnologías del futuro - SISel
Computación cuántica y tecnologías del futuro - SISel
 
Programación del futuro, predicciones a 10 años siscti
Programación del futuro, predicciones a 10 años   sisctiProgramación del futuro, predicciones a 10 años   siscti
Programación del futuro, predicciones a 10 años siscti
 
Tecnología del futuro, predicciones a 10 años
Tecnología del futuro, predicciones a 10 añosTecnología del futuro, predicciones a 10 años
Tecnología del futuro, predicciones a 10 años
 
Programación del futuro, profecías a 10 años
Programación del futuro, profecías a 10 añosProgramación del futuro, profecías a 10 años
Programación del futuro, profecías a 10 años
 
CopyofResume-CarlosToxtli
CopyofResume-CarlosToxtliCopyofResume-CarlosToxtli
CopyofResume-CarlosToxtli
 
El lado oscuro de la programación
El lado oscuro de la programaciónEl lado oscuro de la programación
El lado oscuro de la programación
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Hum2 song multi-track polyphonic music generation from voice melody transcription with neural networks

  • 1. Multi-track Polyphonic Music Generation from Voice Melody Transcription with Neural Networks by Carlos Toxtli
  • 2. Content ● Summary ○ Brief explanation of the results ● Demo ○ Show how Hum2Song works ● Detailed explanation ○ Explain my journey building it.
  • 3. Hum2Song! is an AI-powered web application that is able to compose the musical accompaniment of a melody produced by a human voice. Summary - System description
  • 5. Problems predicting genre from the melody ● Genre is an ambiguous concept ● i.e. Pop music means "popular" regardless of the genre ● Much songs combine different genres. ● It is needed multitrack analysis for genre prediction ● The same melody can be used in different genres
  • 6. Results in literature for genre prediction from MIDI Cory McKay, Automatic Genre Classification of MIDI Recordings
  • 7. Proposed method - 55.8% After running 1,300 experiments (over 4 conditions), our best model of single track 1-D features experiment got 55.8% val_acc that overperformed previous work.
  • 8. Best case Layers: 128, 64, 32, 3 Input: 1D Vector 128 features from drums Output: 3 classes Activation functions: RELU & Softmax Optimizer: Rmsprop Loss function: Categorical Cross Entropy Val_acc: 55.8%
  • 9. Layers: [64, 128, 16, 64, 256, 32, 3] Input: 1D Vector 64 features from melody Output: 3 classes Activation functions: RELU & Softmax Optimizer: Rmsprop Loss function: Categorical Cross Entropy Val_acc: 48.6% Case implemented in the demo
  • 10. RMSprop It was devised by the legendary Geoffrey Hinton, while suggesting a random idea during a Coursera class. Consist in divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight.Gradient Descend Rmsprop
  • 13.
  • 14. My journey - Starting point ● I decided to do it from scratch without consulting previous work. ● I had no domain knowledge (music theory) ● My main area of research is Human Computer Interaction. ● I had no experience building Web-AI apps. ● I only had ~1 month ● My main goal was to learn by trying and to have something to show in my portfolio.
  • 15. My journey - Steps to follow ● Implement an https site that allows voice recording ● Implement my model and Google Magenta models ● Clean the noisy transcribed data ● Get the genre, a drum, a bass, a tonal scale, and chords progression from the melody. ● Create a song from progressions ● Adapt a web music editor ● Publish the website ● Promote online demo ● Learn how MIDI files are structured ● http://www.midiworld.com scraping (16k files) ● Decide the features to use ● Data preprocessing ● Stratified sampling ● Evaluate several NN architecture combinations (325 per condition). ● Fine tuning the best options ● Convert the best model to tensorflow.js
  • 16. Features ● The MIDI file format consists of time series, each note contains a pitch, a start time and an end time. ● In order to convert the notes to a feature vector is needed to define a sample rate. I defined 64 (4 seconds) and 128 samples (8 seconds). ● In order to get a pattern that represents the main melody, 2 string algorithms were applied (learned from String Algorithms class): ○ Longest Common Subsequence (LCS) ○ Longest Repeated Subsequence (LRS) ● Our 4 conditions were Melody 64 features, Melody 128 features, Drums 64 features, and Drums 128 features. ● For the melody conditions we adapted the pitches to the human voice range.
  • 17. Choosing Neural Network Architecture ● In order to decide which architecture to use, all the possible combinations of [16, 32, 64, 128, 256] were tested ● 100 epochs were trained per each combination ● The accuracy and confusion matrices were used to pick the best. ● 4 NVIDIA Tesla K80 GPUs were used from Google Colaboratory. ● Keras checkpoints were used to preserve the best models.
  • 18. Best resultsMelody 64 - 48.6% [64, 128, 256, 64, 16, 3] Melody 128 - 46.7% [128, 256, 128, 64, 16, 3] Drums 64 - 51.3% [64, 128, 16, 64, 256, 32, 3] Drums 128 - 55.8% [128, 64, 32, 3] Multitrack features Sander Shi, CMU (original) [4, 5, 3] 46.4% [4, 64, 32, 16, 128, 256, 3] 51.6%
  • 19. Google Magenta models Piano transcription Multitrack progression Trio generator
  • 20. Outcomes GitHub repository Interactive tutorial Medium Blog post ProductHunt release Portfolio demo Magenta demos Conference poster/demo