Keynote Speaker
I finished my PhD in 2000; a lot has happened over the ensuing ~25 years in the field of music and computation. It seems like an appropriate moment to look back at where we were, how far we’ve come, and where we’re going next. I will discuss early experiments in RNN-generated music, the open-source Magenta project, the rise of LLM and diffusion models for music generation, and more recent work we’ve done at Google DeepMind in text, image, video and music generation. I’ll also address the question of how AI might help us better understand music and maybe even give rise to new forms of musical expression.
Doug is a Senior Research Director at Google, and leads research efforts at Google DeepMind in Generative Media, including image, video, 3D, music and audio generation. His own research lies at the intersection of machine learning and human-computer interaction (HCI). In 2015, Doug created Magenta, an ongoing research project exploring the role of AI in art and music creation. Before joining Google in 2010, Doug did research in music perception, aspects of music performance, machine learning for large audio datasets and music recommendation. He completed his PhD in Computer Science and Cognitive Science at Indiana University in 2000 and went on to a postdoctoral fellowship with Juergen Schmidhuber at IDSIA in Lugano Switzerland. From 2003-2010, Doug was faculty in Computer Science in the University of Montreal machine learning group (now MILA machine learning lab), where he became Associate Professor.
Poster Sessions
Poster Session - 5
Rachel Bittner
In-person presentations:
- ST-ITO: Controlling audio effects for style transfer with inference-time optimization - Christian J. Steinmetz (Queen Mary University of London)*, Shubhr singh (Queen Mary University of London), Marco Comunita (Queen Mary University of London), Ilias Ibnyahya (Queen Mary University of London), Shanxin Yuan (Queen Mary University of London), Emmanouil Benetos (Queen Mary University of London), Joshua D. Reiss (Queen Mary University of London)
- ComposerX: Multi-Agent Music Generation with LLMs - Qixin Deng (University of Rochester), Qikai Yang (University of Illinois at Urbana-Champaign), Ruibin Yuan (CMU)*, Yipeng Huang (Multimodal Art Projection Research Community), Yi Wang (CMU), Xubo Liu (University of Surrey), Zeyue Tian (Hong Kong University of Science and Technology), Jiahao Pan (The Hong Kong University of Science and Technology), Ge Zhang (University of Michigan), Hanfeng Lin (Multimodal Art Projection Research Community), Yizhi Li (The University of Sheffield), Yinghao MA (Queen Mary University of London), Jie Fu (HKUST), Chenghua Lin (University of Manchester), Emmanouil Benetos (Queen Mary University of London), Wenwu Wang (University of Surrey), Guangyu Xia (NYU Shanghai), Wei Xue (The Hong Kong University of Science and Technology), Yike Guo (Hong Kong University of Science and Technology)
- Do Music Generation Models Encode Music Theory? - Megan Wei (Brown University)*, Michael Freeman (Brown University), Chris Donahue (Carnegie Mellon University), Chen Sun (Brown University)
- PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese - Silas Antonisen (University of Granada)*, Iván López-Espejo (University of Granada)
- Sanidha: A Studio Quality Multi-Modal Dataset for Carnatic Music - Venkatakrishnan Vaidyanathapuram Krishnan (Georgia Institute of Technology)*, Noel Alben (Georgia Institute Of Technology), Anish Nair (Georgia Institute of Technology), Nathaniel Condit-Schultz (Georgia Institute of Technology)
- Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music - Pedro Pereira Sarmento (Centre for Digital Music), Jackson J Loth (Queen Mary University of London)*, Mathieu Barthet (Queen Mary University of London)
- Combining audio control and style transfer using latent diffusion - Nils Demerlé (IRCAM)*, Philippe Esling (IRCAM), Guillaume Doras (Ircam), David Genova (Ircam)
- Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox Tewahedo Church Chants - Mequanent Argaw Muluneh (Academia Sinica, National Chengchi University, Debre Markos University)*, Yan-Tsung Peng (National Chengchi University), Li Su (Academia Sinica)
- Wagner Ring Dataset: A Complex Opera Scenario for Music Processing and Computational Musicology - Christof Weiß, Vlora Arifi-Müller*, Michael Krause, Frank Zalkow, Stephanie Klauk, Rainer Kleinertz, and Meinard Müller
- Lyrics Transcription for Humans: A Readability-Aware Benchmark - Ondřej Cífka (AudioShake)*, Hendrik Schreiber (AudioShake), Luke Miner (AudioShake), Fabian-Robert Stöter (AudioShake)
- Content-based Controls for Music Large-scale Language Modeling - Liwei Lin (New York University Shanghai)*, Gus Xia (New York University Shanghai), Junyan Jiang (New York University Shanghai), Yixiao Zhang (Queen Mary University of London)
- Exploring the inner mechanisms of large generative music models - Marcel A Vélez Vásquez (University of Amsterdam)*, Charlotte Pouw (University of Amsterdam), John Ashley Burgoyne (University of Amsterdam), Willem Zuidema (ILLC, UvA)
- Quantitative Analysis of Melodic Similarity in Music Copyright Infringement Cases - Saebyul Park (KAIST)*, Halla Kim (KAIST), Jiye Jung (Heinrich Heine University Düsseldorf), Juyong Park (KAIST), Jeounghoon Kim (KAIST), Juhan Nam (KAIST)
- Robust lossy audio compression identification - Hendrik Vincent Koops (Universal Music Group)*, Gianluca Micchi (Universal Music Group), Elio Quinton (Universal Music Group)
- RNBert: Fine-Tuning a Masked Language Model for Roman Numeral Analysis - Malcolm Sailor (Yale University)*
- Automatic Note-Level Score-to-Performance Alignments in the ASAP Dataset - Silvan David Peter*, Carlos Eduardo Cancino-Chacón, Francesco Foscarin, Andrew Philip McLeod, Florian Henkel, Emmanouil Karystinaios, Gerhard Widmer
Remote presentations:
- On the validity of employing ChatGPT for distant reading of music similarity - Arthur Flexer (Johannes Kepler University Linz)*
- A Critical Survey of Research in Music Genre Recognition - Owen Green (Max Planck Institute for Empirical Aesthetics)*, Bob L. T. Sturm (KTH Royal Institute of Technology), Georgina Born (University College London), Melanie Wald-Fuhrmann (Max Planck Institute for Empirical Aesthetics)
Poster Session - 6
Magdalena Fuentes
In-person presentations:
- MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models - Benno Weck (Music Technology Group, Universitat Pompeu Fabra (UPF))*, Ilaria Manco (Queen Mary University of London), Emmanouil Benetos (Queen Mary University of London), Elio Quinton (Universal Music Group), George Fazekas (QMUL), Dmitry Bogdanov (Universitat Pompeu Fabra)
- Human Pose Estimation for Expressive Movement Descriptors in Vocal Musical Performance - Sujoy Roychowdhury (Indian Institute of Technology Bombay)*, Preeti Rao (Indian Institute of Technology Bombay), Sharat Chandran (IIT Bombay)
- Enhancing predictive models of music familiarity with EEG: Insights from fans and non-fans of K-pop group NCT127 - Seokbeom Park (KAIST), Hyunjae Kim (KAIST), Kyung Myun Lee (KAIST)*
- Mosaikbox: Improving Fully Automatic DJ Mixing Through Rule-based Stem Modification And Precise Beat-Grid Estimation - Robert Sowula (TU Wien)*, Peter Knees (TU Wien)
- MidiCaps: A Large-scale MIDI Dataset with Text Captions - Jan Melechovsky (Singapore University of Technology and Design), Abhinaba Roy (SUTD)*, Dorien Herremans (Singapore University of Technology and Design)
- A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis - Stephen Hahn (Duke)*, Weihan Xu (duke), Zirui Yin (Duke University), Rico Zhu (Duke University), Simon Mak (Duke University), Yue Jiang (Duke University), Cynthia Rudin (Duke)
- DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation - Zachary Novack (UC San Diego)*, Julian McAuley (UCSD), Taylor Berg-Kirkpatrick (UCSD), Nicholas J. Bryan (Adobe Research)
- The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing - Christopher J. Tralie (Ursinus College)*, Ben Cantil (DataMind Audio)
- DEEP RECOMBINANT TRANSFORMER: ENHANCING LOOP COMPATIBILITY IN DIGITAL MUSIC PRODUCTION - Muhammad Taimoor Haseeb (Mohamed bin Zayed University of Artificial Intelligence)*, Ahmad Hammoudeh (Mohamed bin Zayed University of Artificial Intelligence), Gus Xia (Mohamed bin Zayed University of Artificial Intelligence)
- Repertoire-Specific Vocal Pitch Data Generation for Improved Melodic Analysis of Carnatic Music - Genís Plaja-Roglans*, Thomas Nuttall, Lara Pearson, Xavier Serra, Marius Miron
- I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition - Yannis Vasilakis (Queen Mary University of London)*, Rachel Bittner (Spotify), Johan Pauwels (Queen Mary University of London)
- Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection - Weixing Wei (Kyoto University)*, Jiahao Zhao (Kyoto University), Yulun Wu (Fudan University), Kazuyoshi Yoshii (Kyoto University)
- Towards Universal Optical Music Recognition: A Case Study on Notation Types - Juan Carlos Martinez-Sevilla (University of Alicante)*, David Rizo (University of Alicante. Instituto Superior de Enseñanzas Artísrticas de la Comunidad Valenciana), Jorge Calvo-Zaragoza (University of Alicante)
- Controlling Surprisal in Music Generation via Information Content Curve Matching - Mathias Rose Bjare (Johannes Kepler University Linz)*, Stefan Lattner (Sony Computer Science Laboratories, Paris), Gerhard Widmer (Johannes Kepler University)
- Toward a More Complete OMR Solution - Guang Yang (University of Washington)*, Muru Zhang (University of Washington), Lin Qiu (University of Washington), Yanming Wan (University of Washington), Noah A Smith (University of Washington and Allen Institute for AI)
- Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning - Ilaria Manco (Queen Mary University of London)*, Justin Salamon (Adobe), Oriol Nieto (Adobe)
- Music Discovery Dialogue Generation Using Human Intent Analysis and Large Language Model - Seungheon Doh (KAIST)*, Keunwoo Choi (Genentech), Daeyong Kwon (KAIST), Taesoo Kim (KAIST), Juhan Nam (KAIST)
- STONE: Self-supervised tonality estimator - Yuexuan KONG (Deezer)*, Vincent Lostanlen (LS2N, CNRS), Gabriel Meseguer Brocal (Deezer), Stella Wong (Columbia University), Mathieu Lagrange (LS2N), Romain Hennequin (Deezer Research)
- Beat this! Accurate beat tracking without DBN postprocessing - Francesco Foscarin (Johannes Kepler University Linz)*, Jan Schlüter (JKU Linz), Gerhard Widmer (Johannes Kepler University)
- The Sound Demixing Challenge 2023 – Music Demixing Track - Giorgio Fabbro*, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji
Industry Session
Industry Session II
Brandi Frisbie, Minz Won
This session will be the sponsor presentations.
- Adobe
- Splice
- Bytedance
- Deezer
- UMG
ISMIR 2024 Meetup with Industry Panel: Bridging Technology and Musical Creativity
Brandi Frisbie, Minz Won
Join us for a Meetup with Industry Panel for the 25th International Society for Music Information Retrieval (ISMIR) Conference!
The panel aims to explore the intersection of technology and musical creativity, discussing current trends, challenges, and future possibilities in this rapidly evolving field. This year we are excited to be able to offer a set number of tickets to the public!
We’re excited to welcome our very special guests:
Bio
Jessica Powell CEO and co-founder of AudioShake
Jessica Powell is the CEO and co-founder of AudioShake, a sound-splitting AI technology that makes audio more interactive, accessible, and useful. Named one of TIME’s Best Inventions, AudioShake is used widely across the entertainment industry to help give content owners greater opportunities and control over their audio.
Powell spent over a decade at Google, where she sat on the company's management team, reporting into the CEO. She began her career at CISAC, the International Society of Authors and Composers in Paris. She is also an award-winning novelist and former New York Times contributing opinion writer, and her short stories and essays have been published in the New York Times, TIME, WIRED, and elsewhere.
Stephen White Chief Product Officer at EMPIRE
Stephen is the Chief Product Officer at EMPIRE. In this role White leads all technology and product development for the company.
White has held numerous roles across the music industry for over 25 years. Prior to joining EMPIRE White was the Chief Executive Officer at StageIt Corp. White joined StageIt in May of 2020 and lead the company to its successful sale to VNUE.
Prior to joining StageIt White was the Chief Executive and Chairman of the Board for Dubset Media Holdings. White joined Dubset as CEO in February 2015 and lead the company through a repositioning and rebranding, ultimately selling the business to PEX in February of 2020.
White served as chief executive and President at Gracenote from 2012-2014. White held numerous roles for the company across 14 years and played a critical role in growing the company from a small start-up, focusing on music technologies and information, into a digital entertainment leader that touches millions of music and movie fans around the globe. In his role as president, White oversaw all company strategy and operations, and was responsible for growing Gracenote’s core business.
Before joining Gracenote, White was the vice president of product for streaming music start-up Echo.com, which was one of the first companies to combine group content streaming and community features. Prior to Echo.com, White was a senior director and executive producer for CKS, a media consultancy based in Silicon Valley where he led teams in the creation of web properties such as the Apple online store and GM.com. He began his career as a reporter and writer.
Douglas McCausland TAC Studio Manager and Faculty Lecturer at San Francisco Conservatory of Music and Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University
Douglas McCausland is a composer / performer, sound designer, and digital artist whose visceral and often chaotic works explore the extremes of sound, technology, and the digital medium.
As an artist, he researches and leverages the intersections of numerous technologies and creative practices, such as real-time electronic music performance with purpose-built interfaces, spatial audio, interactive systems, intermedia art, musical applications of machine-learning, experimental sound design, and hardware-hacking.
Described as “Tremendously powerful, dark, and sometimes terrifying...” (SEAMUS) and “Ruthlessly visceral...” (The Wire), his works have been performed internationally at numerous festivals, including: Sonorities, SEAMUS, the San Francisco Tape Music Festival, MISE-EN Music Festival, Klingt Gut!, Sounds Like THIS!, NYCEMF, Sonicscape, and Ars Electronica. Recent honors include an award of distinction in the 2021 Prix Ars Electronica for his piece “Convergence”, winning 1st-Prize in the 2021 ASCAP/SEAMUS commission competition, and the gold-prize award for “contemporary computer music” in the Verband Deutscher Tonmeister Student 3D Audio Production Competition.
Douglas is currently the Technology and Applied Composition (TAC) Studio Manager at the San Francisco Conservatory of Music. He holds a DMA in music composition from Stanford University, where he studied with Chris Chafe, Patricia Alessandrini, Jaroslaw Kapuscinski, Fernando Lopez-Lezcano, and Mark Applebaum.
Tony Brooke Independent Consultant for Music Data Companies https://tonybrooke.com/
Tony Brooke is an independent consultant, helping various music companies deliver data-driven products. He was most recently Senior Director of Product Systems at Warner Music Group, leading the team that develops the label copy systems for the major label. Before that he was Senior Product Manager at Pandora, responsible for the systems that ingest and store all the metadata, audio, and images from labels, distributors, and third-party sources. He led Pandora's project to add full credits in 2019.
Tony also fixes data problems throughout the digital music value chain, and helps creators improve their data. He has been on the Board of Directors at DDEX, Co-Chair of the DDEX ERN Working Group, on the Board of the San Francisco chapter of the Recording Academy, and Chair of the chapter's Producers and Engineers Wing. He presents often at industry events and completed significant research into audiovisual data and media asset management as part of his Master's degree in Library and Information Science.
He has also been an audio engineer specializing in remote multitrack recording since 1992, with over 100 releases in his discography (including two GRAMMY-nominated albums) and over 500 clients. SilentWay.com is his audio info hub with thousands of articles, links, tips, and equipment guides. Tony has worked in many stages of music creation and broadcasting as a producer, engineer, singer, FM program director, and DJ. He has lived in the San Francisco Bay Area since 1991.
Heidi Trefethen Adjunct Professor, SF Conservatory TAC (Technology and Applied Composition) Program FOH/Monitor Engineer, SFJAZZ and Freight & Salvage Co-Chair, Producer's & Engineers Wing, SF Chapter and P&E Advisory Group Committee Member
Heidi Trefethen is a distinguished live and recording engineer, producer, composer, and adjunct professor at the San Francisco Conservatory. As a professional French hornist, she performs across classical, jazz and cross-over genres. Renowned for her technical and creative expertise, she brings a unique, multifaceted perspective to music production, performance and education. As chair and co-chair of the P&E Wing of the Recording Academy's San Francisco chapter and the Women in the Mix P&E group for several years, she has championed the rights of engineers on the P&E Advisory Committee, advocating for fair treatment and equitable opportunities in the absence of a formal union. You can hear her work several times a week mixing live performances at SFJAZZ and Freight and Salvage, two of SF Bay Area's premier music venues.
Her dual career as an educator and music creator deepens her understanding of the evolving role of AI in music, especially in how it can support and elevate the work of engineers, educators, and musicians. Her contributions have a significant impact both at the local level and throughout the Recording Academy at-large, shaping policy and creating pathways for the next generation of audio professionals.
Anna Huang Assistant Professor of Music, Assistant Professor of Electrical Engineering and Computer Science at MIT
In Fall 2024, Anna started a faculty position at Massachusetts Institute of Technology (MIT), with a shared position between Electrical Engineering and Computer Science (EECS) and Music and Theater Arts (MTA). For the past 8 years, she has been a researcher at Magenta in Google Brain and then Google DeepMind, working on generative models and interfaces to support human-AI partnerships in music making.
Anna is the creator of the ML model Coconet that powered Google’s first AI Doodle, the Bach Doodle. In two days, Coconet harmonized 55 million melodies from users around the world. In 2018, she created Music Transformer, a breakthrough in generating music with long-term structure, and the first successful adaptation of the transformer architecture to music. Their ICLR paper is currently the most cited paper in music generation.
Anna was a Canada CIFAR AI Chair at Mila, and continues to hold an adjunct professorship at University of Montreal. She was a judge then organizer for AI Song Contest 2020-22. Anna completed her PhD at Harvard University, master’s at the MIT Media Lab, and a dual bachelor’s at University of Southern California in music composition and CS.
ISMIR 2024 Meetup with Industry Panel: Bridging Technology and Musical Creativity
Brandi Frisbie and Minz Won
Q&A and Networking portion of the Meetup with Industry.
Events
Online Q&A w/ volunteers
Having issues with Zoom or Slack? Need help navigating the conference program or materials? Virtual volunteers will be available to meet with you and answer any questions you may have!
Online Q&A w/ volunteers
Having issues with Zoom or Slack? Need help navigating the conference program or materials? Virtual volunteers will be available to meet with you and answer any questions you may have!
Mindfulness session
soundBrilliance
soundBrilliance is an innovative digital health company using enhanced music, psychology, and measurement techniques to create tools and exercises which empower people to better self-manage fundamental health – emotional balance, fitness, quality sleep and pain control. The experiences presented in the ISMIR 2024 Mindfulness sessions are designed to help guide you into a deeper sense of Calm. All visuals are naturally produced and captured, with no AI intervention.
Creative Practice
Creative Practice Session II
Cynthia Liem, Tomàs Peire
Music information technology has the potential to transform creative and artistic practice. Many technologists working in music information retrieval also at least are music lovers (if not skilled players), and as such have strong commitment to having their tools and technologies being useful in practice. At the same time, are these technologists indeed sufficiently aligning to musical and creative practice? Are the needs and interests of relevant real-life music stakeholders (players, composers, producers, other types of practitioners) who never heard about ‘music information retrieval’ sufficiently identified and recognized in technological research and development?
As Creative Practice chairs, considering ISMIR 2024’s special focus on ‘Bridging Technology and Musical Creativity’, we want to stimulate more awareness of (and joint learning on) these questions. In order to do this, we wish to facilitate dialogues and collaborations on this topic between technologists and creatives. While several community members contributed ideas on which you can respond to collaborate (https://bit.ly/ismir24-creative-practice-ideas), at ISMIR 2024, we also will host two panels featuring invited guests who all are active on the bridges between technology and creative practice.
In today’s panel, we will host and have a conversation with:
- Eyal Amir - musician, software developer and musical instruments creator. Co-founder and CTO of the audio plugins company Modalics.
- Ben Cantil aka Encanti - electronic music producer, software designer, educator, and scholar
- Seth Forsgren - Amateur Musician, CEO at Riffusion
- Spencer Salazar - Principal Engineer (formerly CTO) at Output