Gracenote’s engineers are teaching the company’s AI systems that Lady Gaga’s song “Lovegame” is a “sexy stomper.” SEAN RYAN/IPS/REX/SHUTTERSTOCK
Some of these include obvious categories like “sultry” and “sassy,” but there are also extremely specific descriptors like “dreamy sensual,” “gentle bittersweet,” and “desperate rabid energy.” New categories are constantly being added, while others are fine-tuned based on how well the system performs. “It’s sort of an iterative process,” explained Gracenote’s head of content architecture and discovery Peter DiMaria. “The taxonomy morphs and evolves.”
How Media Companies Are Using Artificial Intelligence to Connect With Consumers
In addition to this list of moods, Gracenote also uses a so-called training set for its machine learning efforts. The company’s music experts have picked and classified some 40,000 songs as examples for these categories. Compiling that training set is an art of its own. “We need to make sure that we give it examples of music that people are listening to,” said DiMaria. At the same time, songs have to be the best possible example for any given emotion. “Some tracks are a little ambiguous,” he said.
The current training set includes Lady Gaga’s “Lovegame” as an example for a “sexy stomper,” Radiohead’s “Pyramid Song” as “plaintive,” and Beyonce’s “Me Myself & I” as an example for “soft sensual & intimate.”
Just like the list of emotions itself, that training set needs to be kept fresh constantly. “Artists are creating new types of musical expressions all the time,” said DiMaria. “We need to make sure the system has heard those.” Especially quickly-evolving genres like electronica and hip-hop require frequent updates.
To a computer, compression can sound like a musical style
Once the system has been trained with these songs, it is being let loose on millions of tracks. But computers don’t simply listen to long playlists of songs, one by one. Instead, Gracenote’s system cuts up each track into 700-millisecond slices, and then extracts some 170 different acoustic values, like timbre, from any such slice.
In addition, it sometimes takes larger chunks of a song to analyze a song’s rhythm and similar features. Those values are then being compared against existing data to classify each song. The result isn’t just a single mood, but a mood profile.
Nielsen to Buy Tribune Media’s Gracenote Metadata Unit for $560 Million
All the while, Gracenote’s team has to periodically make sure that things don’t go wrong. “A musical mix is a pretty complex thing,” explained Cremer.
With instruments, vocals, and effects layered on top of each other and the result being optimized for car stereos or internet streaming, there is a lot to listen to for a computer — including things that aren’t actually part of the music.
“It can capture a lot of different things,” said Cremer. Unsupervised, Gracenote’s system could for example decide to pay attention to compression artifacts, and match them to moods, with Cremer joking that the system may decide: “It’s all 96 kbps, so this makes me sad.”
The world’s music, categorized by the world’s emotions
Once Gracenote has classified music by moods, it delivers that data to customers, which use it in a number of different ways. Smaller media services often license Gracenote’s music data as their end-to-end solution for organizing and recommending music. Media center app maker Plex for example uses the company’s music recommendation technology to offer its customers personalized playlists and something the company calls “mood radio.” Plex users can for example pick a mood like “gentle bittersweet,” press play, and then wait for Mazzy Star to do its thing.
Gracenote also delivers its data to some of the industry’s biggest music service operators, including Apple and Spotify. These big players typically don’t like to talk about how they use Gracenote’s data for their products. Bigger streaming services generally tend to operate their own music recommendation algorithms, but they often still make use of Gracenote’s mood data to train and improve those algorithms, or to help human curators pre-select songs that are then being turned into playlists.
This means that music fans may be acutely aware of Gracenote’s mood classification work, while others may have no idea that the company’s AI technology has helped to improve their music listening experience.
Either way, Gracenote has to make sure that its data translates internationally, especially as it licenses it into new markets. On Tuesday, the company announced that it will begin to sell its music data product, which among other things includes mood classification as well as descriptive, cleaned-up metadata for cataloging music, in Europe and Latin America. To make sure that nothing is lost in translation, the company employs international editors who not just translate a word like “sentimental,” but actually listen to example songs to figure out which expression works best in their cultural context.
And the international focus goes both ways. Gracenote is also constantly scouring the globe to feed its training set with new, international sounds. “Our data can work with every last recording on the planet,” said Cremer.
In the end, classifying all of the world’s music is really only possible if companies like Gracenote do not just rely on humans, but also on artificial intelligence and technologies like machine listening. And in many ways, teaching computers to detect sad songs can actually help humans to have a better and more fulfilling music experience — if only because relying on humans would have left many millions of songs unclassified, and thus out of reach for the personalized playlists of their favorite music services.
Using data and technology to unlock these songs from all over the world has been one of the most exciting parts of his job, said Cremer: “The reason I’m here is to make sure that everyone has access to all of that music.”