Alumni Profile: Paris Smaragdis Teaches Machines How to Hear

At MIT's Media Lab, former music synthesis major Paris Smaragdis '95 researches the science of sound.
September 17, 2008

Most Berklee alumni put sounds together. Adobe audio research engineer Paris Smaragdis '95 takes them apart.

Named one of the top young technology innovators of 2006 by Technology Review, the music synthesis grad creates "machines that can listen," pioneering new devices that promise to improve music recordings, safety, and even sports on TV.

Growing up in Greece, Smaragdis always felt drawn to electronic instruments. "At some point my dad gave me a Casio VO-1. I still have it. The fact that blew my mind was that you could make your own sounds on it," he says. At Berklee, especially taking classes with Richard Boulanger, he dove full-bore into electronic music. "I was glad I wasn't being made to conform," he says.

Boulanger says Smaragdis stood out in a big way. "He was writing software—cutting-edge audio software—even as an undergraduate. It was amazing." Specifically, his changes to the sound design programming language Csound "made it commercially viable," Boulanger says. In the mid-90s, "he was quite famous in the world of computer music."

Smaragdis helped shape the Music Synthesis Department overall by being one of the first students to express interest in audio programming. Boulanger thinks he was the first Berklee graduate to get into the MIT Media Lab—a cross-discipline group that expands ways for people to interact with technology.

Over time, Smaragdis branched into a particular area of research: subtracting sounds instead of adding or synthesizing them. You could add a flute track to an orchestra . . . but could you separate out the flute from a concert recording?

"It felt much more challenging. It seemed like a neat problem that no one was working on," he says. As a graduate student at the MIT Media Lab, he learned how to deconstruct a wall of sound.

The work centers on the science of sound timbre. Different sounds have uniquely different patterns. We recognize the patterns, and machines can learn to recognize them as well.

"A sound like a bass drum will have a lot of low frequencies and sound muted. In contrast, a cymbal will sound bright and rich," Smaragdis explains.

Smaragdis found he could teach a computer the difference between these sounds by feeding it some examples. To extract the singer from a vocal/piano recital recording, he played some piano sounds and taught a computer how to recognize it. The computer could then excise the piano patterns from the recording, leaving the singer a cappella.

Check out audio and video samples of Smaragdis's work.

The technique works with non-musical sounds and on small, portable devices, as well. Smaragdis squeezed a pig toy next to a PDA, which recorded the sound's properties. When he repeated the squeeze, the PDA identified and flashed the text "squeaky pig."

Essentially, Smaragdis put machines through a course in ear training. This meant his Berklee education gave him a big advantage over the typical engineer. In general, "people don't really know how we listen," he says. "How do you tell the meowing of a cat from a screeching tire?"

Thanks to Berklee, "I have this intuition about it . . . [and] I can really map my experience into mathematics."

Once a computer can "hear," all sorts of frontiers open.

Think golf's too slow? In Japan, Mitsubishi is using Smaragdis's method to speed up watching sports on TV. After the TV records a game on a digital video recorder, the software finds the moments where the crowd roars. Instead of spending hours in front of the tube, fans can cut right to the big, exciting moments. (The advertising targets disgruntled "sports widows.")

The technology has serious applications as well. Take security surveillance. Instead of fast-forwarding through hours of video to find the moment a crime occurred, a police officer could skip to the minute when somebody screamed.

Smaragdis analyzed an intersection in Louisville, Kentucky that had more than its fair share of traffic accidents. Video analysis had a hard time identifying crashes in the tape because "the picture looks completely different" depending on the color of the car, the weather, and so on, he points out. But the sounds of a car accident are fairly consistent: the screech of tires, the smash.

Certainly the scientist has no shortage of ideas. He's "constantly refining" the techniques. "It's kind of like my homework at Berklee—you never know where to stop."

He also still (literally) plays with technology: creating "little pieces" of music on the synthesizer or computer.

Smaragdis envisions putting sensors along factory lines to hear problems as they occur; elevators that raise an alarm when the machinery starts to squeak (already happening in Japan); smart slideshow programs that would automatically match a soundtrack to your pictures; and who knows what else.

"I have no idea what's coming up next," Smaragdis says. "If I knew what was next, it wouldn't be research. I think the goal is to have fun."

Or as Boulanger says, "He's brilliant, he's fast, he's hard-working, he's humble, and he's changing the world."