Miquel Espi

Email: mespimarques@apple.com or miquel@longnap.com

Audio Engineer at Apple Inc. in Cupertino, California

More stuff on: LinkedIn | Google Scholar | Facebook | longnap

About me

I am an audio engineer in signal processing and machine learning at the Apple in Cupertino, CA. I received the B.Eng. degree from Universidad Politecnica de Valencia, Spain, in 2006, M.Eng. from Kagoshima University, Japan, in 2010, and Ph.D. from the University of Tokyo, Japan, in 2013.

I have been contributing to the fields of acoustic signal processing and machine learning throughout my Ph.D., as a research associate at NTT CS Laboratories, Starkey Hearing Technologies, and now as a research engineer at Apple.

Please have a look at sections research topics, publications, background, or even code (not related to my research), for further details.


Github (miquelespi)

Research topics

Deep learning

"I think the success of deep learning gives a lot of credibility to the idea that we learn multiple layers of distributed representations using stochastic gradient descent. However, I think we are probably a long way from understanding how the brain does this [...] The brain does complex tasks like object recognition and sentence understanding with surprisingly little serial depth to the computation. So artificial neural nets should do the same.

- Hinton's AMA (Ask Me Anything), G. Hinton, Reddit, 2014

Conversation scene analysis

"Conversation scene analysis aims to provide the automatic description of conversation scenes from the multimodal nonverbal behaviors of participants, which are captured with cameras and microphones"

- K. Otsuka, Signal Processing Magazine, IEEE, 2011

Acoustic event detection (characterisation, detection, and classification)

"Acoustic Event Detection/Classification (AED/C) is a recent sub-area of computational auditory scene analysis that deals with processing acoustic signals and converting them into symbolic descriptions corresponding to a listener's perception of the different sound events that are present in the signals and their sources. While acoustic event classification deals with events that have already been isolated from its temporal context, acoustic event detection refers to both identification and localization in time of events in continuous audio streams."

- A. Temko, IV Jornadas en Tecnología del Habla, 2006

Acoustic scene classification

"Acoustic scene classification aims to characterize the environment of an audio stream by providing a semantic label. It can be conceived of as a standard classification task in machine learning: given a relatively short clip of audio, the task is to select the most appropriate of a set of scene labels."

- D. Giannoulis, D-CASE Challenge, 2013

Publications and Awards