An Industrial-Strength Audio Search Algorithm
Avery Li-Chun Wang
We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.
- The main strength of this paper is just how well the algorithm works. It is very accurate while still being fast and requiring a relatively small sample to work with. Algorithms with such properties in any field are great discoveries.
- The ability for the algorithm to work in the presence of noise was particularly interesting. As long as some part of the audio exists in the spectrogram, extra information does not interfere with the matching. Most audio recorded from a mobile device would have some noise or compression so this feature is vital.
- Discussed how could the techniques used be applied to other domains such as image or video identification. There are many more dimensions interacting with each other than audio but finding a video from an image or .gif could be possible.
- Some of the terms and definitions used were not explained in a thorough manner. This paper had a noticeably different structure and feel from the more academic ones.