IDEAS home Printed from https://ideas.repec.org/a/igg/jmdem0/v3y2012i3p1-19.html
   My bibliography  Save this article

On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

Author

Listed:
  • Robert Mertens

    (International Computer Science Institute, University of California, Berkeley, USA)

  • Po-Sen Huang

    (Beckman Institute, University of Illinois at Urbana-Champaign, USA)

  • Luke Gottlieb

    (International Computer Science Institute, University of California, Berkeley, USA)

  • Gerald Friedland

    (International Computer Science Institute, University of California, Berkeley, USA)

  • Ajay Divakaran

    (SRI International Sarnoff, USA)

  • Mark Hasegawa-Johnson

    (Beckman Institute, University of Illinois at Urbana-Champaign, USA)

Abstract

A video’s soundtrack is usually highly correlated to its content. Hence, audio-based techniques have recently emerged as a means for video concept detection complementary to visual analysis. Most state-of-the-art approaches rely on manual definition of predefined sound concepts such as “ngine sounds,” “utdoor/indoor sounds.” These approaches come with three major drawbacks: manual definitions do not scale as they are highly domain-dependent, manual definitions are highly subjective with respect to annotators and a large part of the audio content is omitted since the predefined concepts are usually found only in a fraction of the soundtrack. This paper explores how unsupervised audio segmentation systems like speaker diarization can be adapted to automatically identify low-level sound concepts similar to annotator defined concepts and how these concepts can be used for audio indexing. Speaker diarization systems are designed to answer the question “ho spoke when?”by finding segments in an audio stream that exhibit similar properties in feature space, i.e., sound similar. Using a diarization system, all the content of an audio file is analyzed and similar sounds are clustered. This article provides an in-depth analysis on the statistic properties of similar acoustic segments identified by the diarization system in a predefined document set and the theoretical fitness of this approach to discern one document class from another. It also discusses how diarization can be tuned in order to better reflect the acoustic properties of general sounds as opposed to speech and introduces a proof-of-concept system for multimedia event classification working with diarization-based indexing.

Suggested Citation

  • Robert Mertens & Po-Sen Huang & Luke Gottlieb & Gerald Friedland & Ajay Divakaran & Mark Hasegawa-Johnson, 2012. "On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks," International Journal of Multimedia Data Engineering and Management (IJMDEM), IGI Global, vol. 3(3), pages 1-19, July.
  • Handle: RePEc:igg:jmdem0:v:3:y:2012:i:3:p:1-19
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/jmdem.2012070101
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jmdem0:v:3:y:2012:i:3:p:1-19. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.