You are here
Speech Detection using Gammatone Features and One-Class Support Vector Machine
- Date Issued:
- 2013
- Abstract/Description:
- A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD's rely on time-domain features and simple thresholds for efficient speech detection however this doesn't say much about the signal being passed. More sophisticated methods employ machine learning algorithms, but train on specific noises intended for a target environment. Validating speech under a variety of unknown conditions must be possible; as well as differentiating between speech and non- speech data embedded within the packets. A real-time speech detection method is proposed that relies only on a clean speech model for detection. Through the use of Gammatone filter bank processing, the Cepstrum and several frequency domain features are used to train a One-Class Support Vector Machine which provides a clean-speech model irrespective of environmental noise. A Wiener filter is used to provide improved operation for harsh noise environments. Greater than 90% detection accuracy is achieved for clean speech with approximately 70% accuracy for SNR as low as 5dB.
Title: | Speech Detection using Gammatone Features and One-Class Support Vector Machine. |
19 views
9 downloads |
---|---|---|
Name(s): |
Cooper, Douglas, Author Mikhael, Wasfy, Committee Chair Wahid, Parveen, Committee Member Behal, Aman, Committee Member Richie, Samuel, Committee Member , Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2013 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD's rely on time-domain features and simple thresholds for efficient speech detection however this doesn't say much about the signal being passed. More sophisticated methods employ machine learning algorithms, but train on specific noises intended for a target environment. Validating speech under a variety of unknown conditions must be possible; as well as differentiating between speech and non- speech data embedded within the packets. A real-time speech detection method is proposed that relies only on a clean speech model for detection. Through the use of Gammatone filter bank processing, the Cepstrum and several frequency domain features are used to train a One-Class Support Vector Machine which provides a clean-speech model irrespective of environmental noise. A Wiener filter is used to provide improved operation for harsh noise environments. Greater than 90% detection accuracy is achieved for clean speech with approximately 70% accuracy for SNR as low as 5dB. | |
Identifier: | CFE0005091 (IID), ucf:50731 (fedora) | |
Note(s): |
2013-05-01 M.S.E.E. Engineering and Computer Science, Electrical Engr and Computing Masters This record was generated from author submitted information. |
|
Subject(s): | gammatone -- one-class -- SVM -- support vector machine -- speech detection -- voice activity detection | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0005091 | |
Restrictions on Access: | public 2013-11-15 | |
Host Institution: | UCF |