TAURUS System Requirement Specifications-Computer Sciences Applications-Project Report, Study Guides, Projects, Research of Applications of Computer Sciences

This report is for final year project to complete degree in Computer Science. It emphasis on Applications of Computer Sciences. It was supervised by Dr. Abhisri Yashwant at Bengal Engineering and Science University. Its main points are: System, Requirements, Specification, Development, Telephonic, Speaker, Recognition, Database

Typology: Study Guides, Projects, Research

2011/2012

Uploaded on 07/18/2012

padmini
padmini 🇮🇳

4.4

(207)

175 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
iii
Preface
The System Requirements Specification is one of the critical documents since it identifies
the aims and goals of the project. This is important to the success of the project since it
clearly outlines what the development team must achieve in order to classify the project
to be complete.
The project’s aim is to develop a telephonic speaker recognition system. The first step is
to create a speaker database containing digitized speech recordings of people of both
sexes which are to be identified. These speech recordings should contain both telephonic
as well as non telephonic data. The speaker verification will pursue text-independent
method in which, speaker models capture characteristics of somebody’s speech which
show up irrespective of what one is saying, and text dependent method in which, the
verification is based on what the speaker is saying. The text dependant and independent
methods will be applied on telephonic and non- telephonic speech data.
Later, comparisons would be made among the speaker recognition results acquired from
telephonic data, which will count the handset variability as well as channel distortion, and
the non telephonic speech. Speaker Recognition via lip motion may be carried out at later
stage.
Speaker Recognition makes it possible to use speech to verify identify and control access
to services such as computer-human interaction, voice dialing, banking by telephone,
voice activated data entry in medical or dark room, telephone shopping, voice mail,
security control for confidential information areas.
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download TAURUS System Requirement Specifications-Computer Sciences Applications-Project Report and more Study Guides, Projects, Research Applications of Computer Sciences in PDF only on Docsity!

iii

Preface

The System Requirements Specification is one of the critical documents since it identifies

the aims and goals of the project. This is important to the success of the project since it

clearly outlines what the development team must achieve in order to classify the project

to be complete.

The project’s aim is to develop a telephonic speaker recognition system. The first step is

to create a speaker database containing digitized speech recordings of people of both

sexes which are to be identified. These speech recordings should contain both telephonic

as well as non telephonic data. The speaker verification will pursue text-independent

method in which, speaker models capture characteristics of somebody’s speech which

show up irrespective of what one is saying, and text dependent method in which, the

verification is based on what the speaker is saying. The text dependant and independent

methods will be applied on telephonic and non- telephonic speech data.

Later, comparisons would be made among the speaker recognition results acquired from

telephonic data, which will count the handset variability as well as channel distortion, and

the non telephonic speech. Speaker Recognition via lip motion may be carried out at later

stage.

Speaker Recognition makes it possible to use speech to verify identify and control access

to services such as computer-human interaction, voice dialing, banking by telephone,

voice activated data entry in medical or dark room, telephone shopping, voice mail,

security control for confidential information areas.

iv

Table of Contents

vi

vii

List of Figures

   စစ @   A      စ  ဆဇ

Section 1

Introduction

This document introduces the reader to the software requirements specification of

Taurus: Real time Speaker Recognition System. The name of the project to be developed

is “Taurus: Real Time Telephonic Speaker Recognition system”.

Human speech conveys different types of information. The primary type is the word,

which speaker tries to pass to the listener. But the other types that are also included in the

speech are information about language being spoken, speaker emotions, gender and

identity of the speaker. The goal of automatic speaker recognition is to extract,

characterize and recognize information about speaker identity.

The objective of this project is based on detecting a speaker on the basis of distorted

voice due to telephone channel. Ranging from criminal detection to access restriction,

telenets are now being considered as basic medium for person identification.

Telephonic mode of access is used in banking by telephone, voice activated access in

control of data entry in medical or dark room, telephone shopping, voice mail, security

control, criminal detection and for confidential information areas.

1.1 Purpose

This document serves as a control tool for the progress of the project and also as a way of

verifying and testing the completed system against original requirements. Both the

functional and nonfunctional requirements of the system to be developed have been

mentioned in this document. The objective of this project is to develop an offline

automatic speaker recognition system for person identification. Speaker recognition can

be classified into two phases:

a. Speaker identification

It is a process of determining which speaker, if any, in a group of known speakers,

closely matches an unknown speaker. The identification may be closed set, where it is

assumed that the unknown is in the set of known speakers; or open set, where the

unknown speaker may or may not be in the set of known speakers.

b. Speaker verification

Speaker verification on the other hand, is the process of accepting or rejecting the identity

claim of a speaker.

The system is to create a speaker database containing digitized speech recordings of all

the people of both sexes that are to be identified. The database will include both

telephonic and non telephonic speech samples. The project will include speaker

verification phase, which will pursue text dependent and text independent methods. Text

dependant in which, the unknown speakers must speak the same prescribed text that was

used for training and text independent methods in which, speaker models capture

characteristics of somebody’s speech which show up irrespective of what one is saying. It

allows the user to read any text during both training and testing.

Text independent and Text independent methods of recognition will be carried out with

the sample data:

 The original voices recorded through computer

 The voice transmitted through telephonic channel.

This project is specific to verification only.

1.2 Project Generic Information

1.2.1 Project Team

Student Name

Ms. Sidra Malik

Project Supervisors

Mr. Fayyaz ul Amir Afsar Minhas

Dr. Muhammad Arif

a. Access Restriction

Access restriction is the area in which speaker recognition technology has had the

greatest impact. While access to secured areas can be restricted with the use of

keys, magnetic cards, and lock combinations, all three can be lost or stolen.

Telephonic Speaker recognition can provide an alternative or supplemental means

of entry.

b. Forensic

The use of telephonic speaker recognition in law enforcement is becoming

common place where evidence is in the form of voice recordings of the suspects.

Such cases might include bomb threats, ransom negotiations, undercover tape

recordings, wire taps, etc

c. Computer-human Interaction

There is an increasing need for computer human interaction in the world of

information, and in applications ranging from telephones to mobile devices and

robotics. Some new cellular phones include C&C speech recognition that allows

utterances such as “Call Homes”.

d. Industry

In the industries where the need for security is quite real, there real time

telephonic speaker recognition may result in great mean of access control.

e. Telephony

Some PBX/Voice mail systems allow callers to speak commands instead of

pressing buttons to send specific tones.

1.4 Definitions, Acronyms and Abbreviations

This section presents basic terms and definitions that might be useful for the reader in

gaining a better understanding of the rest of the document

1.4.1 Taurus

Name of the project to be developed

1.4.2 ASR

Automatic Speaker Recognition

1.4.3 SRS

Systems Requirement Specification

1.4.4 RTSI

Real Time Speaker Identification

1.4.5 PBX

A Private Branch exchange is a telephone exchange that serves a particular business or

office, as opposed to one that a common carrier or telephone company operates for many

businesses or for the general public.

1.4.6 Biometrics

Different human characteristics that all people have but in slightly different forms, i.e.

fingerprints, retina and voice etc.

1.4.7 Pattern Recognition

Pattern recognition is to classify objects of interest into one of a number of categories or

classes. The objects of interest are generically called patterns and in this case are

sequences of acoustic vectors that are extracted from an input speech. In the recognition

phase, features are extracted from the unknown speaker’s voice sample. Pattern matching

refers to an algorithm, or several algorithms, that compute a match score between the

unknown speaker’s feature vectors and the models stored in the database.

1.4.8 Feature Extraction

Feature extraction is the process that extracts a small amount of data from the voice

signal that can later be used to represent each speaker. Feature matching involves the

1.4.18 ANN

Artificial Neural Networks

1.4.19 NN

Nearest Neighbor

1.4.20 PNN

Probabilistic Neural Networks

1.5 References

[1] “IEEE Recommended Practice for Software Requirements Specificatio”, Institute

of Electrical and Electronics Engineers, (1998)

[2] D. A. Reynolds, “ An Overview of Automatic Speaker Recognition Technology” ,

IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP), (2002)

[3] A. E. Rosenberg, “ Automatic speaker verification: A review” Proc. IEEE, vol.

64, pp. 475-487, (1976)

[4] D. A. Reynolds, “ HTIMIT and LLHDB: speech corpora for the study of handset

transducer effects ”, IEEE International Conference on Acoustics, Speech and

Signal Processing ( ICASSP) , pp. 1535–1538, April (1997)

[5] J.M. Naik and G.R. Doddington, “ High Performance Speaker Verification Using

Principal Spectral Components” , IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP ), pp. 881-884, (1986)

[6] D. A. Reynolds and R. C. Rose, " Robust text-independent speaker identification

using Gaussian mixture speaker models ," IEEE Trans. Speech and Audio

Processing , vol. 3, no. 1, pp. 72-83, (1995)

[7] Jr. J. P. Campbell. “ Testing with the YOHO CD-ROM voice verification corpus”,

IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP 95) , volume 1, pages 341-344, (1995)

[8] L.C.W. Pols, “ Real-Time Recognition of Spoken Words ,” IEEE, vol. CZO, Sept.

[9] S. Furui, Cepstral analysis technique for automatic speaker verification , IEEE

Trans. Acoustic Speech Signal Process. pp. 254-272, 1981.

[10] F. Boda, J. de Veth & L. Boves , “ Channel normalization by using RASTA

filtering and the dynamic cepstrum for automatic speech recognition over the

phone” , ESCA Workshop on the Auditory Basis of Speech Perception, Keele ,

(July 1996)

[11] S. Furui. “Cepstral analysis technique for automatic speaker verification ” IEEE

Trans. on Acoustics, Speech and Signal Processing , vol 29, pp. 254-272, April

[12] P.K Tomi, “ Features Spectral Features for Automatic Text-Independent Speaker

Recognition” , Finland, December 21,( 2003)

http://cis.gsu.edu/~rbaskerv/cis8680/index.html

[13] S. Lerner, B. Mazor, “ Telephone channel normalization for automatic speech

recognition”, IEEE press, (1997)

[14] D. A. Reynolds, M. A. Zissman, T. F. Quatieri, and G. C. O’Leary “ The Effects

of Telephonic Transmission Degradations on Speaker Recognition

Performance”, IEEE press, (2000)

[15] E. Karpov, “ Real-Time Speaker Identification” , IEEE press, ( 2003)

[16] Rodman, R. D. “ Speaker recognition of disguised voices ,” Proceedings of the

Forensic Applications, Ankara, Turkey , (1998)

[17] Proakis, Discrete-Time Processing of Speech Signals , IEEE Press, 2000.

[18] Baldwin, J. R. and French, “ Forensic Phonetics” , London: Pinter Publishers, P.

[19] H. Gish and M. Schmidt, “ Text Independent Speaker Identification , IEEE Signal

Processing Magazine, Vol. 11, pp. 18-32, (1994)

[20] Reynolds, D., Quatieri, T., Dunn, “ Speaker verfication using adapted gaussian

mixture model”. Digital Signal Process. 10 (1), 19–41.Petrovska, (1998)

http://www.speech.kth.se/~melin/papers/rla2c_ply.ps

Section 2

Overall Description

2.1 Product Perspective

The software to be developed is a full-featured standalone application that is not be

integrated into any other system/software. Therefore this document does not serve as a

continuum to any system level requirements.

The GUI of the product will be windows based. The detailed Hardware and Software

Requirements are mentioned in later section. The general structure of the system in

coordination with external entities is shown in Figure 1.

2.2 Description

Identity verification based on a person’s voice is essential in many fields. In the past few

years, speaker verification for portal access control in benign environments (high quality

microphones, low background noise and clean speech) has become practical, cost-

effective and reliable.

Figure 1 General overview of speaker recognition

Database

TAURUS

(verification system)

Identification

R equest

2.2.1 Real Time Speaker Verification

By real-time speaker identification (RTSI) we mean here the process, which works at the

same time when the unknown person is speaking. More precise, RTSI system is a soft

real-time system with response time is set to the length of the input speech sample.

2.2.2 Types of Distortions in Speaker Recognition

Disguise can be mentioned along two independent dimensions:

  • Deliberate versus non deliberate
  • Electronic versus non electronic

Deliberate-electronic:

It is the use of electronic scrambling devices to alter the voice. This is often done by

radio stations to conceal the identity of a person being interviewed.

Non deliberate-electronic:

This includes, for example, all of the distortions and alterations introduced by voice

channel properties such as the bandwidth limitations of telephones, telephone systems,

and recording devices.

Deliberate-non electronic:

It includes use of falsetto, teeth clenching, etc.

Non deliberate-non electronic:

Alterations that result from some involuntary state of the individual such as illness, use of

alcohol or drugs (the effects are involuntary), or emotional feelings.

   Electronic scrambling, etc        

Table 1 Types of disguises

But speaker verification over telephone network presents the following challenges:

 Variations in handset microphones which result in severe mismatches between

speech data gathered from these microphones.

 Signal distortions due to the telephone channel.

2.3.1 Main Interface

Figure 2 Main interface of the system

The user will be able to perform the Enrollment of a speaker as well as the verification

for that particular Speaker.

2.3.2 Enrollment

Figure 3 Enrollment of a speaker

2.3.3 Verification

Figure 4 Verification of a speaker

2.3.4 Hardware Interfaces

The hardware interfacing required by the software is for reading and writing of voice

samples to different storage media. The speech samples for non telephonic data can

directly be attained from a microphone attached to computer whereas the samples taken

from handset will be recorded on handset and later transferred to computer via Bluetooth.

2.3.5 Software Interfaces

The software needs to interface to MATLAB 7 or higher.

2.4 Use Case Diagrams and their Description

2.4.1 Description of Actors

System Operator:

This is the person who shall operate our software. She is assumed to be an employee of

the organization in which Taurus is to be installed. There are no authentication

requirements for this person. Therefore the authentication of this user is not considered as

a part of the problem domain. Taurus Operator can carry out the following functionalities