SOUND SPECTROGRAPH AND VOICE PRINTS – A NEW PARADIGM IN THE CRIMINAL JUDICIAL ADMINISTRATION
The idea that someone could be identified by the sound of his voice had its origins in the work of Alexander Melville Bell (father to Alexander Graham Bell). Over one hundred years ago, he developed a visual representation of what the spoken word would look like. It was based on pronunciation and he showed that there were subtle differences among different people who said the same things. Then in 1941, the laboratories of Bell Telephone in
The forensic science of voice identification has come a long way from when it was first introduced in the American courts back in the mid 1960's. In the early days of this identification technique there was little research to support the theory that human voices are unique and could be used as a means for identification. There was also no standardization of how identification was reached, or even training or qualifications necessary to perform the analysis. Today voice identification analysis has matured into a sophisticated identification technique, using the latest technology science has to offer. The research, which is still continuing today, demonstrates the validity and reliability of the process when performed by a trained and certified examiner using established, standardized procedures. Voice identification experts are found all over the world. No longer limited to the visual comparison of a few words, the comparison of human voices now focuses on every aspect of the words spoken; the words themselves, the way the words flow together, and the pauses between them.
The sound spectrograph, an automatic sound wave analyzer, is a basic research instrument used in many laboratories for research studies of sound, music and speech. It has been widely used for the analysis and classification of human speech sounds and in the analysis and treatment of speech and hearing disorders.
The instrument produces a visual representation of a given set of sounds in the parameters of time, frequency and amplitude. The analog spectrograph is composed of four basic parts; (1) a magnetic tape recorder/playback unit, (2) a tape scanning device with a drum which carries the paper to be marked, (3) an electronic variable filter, and (4) an electronic stylus which transfers the analyzed information to the paper. The analog sound spectrograph samples energy levels in a small frequency range from a magnetic tape recording and marks those energy levels on electrically sensitive paper. This instrument then analyses the next small frequency range and samples and marks the energy levels at that point. This process is repeated until the entire desired frequency range is analyzed for that portion of the recording. The finished product is called a spectrogram and is a graphic depiction of the patterns, in the form of bars or formants, of the acoustical events during the time frame analyzed. Recent developments in sound spectrography have produced computerized digital sound spectrographs ranging from dedicated digital signal analysis workstations to PC-based systems for acquisition, analysis editing, and playback. These sophisticated computer-based systems provide high fidelity signal acquisition, high- speed digital processing circuitry for quick and flexible analysis, and CD-quality playback. The accuracy and reliability of the sound spectrograph, either analog or digital, has never been in question in any of the courts and never considered an issue in the admissibility of voice identification evidence.
THE METHOD OF VOICE IDENTIFICATION
The method by which a voice is identified is a multifaceted process requiring the use of both aural and visual senses. In the typical voice identification case the examiner is given several recordings; one or more recordings of the voice to be identified and one or more recorded voice samples of one or more suspects. It is from these recordings the examiner must make the determination about the identity of the unknown voice.
The first step is to evaluate the recording of the unknown voice, checking to make sure the recording has a sufficient amount of speech with which to work and that the quality of the recording is of sufficient clarity in the frequency range required for analysis.1 The volume of the recorded voice signal must be significantly higher than that of the environmental noise. The greater the number of obscuring events, such as noise, music, and other speakers, the longer the sample of speech must be. Some examiners report that they reject as many as sixty percent of the cases submitted to them with one of the main reasons for rejection being the poor quality of the recording of the unknown voice. Once the unknown voice sample has been determined to be suitable for analysis, the examiner then turns his attention to the voice samples of the suspects. Here also, the recordings must be of sufficient clarity to allow comparison, although at this stage, the recording process is usually so closely controlled that the quality of recording is not a problem.
The examiner can only work with speech samples which are the same as the text of the unknown recording. Under the best of circumstances the suspects will repeat, several times, the text of the recording of the unknown speaker and these words will be recorded in a similar manner to the recording of the unknown speaker. For example, if the recording of the unknown speaker was a bomb threat made to a recorded telephone line then each of the suspects would repeat the threat, word for word, to a recorded telephone line. This will provide the examiner with not only the same speech sounds for comparison but also with valuable information about the way each speech sound completes the transition to the next sound.
As in any other form of identification analysis, as the quality of the evidence with which the examiner has to work declines, the greater the amount of evidence and time necessary to complete the analysis, and the less likely the chance for a positive conclusion.
Once the evidence has been determined to be sufficient to perform the analysis, the examiner then begins the two step process of voice sample comparison; one aural (listening) and the other spectrographic (visual).
In the case of aural comparison of the voice samples the examiner compares both single speech sounds and series of speech sounds of the known and unknown samples. Once the examiner has located those portions to be used for the analysis, a more detailed aural comparison is undertaken. This comparison can be accomplished in many different ways. One of the most commonly used methods of aural comparison is re-recording a speech sound sample of the unknown followed immediately by a re-recording of the same speech sounds of the suspect. This is repeated several times so that the final product is a recording of specific speech sounds, in alternating order, by the unknown speaker followed by the suspect. Such comparisons have been greatly facilitated by the use of audio digital recording equipment which allows for the digital recording, storage, and repeated playback of only the desired speech sounds to be examined. During the aural comparison the examiner studies the psycholinguistic features of the speakers voice. There are a large number of qualities and traits which are examined from such general traits as accent and dialect to inflection, syllable grouping and breath patterns. The examiner also scrutinizes the samples for signs of speech pathologies and peculiar speech habits.
The second step in the voice identification process is the spectrographic analysis of the recorded samples. The sound spectrograph is an automatic sound wave analyzer with a high quality, fully functional tape recorder. The speech samples to be analyzed are recorded on the sound spectrograph. The recording is then analyzed in two and one half second segments. The product is a spectrogram, a graphic display of the recorded signal on the basis of time and frequency with a general indication of amplitude.
The spectrograms of the unknown speaker are then visually compared to the spectrograms of the suspects. Only those speech sounds which are the same are compared. The examiner looks not only for similarities but also for differences. The differences are closely examined to determine if they are due to pronunciation differences or if they are indicative of different speakers.
When the analysis is complete the examiner integrates his findings from both the aural and spectrographic analyses into one of five standard conclusions;
1. a positive identification 2. a probable identification 3. a positive elimination 4. a probable elimination or 5. no decision.
In order to arrive at a positive identification the examiner must find a minimum of twenty speech sounds which possess sufficient aural and spectrographic similarities. There can be no differences either aural or spectrographic for which there can be no accounting. The probable identification conclusion is reached when there are less then twenty similarities and no unexplained differences. The result of positive elimination is rendered when twenty differences between the samples are found that can not be based on any fact other than different voices having produced the samples. A probable elimination decision is usually reached when working with limited text or a recording of lower quality. The no decision conclusion is used when the quality of the recording is so poor that there is insufficient information with which to work or when there are too few common speech sounds suitable for comparison.
The contribution of
Voiceprint technology began to get notice for criminal investigations in the early 1960s when the New York City Police Department received numerous bomb threats by phone against major airlines. The FBI asked Bell Labs to help. Lawrence G. Kersta, one of their senior engineers was assigned the task of figuring out a method of identification that would stop the calls and bring the perpetrators to justice. He was a physicist, who had worked with the sound spectrograph in its early days. It took him more than two years and the analysis of over 50,000 voices, but he managed to offer a technique that he claimed tested at 99.65% accuracy.
Lawrence Kersta noted that each person's voice has a unique quality that can be mapped on a graph. One person's vocal chords, no matter how similar they might look, process sounds differently than someone else's. The size and shape of someone's vocal cavity, tongue, and nasal cavities contribute to this, as well as how that person coordinates lips, jaw, tongue, and soft palate to make speech. No combination of these things is like any other. That means that our voices are sufficiently unique to make personal identification based on voice sounds possible. Then in 1966, the Michigan State Police formed a Voice Identification Unit and hired Lawrence Kersta to train these officers.
California came to a similar holding when the issue first reached the appellate level in People v. King.15 The State brought in Lawrence Kersta as the voice identification expert to testify as to the reliability of the technique. The defense brought in seven speech scientists and engineers to rebut Kersta's claims. The court held that "Kersta's claims for the accuracy of the `voiceprint' process are founded on theories and conclusions which are not yet substantiated by accepted methods of scientific verification".
Admissibility of Evidence
Voiceprint technology came into the American courts in the 1960s, and judges were divided on whether or not to admit it as scientific evidence. The first court of published opinion to rule on the admissibility of voice identification analysis was in the case of United States v. Wright, 17 USCMA 183, 37 CMR 447 (1967). This was a court martial proceeding in which the appellate court affirmed the admission of spectrographic voice identification evidence by the board of review. The New Jersey Supreme Court was the first non-military court to make an appellate review, in State v. Cary. Courts in
In 1976 the New York Supreme Court pointed out, in the case of People v. Rogers, that fifty different trial courts had admitted spectrographic voice identification evidence, as had fourteen out of fifteen U. S. District Court judges, and only two out of thirty- seven states considering the issue had rejected admission. The
The Supreme Court of Pennsylvania rejected admission in Commonwealth v. Topa holding that the technician's opinion alone will not suffice to permit the introduction of scientific evidence into a court of law. This was the same situation, in fact the same single expert, which confronted the Kelly court.
In February of 1989, the United States Court of Appeals for the Seventh Circuit affirmed the decision of the United States District Court for the Northern District of Illinois admitting spectrographic voice identification evidence in the criminal case of
Clifford Irving case
Lawrence Kersta believed that an individual's voice does not change over his or her lifetime, other experts have disputed him on this point. If the body changes, so does the voice. Even where a person lives can effect voice changes, as well as illness, stress, aging, and other factors. Nevertheless, Kersta maintained that the essential qualities of the voice remain constant. He felt that he finally proved this in one of the most famous cases involving the spectrograph: that of the reclusive Howard Hughes.
In 1971, a man named Clifford Irving came to
A group of reporters familiar with him from his early days was assembled by NBC in
Arsonist jailed for murder yesterday has become the first person to be convicted on the evidence of a “voice identification parade”. Assad Khan’s voice was picked out by the murder victim’s tenant, who overheard him plotting to use petrol to set fire to the house. Although Raymond Sarong did not see the conversation taking place, he recognised Khan’s voice as he had seen him around the Hounslow area of
The voice identification team uses the lab for quantification of the voice signal in terms of pitch, loudness, quality, as well as for measurement of the acoustic parameters and breathing dynamics. (1) Sound Spectrograph: The sound spectrograph produces a "print" that shows fundamental frequency, harmonics, and intensity of the voice. This is used to identify subtle but crucial changes in voice quality that can then be treated. The spectrogram identifies the presence of noise in the voice, such as tremor, pitch breaks, and abrupt starts. (2) Aero-Dynamic Analysis: This technology measures laryngeal airflow, air pressure, and intensity during voice production. This allows for the analysis of breath use, laryngeal control, and vocal efficiency during phonation. Aerodynamic assessment reflects the physiology of vocal fold opening and closing. (3)
INDIAN SITUATION
The Indian Evidence Act, prior to its being amended by the Information Technology Act, 2000, mainly dealt with evidence, which was in oral or documentary form. Nothing was there to point out about the admissibility, nature and evidentiary value of a conversation or statement recorded in an electro-magnetic device. Being confronted with the question of this nature and called upon to decide the same, the law courts in
In
In
The All India Institute of Speech and Hearing,
Case Law- Whetehr tape recorded conversation a form of voice identification is admissible in evidence?
In Hopes v. H.M. Advocate, 1960 Scots Law Times 264, the court while dealing with the question of admissibility of tape recorded conversation observed as under:
New techniques and new devises are the order of the day. I can’t conceive, for example, of the evidence of a ship’s captain as to what he observed being turned down as inadmissible because he had used a telescope, any more than the evidence of what an ordinary person sees with his eyes becomes incompetent because he was wearing spectacles. Of course, comments and criticism can be made, and no doubt will be made, on the audibility or the intelligibility, or perhaps the interpretation, of the results of the use of a scientific method; but that is another matter and that is a matter and that is a matter of value, not of competency.
In Rex v. Maqsud, 1965(2) All ER,461 wherein the Court of Criminal Appeal observed that the time has come when this court should state its views of the law matter which is likely to be increasingly raised as time passes. For many years now photographs have been admissible in evidence on proof that they are relevant to the issues in involved in the case and that the print as seen represents situations that have been reproduced by means of mechanical and chemical devices. Evidence of things seen through telescopes or binoculars which otherwise could not be picked up by the naked eye have been admitted, and now there are devices for picking up, transmitting and recording conversations. In principle no difference can be made between a tape recording and a photograph. The court was of the view that it would wrong to deny to the law of evidence advantages to be gained by new techniques and devises.
In
In S. Pratap Singh v. State of Punjab, AIR 1964 SC 72 a five judges bench of Apex Court considered the issue and clearly propounded that tape recorded that tape recorded talks are admissible in evidence and simple fact that such type of evidence can be easily tampered which certainly could not be a ground to reject such evidence as inadmissible or refuse to consider it, because there are few documents and possibly no piece of evidence, which could not be tempered with.
The
a) The contemporaneous dialogue, which was tape recorded, formed part of res-gestae and is relevant and admissible under section 8 of the Indian Evidence Act. b) The contemporaneous tape record of a relevant conversation is a relevant fact and is admissible under section 7 of the Indian Evidence Act. c) Such a statement was not in fact a statement made to police during investigation and, therefore, cannot be held to be inadmissible under section 162 of the Criminal Procedure Code. d) Such a recorded conversation though procured without the knowledge of the accused but the same is not elicited by duress, coercion or compulsion nor extracted in an oppressive manner or by force or against the wishes of the accused. Therefore the protection of the article 20(3) was not available. e) One of the features of magnetic tape recording is the ability to erase and re-use the recording medium. Therefore, the evidence must be received with caution. The court must be satisfied beyond reasonable doubt that the record has not been tampered with.
In in Rakesh Bisht V. CBI Justice Badur Durrez Ahmed of High Court of Delhi at
The phenomenon of tendering tape recorded conversation before law courts as evidence, particularly in cases arising under the Prevention of Corruption Act, where such conversation is recorded by sending the complainant with a recording device to the person demanding or offering bribe has almost become a common practice now. In that cases the court has to face various questions regarding admissibility, nature and evidentiary value of such a tape- recorded conversation.
If in a particular case, there is a well grounded suspicion not even say proof, that the tape recording has been tampered with that would be a good ground for the court to discount wholly its evidentiary value as in Pratap Singh v. State of Punjab, AIR 1964 SC 72. in the case of Ram Singh v. Col. Ram Singh, AIR 1986 SC 3, following conditions were pointed out by the Apex Court for admissibility of tape recorded conversation:
1. The voice of the speaker must be duly identified by the maker of the record or by others who recognize his voice. Where the maker has denied the voice it will require very strict proof to determine whether or not it was really the voice of the speaker.
2. The accuracy of the tape recorded statement has to be proved by the maker of the record by satisfactory evidence direct or circumstantial.
3. Every possibility of tempering with or erasure of a part of a tape recorded statement must be ruled out otherwise it may render the said statement out of context and, therefore, inadmissible.
d) The statement must be relevant according to the rules of Evidence Act.
4. The recorded cassette must be carefully sealed and kept in safe or official custody.
5. The voice of the speaker should be clearly audible and not lost or distorted by other sounds or disturbance.
In Ziyauddin Burhanuddin Bukhari v. Brijmohan Ramdas Mehta, AIR 1975 SC 1788, the Apex Court considered the value and use of transcripts and expressed the view that transcript could be used to show what the transcriber has found recorded there at the time of transcription and the evidence of the makers of the transcripts is certainly corroborative because it goes to confirm what the tape record contained. The
Apex Court in Ziyauddin Burhanuddin Bukhari case clearly laid down that the tape recorded speeches were "documents as defined by section 3 of the Evidence Act", which stood on no different footing than photographs.
The concept of evidence stands totally reformed after the coming into force of the Information Technology Act, 2000 on 17.10.2000. Section 2(r) of this Act is relevant in this respect which defines information in electronic form as information generated, sent, received or stored in media, magnetic, optical, computer memory, micro film, computer generated micro fiche or similar device. Under section 2 (t)‘electronic record ’ means data, record or data generated, image or sound stored, received or sent in an electronic form or micro film or computer generated micro fiche. Section 92 of this Act read with Schedule (2) amends the definition of ‘evidence’ as contained in section 3 of the Indian Evidence Act. The amended definition runs as under:
“Evidence:- ‘Evidence’ means and includes-
(1) all statements which the court permits or requires to be made before it by witness, in relation to matters of fact under inquiry;
such statement is called oral evidence;
(2) all documents including electronic records produced for the inspection of the Court; such documents are called documentary evidence.
The present legal position is recognizes the information stored on magnetic or electronic device and treats it as documentary evidence within the meaning of section 3 of the Indian Evidence Act.
At this juncture 3 legal points are necessarily to be answered for a better understanding.
1. Whether an information/evidence stored on magnetic or electronic devise is primary or secondary?
2. Whether such evidence is direct or hearsay? 3. Whether such evidence is corroborative or substantive?
The point whether such evidence is primary and direct was dealt with by the
In N. Sri Rama Reddy (Supra) the
The technique of voice identification by means of aural and spectrographic comparison is still an unsettled topic in law. Although the spectrographic voice identification method has progressed greatly since it was first introduced to a court of law back in the mid 1960's, it still faces stiff resistance on the issue of admissibility in the courts today.
Adv. K.C. Suresh, B.A., LL.M (Crimes), PGDHR (Human Rights)
Legal Adviser (Rtd) Vigilance & Anti-Corruption bureau, State of
Join LAWyersClubIndia's network for daily News Updates, Judgment Summaries, Articles, Forum Threads, Online Law Courses, and MUCH MORE!!"
Tags :Criminal Law