Voice identification: Australia
31 There is no case directly on point on the question of whether a jury can be permitted to engage in an exercise of comparing an accused's voice, speaking in English, with voices speaking in a foreign language to determine whether the accused is one of the speakers. There are somewhat analogous cases from which relevant principles can be drawn, most of which, the transcript reveals, were debated before the trial judge.
32 The exercise in which the jury was permitted to engage is variously described in Australia as "voice identification" or "voice comparison". In truth it is a hybrid exercise: the jury is being asked to undertake the comparative exercise to determine whether the accused's voice can be identified in some potentially incriminating evidence. What Australian courts have described as "voice identification", being evidence that a witness familiar with a speaker's voice and/or relying on its distinctive qualities as heard out of court identified the accused as the speaker, is referred to in other jurisdictions as "voice recognition", "voice identification" being used to describe identification by persons previously unfamiliar with the accused's voice: Omerod, "Sounds Familiar? - Voice Identification Evidence", [2001] Criminal Law Review 595 at 596.
33 Bulejcik v R [1995] HCA 54; (1996) 185 CLR 375 dealt with voice identification. In that case the Crown tendered a tape recording made through a transmitting device fitted to a police officer who had participated in a conversation allegedly with the accused and a police informer. The police officer gave evidence that the accused had taken part in the conversation and that his voice was one of those on a tape recording of the conversation. The recording was tendered and an edited copy made available to the jury (exibit D). The accused, who was described to the High Court as "a new Australian Yugoslav" (see 380), gave an unsworn statement which took about forty minutes to deliver. The extent of the accused's unfamiliarity with the English language does not clearly appear from the judgment although Brennan CJ described (at 383) his voice as "accented". The jury asked for the accused's unsworn statement to be replayed to them at the conclusion of the summing up.
34 On appeal the appellant argued (inter alia) the jury should not have been permitted to use his voice while making his unsworn statement to determine whether he was one of the speakers on Ex D, and that the trial judge's directions as to the dangers of this exercise were inadequate.
35 Toohey and Gaudron JJ (at 393-394) discussed R v Smith [1984] 1 NSWLR 462, which they described as the leading New South Wales authority on voice identification evidence, and the approach taken to the issue by other State courts. They then said (at 394-395):
" Voice comparison
The significance of these decisions lies in the use the jury were permitted to make of the accused's voice as heard by them in order to accept or reject the evidence of a witness that a voice heard at the scene of an offence was that of the accused. The present case is of course different because the jury had access to the voice itself, by means of the tape recording Ex D. The question is whether the jury might make a comparison of that voice with the voice of the accused as heard making an unsworn statement, together with the tape recording of that statement, in order to determine whether it was the appellant's voice on Ex D.
Where a witness identifies a voice on the basis of having heard it before, the witness needs to have heard a sufficient amount of the accused's speech to be familiar with it because, in saying that the voice at the crime scene is that of the accused, the witness is relying on his or her memory of the accused's voice. Where a witness identifies a voice on the basis of having heard it subsequently, there should be something about the voice at the crime scene to sufficiently embed it in the witness's memory so as to enable him or her to say that it is the same as a voice which he or she heard subsequently. The greater the distance in time between when the two voices compared were heard, the greater the desirable degree of familiarity or distinctiveness.
Where two voices are being heard side-by-side, as occurred in the present case, the concern is not with familiarity or distinctiveness but with whether the quality and quantity of the material is sufficient to enable a useful comparison to be made. By way of analogy, asking a jury to compare a photograph of an accused with a security camera picture of the perpetrator of a robbery involves quite different considerations from asking a witness whether the accused is the person they remember seeing at the robbery. It is in this sense that counsel for the respondent stressed that, notwithstanding that the aim is still to identify the voice on the tape, the exercise is one of voice comparison rather than identification from memory.
As to the quality and quantity of the material being compared, clearly the greater the amount of material, the greater the similarity in the circumstances in which the voices were spoken or recorded and the greater the number of similar words used, the more useful the comparison. A jury would also benefit from hearing the material more than once so as to enable them to concentrate on both similarities and dissimilarities. Counsel for each side should have the opportunity to point out or emphasise particular similarities or dissimilarities to the jury. The defence may wish to call expert evidence where the jury may have difficulty in drawing a distinction between two voices of a particular nationality or dialect." (emphasis added)
36 Their Honours said (at 397) that the High Court "would be slow to depart from a trial judge's assessment that material was of sufficient quality and quantity for the jury to be permitted to make the necessary comparison". After observing that the Court would not shirk the responsibility of determining whether the jury was given sufficient warning of the difficulties involved in the voice comparison exercise, they said (at 397-398, footnotes omitted):
" Domican v The Queen was concerned with visual identification. Nevertheless, the following passage from the judgment of Mason CJ, Deane, Dawson, Toohey, Gaudron and McHugh JJ is particularly apposite:
'Whatever the defence and however the case is conducted, where evidence as to identification represents any significant part of the proof of guilt of an offence, the judge must warn the jury as to the dangers of convicting on such evidence where its reliability is disputed. The terms of the warning need not follow any particular formula. But it must be cogent and effective. It must be appropriate to the circumstances of the case. Consequently, the jury must be instructed 'as to the factors which may affect the consideration of [the identification] evidence in the circumstances of the particular case'. A warning in general terms is insufficient. The attention of the jury 'should be drawn to any weaknesses in the identification evidence'. Reference to counsel's arguments is insufficient. The jury must have the benefit of a direction which has the authority of the judge's office behind it. It follows that the trial judge should isolate and identify for the benefit of the jury any matter of significance which may reasonably be regarded as undermining the reliability of the identification evidence.'
Where the jury is itself asked to make a comparison of voices in a situation such as this one, very careful directions are called for. It is not irrelevant that in the case of handwriting comparisons, it has been said to be unsafe to leave the matter to the jury without the guidance of an expert. It is unnecessary to go that far in the case of a voice comparison but, in our view, it is unsafe to leave that matter to the jury without very careful directions as to those considerations which would make a comparison difficult and without a strong warning as to the dangers involved in making a comparison . This was not done in the present case." (emphasis added)
37 Their Honours concluded (at 401) that there had been a miscarriage of justice and a new trial should be ordered.
38 McHugh and Gummow JJ also considered the admissibility of voice identification evidence (at 405 - 407). In their opinion it was "arguable that Smith was wrongly decided in so far as it [held] that evidence of voice identification is only admissible when the witness is very familiar with the accused's voice or when the voice of the accused is very distinctive", observing that "[v]isual identification evidence does not have to meet a similar threshold standard". While they suggested that "[f]amiliarity and distinctiveness appear to be matters of weight rather than conditions of admissibility", they said "the correctness of Smith and the cases that follow it should await a case where a decision on the point is essential".
39 However in McHugh and Gummow JJ's view (at 407) voice identification principles were not relevant to Bulejcik. Rather, the issue was whether "a recording of material that was before the Court was put to a proper use". In their Honours' view the trial had miscarried because the trial judge had permitted the jury to listen to a recording of the accused's unsworn statement, but had not required the Crown to tender it. In their view he ought to have required the Crown to reopen its case to tender the recording and give the accused an opportunity to deal with that evidence (at 408-409). Accordingly on the majority view, there having been a miscarriage of justice, a new trial was ordered.
40 Brennan CJ would have dismissed the appeal. He was of the view both that the trial judge's directions on the voice comparison exercise were adequate and that the playing back of the accused's unsworn statement was a matter of practice to be determined by the trial judge in the exercise of his or her discretion. In the course of his judgment, however, he made pertinent observations about the central issue, which he described as one of voice identification, rather than voice comparison. While expressed in dissent his views command great weight and have been relied upon in subsequent authorities. He said (at 381-383):
"…[ T]here is no general rule that precludes a jury from taking account of an accused's voice heard at the trial when it tends to prove a fact to be found . … Recognition of a speaker by the sound of the speaker's voice is a commonplace of human experience . To recognise the voice of a particular speaker some familiarity with that speaker's voice is ordinarily needed. A person who is not familiar with the voice of a putative speaker may be able nevertheless to recognise the speaker's voice by comparison with an established example of that voice if the speaker's voice exhibits sufficiently distinct features to permit an ordinary person to identify the speaker or if the person possesses an appropriate expertise. …
In the present case, no question of admissibility arises but a similar issue does arise. If it would be wrong to admit evidence of identification of the voice recorded on Ex D by a witness who has compared the voice with the appellant's voice in making his unsworn statement, it would be wrong to allow the jury to make an identification of the voice on Ex D based on that comparison.
In some cases, judges have treated prior familiarity or distinctiveness as conditions of admissibility of voice identification by non-experts in the absence of other means of identification; in other cases, familiarity and distinctiveness have been treated as factors relevant to the weight of the witness' evidence but not its admissibility. Evidence of identification by voice recognition is not a distinct category of evidence, though its probative value may oftentimes be dubious and will vary according to the circumstances of each case. The test of its admissibility must be, in my opinion, one of degree. The prescription of particular conditions of admissibility is not supported by any principle of the law of evidence. Provided a reasonable jury could find, or be assisted in finding, a relevant fact upon consideration of evidence of voice identification that is admissible under the ordinary rules of evidence, there is no reason why the tender should be rejected. The evaluation of evidence on which a reasonable jury could act is a matter for the jury. It exceeds a judge's function to withhold evidence from a jury merely because, on that evidence, the judge would not reach and thinks a jury should not reach a conclusion adverse to the accused beyond reasonable doubt. However, the ordinary rules of evidence confer on a judge a discretion to exclude evidence that is unduly prejudicial, albeit the evidence is otherwise admissible. The exercise of that discretion is designed to avoid a significant risk that the evidence will be misused by the jury in a way that cannot be guarded against by an appropriate warning. As the discretion is designed to avoid the risk of a miscarriage of justice, the exercise of that discretion in practice is apt to lift the level of familiarity or distinctiveness or expertise expected of admissible evidence. Again, that is a matter of degree to be assessed in the circumstances of each case.
In the present case, the voice recorded on Ex D was first identified by Detective Sergeant Wilding. A comparison with the appellant's voice at the trial was permitted to confirm or to cast doubt on Detective Sergeant Wilding's evidence and to rebut or to support the appellant's suggestion that Ex H had been fabricated. The jury heard the appellant's accented voice for forty minutes during the making of his unsworn statement and were in a position to assess for themselves whether that auditory experience equipped them to make a comparison with the voice on Ex D.
To deny the jury the right to take into account probative material that they had heard with their own ears would have been to impose - or to attempt to impose - an artificial restraint on the jury's employment of their common sense. It would have been as erroneous as it would have been futile to direct the jury to ignore the voice they had heard when the accused made his unsworn statement. There was no reason why, subject to a satisfactory warning, the jury should not have had regard to the sound of the appellant's voice in determining whether the appellant's voice had been recorded on Ex D." (emphasis added)
41 In R v Leung & Anor [1999] NSWCCA 287; (1999) 47 NSWLR 405 the accused were indicted on a charge of having been knowingly concerned in the importation into Australia of not less than a commercial quantity of heroin contrary to s 233B of the Customs Act. Prior to their arrest the police obtained audio recordings of conversations between persons within the premises in which the accused were located when arrested (the "DAT tapes"). According to an interpreter called at the trial the conversations were in Cantonese and Mandarin. After the accused were arrested, one, Wong, participated in a conversation with Federal Police Officers in English. Another accused, Leung, also participated in a short conversation with a Federal Police Officer in English and subsequently in longer conversations with a Federal agent acting as an interpreter and, for Leung's part, presumably in Chinese, whether Mandarin or Cantonese does not clearly appear from the judgment. The conversations between Wong and Leung and the Federal agents were tape recorded (the "police tapes").
42 Mr Fung, an interpreter, translated the DAT tapes into English. He gave evidence that there were three different voices on them: "M1", "M2" and "M3". He also purported to attribute the voices of M1 and M3 to the accused by comparing the DAT tapes with the police tapes. Leung did not give evidence. Wong did, although it does not appear from the judgment whether he did so in English or in Chinese, through an interpreter. Leung and Wong were convicted.
43 Both Leung and Wong appealed. One of their grounds of appeal was that the trial judge erred by ruling that the interpreter's evidence of voice identification and voice comparison was admissible.
44 Simpson J (with whom Spigelman CJ and Sperling J relevantly agreed) held that the interpreter's evidence was admissible pursuant to s 79 of the Evidence Act as he fell into the category of being an "ad hoc expert". Her Honour referred (at [42]) to the criticism, which she said had some merit, that, in relation to Wong, Mr Fung was comparing a voice speaking Cantonese on the DAT tapes with a voice speaking English on the police tapes, however she did not further consider this issue. In rejecting this ground of appeal, her Honour said:
"44 Voice comparison is not necessarily a question for expert evidence, although it may be . If the two sets of tape recordings in the present case had been in English, it would have been open to the Crown to have left it to the jury to make their own comparison and assessment of whether the voices on the DAT tapes (or any of them) corresponded to either of the voices on the police tapes. That course theoretically remained open but would have left the jury with a task immeasurably more difficult, given the reasonable assumption that no member of the jury understood either of the Chinese languages involved. The jury would, truly, have been comparing voices only, without the intrusion of language and speech patterns that are part of voice identification. " (emphasis added)
45 In Nguyen v R the Western Australian Court of Criminal Appeal (Malcolm CJ, Anderson and Steytler JJ) considered the issue of voice identification and voice comparison in the context of a complaint by the accused that the trial judge had erred in failing to give the jury appropriate directions with respect to the identification of voices on recordings of intercepted telephone calls.
46 The facts, as taken from the headnote, were that:
"The appellant (a businessman with no prior convictions) was convicted of two offences in relation to the importation by mail from Hong Kong of 134.8 grams of pure heroin which arrived in a package at the Perth Mail Exchange. Later there was a controlled delivery of a substituted substance which the appellant accepted. Part of the case against him was that there had been a voice comparison by an interpreter of many calls made to and from the appellant's and another's mobile phones. Some of the calls were incriminating. The interpreter who had not spoken to the appellant identified one of the parties as 'H'. The appellant admitted that he was H in three calls only. The jury had recordings of nine calls including the three in which the appellant admitted he was H ... It was contended on appeal that the judge had failed to give the jury a special warning about voice comparison …"
47 In addition to the interpreter's evidence identifying the appellant as H, it appears (see 368) that the jury was "impliedly, if not expressly, invited to compare for themselves the admitted voice with the putative voice and to come to their own conclusion as to whether the putative voice was the same as the voice which the appellant admitted was his."
48 Anderson J (with whom Steytler J agreed) rejected the appellant's complaint that the trial judge should have warned the jury that it should not embark upon on a voice comparison exercise itself or, if it did, should only do so "with extreme care". His Honour said:
"[138] I cannot accept the submission that the jury should have been warned not to embark upon a process of comparison themselves. I see no reason why the jury are not entitled to compare voice recordings in order to come to their own conclusions. Voice recognition is not, of itself, an expert process. As Brennan CJ said in Bulejcik (at 381; 470): 'Recognition of a speaker by the sound of the speaker's voice is a commonplace of human experience.'
[139] It is clear that it is permissible for the jury to make their own comparison: Smith (1990) 50 A Crim R 434 at 453-454; see also Barker [2002] WASCA 127 as to how the jury may use exhibits.
[140] As to whether the jury should have been given an instruction that they should be careful before concluding that the voices were the same, I do think it would have been better if some such direction had been given. The jury were listening to speakers in a foreign language and they might have been told to bear that in mind in attempting a comparison. On the other hand, it was not the Crown case that the only voice captured on the discs was that of the appellant. In such a case, the danger is that the jury will be overly influenced by the fact that both the putative voice and the accused's voice are of similar accent, when the explanation for that might simply be that it is the Vietnamese way of speaking. However, this is not such a case. In each of the six incriminating intercepts there were two voices in the Vietnamese language, so that not only did the jury have the opportunity to compare the putative voice of the appellant on the discs with his admitted voice, but also to compare and to distinguish (if they could) the admitted and putative voice of the appellant with the voice of 'Tran'. This would bring home to the jury that merely because the voices had a particular cadence, intonation or accent did not mean that the voices were those of the same speaker. …" (emphasis added)