Tuite v The Queen [2015] VSCA 148

Tuite v The Queen [2015] VSCA 148 - VSCA 2015 case summary — Zoe

[68]

STRmix has not been shown to be a reliable tool for the statistical evaluation of DNA profiles. The defence argues that STRmix has not been properly validated for the use to which it has been put by VPFSS and it is not widely accepted by the forensic science community.

The defence witness, Ms Jane Taupin, gave evidence that there is no consensus in the literature that STRmix works or any description of how it works. While probabilistic statistical methodologies have been introduced in other jurisdictions, these other systems have not been evaluated. Neither the United States 'Scientific Working Group on DNA Analysis Methods' ('SWGDAM') nor the International Society for Forensic Genetics DNA Commission ('ISFG') has published guidelines for the use of a fully continuous probabilistic methodology. In particular, there are no recommendations as to how to calculate the probability of allele drop-out or more generally regarding the use of models based on peak height variation, like STRmix. Furthermore, the way in which STRmix determines the probability of drop-out is embedded in an internal system which is not open to evaluation. In Ms Taupin's opinion, the inherent unreliability of peak heights in low level DNA means that STRmix should only be used where the DNA is not low-template DNA and there is less variance in peak heights. STRmix might be suitable for good quality DNA profiles, but it should not be used on poor quality profiles, particularly without recommendations for its use from either the ISFG or SWGDAM.

For the purpose of s 79 admissibility, the defence contends that the prosecution has not established that STRmix is a reliable body of knowledge in respect of which evidence based on 'specialised knowledge' can be given. This is because:

(a) The particular fully-continuous probabilistic methodology used or applied by STRmix is a discrete and new development in the area of DNA science; it involves a new or novel 'area' which does not constitute (at least not yet) a reliable body of knowledge; and

(b) Even if the Court came to the view that the fully continuous probabilistic methodology used and applied by STRmix was appropriate for use with optimum amounts of DNA, there is no reliable body of knowledge for its application in relation to complex mixtures and/or sub-optimal amounts of DNA.

...

The defence argument ... is based on what counsel described as the 'probabilistic methodology used by STRmix' being a new and discrete field of knowledge that remains untested and lacks acceptance in the forensic science community. According to the defence, the methodology used by STRmix is not 'a body of knowledge or experience which is sufficiently organised or recognised to be accepted as a reliable body of knowledge or experience'.[23]

[75]

Before admitting the opinion of a witness into evidence as expert testimony, the judge must consider and decide two questions. The first is whether the subject matter of the opinion falls within the class of subjects upon which expert testimony is permissible. This first question may be divided into two parts:

(a) whether the subject matter of the opinion is such that a person without instruction or experience in the area of knowledge or human experience would be able to form a sound judgment on the matter without the assistance of witnesses possessing special knowledge or experience in the area[;] and

(b) whether the subject matter of the opinion forms part of a body of knowledge or experience which is sufficiently organized or recognized to be accepted as a reliable body of knowledge or experience, a special acquaintance with which by the witness would render his opinion of assistance to the court. The second question is whether the witness has acquired by study or experience sufficient knowledge of the subject to render his opinion of value in resolving the issues before the court.

An investigation of the methods used by the witness in arriving at his opinion may be pertinent, in certain circumstances, to the answers to both the above questions. If the witness has made use of new or unfamiliar techniques or technology, the court may require to be satisfied that such techniques or technology have a sufficient scientific basis to render results arrived at by that means part of a field of knowledge which is a proper subject of expert evidence. Examples of cases in which that question arose are The Queen v Gilmore, The Queen v McHardie and Danielson and United States v Williams. An investigation of the methods adopted by a witness may be relevant to an assessment of his qualifications as a witness if such an investigation might reveal that the witness has 'posing as an expert made assertions that are contrary to proved scientific facts or to the known phenomena of nature, thus exposing his ignorance of the learning he professed' ... or that the witness has adopted methods which are so unscientific as to expose that ignorance.

...

Generally speaking, once the qualifications are established, the methodology will be relevant to the weight of the evidence and not to the competence of the witness to express an opinion. The suitability and adequacy of the methods used may well be themselves a matter of expert opinion.[29]

[90]

With respect to the first limb of s 79, I have set out above the evidence of the nature of the specialised knowledge which Dr Sutisno said she had brought to bear in the formulation of the three opinions she expressed. There does appear to be a body of expertise based on facial identification. The detailed knowledge of anatomy which Dr Sutisno unquestionably had, together with her training, research and experience in the course of facial reconstruction supports her evidence of facial characteristics.

Nothing was presented to the Court which indicates, in any way, that Dr Sutisno's extension from facial to body mapping, with respect to matters of posture, has anything like that level of background and support. Specialist knowledge of posture can of course exist ... But the foundation for admissibility must be lain. It was not lain in the present case. The so-called 'unique identifier' of posture was an essential element of Dr Sutisno's evidence of identity in the present case.

The focus of attention must be on the words 'specialised knowledge', not on the introduction of an extraneous idea such as 'reliability'.

...

In the immediate context of 'specialised knowledge', picked up by the words 'that knowledge' in the second limb of s 79, the word 'knowledge' has a different connotation to that which it might have in a different context, for example, 'common knowledge'. The meaning of 'knowledge' in s 79 is, in my opinion, the same as that identified in the reasons of the majority judgment in Daubert v Merrell Dow Pharmaceuticals Inc [1993] USSC 99; 509 US 579 (1993) at 590: '[T]he word "knowledge" connotes more than subjective belief or unsupported speculation. The term "applies to any body of known facts or to any body of ideas inferred from such facts or accepted as truths on good grounds"'. The quoted definition is from an American dictionary.

I do not mean to suggest that Daubert and its progeny in the United States has anything useful to say about s 79 of the Evidence Act. Rule 702 of the Federal Rules of Evidence (2004), which fell to be interpreted in Daubert, is in quite different terms to s 79. The definition of the word 'knowledge' in this cognate context is, however, instructive.

In the case of the appellant the relevant evidence about posture was expressed in terms of 'upright posture of the upper torso' or similar words. The only links to any form of 'training, study or experience' was the witnesses' study of anatomy and some experience, entirely unspecified in terms of quality or extent, in comparing photographs for the purpose of comparing 'posture'. The evidence in this trial did not disclose, and did not permit a finding, that Dr Sutisno's evidence was based on a study of anatomy. That evidence barely, if at all, rose above a subjective belief and it did not, in my opinion, manifest anything of a 'specialised' character. It was not, in my opinion, shown to be 'specialised knowledge' within the meaning of s 79.[40]

[207]

I do not accept that 'the probabilistic methodology used by STRmix' is either a 'discrete' or a 'new' field of knowledge as asserted by the defence. Based on the evidence at the preliminary hearing, I have concluded that STRmix and the methodology that it uses, is a development within an established and sophisticated field of knowledge concerned with the evaluation of DNA profiles. It is not a new or discrete body of knowledge. Rather, it is a new development in an existing body of knowledge.

Although the STRmix program is 'new' in that it has only recently been developed and put to forensic use, and only a small number of fully-continuous probabilistic systems like STRmix are in use internationally, the development and use of semi-continuous and fully-continuous probabilistic systems has been the subject of discussion and promotion in the forensic scientific community at the highest levels, particularly for the analysis of low-template and complex DNA profiles. Two examples were drawn to the Court's attention during preliminary argument.

In 2012, the ISFG DNA Commission[132] published a paper on the evaluation of DNA results that may include drop-out and/or drop-in using probabilistic methods. The authors opine that 'classical binary models' are largely inferior to the probabilistic approach but observe that the adoption of probabilistic models has been inhibited by the complexity of concepts that are largely outside the experience of case-working forensic scientists, coupled with lack of suitable training opportunities. They confirm that the ISFG DNA Commission strongly supports new initiatives to remedy the problem and continue:

Some laboratories will wish to quickly adopt probabilistic methods ahead of the main-stream forensic community. This ISFG DNA Commission strongly supports this approach, since it will encourage others to follow. In this context, it should be noted that the approach described here still requires rigid assessment of the overall quality of a given DNA profile and its suitability for further analysis based on criteria described in the laboratory's quality management guidelines.

The 'main-stream forensic community' is therefore expected to follow the lead of laboratories using probabilistic methods for typing results that include drop-in and drop-out, that is, results that include stochastic effects and are associated with the analysis of low-template or compromised DNA.

Christopher Steele and David Balding from the Genetics Institute at University College, London, have recently published a paper in which they review the main models and software for the statistical evaluation of DNA profile evidence, including STRmix, having outlined the general principles for the statistical evaluation such evidence and the difficulties associated with the analysis of low-template DNA. They state:

We hope that this review will help disseminate current best practices in statistical evaluation of [low template DNA] evidence, spur further developments and advertise to the forensic and wider community that robust methods for [low template DNA] evidence evaluation are now available. We have not verified the software described here, but, where available, we cite validation studies conducted by the authors of each program or package. In our view, courts are now able to avail themselves of the powerful new [low template DNA] profiling technologies, provided that as much care is take with the statistical analysis as is necessary for the collection, handling and analysis of samples.

Although neither article endorses STRmix specifically, or the fully-continuous methodology that it uses more generally, the articles place the fully-continuous probabilistic statistical methodology within the evolving field of knowledge concerned with the statistical evaluation of DNA evidence. This is not to say, however, that there may not be problems associated with fully-continuous systems or that particular DNA scientists might well prefer other models or methodologies.

In the STRmix Research Paper,[133] Dr Taylor and his co-authors also describe the 'forward movement' in the late 2000s to develop and use continuous and semi-continuous statistical models, but acknowledge that progress is still partial, with only a few laboratories worldwide implementing or investigating fully continuous methods. Barriers to the adoption of the methodology include the initial lack of validated software, the fear of complexity, the implications of using 'black box' technology and the perceived costs. Nonetheless, validated and commercially available software, known as 'TrueAllele', has proceeded through the court processes in the United Kingdom and the United States of America.

In response to the proposition that open source software is highly desirable in a court environment, Dr Taylor said that open source software is desirable but not essential, as long as other measures are taken, such as reporting the models or the mathematics underlying the methodology and validation information is provided. He agreed that few systematic comparisons of the performance of the different programs had been published and said that such comparisons were a high priority now that the field was beginning to mature. He was aware from the results available that the different programs often generated different results when comparing the same hypotheses and agreed that the most important difference was typically between the continuous and discrete algorithms, as the continuous algorithms exploit peak height information which gives different results.

For her part, Ms Federle described STRmix as an 'incremental change' to VPFSS's method for calculating likelihood ratios. She said STRmix modified but was based on the same principles as SPURS. In response to the criticism that STRmix listed countless numbers of genotypes as possible contributors, Ms Federle said that it had ever been thus: there had always been a list of possibilities that could explain the evidence, particularly where there are a number of contributors. STRmix is different in that it weights the possibilities, whereas in the past, the statistical methodology had not been able to weight possibilities differently: if something registered as a possibility, it had to be considered to be just as likely a possibility as something that could potentially be a really good explanation for the profile.

It was put to Ms Federle that using weightings and abandoning thresholds had not been robustly debated in the broader scientific forensic DNA community. She responded that there had been a lot of literature about the different methods and, in the past, there had been recommendations to move to this sort of method, but there had not been the statistical programs to enable a method to be implemented. The continuous method was advocated for dealing with partial profiles and low level profiles, and the debate had proceeded on the basis that this was the way forward.

In my view, this evidence places STRmix and its fully-continuous probabilistic methodology squarely within the field of statistical DNA evaluation, albeit possibly at its leading edge. It has a credible scientific and mathematical basis. However, that is not to say that the results that it produces are inherently reliable. It may not produce results that are as reliable for the purposes of forensic case-work as binary or semi-continuous methods. Its method for determining the possibility or probability of drop-out based on peak height variability may be open to criticism. Its use on compromised profiles may be questionable. But those are matters for competing expert opinion, providing, of course, that STRmix is amenable to scrutiny and independent testing.[134]

[209]

I observe in this context that while STRmix produces quite extensive reports (referred to as 'output data') that permit scrutiny of many of its operations, Dr Taylor gave evidence that this output data is incomplete for the reason that there are 'hundreds of thousands of different data points and calculations' and it is impossible to include them all in the output data due to their sheer number.

The STRmix Research Paper acknowledges that without an understanding of the underlying mathematics, there is a risk that systems become 'black boxes' the workings of which are not understood by the users and that presentation of any statistical analysis in court becomes problematic.

To this end, the STRmix Research Paper describes the mathematics underpinning STRmix along with the practical implementation of the mathematics and the means of calculating the likelihood ratio. It is an attempt by the developers of STRmix to expose the internal workings of STRmix by setting out the mathematical and statistical models on which it is based.

The STRmix Research Paper also deals with the issues of reproducibility and describes the validation experiments carried out by the authors. The results of the validation experiments are set out in the appendices. According to the authors, inspection of the results of these experiments suggests that the likelihood ratio assigned by the method is fair and reasonable.

In his evidence, Dr Taylor agreed that it is not feasible to validate a process for producing a likelihood ratio in the way that one might validate a procedure for measuring a physical quantity, because a likelihood ratio has no true value: it expresses uncertainty about an unknown event and depends on modelling assumptions that cannot be expressly verified. However, he said that progress can be made in evaluating the validity and performance of software, and that the courts need these kinds of evaluations to have confidence in the results generated by software-based forensic analysis.

Dr Taylor gave evidence that validation can be done at several levels. The STRmix Research Paper addresses the question of conceptual validation. Development validation of the software involves ascertaining whether the software does what the Research Paper says it does. Then there is laboratory-based validation, which is verification that the software is fit for purpose in the hands of scientists. That involves examination of interpretations of mixtures of known contributors and comparison against other methods and/or human judgment. The STRmix Research Paper details validation of both kinds, including validation studies using known contributors and comparison against other methods and human judgment.

In addition, the STRmix User's Manual details the verification 'by hand' of a number of STRmix functions, including expected allele and stutter heights and expected peak heights of drop or 'Q' alleles.

VPFSS has carried out validation studies using known contributors and P+ and PP21 mixtures. The validation studies were tendered in court and Ms Federle and Ms Scott gave evidence about them at the preliminary hearing.

The PP21 study is dated April 2013 and tests with two person and three person mixtures. Six known individuals were used in different pairings and at different ratios. A total of 16 two person mixtures were amplified, producing 161 deconvolutions and 307 likelihood ratios. Ten three person mixtures were amplified, producing 124 deconvolutions and 371 likelihood ratios. In addition, three mocked partial profiles were analysed a number of times using STRmix.

The P+ study is dated May 2013. Again, tests were conducted with two person and three person mixtures and six known individuals were used in different pairings and at different ratios. A total of 12 two person mixtures were amplified, 60 deconvolutions performed and 120 likelihood ratios calculated. For the three person mixture study, 11 mixtures were amplified, 60 deconvolutions performed and 180 likelihood ratios calculated.

Both studies contained a variety of other testing and calibration, including the pilot studies for peak height variance using Model Maker required for STRmix.

All of these studies are open to scrutiny.

It is the defence position that the validation studies are inadequate because they did not test a sufficient number of samples and did not use the ratios that are in issue in this case.

In her evidence about these studies, Ms Federle described them as 'large' in the sense that each of the different mixture ratios was repeated over and over again to create a lot of data. The ratios used were intended to cover a broad range of different scenarios. According to Ms Federle, there was nothing in the studies to suggest that other mixture ratios would behave any differently from the study ratios. She said that none threw up any issues, so it was possible to extrapolate from them in respect of all sorts of mixtures.

For his part, Dr Taylor commented on the VPFSS testing with PP21 mixtures, observing that there was a reasonably substantial number of calculations and that contributors were combined in different proportions and in different amounts. When it was put to Dr Taylor that no validation study had been done for the particular ratios in the present case, Dr Taylor said that it was possible to extrapolate based on the kinds of ratios that were tested. The aim is to vary the total amount of DNA that goes into the mixtures and the proportions for each contributor to show mixtures in a range of configurations - a major-minor, two equal contributors, two equal contributors of very low amounts of DNA and so forth - to permit extrapolation from these results and to see that STRmix is performing as expected or as intended, based on these mixtures. He gave evidence that he himself had recently published a study in which he looked at two, three and four person mixtures at high and low levels and in different concentrations, and he concluded that STRmix was behaving as would be expected and could therefore be used on evidence profiles.

In cross-examination, Ms Federle was taken to the STRmix validation study that tested with P+ mixtures. She agreed in the two person mixtures, when both of the contributors contributed low levels of DNA, the likelihood ratio was determined to be less than one at four of markers, giving more support for the actual contributor not contributing to the DNA profile at four of the nine markers. It was put that the study did not validate the methodology. Ms Federle responded that this did not prove that the contributor was a non-contributor, but highlighted the issues with low-template DNA. According to Ms Federle, that sort of result was to be expected, given the input amount of the contributors to the profile: when there are really low contributions, STRmix will not calculate a likelihood ratio indicating a contribution that is contrary to the evidence. It will therefore give a likelihood ratio that favours the alternative view. That, says Ms Federle, is a totally reasonable explanation for the evidence.

Ms Federle denied that this showed that the validation group was too small and said it was just one of the effects of low-template DNA. It showed that STRmix was doing the correct thing and not overstating the evidence.

By contrast, Ms Taupin viewed the production of 'false negatives' of this kind by STRmix as an indication that it was unreliable.

In my view, this will be an issue for the experts at trial. I do not see it as showing that STRmix lacks an objective foundation or consider that it detracts from the cogency or reliability of the validation studies.[135]

[216]

The prosecution witnesses acknowledge that there are limitations in the STRmix methodology, a number of which are detailed in the STRmix Research Paper. The STRmix Research Paper describes limitations arising in some cases from conscious choice and in others from the current state of the model development. They include the danger that a large artefact is allowed through the manual review of the electropherogram and what is described as a 'sub-optimal' stutter model. It is also recognised that it is difficult to test continuous models for the accuracy of the likelihood ratio produced because the correct answer is unknown and in many cases unknowable.

Importantly, so far as I can tell, none of the prosecution witnesses contend that the forensic scientist can simply ignore the quality of the evidentiary profile and feed any kind of profile into STRmix to produce a reliable likelihood ratio. Although STRmix purports to be specifically suited for the analysis of low-template and partial DNA profiles, there is a need to carefully consider the profile and whether it is suitable for statistical evaluation for use in legal proceedings.

I have found the STRmix methodology to be a development in a recognised field of knowledge concerned with the statistical evaluation of DNA profiles. It is not subjective or speculative or otherwise to be dismissed as lacking an objective basis. In my view, based on the opinions of Dr Taylor, Ms Federle and Ms Scott the limitations identified do not substantially erode the probative worth of the DNA evidence.

However, it is clear that there is scope for competing expert evidence about the reliability of the STRmix methodology. What a lack of international take-up or independent review and assessment of the STRmix methodology means for the weight that should be given to the DNA evidence will be a matter for the jury, based on differing expert evidence on this issue. Likewise, the suitability of STRmix (or any system based on peak height modelling) for the analysis of low-template DNA will be the subject of disagreement between experts upon which the jury will be called to adjudicate. As to the specific problems identified by Ms Taupin in the VPFSS analysis of the Items, notably whether the profile for Item 1-3 shows three or four contributors and whether the accused is excluded as a contributor to Item 4-1 because the PP21 profile does not show a Y allele at the amelogenin marker, these again are matters that can and should be resolved by a jury hearing the expert evidence and deciding which evidence is to be preferred.

At a glance

Court

Decision date

Before

Source

Judgment (334 paragraphs)

Parties

Legislation Cited (1)

Cases Cited (20)