I observe in this context that while STRmix produces quite extensive reports (referred to as 'output data') that permit scrutiny of many of its operations, Dr Taylor gave evidence that this output data is incomplete for the reason that there are 'hundreds of thousands of different data points and calculations' and it is impossible to include them all in the output data due to their sheer number.
The STRmix Research Paper acknowledges that without an understanding of the underlying mathematics, there is a risk that systems become 'black boxes' the workings of which are not understood by the users and that presentation of any statistical analysis in court becomes problematic.
To this end, the STRmix Research Paper describes the mathematics underpinning STRmix along with the practical implementation of the mathematics and the means of calculating the likelihood ratio. It is an attempt by the developers of STRmix to expose the internal workings of STRmix by setting out the mathematical and statistical models on which it is based.
The STRmix Research Paper also deals with the issues of reproducibility and describes the validation experiments carried out by the authors. The results of the validation experiments are set out in the appendices. According to the authors, inspection of the results of these experiments suggests that the likelihood ratio assigned by the method is fair and reasonable.
In his evidence, Dr Taylor agreed that it is not feasible to validate a process for producing a likelihood ratio in the way that one might validate a procedure for measuring a physical quantity, because a likelihood ratio has no true value: it expresses uncertainty about an unknown event and depends on modelling assumptions that cannot be expressly verified. However, he said that progress can be made in evaluating the validity and performance of software, and that the courts need these kinds of evaluations to have confidence in the results generated by software-based forensic analysis.
Dr Taylor gave evidence that validation can be done at several levels. The STRmix Research Paper addresses the question of conceptual validation. Development validation of the software involves ascertaining whether the software does what the Research Paper says it does. Then there is laboratory-based validation, which is verification that the software is fit for purpose in the hands of scientists. That involves examination of interpretations of mixtures of known contributors and comparison against other methods and/or human judgment. The STRmix Research Paper details validation of both kinds, including validation studies using known contributors and comparison against other methods and human judgment.
In addition, the STRmix User's Manual details the verification 'by hand' of a number of STRmix functions, including expected allele and stutter heights and expected peak heights of drop or 'Q' alleles.
VPFSS has carried out validation studies using known contributors and P+ and PP21 mixtures. The validation studies were tendered in court and Ms Federle and Ms Scott gave evidence about them at the preliminary hearing.
The PP21 study is dated April 2013 and tests with two person and three person mixtures. Six known individuals were used in different pairings and at different ratios. A total of 16 two person mixtures were amplified, producing 161 deconvolutions and 307 likelihood ratios. Ten three person mixtures were amplified, producing 124 deconvolutions and 371 likelihood ratios. In addition, three mocked partial profiles were analysed a number of times using STRmix.
The P+ study is dated May 2013. Again, tests were conducted with two person and three person mixtures and six known individuals were used in different pairings and at different ratios. A total of 12 two person mixtures were amplified, 60 deconvolutions performed and 120 likelihood ratios calculated. For the three person mixture study, 11 mixtures were amplified, 60 deconvolutions performed and 180 likelihood ratios calculated.
Both studies contained a variety of other testing and calibration, including the pilot studies for peak height variance using Model Maker required for STRmix.
All of these studies are open to scrutiny.
It is the defence position that the validation studies are inadequate because they did not test a sufficient number of samples and did not use the ratios that are in issue in this case.
In her evidence about these studies, Ms Federle described them as 'large' in the sense that each of the different mixture ratios was repeated over and over again to create a lot of data. The ratios used were intended to cover a broad range of different scenarios. According to Ms Federle, there was nothing in the studies to suggest that other mixture ratios would behave any differently from the study ratios. She said that none threw up any issues, so it was possible to extrapolate from them in respect of all sorts of mixtures.
For his part, Dr Taylor commented on the VPFSS testing with PP21 mixtures, observing that there was a reasonably substantial number of calculations and that contributors were combined in different proportions and in different amounts. When it was put to Dr Taylor that no validation study had been done for the particular ratios in the present case, Dr Taylor said that it was possible to extrapolate based on the kinds of ratios that were tested. The aim is to vary the total amount of DNA that goes into the mixtures and the proportions for each contributor to show mixtures in a range of configurations - a major-minor, two equal contributors, two equal contributors of very low amounts of DNA and so forth - to permit extrapolation from these results and to see that STRmix is performing as expected or as intended, based on these mixtures. He gave evidence that he himself had recently published a study in which he looked at two, three and four person mixtures at high and low levels and in different concentrations, and he concluded that STRmix was behaving as would be expected and could therefore be used on evidence profiles.
In cross-examination, Ms Federle was taken to the STRmix validation study that tested with P+ mixtures. She agreed in the two person mixtures, when both of the contributors contributed low levels of DNA, the likelihood ratio was determined to be less than one at four of markers, giving more support for the actual contributor not contributing to the DNA profile at four of the nine markers. It was put that the study did not validate the methodology. Ms Federle responded that this did not prove that the contributor was a non-contributor, but highlighted the issues with low-template DNA. According to Ms Federle, that sort of result was to be expected, given the input amount of the contributors to the profile: when there are really low contributions, STRmix will not calculate a likelihood ratio indicating a contribution that is contrary to the evidence. It will therefore give a likelihood ratio that favours the alternative view. That, says Ms Federle, is a totally reasonable explanation for the evidence.
Ms Federle denied that this showed that the validation group was too small and said it was just one of the effects of low-template DNA. It showed that STRmix was doing the correct thing and not overstating the evidence.
By contrast, Ms Taupin viewed the production of 'false negatives' of this kind by STRmix as an indication that it was unreliable.
In my view, this will be an issue for the experts at trial. I do not see it as showing that STRmix lacks an objective foundation or consider that it detracts from the cogency or reliability of the validation studies.[135]