Part I: Ames Study
Finding the Article
I was able to find a copy of a Study titled “A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons” written by David P. Baldwin, Stanley J. Bajic, Max Morris, and Daniel Zamzow. This is Part I of a two-part study done by the Ames Laboratory. Part I can still be found in obscure places but part II has been wiped from most sources. Defense attorneys or academic opponents heavily reference these studies, but when used, they are quickly/sloppily referenced and cherry-picked. I hope to be able to share the main findings in these studies to be able to help anyone in the field that will meet with people using these studies. This post will focus on Part I of the study, and at a later date, I will post a discussion of Part II of the study.
Introduction/Experiment
The study’s authors designed the study to better understand error rates associated with the comparison of fired cartridge casings. They stated that the problem with previous studies is that they did not include independent sample sets that would allow unbiased determination of the false-positive and/or false-negative rates. So, this study sets out to resolve this issue.
Two hundred and eighty-four (284) participants were given fifteen (15) test sets to examine. Twenty-five (25) Ruger SR9s were used to create the samples for the test sets, and each firearm fired 200 cartridges to break them in before sample collection. Each handgun fired 800 cartridges in total for the test sets. In the test sets, no source firearm was repeated in a single test packet, except in the case when a test set was meant to be the same source comparison. The sets included 3 knowns to compare to a single questioned casing. For all the participants five (5) of the test sets were from known same-source firearms, and ten (10) of the test sets were from known different-source firearms. In addition to the results, the participants had to record the quality of the known samples, which allowed the authors to calculate a poor mark production rate. This rate was examined to avoid cherry-picking well-marked samples for the test sets, which usually draws criticism as making the test sets too easy. The authors also asked the participants not to use their laboratory peer review process, which allowed the error rates to reflect the individual examiner.
Results
False Negative
Out of the two hundred and eighty-four (284) participants, only two hundred and eighteen (218) returned completed responses. Out of the completed responses, 3% accounted for self-employed individuals. In total, thousand and ninety (1090) true same-source comparisons were made, where only four (4) comparisons were labeled elimination and eleven (11) were labeled inconclusive. The false elimination rate was calculated to be 0.3670% with the Clopper-Pearson exact 95% confidence interval calculated to be 0.1001%-0.9369%. Two (2) of the four (4) false eliminations were made by the same examiner, therefore 215 out of 218 examiners did not make a false elimination. When factoring in inconclusive with false elimination the error rate increases to 1.376% with the corresponding 95% confidence interval calculated to be 0.7722%-2.260%.
A number to take into consideration is the poor mark production rate that was discussed above. Two hundred and twenty-five (225) known samples out of nine thousand seven hundred two (9702) knowns were considered poor quality and inappropriate for inclusion in the comparison, which was calculated to be 2.319% of the samples with a corresponding 95% confidence interval of 2.174%-2.827%. This percentage is greater than the false elimination rate, which means there is a high probability that some of the false elimination can be attributed to the poor quality of the knowns used for comparison. Also, four (4) of the false eliminations were made by examiners who did not use inconclusive for any response, which could be attributed to their agency requirements.
False Positive
Out of the two thousand hundred and eighty (2180) true different-source comparisons, twenty-two (22) comparisons were labeled identifications, and seven hundred and thirty-five (735) were labeled inconclusive. The error rate for false identification was calculated to be 1.010% (Note: Two (2) responses were left blank and were subtracted from the total number of responses.). Out of the false identifications, all but two were made by five (5) of the two hundred and eighteen (218) examiners. Since a small number of examiners made the same error, it can be suggested that the error probability is not consistent across examiners, which was the idea stated at the beginning of this post. The beta-binomial model was used to estimate the false identification because it cannot be assumed that the probability is uniform across examiners. The probability was calculated to be 0.939% with a likelihood-based 95% confidence interval of 0.360%-2.261%.
The inconclusive also showed to be heterogeneous. Out of the two hundred and eighteen (218) examiners, ninety-six (96) labeled none of the comparisons as inconclusive, forty-five (45) labeled all 10 of the comparisons as inconclusive, and seventy-seven (77) examiners showed an even spread between the extremes.
My Discussion
The authors state that the false elimination error rate is in doubt because the poor-quality rate is higher than the false elimination rate, even with the inconclusive results factored in. I agree that the error rate should be questioned because the rate can be affected by the poor quality of the samples, which can lead an examiner to not conclude a positive comparison. But, there is another factor in play as well. Some laboratories do not allow their examiners to report inconclusive results and require that the conclusion can be either an identification or an elimination, which is something that the statistics community has been making a push for. But, this factor is hard to consider when the authors did not require the participants to disclose their laboratory practices. It can be assumed that this might be the case because all the false eliminations were made by examiners who did not conclude inconclusive in any of the comparisons.
The false positive rate is a percentage that should not be applied to the science but rather to an examiner. This 1% error rate is more representative of the examiners participating in this specific study. This can be seen when most of the false identifications were produced by five (5) examiners out of the two hundred and eighteen (218) total participants. This study also disclosed in the design that they did not want the laboratory review process to be a factor so that they can examine the individual examiners. It is my belief that if the review process was allowed in this experiment the error rate would be smaller or be close to 0%. So, the error rate can be used to advocate for examiners to be well-trained and to have a well-established QA system in place.
The study also addresses the higher inconclusive responses that were received with the different-source comparisons. The seven hundred and thirty-five (735) inconclusive results out of the one thousand four hundred and twenty-one (1421) reported eliminations is too large to be attributed to the poor-quality percentage. Just like the false elimination results, the inconclusive can be attributed to the policy of the laboratory. A laboratory may require the examiner to report an inconclusive result if the class characteristics are the same between the known and unknown samples. In this study since the same model of firearms were used for the creation of the known and unknown samples, the samples generated would have the same class characteristics. If the authors included a section where the participants were able to disclose their laboratory policy, we would be able to better understand the number of inconclusive results seen in the study.
Hopefully, my post will help bring to light the first part of the Ames study and provide more transparency to the error rates published in the paper. Please use this post as a reference or a quick summary for your knowledge, but seek out a copy of the original paper for a more in-depth look into the study design. The authors of the study were very detailed in their paper and it would prove to be very beneficial to read the paper for yourself. They go deeper in-depth on the design of the study and the creation of the samples than I have included in this post. They also have a large discussion section that dives deeper into the statistics they applied and why they were selected to properly represent the data. In a future post, I will be summarizing and discussing the second part of the Ames study so that more examiners will have access to what some critics of the science use as a reference.