Daugman, NIST and the saga of the brown paper bag
There was clearly something afoot when one of the world's most esteemed biometric scientists brought out and then threatened to put a paper bag on his head in front of hundreds of government and industry participants at a recent show in the USA.
John Daugman, professor at Cambridge University and the inventor of iris recognition – as well as chief scientist for iris recognition at L-1 Identity Solutions – resorted to such graphic measures when expressing his frustration at what he describes as “mischievous” reporting by researchers in the highly-respected US government organisation, NIST.
Daugman's frustration is aimed, in particular, at two recent reports published by NIST, namely FRVT 2006 and ICE 2006 Large Scale Results and Meta-Analysis of Third Party Evaluations of Iris Recognition.
In the first of these reports, which was published in March 2007, NIST reveals the results of its latest face and iris trials and amid many interesting conclusions it also reports that the performance of iris recognition, still face recognition and 3D face recognition are comparable, given the various constraints of the tests.
In the second paper, which was published in August 2007, the author, Elaine Newton, questions the reputation of iris recognition as being “highly accurate” due to the lack of independent testing. She analyses the results of three major tests of iris recognition, namely NIST's ICE 2006, the Independent Testing of Iris Recognition Technology (ITIRT) study conducted by the International Biometric Group (IBG) and the Iris RecognItion Study 2006 (IRIS06) conducted by Authenti-Corp.
Similar to the first NIST paper, Newton compared the three studies at a False Match Rate of 0.001 and reported the False Non Match Rate (FNMR) for the best performers in each test to range from 0.0122 to 0.0175.
Daugman's arguments Daugman's conference presentation was given at the recent Biometric Consortium event in Baltimore. In his presentation he quoted one ‘unnamed’ US Government sponsor of the NIST 2006 FRVT and ICE tests, as having said that: “The deck was stacked against iris.”
Most controversial to Daugman is that the NIST report focuses on a part of the ROC curve which does not, in his view, display the true power of the technology – namely its ability to operate on very large databases in identification mode.
According to Daugman: “Recent NIST reports dispute the iris reputation for accuracy by selecting a point on the ROC curve where FMR = 0.001. Since hi-resolution face recognition based on skin texture can also reach this, NIST concludes they are ‘comparable’.”
Daugman continues: “The slope of log ROC curves for these algorithms is about 2:10 000 (the FnMR only doubles while FMR is reduced by a factor of 10 000 when the HD threshold changes from about 0.37 to about 0.32). So, while citing a FnMR of about 1%−2%, testers could equally well cite a FMR of: 1 in a million, or 1 in 100 000, or 1 in 10 000, or 1 in 1000.”
In other words, according to Daugman: “FnMR is hardly a function of FMR over this range. FnMR is mainly determined by the number of rubbish images included; not the threshold used.”
In his presentation Daugman showed a selection of the worst quality images used in NIST's tests (although not the exact images, as these have been kept confidential by NIST). Although these poor quality images are not representative of the entire ICE dataset, some were impossible to match, such as those where printed patterned contact lenses were used.
According to Daugman, comparing facial and iris recognition fairly would mean that the testers should have used some facial images where the candidate had placed a paper bag over their head.
In direct response to Daugman, Dr Jonathon Phillips, the ICE programme manager at NIST, said that iris image quality is a crucial part of testing. He points out that bad biometric samples happen and that technology evaluations must not ignore bad samples, although scenario evaluations may filter for quality. Interestingly, Phillips also challenged the notion that iris ROC curves are indeed “virtually flat”.
In Newton's paper which did not compare different biometric modalities, but only the results of three iris trials, she defends her decision to use the FMR of 1 in a 1000. This is because of the low number of samples in each of the three studies (ranging from 240 subjects in ICE to 1224 subjects in ITIRT). Going much beyond 1 in a thousand, could be statistically irresponsible Newton told Btt. That said Newton does provide a table in her paper (Table V) that compares the results of the various trials at a FMR of one in ten thousand. At this point the False Non Match Rate (FNMR) for the best performers in each test falls from (0.0122 to 0.0175) to (0.0141 to 0.02).
|