Assessing the reproducibility in sets of Talairach coordinates
Finn Årup Nielsen , Lars Kai Hansen
Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark

Modeling & Analysis

Abstract
Introduction
Statements like "this study demonstrates highly consistent findings" or "our results reveal a striking degree of overlap" appear commonly in the literature. Such statements are typically based on informal comparison between activation maps. Computerized methods for comparing activation maps in the form of images exist, e.g., see [1]. Here we propose methods for comparing activation maps when they exist in the form of sets of stereotaxic coordinates. We extented the method developed in connection with information retrieval where a metric was provided for assessing the similarity between the coordinate sets [2]. Our aim is to develop quantitative supports for phrases like "striking degree of overlap" and "highly consistent", i.e., a statistical test for replication, reproducibility or consistency.

Methods
We describe two methods. The first method uses a database of "experiments" (sets of Talairach coordinates) to generate a null-distribution for a similarity measure: A distribution is computed for the similarity between all pairs of experiments in the database. When two new experiments are to be assessed for reproducibility their similarity is compared against the distribution of the database. A P-value is generated based on the rank of the similarity. We use the Brede database [3] and a similarity based on voxelization and the cross-correlation coefficient [2]. Our second method tests whether two coordinates from two different experiments are statistically the same, and the statistic is based on the minimum distance between all pairs of coordinates (xn, xm) in two experiments: d = minn,m sqrt[(xn-xm)'(xn-xm)]. To form a P-value a new distance is compared against the distribution found from all pairs of experiments in the database.

Results
The Brede database presently contains 368 experiments from 118 different papers. Figure 1 shows the sorted similarities from all pairs in the database, excluding those pairs that are from the same paper. A threshold for P=0.05 appears at a similarity of 0.35. A histogram of the minimum distance d is shown in figure 2 and the associated d-value for a P-value of 0.05 is d=6.9mm.

Discussion
The distribution of the minimum distance tells us that if we would like to say that two coordinates from two different experiments are the same they should be closer than approximately 7 millimeters, and the similarity distribution indicates that the similarity should be larger than 0.35 before we can accept that an experiment is reproduced. The statistics do not model the number of coordinates in each experiment nor their distribution in the brain. One would expect the minimum distance to be smaller if the experiments have many coordinates. Furthermore, the distribution of the similarity measure changes depending on the type of voxelization and type of similarity measure. Nevertheless, our method provides a first step for a quantitative reproducibility measure for sets of coordinates.

References
1. Lange, N., et al., NeuroImage, September 1999, 10(3):282-303.
2. Nielsen, F. Å., Hansen, L. K., Artificial Intelligence in Medicine, 2003, In print.
3. Nielsen, F. Å., NeuroImage, 2003, 19(2), Presented at the 9th International Conference on Functional Mapping of the Human Brain.




Plot of P-values as a function of similarity.




Histogram of the minimum distance statistics.