History

The original NPAIRS software was developed by Stephen Strother in IDL during 1996 and included modules from the VAST software library at the VA Medical Center written by Jon Anderson, Kirt Schaper and Kelly Rehm. These preliminary results were presented at BrainMap'96 and published in Strother et al. (1997). The software offered for downloading from this site has been completely rewritten and updated by Jon Anderson during 1999 and 2000, and many of its features are described and used in Strother et al. (2001).



What is NPAIRS?

NPAIRS (Nonparametric, Prediction, Activation, Influence, Reproducibility, re-Sampling) is a software package used for the analysis of neuroimaging data. NPAIRS is based on split-half resampling which takes the data - as represented by a 2D data matrix with rows as observations (scans) and columns as brain regions (voxels) - and randomly divides the matrix into 2 disjoint halves. Each data half is analyzed separately using a chosen analysis technique such as the General Linear Model (GLM) or Canonical Variate Analysis (CVA). The results of the 2 analyses are then compared. One measurement taken is the reproducibility of the spatial patterns generated by each disjoint data set. If the spatial patterns look alike, as measured by the correlation coefficient computed across voxels, then the patterns are said to reproduce.Prediction error is another metric of interest. The results from the 1st split-half (training set) are used to "predict" the group membership of the scans in the 2nd split-half (test set). The training/test roles of the 1st and 2nd split-half are then reversed, and a second set of prediction measurements are computed. This process - splitting the data into disjoint halves, analyzing each half, and comparing results - is repeated until all the possible disjoint pairs have been exhausted, or until a user specified limit has been reached.

In summary, NPAIRS takes a data set and repeatedly divides it into 2 independent parts. For each partition, NPAIRS determines if there is "agreement" between the results generated by feeding each data pair through some analysis technique. The "agreement" metrics include spatial pattern reproducibility as measured by the correlation coefficent, prediction error using training/test techniques, and SPM's indicating the likelihood of "activation" at each voxel.

The pseudo code for an NPAIRS analyses is:

NPAIRS provides a number of tools for displaying the results of an analysis. This includes displaying of rSPM{Z} spatial patterns and plotting of correlation coefficients (reproducibility), prediction errors, and subject influence metrics.



References

Technical Publications
Strother SC, Rehm K, Lange N, Anderson JR, Schaper KA, Hansen LK, Rottenberg DA. Measuring activation
pattern reproducibility using resampling techniques. In: Quantitative functional brain imaging with
Positron Emission Tomography. (Carson RE, Daube-Witherspoon ME, Herscovitch P, eds.), Academic Press,
San Diego, pp. 241-246, 1998.

Strother SC, Anderson J, Hansen LK, Kjems U, Kustra R, Siditis J, Frutiger S, Muley S, LaConte S,
Rottenberg D. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS data
analysis framework. Neuroimage 15:747-771, 2002.

Kjems U, Hansen LK, Anderson J, Frutiger SA, Sidtis JJ, Rottenberg D, Strother SC. The quantitative
evaluation of functional neuroimaging experiments: Mutual information learning curves. Neuroimage
15:772-786, 2002. 

Application Publications

Strother SC, Lange N, Anderson JR, Schaper KA, Rehm K, Hansen LK, Rottenberg DA. Activation pattern
reproducibility: Measuring the effects of group size and data analysis models. Hum Brain Mapp,
5:312-316, 1997.

Tegeler C, Strother SC, Anderson JR, Kim S-G. Reproducibility of BOLD-based functional MRI obtained at
4T. Hum Brain Mapp, 7:267-283, 1999.

Frutiger S, Strother SC, Anderson JR, Sidtis JJ, Arnold JB, Rottenberg DA. Multivariate predictive
relationship between kinematic and functional activation patterns in a PET study of visuomotor
learning. Neuroimage 12:515-527, 2000.

Muley SA, Strother SC, Ashe J, Frutiger SA, Anderson JR, Sidtis JJ, Rottenberg DA. Effects of changes
in experimental design on PET studies of isometric force. Neuroimage 13:185-195, 2001.

Shaw M, Strother SC, McFarlane AC, Morris P, Anderson J, Clark CR, Egan GF. Abnormal functional
connectivity in post-traumatic stress disorder. Neuroimage 15:661-674, 2002.



Definitions

AIR
c*beta Volume
Canonical Eigenimage Weights
CVA
Data Matrix
Data Volume
Dimension
FWHM
Generalization Error
GLM
GLM
Mask Volume
Mean Z-score Volume
MSR
Noise Axis Pattern
NPAIRS
NPAIRS ID
Number of Splits
Pattern Type
PC
PCA
Prediction Error
Read_Matrix Format
Reproducibility
rSPM{Z}
Signal Axis Pattern
Spatial Pattern
Split
Split-Half
Split-Group
Split-Object
SPM
SSM
Subject Influence
SVD
T-scores
Volume List File
VLF
VMN
Z-score Volume



AIR

AIR, Automated Image Registration, is a popular registration software package developed by Roger Woods at UCLA. It includes both intra-subject and inter-subject registration techniques.

[Top]



c*beta Volume

This is a spatial pattern generated by a GLM analysis. The volume is computed by c*beta, where c is a contrast vector, and beta is the matrix of estimated regression coefficients.

[Top]



Canonical Eigenimage Weights

A canonical eigenimage weight is a value that determines how the canonical eigenimage patterns are to be computed. True canonical eigenimage patterns are created with a weight of 0 and are saved in a volume with a '.nwcgis' suffix. A weight of 1 results in "weighted" canonical eigenimage, with a file suffix of '.cgis'. Canoncial eigenimage weights (and PC subset selection) are means of regularization.

Mathematically, a canonical eigenimage is created by C = V*(E^w)*L, where C is a matrix having one CVA eigenimage vector (pattern) per column, V is a matrix having one PCA eigenimage vector per column, E is a diagonal matrix of PCA eigenvalues, w is the CVA eigenimage weight, and L are the CVA eigenvectors of W'*B. The standard (true) canonical eigenimage has w = 0, which reduces the equation to C = V*L, where the canonical eigenimages are now just a linear combination of the PCA eigenimages.

[Top]



CVA

Canonical Variate Analysis.

[Top]



Data Matrix

A 2 dimensional array that holds the functional data to be analyzed. The columns index voxel locations (variables) and the rows index volumes (observations). The data matrix is typically highly ill-posed (i.e, many more voxels than volumes).

[Top]



Data Volume

A data volume is a 3D array, i.e volume, holding some sort of imaging data (e.g, PET, fMRI) that was collected at the same time point. A data volume is usually saved on disk in either an Analyze file (plus header) or in a VAPET file.

[Top]



Dimension

A "dimension" is defined somewhat loosely in the context of NPAIRS. For an NPAIRS that includes CVA, a dimension simply refers to one of the CVA dimensions. If there are N classes in a CVA, then there are N-1 CVA dimensions. In GLM, a dimension refers to one of the contrast vectors. Each GLM contrast vector corresponds to a dimension. If only one contrast vector was defined, which generates one spatial pattern, then there is one "dimension". Unlike CVA dimensions, which are ordered according to between class variance, the GLM "dimensions" are unordered. In NPAIRS, all dimensions within a pattern type are saved to the same multiple volume VAPET file.

[Top]



FWHM

FWHM, Full-Width Half-Maximum, measures the width of some profile, e.g, probability density function, by computing the full width of the profile at half its maximum value.

[Top]



Generalization Error

Generalization error measures the ability of a model, whose parameters are estimated from a training data set, to predict the scan labels of an independent test data set.

[Top]



GLM

General Linear Model. This is the statistical technique implemented in SPM.

[Top]



IDL

IDL is a programming language developed by Research Systems Inc that specializes in the analysis and visualization of data.

[Top]



Mask Volume

A mask volume is a 3D array, i.e volume, with 0's and 1's only. Mask volumes are usually used to indicate whether each voxel in a data volume is brain or non-brain. Brain voxels are assigned 1 and non-brain voxels are assigned 0.

[Top]



Mean Z-score Volume

The mean Z-score volume is the average of the individual Z-score Volumes across the NPAIRS splits. If the number of splits is N, then the value at a particular voxel in the mean Z-score volume is the average voxel value across the N individual Z-score volumes, at that voxel location.

[Top]



MSR

Mean Scan Removal (MSR) is a method for removing subject block effects from the data. The mean volume (scan) is computed across all volumes (scans) within a subject's scan session. This mean volume is then subtracted, on a voxel basis, from each of the individual volumes within the scan session. This is essentially removing the subject's mean spatial pattern from each scan. Performing a PCA on MSR normalized data will result in the elimination of subject specific Principal Components.

[Top]



Noise Axis Pattern

The principal axes are computed between the 2 spatial patterns created from an NPAIRS split. The (x,y) points in the 2D scatter plot - where x is a voxel value from the first spatial pattern, and y is the voxel value from the second spatial pattern - are projected onto the minor axis. The result of these projections is called the noise axis pattern.

[Top]



NPAIRS

Nonparametric, Prediction, Activation, Influence, Reproducibility, re-Sampling

[Top]



NPAIRS ID

The NPAIRS ID is a string that identifies a particular run of the NPAIRS program. Each time NPAIRS is run, an ID is specified in the parameter file that will be used to identify the output of the analysis.

[Top]



Pattern Type

An analysis method (e.g, CVA or GLM) can sometimes produce multiple spatial patterns. For example, in CVA, a canonical eigenimage is created for each separate canonical eigenimage weight. In GLM, T-statistics volumes and c*beta volumes can be generated. Each different spatial pattern is refered to as a pattern type.

Note, the spatial patterns generated for each CVA dimension, and the spatial patterns generated by multiple contrast vectors in GLM, are not considered different pattern types. They are considered dimensions within the pattern type.

[Top]



Number of Splits

The number of times to split the data into disjoint parts in an NPAIRS analysis.

[Top]



PC

Principal Component.

[Top]



PCA

Principal Component Analysis.

[Top]



Prediction Error

Prediction error measures the ability of a model, whose parameters are estimated from a training data set, to predict the group membership (scan labels) of an independent test data set.

[Top]



Read_Matrix Format

A file is in "read_matrix" format if it can be read by the read_matrix IDL program. A file in this format is an ASCII file holding the values of an array. Blank lines and lines starting with '#' are ignored. The array can be of any size and dimension. The size of each dimension is specified in the first (non-blank, non-#) line of the file as a space delimented list of integers. This list of dimension sizes must all be on one line. The number of integers in the list determines the number of dimensions. For example, if the first line is "4 7 9" then we are dealing with a 3 dimensional array with dimension sizes of 4,7,9, respectively. The white-space delimented values are then specified for each element in the array, with 1st dimension values listed first and and last dimension values listed last. [In IDL, the 1st dimension indices change that fastest].

[Top]



Reproducibility

Reproducibility measures the similarity of the spatial patterns generated from the analyses of two independent data sets.The usual metric to measure reproducibility is the Pearson correlation coefficient, r.

[Top]



rSPM{Z}

See Z-score Volume.

[Top]



Signal Axis Pattern

The principal axes are computed between the 2 spatial patterns created from an NPAIRS split. The (x,y) points in the 2D scatter plot - where x is a voxel value from the first spatial pattern, and y is the voxel value from the second spatial pattern - are projected onto the major axis. The result of these projections is called the signal axis pattern.

[Top]



Spatial Pattern

A spatial pattern is a 3D image that is generated by some data analysis technique. The voxel values in a spatial pattern usually hold some estimate of a model parameter (e.g, a regression slope), or some statistic computed from an estimated model parameter (e.g, a Student t value), or a P-value indicating the likelihood of some statistic. The GLM creates spatial patterns that are SPM's, e.g, SPM{Z}, SPM{t}, SPM{F}. The spatial patterns generated by Canonical Variate Analysis (CVA) are called canonical eigenimages. And NPAIRS creates rSPM{Z} spatial patterns generated by the normalization of the signal axis.

[Top]



Split

Refers to a division of a data set into 2 disjoint parts. How the split is performed depends on the Split-Object, and, occasionally, the Split-Group.

[Top]



Split-Half

This is a split of the data into 2 equal halves, with the additional constraint that all the data is represented in a split, i.e, no data is left out. This is the typical splitting strategy for an NPAIRS analysis.

[Top]



Split-Group

Refers to a group structure that is imposed on the NPAIRS Split-Object. Each split-object belongs to one split-group. When a split of the data is performed, the proportion of the number of objects in a group to the total number of objects in the split is equal to the group proportion of the data set as a whole. For example, assume the data set consists of 28 scan sessions comprised of 12 normal and 16 disease subjects, and the Split-Group is defined by patient population. Then a split-half would have 14 subjects in each half, each consisting of 6 normals (6/14 = 12/28) and 8 disease (8/14 = 16/28).

[Top]



Split-Object

Split-objects are the things that are to be split in an NPAIRS analysis. Typically, the Split-Object is the scan session. For example, if the data set consists of 6 scan sessions labeled 1,2,3,4,5,6, then one split of the data could be 1,2,3 and 4,5,6, and the data matrix for the 1st half of the split would consist of all rows in the entire data matrix associated with scan sessions 1,2,3. And likewise, the data matrix for the 2nd half of the split would be all rows in the entire data matrix associated with scan sessions 4,5,6.

[Top]



SPM

A Statistical Parametric Map (SPM) is a 2D or 3D image whose voxel values are statistics from a known distribution, e.g, Student's T (SPM{t}), standard normal (SPM{Z}), F distribution (SPM{F}).

SPM also refers to an analysis software package developed at the Wellcome Department of Cognitive Neurology which implements, among other things, the General Linear Model.

[Top]



SSM

Subprofile Scaling Model.

[Top]



Subject Influence

Subject influence measures how much each subject adds (or removes) from the reproducibility of a data set. The first step in computing this metric is to generate a reference pattern. Usually, this is done by averaging together the 2 spatial patterns that give the highest correlation coefficient. [Another way to create a reference is to compute the average pattern over all the patterns generated by the NPAIRS splits (2*#splits)]. Next, for each split, the correlation coefficient (r) is computed between the reference pattern and the 2 patterns generated by each split-half. The two r values are compared. The subjects belonging to the split-half with the higher r value are identified, and counters (1 per subject) for these subjects are incremented by 1. For an N split NPAIRS analysis, the highest value of a counter, for any one subject, is N. The lowest is 0. Subjects with higher counts tend to add to the reproducibility of the data set, and subjects with lower counts tend to lessen the reproducibility.

[Top]



SVD

The Singular Value Decomposition (SVD) of a p x n matrix, X, of rank r is given by:

X = U * D * V', where
U is a p x r column orthogonal matrix (U' * U = I, with I of size r x r),
V is a n x r column orthogonal matrix (V' * V = I, with I of size r x r),
D is a r x r diagonal matrix of singular values

For imaging data, p is the number of volumes (scans) and n is the number of voxels. Usually, n >> p and r = p. In this case:

U is p x p and becomes an orthonormal matrix (U' * U = U * U' = I),
V is a n x p column orthogonal matrix,
D is a p x p diagonal matrix of singular values

Also note, after some linear algebra, that:
(X*X')*U = U*D^2, and
(X'*X)*V = V*D^2,

Therefore, the columns of U are eigenvectors of X*X', and the columns of V are eigenvectors of X'*X, with both having the same eigenvalues, D^2.

[Top]



T-statistic Volume

This is a spatial pattern generated by a GLM analysis. The volume is computed by c*beta/se, where c is a contrast vector, beta is the matrix of estimated regression coefficients, and se is the vector of standard errors at each voxel.

[Top]



Volume List File

A volume list file (VLF) is an ASCII file containing information on a set of data volumes. The file has one line for each volume. Each line consists of a number of space separated fields that hold different bits of information about a volume. Comment lines in a VLF start with the '#' character. Click here for a more detailed description.

[Top]



VLF

VLF is short for Volume List File.

[Top]



VMN

Volume Mean Normalization (VMN) is a normalization method used to account for global effects across volumes. The mean voxel value is computed across all brain voxels. Each voxel value is then replaced by dividing its value by the mean. This results in all volumes having a mean voxel value of 1.

[Top]

Z-score Volume

A Z-score volume, aka rSPM{Z}, is generated for each NPAIRS split by dividing the signal axis pattern (for the split) by the variance of the noise axis pattern (for the split). That is, the Z-score is the signal axis pattern normalized by a scalar whose value is the variance (across all voxels) of the noise axis pattern.

[Top]