Brain Extraction Evaluation Service

What Your Results Mean

The Brain Extraction Evaluation Service provides valuable information about how the masks generated by your stripping algorithm match up against expertly-stripped manual masks. The results you receive from the BEE service are in individual postscript files for each volume. Please read on for a detailed explanation of our postscript output files.

EXAMPLE OUTPUT


---- test mask subject00_kab.img
The filename for a volume that you uploaded to our ftp site.

---- target mask subject00_hand.kr.img
The filename for one of our manual masks to which you are comparing your strip mask.

TARGET MASK volume = 1454cc.
This number describes the total brain volume within the target mask.

boundaries "close enough" = 2.00 pixels
When you submitted your masks for comparison, you selected a Leniency of Fit level from 1 to 3. In this example, "2.00 pixels" corresponds to a selection of 2.00 for the Leniency of Fit. This metric describes how close the test mask must fit to the target mask to be considered acceptable. In other words, at each pixel location along the boundary, if your mask is within two pixels of the target mask, it is considered to be correct and contributes to the overall amount of correct boundary for the volume. Further, if you were to select 3 for Leniency of Fit, your mask could be as far as three pixels away from the target mask at any given pixel location and still be considered correct.

Targetmask boundary on axial slices -- test mask captured %72.02 of the "correct" boundary.
By definition, the boundary of the target mask is deemed "correct". This Correct Boundary metric describes the percentage of target-mask boundary voxels that correspond to voxels in the test-mask boundary. In other words, if you were to take the boundary of the test mask and lay it on top of the target mask, 72.02% of the target boundary would be "covered up".

If enclosed CSF is not treated as an error -- it captured %73.64 of the "correct" boundary.
This number includes voxels that counted toward the Correct Boundary but also includes voxels below the grey matter-CSF threshold. The rationale for this metric is that boundary error involving suprathreshold voxels is more pertinent than error involving low-intensity voxels (for example, CSF). This Pertinent Boundary metric disregards any error that occurs below the grey matter-CSF threshold.

Targetmask volume captured = %96.20
This number describes how much of the target mask was included in the test mask.

Average distance between mask and target boundaries (where captured) = 0.9 pixels
This number will vary with the Leniency of Fit.

VOLUME err Any intensity : wrongly_included = %5.5 wrongly_excluded = %3.8
These percentages describe the brain volume (counting voxels at all intensities) incorrectly included in or excluded from your test mask relative to the the volume of the target mask. In other words, if you were to lay the test mask on top of the target mask, any voxels in the test mask found to be either lying outside or missing from the inside would be considered "wrongly" included or excluded.

VOLUME err High intensity : wrongly_included = %4.1 wrongly_excluded = %0.8
These percentages describe the brain volume (counting voxels only at intensities greater than the grey matter-CSF threshold) incorrectly included in or excluded from your test mask relative to the the volume of the target mask. In other words, if you were to lay the test mask on top of the target mask, any voxel (with an intensity greater than the grey matter-CSF threshold) in the test mask found to be either lying outside or missing from the inside would be considered "wrongly" included or excluded. Our Misclassified Tissue metric is the sum of these two numbers.

Similarity index for testmask and targetmask (all values, high intensity values) = 0.954, 0.973
This metric is another way to look at the volume mismatch as calculated using the Dice Similarity Metric, where M1 is the test mask and M2 is the target mask: [2 x (intersection of M1 and M2)]/ [count(M1) + count(M2)]. In other words, when overlaying the test and target masks, these indices describe the number of voxels present in the test mask relative to the number in the target mask.

Images
% of total error on slice 34 = 0.6


subject00_kab.img high-intensity tissue errors


The postscript file also contains images of two slices taken from the raw, test-mask, and target-mask volumes. These slices represent regions where the largest misclassification error occured relative to total error for the volume. The graph at the end of the postscript file illustrates the distribution of percent error over the entire volume.


PERFORMACE COMPARISON
Below you will find a summary of how BET, BSE, and McStrip performed on our test dataset. Parameters employed by each algorithm are listed, and means are reported for 15 subjects (00-14) with the selected Leniency of Fit = 2.00.

Performace Metrics Reviewed:
Correct Boundary: the percentage of your test mask boundary that corresponds to the target mask boundary.
Pertinent Boundary: the Correct Boundary plus those segments where differences between the test mask and target mask boundaries occur in low-intensity voxels (below the grey matter-CSF threshold).
Misclassified Tissue: total volume error involving voxels with high intensity values; calculated as the sum of % wrongly included and % wrongly excluded.
Similarity Index: number of voxels present in the test mask relative to the target mask.



McStrip
Parameters:
Warp Mask: Third-order polynomial
Dilation Kernel: 7 x 7 x 7 voxels
Grey Threshold: 15-35%
Smoothing Kernel: FWHM=3mm

BSE Parameters within McStrip--
Anisotropic Smoothing Kernel: 5, 10, 15;
Iterations: 3;
Edge Detection Sigma: 0.60, 0.64, 0.70, 0.80, 0.90

Table 1. McStrip: Means for 15 subjects reported, Leniency of Fit = 2.00(subjects 00-14).
McStrip vs. UMN Mask
Performance Metrics
McStrip vs. UCLA Mask
88.9%
Correct Boundary
71.5%
89.3%
Pertinent Boundary
74.5%
1.8%
Misclassified Tissue
2.1%

.991
Similarity Index
(High Intensity Voxels)

.988

.980
Similarity Index
(All Intensities)

.966


BSE
Parameters for all subjects:
Anisotropic Smoothing Kernel,0
Iterations, 0
Edge Detection Sigma, 0.90

Table 2. BSE: Means for 15 subjects reported, Leniency of Fit = 2.00(subjects 00-14).
BSE vs. UMN Mask
Performance Metrics
BSE vs. UCLA Mask
48.5%
Correct Boundary
24.5%
51.4%
Pertinent Boundary
28.6%
26.0%
Misclassified Tissue
25.3%

.857
Similarity Index
(High Intensity Voxels)

.854

.829
Similarity Index
(All Intensities)

.814


BET
Parameters for all subjects:
Fractional Intensity Threshold (FIT), 0.45
Threshold Gradient (TG), 0.0

Table 3. BET: Means for 15 subjects reported, Leniency of Fit = 2.00(subjects 00-14).
BET vs. UMN Mask
Performace Metrics
BET vs. UCLA Mask
72.6%
Correct Boundary
58.5%
73.6%
Pertinent Boundary
61.8%
16.0%
Misclassified Tissue
14.9%

.924
Similarity Index
(High Intensity Voxels)

.926

.906
Similarity Index
(All Intensities)

.902


SPM

Table 4. SPM: Means for 15 subjects reported, Leniency of Fit = 2.00(subjects 00-14).
SPM vs. UMN Mask
Performace Metrics
SPM vs. UCLA Mask
75.0%
Correct Boundary
54.6%
76.5%
Pertinent Boundary
57.6%
3.8%
Misclassified Tissue
4.1%

.979
Similarity Index
(High Intensity Voxels)

.977

.954
Similarity Index
(All Intensities)

.934



© 2003 NeuroVia Lab
webmaster@neurovia.umn.edu
Last modified: 8/20/2003
HOME |  SEARCH |  CONTACT
SITEMAP |  CALENDAR
The University of Minnesota is an
equal opportunity educator and employer.