This paper presents a comparative evaluation of methods for automated voxel-based spatial mapping in diffusion tensor imaging studies. Such methods are an essential step in computational pipelines and provide anatomically comparable measurements across a population in atlas-based studies. To better understand their strengths and weaknesses, we tested a total of eight methods for voxel-based spatial mapping in two types of diffusion tensor templates. The methods were evaluated with respect to scan-rescan reliability and an application to normal aging. The methods included voxel-based analysis with and without smoothing, two types of region-based analysis, and combinations thereof with skeletonization. The templates included a study-specific template created with DTI-TK and the IIT template serving as a standard template. To control for other factors in the pipeline, the experiments used a common dataset, acquired at 1.5T with a single shell high angular resolution diffusion MR imaging protocol, and tensor-based spatial normalization with DTI-TK. Scan-rescan reliability was assessed using the coefficient of variation (CV) and intraclass correlation (ICC) in eight subjects with three scans each. Sensitivity to normal aging was assessed in a population of 80 subjects aged 25-65 years old, and methods were compared with respect to the anatomical agreement of significant findings and the R2 of the associated models of fractional anisotropy. The results show that reliability depended greatly on the method used for spatial mapping. The largest differences in reliability were found when adding smoothing and comparing voxel-based and region-based analyses. Skeletonization and template type were found to have either a small or negligible effect on reliability. The aging results showed agreement among the methods in nine brain areas, with some methods showing more sensitivity than others. Skeletonization and smoothing were not major factors affecting sensitivity to aging, but the standard template showed higher R2 in several conditions. A structural comparison of the templates showed that large deformations between them may be related to observed differences in patterns of significant voxels. Most areas showed significantly higher R2 with voxel-based analysis, particularly when clusters were smaller than the available regions-of-interest. Looking forward, these results can potentially help to interpret results from existing white matter imaging studies, as well as provide a resource to help in planning future studies to maximize reliability and sensitivity with regard to the scientific goals at hand.