Computational Statistics

A Method for Estimating Occupational Radiation Dose to Individuals, Using Weekly Dosimetry Data

Mitchell, T. J., Ostrouchov, G., Frome, E. L., and Kerr, G. D. Radiation Research,147;195-207(1997) Postcript File(456KB) OR PDF File.(254K)

Statistical analyses of data from epidemiologic studies of workers exposed to radiation have been based on recorded annual radiation doses. It is usually assumed that the annual dose values are known exactly, although it is generally recognized that the data contain uncertainty due to measurement error and bias. We propose the use of a probability distribution to describe an individual's dose during a specific period of time. Statistical methods for estimating this dose distribution are developed. The methods take into account the ``measurement error'' that is produced by the dosimetry system, and the bias that was introduced by policies that lead to right censoring of small doses as zero. The method is applied to a sample of dose histories over the period 1945 to 1955 obtained from hard copy dosimetry records at Oak Ridge National Laboratory (ORNL). The result of this evaluation raises serious questions about the validity of the historical personnel dosimetry data that is currently being used in low-dose studies of nuclear industry workers. In particular, it appears that there was a systematic underestimation of doses for ORNL workers. This could result in biased estimates of dose-response coefficients and their standard errors.


George Ostrouchov and Edward L. Frome Comp. Stat. & Data Analysis (1993) Adobe Acrobat(pdf) (221K)

Large data sets cross-classified according to multiple factors are available in epidemiology and other disciplines. Their analysis often calls for finding a small set of best hierarchical models to serve as a basis for further analysis. This selection can be based on some well defined model optimality criterion. Fitting all possible models to find a best set is usually not feasible for as few as five factors (7581 possible models). We note that the set of hierarchical models and their relationships can be represented by a graph and develop an algorithm to generate it efficiently. We further develop a graph traversal algorithm that requires fitting of only a fraction of all models to find exactly a best subset of the models. The algorithm classifies as many models as possible on the basis of each fit. A data structure implementing the graph of model nodes keeps track of the information required by the model search algorithm.

The Computing facilities available for our work in computational statistics include those of the Mathematical Sciences Section (MSR) which houses approximately 50 networked high performance workstations (Sun Sparkstations 5/10/20 and IBM Risk/6000) as well as parallel computers (Intel, Sequent). Within MSR, there is an Advanced Visualization Research Center that has a number of high performance visualization workstations and other related high resolution graphics equipment. Network access is also available to supercomputers of the Center for Computational Sciences, which is housed in a nearby building. The primary CCS computers are two Intel Paragon computers, one with 512 processors and the other with 66 processors, and a Kendall Square KSR-2 with 64 processors. A high speed link connects MSR with the University of Tennessee (UT) Computer Science Department and the Joint Institute of Computational Science to facilitate the use of computers at both sites. UT computers include a CM-5 with 32 processors and a MasPar MP-2 with 4,096 processors. There are a number of other smaller MPP systems available plus an extensive infrastructure of Sun and IBM workstations on a local area network with various compute, network, file, and print servers.