Instructions for Using R-project Software for Analyzing Industrial Hygiene Data

(Including Data with Non-Detects and Non-parametric Data)

1. Introduction: These are instructions for generating
industrial hygiene metrics using R routines and MS Excel for Windows. MS Excel can be used to clean-up data, group
the data, create text files for analysis and create tables and charts for final
reports. Any word processing,
spreadsheet, or database software could be used for creating the text file. The Statistical Analysis of Non-Detects
(sand) package contains R routines that read text files and generate an output
file containing metrics. The metrics
generated are those recommended by the AIHA[1]
and are interpreted and used as described in the book.

2. You
should have already gone to http://www.csm.ornl.gov/esh/statoed/
and followed the instruction for installing R and the “sand” package. Now you can use R to analyze data in a text
file. The text file must have at least two
columns. One column contains a value and
the second a 0 or 1 for non-detect or detect.
Each column must have a one-word heading. The following describes one way to create the
text file using MS Excel.

3. This
example uses data from Table IV.3 of the AIHA book and is also the example data
in the file aihand.txt included in the download from the “statoed” web
site. The first column “Monitoring Data (mg/m^{3})” contains
a mix of values and the text symbol “<”.
The “Value” and “Detected 0=No 1=Yes” columns were created using a variety
of Excel text editing and logic functions.
For example the Excel logic function
=IF(LEFT(A2,1)="<",VALUE(REPLACE(A2,1,1,0)),A2) removes the “<” symbol when it appears.
While not important for a small file like this, these editing functions are
very helpful when cleaning-up large data sets.

4. Columns F and G contain the values and flags that you need to convert to a text file. Copy the two columns; open a new file, select “paste special”, and “values”.

5. In this case Column B has a 3 word heading “Detected 0=No 1=Yes” and this has to be changed to a one word heading such as “Detected” or “Flag”. The next step is to close the file and save in the rmain folder as a tab delimited text file. Click through the screens warning of the loss of formatting etc.

6. Open
the R console by double clicking on the icon in the “rmain” folder you
created. Type in the command “aihand<-readss("aihand",L=5)”. The L=5 is the value of the OEL being used to
interpret data. Hit “Enter” and the file
is read and analyzed. Once you have
typed in commands, the up and down arrows toggle through the commands you have
used. If you “Save Workspace Image” at
the end of the R session, the commands will be saved. With a large dataset, one often will group
data into subsets based on some variable (location, time, individual, etc) and
create several text files for analysis.
Use the up arrow, edit the file name command line (i.e.
Bldg1<-readss("Bldg1",L=5), Bldg2<-readss("Bldg2",L=5),
etc.) When the analyses are complete the
prompt returns.

7. R creates a new comma delimited text file “aihandout.csv” that contains the metrics. This can be opened with MS Excel and the two columns can be copied and pasted into spread sheet you will be using for further analysis and report writing.

8.
The “readss” command generates the following metrics. The industrial hygienist chooses those that
help interpret the data. Mean and
confidence intervals are useful for decisions on exposure groups and
constructing job and exposure matrices.
Upper tolerance limits and percent exceedance are useful for determining
compliance and other day-to-day risk management decisions. Parametric and non-parametric versions of
each are included.

__Label Metric Glossary __

mu 0.925 Maximum likelihood estimate (MLE)
of mean of the log

transformed data (log of GM)

se.mu 0.099 Estimate of the standard error of
mu

sigma 0.37 MLE of standard deviation of
log transformed data

(log of GSD)

se.sigma 0.079 Estimate of standard error of sigma

GM 2.522 MLE of geometric mean

GSD 1.447 MLE of geometric standard deviation

EX 2.7 MLE of the EX the (arithmetic) mean

LCLa_95 2.26 95% Lower Confidence Limit (LCL) for EX

UCLa_95 3.226 95% Upper Confidence Limit (UCL) for EX

KMmean 2.773 Kaplan-Meier (KM) Estimate of EX

KM.LCL 2.29 95% LCL for KM EX

KM.UCL 3.257 95% UCL for KM EX

KM.se 0.269 Standard Error of KMmean

Xp.obs 4.75 Observed 95^{th} Percentile
of data

Xp 4.633 MLE of 95^{th} Percentile

Xp.LCL 3.521 MLE of 95% LCL for Xp

Xp.UCL 6.096 MLE of the 95% Upper Tolerance Limit (UTL) of Xp

NpUTL NA Nonparmetric estimate of the 95% UTL of Xp.

Maximum 5.5 Largest value in the data set

NonDet% 20 The percent of Xs that are left censored

n 15 The number of observations in the data set

Rsq 0.969 Square of correlation for the data and standard log normal

m 12 The number of detected Xs

f 3.208 MLE of the percent exceeding the specified limit L

f.LCL 0.396 MLE of 95% LCL for f

f.UCL 14.767 MLE of 95% UCL for f

fnp 6.667 Nonparametric estimate of f for limit L

fnp.LCL 0.341 Nonparametric estimate of 95% LCL for f

FnUCL_95 27.94 Nonparametric
estimate of 95% UCL for f

m2logL 41.3044 -2
times the log-likelihood function

L 5 L is specified limit for
the percent exceeding; e.g., the OEL

p 0.95 percentile
for UTL p-gamma

gam 0.95 one-sided confidence level
gamma. Default is 0.95

9.
R will generate a log probability plot (also called a
Q-Q Plot) that provides a visual check of whether the data fits the lognormal
model (see the AIHA book.) Creating a
log probability plot requires two commands.
The first command, pnd<-plend(aihand), creates a data frame. The second, qq.lnorm(pnd), generates a
probability plot as a visual check of log-normality. This plot displays only detected values and
displays replicates as a single data point, which aids the visual check when
the data set is large. Clicking the
camera button copies the image so that it can be pasted into Excel or another
document.

10. The metrics calculated by the “readss” command can also be calculated separately. The commands for these are described in the help menu, which is shown when you type “help(sand).” The “readss” routine requires at least 3 detected results to run. One function that can be used with all non-detect is “nptl(n , p = 0.95, gam = 0.95)”, which provides the order of the value in a data set with n values that corresponds to non-parametric upper tolerance limit for specified percentile and upper tolerance.

[1] Ignacio,
J.S., and W.H. Bullock: A Strategy for Assessing and Managing Occupational
Exposures, 3^{rd} ed.