SPREG is a computer program for performing regression analysis of secondary phenotype data in casecontrol association studies. Secondary phenotypes are quantitative or qualitative traits other than the casecontrol status. Because the casecontrol sample is not a random sample of the general population, standard statistical analysis of secondary phenotype data can yield very misleading results. SPREG implements valid and efficient statistical methods, as described in Lin DY, Zeng D. 2009, Proper analysis of secondary phenotype data in casecontrol association studies, Genetic Epidemiology, 33:256265.
The software is written in C and built under 64bit x86based Linux. The current release performs linear regression analysis of quantitative traits and logistic regression of binary traits under the additive mode of inheritance with or without environmental factors.
spreg infile outfile rate nenv
All four program parameters are required and must be entered in the same order as specified above.
Parameter  Description 

infile  Name of the input file 
outfile  Name of the output file 
rate  Disease rate 
nenv  Number of environmental variables; 0 or a positive integer 
The program requires one input file. The input file contains text data in a matrix format. Suppose there are n study subjects, k genes, and d environmental variables. The input file is a (k + d + 2) by n matrix, with columns representing study subjects, and rows conforming to this format:
If the disease is rare, enter any number less than 0.01 for the disease rate.
Computational results are written to the output file specified by the user. For each gene, the output shows the maximum likelihood estimate of the genetic effect (i.e., slope parameter in the linear model or log odds ratio in the logistic model), its standard error, the standardnormal test statistic and the (twosided) pvalue.
The example files (can be downloaded in the DOWNLOAD section below) includes an input file demo.dat
and an output file demo.out
. The input file demo.dat
contains the casecontrol status of 3000 individuals, a continuous secondary phenotype, two environmental variables, and genotypes of 10 genes. The disease rate is 0.08.
Enter the command
spreg demo.dat demo.out 0.08 2
to obtain the output file as given in demo.out
. Its contents are
Gene_number Estimate Std_Error Z_stat p_value
1 5.203e03 8.509e03 6.114e01 5.409e01
2 1.275e03 5.514e03 2.312e01 8.172e01
3 3.067e03 5.713e03 5.368e01 5.914e01
4 8.081e04 5.651e03 1.430e01 8.863e01
5 1.198e04 8.013e03 1.495e02 9.881e01
6 2.148e03 6.575e03 3.267e01 7.439e01
7 2.517e03 7.245e03 3.474e01 7.283e01
8 2.308e03 5.975e03 3.864e01 6.992e01
9 2.384e02 6.004e03 3.970e+00 7.178e05
10 1.665e02 6.257e03 2.661e+00 7.781e03
Lin DY, Zeng D. 2009. Proper analysis of secondary phenotype data in casecontrol association studies. Genetic Epidemiology, 33:256265.
Version  Date  Description 

1.0  April 22, 2008  First version released 
2.0  May 18, 2011 

University Operator: (919) 9622211  © 2019 The University of North Carolina at Chapel Hill 