================================================================= # HAPLO STATS News File # # This file documents software changes up to version 1.3.8 # # format is as follows # --------------------------- # [change/add]: function.name title for issue # explanation of issue, status, and recommendations =================================================== ### changes made between releases 1.3.8 and 1.4.1 =================================================== -------------------- change: seqhap --use precision threshold for permuation p-values Adapt the permutation rules used in haplo.score's sim.control parameter to ensure accuracy and precision thresholds for permutation p-values. The permutations are carried out in seqhap.c, so the parameters p.threshold, min.sim, and max.sim are passed to the C code to permute the response until precision criteria met. No longer use n.sim parameter; now sim.control=score.sim.control() handles the permutations. ------------------- update: user manual The user manual has been updated from version 1.3.1 to reflect all the updates since then, and will be placed on Dan Schaid's software page, in addition to its current location within the package. ------------------- update: help files for example datasets contained \item keyword with no text, which didn't pass R CMD check on R 2.8.1. Now they pass. =================================================== ### changes made between releases 1.3.6 and 1.3.8 =================================================== --------------------------------- change: plot.seqhap --handle small p-values Handle very small p-values better by having a minimum allowable asymptotic p-value of .Machine.double$eps, and permutation p-value of 1/(n.sim+1). It will also handle a ylim value if passed. Add more useful warning messages for when p-values are fixed for plotting. ---------------------------------- change: haplo.score -add eps.svd In some assocation tests from haplo.score, we have observed extremely significant values for the global association test statistic. The degrees of freedom for the global test is the rank of the score vector's variance matrix. We found the source of the problem was having too low a cutoff (epsilon) for svd values for determining rank of the variance matrix. We increased the default for the epsilon from 1e-6 to 1e-5 and allow it to be changed by the user as the eps.svd parameter in any function that uses haplo.score (haplo.score.slide, haplo.cc). --------------------------------------- change: haplo.cc parameters We remove haplo.min.count as a top-level parameter; it can only be used in the control() function, just as in haplo.glm. Note that haplo.freq.min can also be used. The eps.svd parameter is also added, as noted for haplo.score. =================================================== ### changes made between releases 1.3.0 and 1.3.6 =================================================== ---------------------------------- add: haplo.power.qt and haplo.power.cc: Power and sample size calculations for haplotype association studies. Calculations are performed given a set of haplotypes, their freqs, and their beta coefficients, which can be converted by log(OR) for case-control (cc) or calculated for quantitative trait (qt) by R2 variance explained by gene association. For qt, use the find.haplo.beta.qt to get these beta coefficients. ------------------------- added: dataset hapPower.demo An example data set hapPower.demo is included in the package for demonstrating the haplo.power.qt/cc functions in example() and in the manual. ------------------------------ change: haplo.em In past versions, a change was made to pre-calculate how much memory would be needed for all haplotype pairs, and issued a warning if that memory could not be allocated. It stopped calculations that could have been completed by progressive insertion & trimming steps because rare haplotypes are trimmed off and memory rarely meets the max. So the warning is taken off. ------------------------------ change: haplo.em.control: min.posterior The old default for min.posterior was set at 1e-7. In rare cases of some datasets that had low LD and 10 or more markers, the trimming steps actually trimmed away all haplotypes for a given person and the person was removed. We have changed min.posterior to 1e-9 and put in warnings and check for this occuring. Note, we have only observed this in simulated data on very rare occasions. ------------------------------- change: haplo.glm remove allele.lev and miss.val parameters We used to require the use of allele.lev as a parameter for haplo.glm, and allow miss.val to specify codes for missing alleles in the genotype matrix. However, we require using setupGeno to prepare the genotype matrix to be used in haplo.glm, after it is added to the data.frame to be passed to haplo.glm. miss.val is completely taken care of there, and allele.lev is assigned as an attribute of geno. We have re-worked the formula and na.geno.keep to recognize these values when it finds geno in the formula; therefore, these parameters are not required in haplo.glm. ---------------------------- change: na.geno.keep We used to keep all subjects who were missing any number of alleles. However, if a subject is missing all alleles, they both slow the calcualtions down, and don't add any information to the analysis. This function still removes subjects missing y or covariate values, and now removes subjects missing all their alleles. After the removal, the attributes of the genotype matrix are re-calculated and retained for its use in haplo.model.frame. ------------------------- changes: haplo.model.frame Get allele.lev from geno in m[[]], not as passed paremeter from haplo.glm. --------------------------- change: haplo.glm.control enforce the default setting for haplo.min.count and haplo.freq.min in the function delcaration. In the declaration they were NA, but a default min.count of 5 was enforced. We have changed the default of haplo.freq.min of .01 to be enforced, and the delcaration now reflects the enforced default. ------------------------- changes: Ginv.q and Ginv.R Nothing has changed for R. Splus version 8.0.1 has a problem in its use of the svd fortran function, as called by svd.Matrix. We contacted Insightful and they fixed it for version 8.0.4. We include the svd.Matrix function from version 7 and 8.0.4 in the Ginv.q file, but only load it if the Splus version matches 8.0.1. ------------------------------- change: louis.info.c Prior efforts to make all long integer values as int was not completed for this function. The result was the package didn't work on linux 64bit machines. Now it doesn't use long, and it should work on most platforms. -------------------------- change: louis.info.q When the variance of a quantitative trait is so high that the the information matrix becomes ill-conditioned, the Ginv determines the information matrix singular, and the standard errors are incorrect. Change the epsilon parameter for the generalized inverse to about 1e-8, versus the old default in Ginv of 1e-6. ========================================================= #### changes made between release 1.2.5 and 1.3.0 ##### ========================================================= -------------------------- seqhap: sequential haplotype selection in a set of loci For choosing loci for haplotype associations, as described in Yu and Schaid, 2007. The method performs three tests for association of a binary trait over a set of bi-allelic loci. When evaluating each locus, loci close to it are added in a sequential manner based on the Mantel-Haenszel test. -------------------------- geno1to2: convert geno from 1- to 2-column convert 1-column minor-allele-count matrix to two-column allele codes --------------------- plot.haplo.score.slide: handle near-zero pvalues For asymptotic pvalues near zero, set to epsilon. For simulated, set to 0.5 divided by the number of simulations performed ----------------------- haplo.design: create design matrix for haplotypes In response to many requests made for getting columns for haplotype effects to use in glm, survival, or other regression models, we created a function to set up this kind of design matrix. There are issues surrounding the use of these effect columns, as outlined in the user manual. ---------------------- Ginv: svd problems continue The Matrix library svd function has changed for Splus 8.0.1. Therefore, revert back to the default svd function in getting the generalized inverse. ========================================================= #### changes made between release 1.2.0 and 1.2.5 ##### ========================================================= ----------------------------------- haplo.glm: Iterative steps efficiency In consecutive steps of the IRWLS steps in haplo.glm, the starting values for re-fitting the glm model were not updated to be the most recently updated values. This now saves about 20% of run time in haplo.glm. ----------------------------------- haplo.score: haplo.effect allow additive, dominant, recessive A new option to make haplo.score more flexible. Previously the scores for haplotypes were computed assuming an additive effect for all haplotypes. A new parameter, haplo.effect, is in place to allow either additive, dominant, or recessive effects. ----------------------------------- haplo.score: min.count parameter The cut-off for selecting haplotypes to score is either by a minimum frequency, skip.haplo, or a new option, min.count. The min.count is based on the same idea as that used in haplo.glm, where the minimum expected count of haplotypes in the population is enough such that accurate estimates of parameters and standard errors are computed. The min.count became needed when haplo.effect was added because under the dominant or recessive models, the number of persons actually having a haplotype effect could be fewer than the expected count over the population (i.e., haplotype pair h1/h2 is coded as 0 for both under recessive model, and h1/h1 is coded as 1 under dominant). --------------------------------------- haplo.em: improved reliability of C routines Previously problems had been observed with running haplo.em and haplo.glm on linux 64-bit machines, because of issues with the storage of integers in R. In R, all integers are stored as int, which are stored differently on 64-bit and 32-bit machines. We get around this problem by using all int types for integers, which are only used for indices of other data structures. We find out the max value for integers on the system, and if the indices are going to exceed the max, issue a warning from C. -------------------------------------- haplo.glm and Ginv: improvement of standard error calculations Under some extreme circumstances, such as haplo.glm modeling haplotypes with rare frequencies, or a high amount of variance in the response, the standard error estimates were unreliable. The issue came out in the Ginv function in haplo.stats, which needed a smaller epsilon to decide on the rank of the information matrix. ========================================================= #### changes made between release 1.1.1 and 1.2.0 ##### ========================================================= --------------------- haplo.em: fixed memory leak Versions up to 1.1.1 had either one or two memory leaks in haplo.em. They are fixed. --------------------- All .C functions: Long Integers warning for 64-bit machine Due to problems with long integers between 32-bit and 64-bit machines using R, all integers used in C functions will use unsigned integers. --------------------- haplo.glm: haplo.effect="recessive" the estimation stops if no columns are left in the model.matrix for homozygotes with the haplotype, and for haplotypes that do not have any subjects with a posterior probability of being homozygous for the haplotype, those subjects are grouped into the baseline effect. Guidelines for rare haplotypes are explained further in the manual. --------------------- haplo.glm: na.action, when not specified got set to something besides the intended 'na.geno.keep'. Now the default setting works. --------------------- haplo.cc: New Function for Case-Control Analysis New function added to combine methods of haplo.score, haplo.group and haplo.glm into one set of output for Case-Control data. Choose haplotypes for analysis by haplo.min.count only, not a frequency cut-off. ------------------ haplo.score: skip.haplo new default Default for skip.haplo is now 5/(nrow(geno)*2) ------------------ haplo.glm: haplo.freq.min and haplo.min.count control parameters Haplotypes used in the glm are still chosen by haplo.freq.min, but the default is based on a minimum expected count of 5 in the sample. The better choice for selecting haplotypes is haplo.min.count. The issue is documented in the manual and help files. ----------------- haplo.score: max-stat simulated p-value A better description of this is included in the manual and help file ------------------ haplo.em.control and haplo.em: defaults for control parameters changed The default for control parameter: max.iter=5000, changed from 500 insert.batch.size = 6, changed from 4 ------------------ locus The genetics package for R has a function named locus which does not agree with locus from haplo.stats. We do not plan to change it, so be aware of the possible clash if you use these two packages ------------------ haplo.scan: new function For analyzing a genome region with case-control data. Search for a trait-locus by sliding a fixed-width window over each marker locus and scanning all possible haplotype lengths within the window ================================================================= ### changes made prior to release 1.1.1 ##### ================================================================= ---------------------- haplo.glm: Warnings for non-integer weights glm.fit for R does not allow non-integer weights for subjects, whereas S-PLUS does. Use a glm.fit.nowarn function for R to ignore warnings. --------------------- haplo.glm: Character Alleles Local settings for strings as factors causes confusion for keeping orinial character allele values. To ensure consistency of allele codes, use setupGeno() and then in the haplo.glm call, use allele.lev as documented in the manual and help files. --------------------- haplo.score.slide: add to package Run haplo.score on all contiguous subsets of size n.slide from the loci in a genotype matrix (geno). --------------------- haplo.score: simulations controlled for precision Employ simulation precision criteria for p-values, adopted from Besag and Clifford [1991]. Control simulations with score.sim.control.