Package: randomSurvivalForest Version: 3.2.3 BUILD: release_3_2_1_branch_bld20080408 --------------------------------------------------------------------------------- CHANGES TO RELEASE 3.2.3 RELEASE 3.2.3 represents a critical update of the product resolving unexpected program termination under certain conditions. Key changes are as follows: o In the case of missing data, it is possible for imputation to fail catastrophically in PREDICT or INTERACTION mode. This can occur when the GROW data set has no missing data, but the non-GROW data set does have missingness. Thanks to Andy J. Minn for finding this feature. o In the case of a single predictor, PREDICT mode can fail to dimension the data properly causing unpredictable results. Thanks to Eric A. Macklin for finding this feature. --------------------------------------------------------------------------------- Release 3.2.2 represents a critical update of the product resolving unexpected program termination under certain conditions. Key changes are as follows: o In the case of missing data, it is possible for naive imputation to fail catastrophically in PREDICT or INTERACTION mode. This can occur when the dimension of the mode specific data set is less than the GROW data set. Thanks to Andy J. Minn for finding this feature. --------------------------------------------------------------------------------- Release 3.2.1 represents a minor upgrade of the product, and will not affect most users of the prior version of the product. Key changes are as follows: o An additional option in INTERACTION mode can now be specified. The option 'rough' can significantly decrease the processing time necessary to determine variable importance for some large data sets. This is accomplished by replacing the cumulative hazard function that is based on the Nelson-Aalen estimate with the mean death time in a node as the measure of mortality. See the documentation for more details. o A minor but longstanding (since Release 1.0.0) issue in booking potential split points has been discovered. The problem manifests itself by slightly favoring splits on continuous covariates under certain conditions such as low values of 'mtry'. However, it is not expected that users will notice significant changes. The issue has been resolved. --------------------------------------------------------------------------------- Release 3.2.0 represents a significant upgrade in the functionality of the product. Key changes are as follows: o A second method of perturbing the data set in order to calculate variable importance (VIMP) has been implemented. In addition to permuting the values for a single variable, a random split approach has been taken in which a data point is randomly assigned to the left or right daughter node when a split occurs on the specified variable. o The joint VIMP among multiple variables of a (potentially proper) subset of the GROW data can now be calculated using the new function interaction.rsf(). This represents a third mode of operation (INTERACTION) for the application, and follows rsf.default (GROW) and predict.rsf (PREDICT). See the documentation for details. o An additional option in GROW mode can now be specified. The option 'varUsed' allows users to quantify which variables have been split upon within a single tree or over the entire forest. See the documentation for more details. o The ability to multiply impute data has been implemented. This involves imputing data while growing a forest and using the results to grow a new forest in order to better impute the data. o In GROW mode, the application now outputs both the in-bag and OOB summary imputed values. o An additional split rule 'randomsplit' has been implemented. See the documentation for more details. o The split rule 'logrankscore' is now calculated correctly. o The split rule 'logrankapprox' has been removed and replaced by the new split rule 'logrankrandom'. See the documentation for more details. --------------------------------------------------------------------------------- Release 3.0.1 represents a minor upgrade of the product, and will not affect most users of the prior version of the product. Key changes are as follows: o A slight adjustment has been made to the variance used in the "logrankapprox" splitting rule. This fixes the issue where a sizable fraction of trees in the forest were being stumped in some examples. o Illegal syntax fixes to C-code that manifest themselves on some compilers. Thanks to Brian Ripley for pointing this out. --------------------------------------------------------------------------------- Release 3.0.0 represents a major upgrade in the functionality of the product. Key changes are as follows: o Missing data can be imputed in both GROW and PREDICT mode. This applies to variables as well as time and censoring outcome values. Values are imputed dynamically as the tree is grown using a new tree imputation methodology. This produces an imputed forest which can be used for prediction purposes on test data sets with missing data. o Importance values for variables are returned in PREDICT mode when test data contains outcomes as well as variables. o Fixed some bugs in plot.variable(). Thanks to Andy J. Minn for pointing this out. o Minor modification of PMML representation of RSF forest output to accomodate imputation. The method of random seed chain recovery has been altered. Note that forests produced with prior releases will have to be regenerated using this release. We apologize for the inconvenience. --------------------------------------------------------------------------------- Release 2.1.0 represents a minor upgrade of the product, and will not affect most users of the prior version of the product. Key changes are as follows: o R 2.5.0 compliance issues and necessitated modifications. o Modification of PMML representation of RSF forest output. The RSF custom extension has been moved from the DataDictionary node to a new MiningBuildTask node. Note that forests produced with Release 2.0.0 will have to be regenerated using Release 2.1.0. We apologize for the inconvenience. o Fast processing of data involving large numbers of predictors (as in many genomic examples) by using the option big.data=TRUE. This option bypasses the huge overhead needed by R in creating design matrices and parsing formula. However, users should be aware of some side effects. See the RSF help file for more details. Thanks to Steven (Xi) Chen for pointing out the problem. o Only the top 100 predictors are now printed to the terminal when calling plot.error(). This deals with settings as above when one might have thousands of predictors. o Introduced a new wrapper "find.interaction()" for testing of pairwise interactions between predictors. --------------------------------------------------------------------------------- Release 2.0.0 represents a major upgrade in the functionality and stability of the original 1.0.0 release. Key changes are as follows: o Two new splitting rules, 'logrankscore' and 'logrankapprox', added. o Expanded output from 'rsf()'. Now out-of-bag objects 'oob.ensemble' and 'oob.mortality' are included in addition to the full ensemble objects 'ensemble' and 'mortality'. o Importance values for predictors can now be calculated (set 'importace = TRUE' in the initial 'rsf()' call). Extended 'plot.error()' to print, as well as plot, such values. o Prediction on test data can now be implemented using 'rsf.predict()' (set 'forest = TRUE' in the initial 'rsf()' call). o Included option 'predictorWt' used for weighted sampling of predictors when growing a tree. o Formula no longer restricted to main effects. Formula for 'rsf' interpreted as in typical R applications. However, users should be aware that including interactions or higher order terms in a formula may not be an optimal way to grow a forest. o Three types of objects are generated in an RSF analysis: '(rsf, grow)', '(rsf, predict)' and '(rsf, forest)'. Wrappers handle each type of object in different ways. o Improved error checking in all wrappers. o Extended 'plot.variable()' wrapper to generate parial plots for predictors. o Improved control over trace output. See the 'do.trace' option in 'rsf()'. o Implements the Predictive Model Markup Language specification for an '(rsf, forest)' forest object. PMML is an XML based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. More information about PMML and the Data Mining Group can be found at http://www.dmg.org. Our implementation gives the user the ability to save the geometry of a forest as a PMML XML document for export or later retrieval. ---------------------------------------------------------------------------------