Writing R Extensions


Next: , Previous: (dir), Up: (dir)

Writing R Extensions

This is a guide to extending R, describing the process of creating R add-on packages, writing R documentation, R's system and foreign language interfaces, and the R API.

The current version of this document is 2.7.0 (2008-04-22).

ISBN 3-900051-11-9


Next: , Previous: Top, Up: Top

Acknowledgements

The contributions of Saikat DebRoy (who wrote the first draft of a guide to using .Call and .External) and of Adrian Trapletti (who provided information on the C++ interface) are gratefully acknowledged.


Next: , Previous: Acknowledgements, Up: Top

1 Creating R packages

Packages provide a mechanism for loading optional code and attached documentation as needed. The R distribution provides several packages.

In the following, we assume that you know the ‘library()’ command, including its ‘lib.loc’ argument, and we also assume basic knowledge of the INSTALL utility. Otherwise, please look at R's help pages

     ?library
     ?INSTALL

before reading on.

A computing environment including a number of tools is assumed; the “R Installation and Administration” manual describes what is needed. Under a Unix-alike most of the tools are likely to be present by default, but Microsoft Windows and MacOS X will require careful setup.

Once a source package is created, it must be installed by the command R CMD INSTALL. See Add-on-packages, for further details.

Other types of extensions are supported: See Package types.


Next: , Previous: Creating R packages, Up: Creating R packages

1.1 Package structure

A package consists of a subdirectory containing a file DESCRIPTION and the subdirectories R, data, demo, exec, inst, man, po, src, and tests (some of which can be missing). The package subdirectory may also contain files INDEX, NAMESPACE, configure, cleanup, LICENSE, LICENCE, COPYING and NEWS. Other files such as README or ChangeLog will be ignored by R, but may be useful to end-users.

The DESCRIPTION and INDEX files are described in the sections below. The NAMESPACE file is described in Package name spaces.

The optional files configure and cleanup are (Bourne shell) script files which are executed before and (provided that option --clean was given) after installation on Unix-alikes, see Configure and cleanup.

The optional file LICENSE/LICENCE or COPYING (where the former names are preferred) contains a copy of the license to the package, e.g. a copy of the GNU public license. Whereas you should feel free to include a license file in your source distribution, please do not arrange to install yet another copy of the GNU COPYING or COPYING.LIB files but refer to the copies on http://www.r-project.org/Licenses/ and included in the R distribution (in directory share/licenses).

For the conventions for files NEWS and ChangeLog in the GNU project see http://www.gnu.org/prep/standards/standards.html#Documentation.

The package subdirectory should be given the same name as the package. Because some file systems (e.g., those on Windows) are not case-sensitive, to maintain portability it is strongly recommended that case distinctions not be used to distinguish different packages. For example, if you have a package named foo, do not also create a package named Foo.

To ensure that file names are valid across file systems and supported operating system platforms, the ASCII control characters as well as the characters ‘"’, ‘*’, ‘:’, ‘/’, ‘<’, ‘>’, ‘?’, ‘\’, and ‘|’ are not allowed in file names. In addition, files with names ‘con’, ‘prn’, ‘aux’, ‘clock$’, ‘nul’, ‘com1’ to ‘com9’, and ‘lpt1’ to ‘lpt9’ after conversion to lower case and stripping possible “extensions” (e.g., ‘lpt5.foo.bar’), are disallowed. Also, file names in the same directory must not differ only by case (see the previous paragraph). In addition, the names of ‘.Rd’ files will be used in URLs and so must be ASCII and not contain %. For maximal portability filenames should only contain only ASCII characters not excluded already (that is A-Za-z0-9._!#$%&+,;=@^(){}'[] we exclude space as many utilities do not accept spaces in file paths): non-English alphabetic characters cannot be guaranteed to be supported in all locales. It would be good practice to avoid the shell metacharacters (){}'[]$.

The R function package.skeleton can help to create the structure for a new package: see its help page for details.


Next: , Previous: Package structure, Up: Package structure

1.1.1 The DESCRIPTION file

The DESCRIPTION file contains basic information about the package in the following format:

          Package: pkgname
          Version: 0.5-1
          Date: 2004-01-01
          Title: My First Collection of Functions
          Author: Joe Developer <Joe.Developer@some.domain.net>, with
            contributions from A. User <A.User@whereever.net>.
          Maintainer: Joe Developer <Joe.Developer@some.domain.net>
          Depends: R (>= 1.8.0), nlme
          Suggests: MASS
          Description: A short (one paragraph) description of what
            the package does and why it may be useful.
          License: GPL (>= 2)
          URL: http://www.r-project.org, http://www.another.url

Continuation lines (for example, for descriptions longer than one line) start with a space or tab. The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, the remaining fields (‘Date’, ‘Depends’, ‘URL’, ...) are optional.

The DESCRIPTION file should be written entirely in ASCII for maximal portability.

The ‘Package’ and ‘Version’ fields give the name and the version of the package, respectively. The name should consist of letters, numbers, and the dot character and start with a letter. The version is a sequence of at least two (and usually three) non-negative integers separated by single ‘.’ or ‘-’ characters. The canonical form is as shown in the example, and a version such as ‘0.01’ or ‘0.01.0’ will be handled as if it were ‘0.1-0’. (Translation packages are allowed names of the form ‘Translation-ll’.)

The ‘License’ field should specify the license of the package in the following standardized form. Alternatives are indicated via vertical bars. Individual specifications must be one of

Examples for standardized specifications include
     License: GPL-2
     License: GPL (>= 2) | BSD
     License: LGPL (>= 2.0, < 3) | Mozilla Public License
     License: GPL-2 | file LICENCE

Please note in particular that “Public domain” is not a valid license. It is very important that you include this information! Otherwise, it may not even be legally correct for others to distribute copies of the package.

The ‘Description’ field should give a comprehensive description of what the package does. One can use several (complete) sentences, but only one paragraph.

The ‘Title’ field should give a short description of the package. Some package listings may truncate the title to 65 characters in order to keep the overall size of the listing limited. It should be capitalized, not use any markup, not have any continuation lines, and not end in a period. Older versions of R used a separate file TITLE for giving this information; this is now defunct, and the ‘Title’ field in DESCRIPTION is required.

The ‘Author’ field describes who wrote the package. It is a plain text field intended for human readers, but not for automatic processing (such as extracting the email addresses of all listed contributors).

The ‘Maintainer’ field should give a single name with a valid (RFC 2822) email address in angle brackets (for sending bug reports etc.). It should not end in a period or comma.

The optional ‘Date’ field gives the release date of the current version of the package. It is strongly recommended to use the yyyy-mm-dd format conforming to the ISO standard.

The optional ‘Depends’ field gives a comma-separated list of package names which this package depends on. The package name may be optionally followed by a comment in parentheses. The comment should contain a comparison operator (only ‘>=’ and ‘<=’ were supported prior to R 2.7.0), whitespace and a valid version number). (List package names even if they are part of a bundle.) You can also use the special package name ‘R’ if your package depends on a certain version of R. E.g., if the package works only with R version 2.4.0 or newer, include ‘R (>= 2.4.0)’ in the ‘Depends’ field. Both library and the R package checking facilities use this field, hence it is an error to use improper syntax or misuse the ‘Depends’ field for comments on other software that might be needed. Other dependencies (external to the R system) should be listed in the ‘SystemRequirements’ field or a separate README file. The R INSTALL facilities check if the version of R used is recent enough for the package being installed, and the list of packages which is specified will be attached (after checking version dependencies) before the current package, both when library is called and when saving an image of the package's code or preparing for lazy-loading.

As from R 2.7.0 a package (or ‘R’) can appear more than once in the ‘Depends’, but only the first occurrence will be used in earlier versions of R. (Unfortunately all occurrences will be checked, so only ‘>=’ and ‘<=’ can be used.)

The optional ‘Imports’ field lists packages whose name spaces are imported from but which do not need to be attached. Name spaces accessed by the ‘::’ and ‘:::’ operators must be listed here, or in ‘Suggests’ or ‘Enhances’ (see below). Ideally this field will include all the standard packages, and it is important to include S4-using packages (as their class definitions can change and the DESCRIPTION file is used to decide which packages to re-install when this happens).

The optional ‘Suggests’ field uses the same syntax as ‘Depends’ and lists packages that are not necessarily needed. This includes packages used only in examples or vignettes (see Writing package vignettes), and packages loaded in the body of functions. E.g., suppose an example from package foo uses a dataset from package bar. Then it is not necessary to have bar for routine use of foo, unless one wants to execute the examples: it is nice to have bar, but not necessary.

Finally, the optional ‘Enhances’ field lists packages “enhanced” by the package at hand, e.g., by providing methods for classes from these packages.

The general rules are

In particular, large packages providing “only” data for examples or vignettes should be listed in ‘Suggests’ rather than ‘Depends’ in order to make lean installations possible.

The optional ‘URL’ field may give a list of URLs separated by commas or whitespace, for example the homepage of the author or a page where additional material describing the software can be found. These URLs are converted to active hyperlinks on CRAN.

Base and recommended packages (i.e., packages contained in the R source distribution or available from CRAN and recommended to be included in every binary distribution of R) have a ‘Priority’ field with value ‘base’ or ‘recommended’, respectively. These priorities must not be used by “other” packages.

An optional ‘Collate’ field (or OS-specific variants ‘Collate.OStype’, such as e.g. ‘Collate.windows’) can be used for controlling the collation order for the R code files in a package when these are concatenated into a single file upon installation from source. The default is to try collating according to the ‘C’ locale. If present, the collate specification must list all R code files in the package (taking possible OS-specific subdirectories into account, see Package subdirectories) as a whitespace separated list of file paths relative to the R subdirectory. Paths containing white space or quotes need to be quoted. An applicable OS-specific collation field (‘Collate.unix’ or ‘Collate.windows’) will be used instead of ‘Collate’.

The optional ‘LazyLoad’ and ‘LazyData’ fields control whether the R objects and the datasets (respectively) use lazy-loading: set the field's value to ‘yes’ or ‘true’ for lazy-loading and ‘no’ or ‘false’ for no lazy-loading. (Capitalized values are also accepted.)

If the package you are writing uses the methods package, specify ‘LazyLoad: yes’.

The optional ‘ZipData’ field controls whether the automatic Windows build will zip up the data directory or no: set this to ‘no’ if your package will not work with a zipped data directory.

If the DESCRIPTION file is not entirely in ASCII it should contain an ‘Encoding’ field specifying an encoding. This is currently used as the encoding of the DESCRIPTION file itself and of the R and NAMESPACE files, and is taken as the default encoding of .Rd files as from R 2.6.0. As from R 2.7.0, the examples are assumed to be in this encoding when running R CMD check. Only encoding names latin1, latin2 and UTF-8 are known to be portable. (Do not specify an encoding unless one is actually needed: doing so makes the package less portable.)

The optional ‘Type’ field specifies the type of the package: see Package types.

Note: There should be no ‘Built’ or ‘Packaged’ fields, as these are added by the package management tools.


Next: , Previous: The DESCRIPTION file, Up: Package structure

1.1.2 The INDEX file

The optional file INDEX contains a line for each sufficiently interesting object in the package, giving its name and a description (functions such as print methods not usually called explicitly might not be included). Normally this file is missing, and the corresponding information is automatically generated from the documentation sources (using Rdindex() from package tools) when installing from source and when using the package builder (see Checking and building packages).

Rather than editing this file, it is preferable to put customized information about the package into an overview man page (see Documenting packages) and/or a vignette (see Writing package vignettes).


Next: , Previous: The INDEX file, Up: Package structure

1.1.3 Package subdirectories

The R subdirectory contains R code files, only. The code files to be installed must start with an ASCII (lower or upper case) letter or digit and have one of the extensions .R, .S, .q, .r, or .s. We recommend using .R, as this extension seems to be not used by any other software. It should be possible to read in the files using source(), so R objects must be created by assignments. Note that there need be no connection between the name of the file and the R objects created by it. Ideally, the R code files should only directly assign R objects and definitely should not call functions with side effects such as require and options. If computations are required to create objects these can use code `earlier' in the package (see the ‘Collate’ field) plus, only if lazyloading is used, functions in the ‘Depends’ packages provided that the objects created do not depend on those packages except via name space imports. (Packages without namespaces will work under somewhat less restrictive assumptions.)

Two exceptions are allowed: if the R subdirectory contains a file sysdata.rda (a saved image of R objects) this will be lazy-loaded into the name space/package environment – this is intended for system datasets that are not intended to be user-accessible via data. Also, files ending in ‘.in’ will be allowed in the R directory to allow a configure script to generate suitable files.

Only ASCII characters (and the control characters tab, formfeed, LF and CR) should be used in code files. Other characters are accepted in comments, but then the comments may not be readable in e.g. a UTF-8 locale. Non-ASCII characters in object names will normally1 fail when the package is installed. Any byte will be allowed2 in a quoted character string (but \uxxxx escapes should not be used), but non-ASCII character strings may not be usable in some locales and may display incorrectly in others.

Various R functions in a package can be used to initialize and clean up. For packages without a name space, these are .First.lib and .Last.lib. (See Load hooks, for packages with a name space.) It is conventional to define these functions in a file called zzz.R. If .First.lib is defined in a package, it is called with arguments libname and pkgname after the package is loaded and attached. (If a package is installed with version information, the package name includes the version information, e.g. ‘ash_1.0.9’.) A common use is to call library.dynam inside .First.lib to load compiled code: another use is to call those functions with side effects. If .Last.lib exists in a package it is called (with argument the full path to the installed package) just before the package is detached. It is uncommon to detach packages and rare to have a .Last.lib function: one use is to call library.dynam.unload to unload compiled code.

The man subdirectory should contain (only) documentation files for the objects in the package in R documentation (Rd) format. The documentation filenames must start with an ASCII (lower or upper case) letter or digit and have the extension .Rd (the default) or .rd. Further, the names must be valid in ‘file://’ URLs, which means3 they must be entirely ASCII and not contain ‘%’. See Writing R documentation files, for more information. Note that all user-level objects in a package should be documented; if a package pkg contains user-level objects which are for “internal” use only, it should provide a file pkg-internal.Rd which documents all such objects, and clearly states that these are not meant to be called by the user. See e.g. the sources for package grid in the R distribution for an example. Note that packages which use internal objects extensively should hide those objects in a name space, when they do not need to be documented (see Package name spaces).

The R and man subdirectories may contain OS-specific subdirectories named unix or windows.

The sources and headers for the compiled code are in src, plus optionally file Makevars or Makefile. When a package is installed using R CMD INSTALL, Make is used to control compilation and linking into a shared object for loading into R. There are default variables and rules for this (determined when R is configured and recorded in R_HOME/etcR_ARCH/Makeconf), providing support for C, C++, FORTRAN 77, Fortran 9x4, Objective C and Objective C++ with associated extensions .c, .cc or .cpp or .C, .f, .f90 or .f95, .m, and .mm or .M, respectively. We recommend using .h for headers, also for C++5 or Fortran 9x include files. The default rules can be tweaked by setting macros in a file src/Makevars (see Using Makevars). Note that this mechanism should be general enough to eliminate the need for a package-specific src/Makefile. If such a file is to be distributed, considerable care is needed to make it general enough to work on all R platforms. It should have an appropriate first target (conventionally called ‘all’) and a (possibly empty) target ‘clean’ which removes all files generated by Make (to be used by ‘R CMD INSTALL --clean’ and ‘R CMD INSTALL --preclean’). There are platform-specific file names on Windows: src/Makevars.win takes precedence over src/Makevars and src/Makefile.win must be used.

The data subdirectory is for additional data files the package makes available for loading using data(). Currently, data files can have one of three types as indicated by their extension: plain R code (.R or .r), tables (.tab, .txt, or .csv, see ?data for the file formats), or save() images (.RData or .rda). (All ports of R use the same binary (XDR) format and can read compressed images. Use images saved with save(, compress = TRUE), the default, to save space.) Note that R code should be “self-sufficient” and not make use of extra functionality provided by the package, so that the data file can also be used without having to load the package. It is no longer necessary to provide a 00Index file in the data directory—the corresponding information is generated automatically from the documentation sources when installing from source, or when using the package builder (see Checking and building packages). If your data files are enormous you can speed up installation by providing a file datalist in the data subdirectory. This should have one line per topic that data() will find, in the format ‘foo’ if data(foo) provides ‘foo’, or ‘foo: bar bah’ if data(foo) provides ‘bar’ and ‘bah’.

The demo subdirectory is for R scripts (for running via demo()) that demonstrate some of the functionality of the package. Demos may be interactive and are not checked automatically, so if testing is desired use code in the tests directory. The script files must start with a (lower or upper case) letter and have one of the extensions .R or .r. If present, the demo subdirectory should also have a 00Index file with one line for each demo, giving its name and a description separated by white space. (Note that it is not possible to generate this index file automatically.)

The contents of the inst subdirectory will be copied recursively to the installation directory. Subdirectories of inst should not interfere with those used by R (currently, R, data, demo, exec, libs, man, help, html, latex, R-ex, chtml, and Meta). The copying of the inst happens after src is built so its Makefile can create files to be installed. Note that with the exceptions of INDEX, LICENSE/LICENCE, COPYING and NEWS (from R 2.7.0), information files at the top level of the package will not be installed and so not be known to users of Windows and MacOS X compiled packages (and not seen by those who use R CMD INSTALL or install.packages on the tarball). So any information files you wish an end user to see should be included in inst. One thing you might like to add to inst is a CITATION file for use by the citation function. Note that if the named exceptions also occur in inst, the version in inst will be that seen in the installed package. If you want NEWS to be installed by your package in earlier versions of R, you need to include it in inst.

Subdirectory tests is for additional package-specific test code, similar to the specific tests that come with the R distribution. Test code can either be provided directly in a .R file, or via a .Rin file containing code which in turn creates the corresponding .R file (e.g., by collecting all function objects in the package and then calling them with the strangest arguments). The results of running a .R file are written to a .Rout file. If there is a corresponding .Rout.save file, these two are compared, with differences being reported but not causing an error. The directory tests is copied to the check area, and the tests are run with the copy as the working directory and with R_LIBS set to ensure that the copy of the package installed during testing will be found by library(pkg_name).

Subdirectory exec could contain additional executables the package needs, typically scripts for interpreters such as the shell, Perl, or Tcl. This mechanism is currently used only by a very few packages, and still experimental.

Subdirectory po is used for files related to localization: see Internationalization.


Previous: Package subdirectories, Up: Package structure

1.1.4 Package bundles

Sometimes it is convenient to distribute several packages as a bundle. (An example is VR which contains four packages.) The installation procedures on both Unix-alikes and Windows can handle package bundles.

The DESCRIPTION file of a bundle has a ‘Bundle’ field and no ‘Package’ field, as in

          Bundle: VR
          Priority: recommended
          Contains: MASS class nnet spatial
          Version: 7.2-36
          Date: 2007-08-29
          Depends: R (>= 2.4.0), grDevices, graphics, stats, utils
          Suggests: lattice, nlme, survival
          Author: S original by Venables & Ripley.
            R port by Brian Ripley <ripley@stats.ox.ac.uk>, following earlier
            work by Kurt Hornik and Albrecht Gebhardt.
          Maintainer: Brian Ripley <ripley@stats.ox.ac.uk>
          BundleDescription: Functions and datasets to support Venables and
            Ripley, 'Modern Applied Statistics with S' (4th edition).
          License: GPL-2 | GPL-3
          URL: http://www.stats.ox.ac.uk/pub/MASS4/

The ‘Contains’ field lists the packages (space separated), which should be contained in separate subdirectories with the names given. During building and installation, packages will be installed in the order specified. Be sure to order this list so that dependencies are met appropriately.

The packages contained in a bundle are standard packages in all respects except that the DESCRIPTION file is replaced by a DESCRIPTION.in file which just contains fields additional to the DESCRIPTION file of the bundle, for example

          Package: spatial
          Description: Functions for kriging and point pattern analysis.
          Title: Functions for Kriging and Point Pattern Analysis

Any files in the package bundle except the DESCRIPTION file and the named packages will be ignored.

The ‘Depends’ field in the bundle's DESCRIPTION file should list the dependencies of all the constituent packages (and similarly for ‘Imports’ and ‘Suggests’), and then DESCRIPTION.in files should not contain these fields.


Next: , Previous: Package structure, Up: Creating R packages

1.2 Configure and cleanup

Note that most of this section is Unix-specific: see the comments later on about the Windows port of R.

If your package needs some system-dependent configuration before installation you can include a (Bourne shell) script configure in your package which (if present) is executed by R CMD INSTALL before any other action is performed. This can be a script created by the Autoconf mechanism, but may also be a script written by yourself. Use this to detect if any nonstandard libraries are present such that corresponding code in the package can be disabled at install time rather than giving error messages when the package is compiled or used. To summarize, the full power of Autoconf is available for your extension package (including variable substitution, searching for libraries, etc.).

The (Bourne shell) script cleanup is executed as last thing by R CMD INSTALL if present and option --clean was given, and by R CMD build when preparing the package for building from its source. It can be used to clean up the package source tree. In particular, it should remove all files created by configure.

As an example consider we want to use functionality provided by a (C or FORTRAN) library foo. Using Autoconf, we can create a configure script which checks for the library, sets variable HAVE_FOO to TRUE if it was found and with FALSE otherwise, and then substitutes this value into output files (by replacing instances of ‘@HAVE_FOO@’ in input files with the value of HAVE_FOO). For example, if a function named bar is to be made available by linking against library foo (i.e., using -lfoo), one could use

     AC_CHECK_LIB(foo, fun, [HAVE_FOO=TRUE], [HAVE_FOO=FALSE])
     AC_SUBST(HAVE_FOO)
     ......
     AC_CONFIG_FILES([foo.R])
     AC_OUTPUT

in configure.ac (assuming Autoconf 2.50 or later).

The definition of the respective R function in foo.R.in could be

     foo <- function(x) {
         if(!@HAVE_FOO@)
           stop("Sorry, library 'foo' is not available"))
         ...

From this file configure creates the actual R source file foo.R looking like

     foo <- function(x) {
         if(!FALSE)
           stop("Sorry, library 'foo' is not available"))
         ...

if library foo was not found (with the desired functionality). In this case, the above R code effectively disables the function.

One could also use different file fragments for available and missing functionality, respectively.

You will very likely need to ensure that the same C compiler and compiler flags are used in the configure tests as when compiling R or your package. Under Unix, you can achieve this by including the following fragment early in configure.ac

     : ${R_HOME=`R RHOME`}
     if test -z "${R_HOME}"; then
       echo "could not determine R_HOME"
       exit 1
     fi
     CC=`"${R_HOME}/bin/R" CMD config CC`
     CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
     CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`

(using ‘${R_HOME}/bin/R’ rather than just ‘R’ is necessary in order to use the `right' version of R when running the script as part of R CMD INSTALL.)

Note that earlier versions of this document recommended obtaining the configure information by direct extraction (using grep and sed) from R_HOME/etcR_ARCH/Makeconf, which only works for variables recorded there as literals. You can use R CMD config for getting the value of the basic configuration variables, or the header and library flags necessary for linking against R, see R CMD config --help for details. (This works on Windows as from R 2.6.0.)

To check for an external BLAS library using the ACX_BLAS macro from the official Autoconf Macro Archive, one can simply do

     F77=`"${R_HOME}/bin/R" CMD config F77`
     AC_PROG_F77
     FLIBS=`"${R_HOME}/bin/R" CMD config FLIBS`
     ACX_BLAS([], AC_MSG_ERROR([could not find your BLAS library], 1))

Note that FLIBS as determined by R must be used to ensure that FORTRAN 77 code works on all R platforms. Calls to the Autoconf macro AC_F77_LIBRARY_LDFLAGS, which would overwrite FLIBS, must not be used (and hence e.g. removed from ACX_BLAS). (Recent versions of Autoconf in fact allow an already set FLIBS to override the test for the FORTRAN linker flags. Also, recent versions of R can detect external BLAS and LAPACK libraries.)

You should bear in mind that the configure script may well not work on Windows systems (this seems normally to be the case for those generated by Autoconf, although simple shell scripts do work). If your package is to be made publicly available, please give enough information for a user on a non-Unix platform to configure it manually, or provide a configure.win script to be used on that platform. (Optionally, there can be a cleanup.win script as well. Both should be shell scripts to be executed by ash, which is a minimal version of Bourne-style sh.)

In some rare circumstances, the configuration and cleanup scripts need to know the location into which the package is being installed. An example of this is a package that uses C code and creates two shared object/DLLs. Usually, the object that is dynamically loaded by R is linked against the second, dependent, object. On some systems, we can add the location of this dependent object to the object that is dynamically loaded by R. This means that each user does not have to set the value of the LD_LIBRARY_PATH (or equivalent) environment variable, but that the secondary object is automatically resolved. Another example is when a package installs support files that are required at run time, and their location is substituted into an R data structure at installation time. (This happens with the Java Archive files in the SJava package.) The names of the top-level library directory (i.e., specifiable via the ‘-l’ argument) and the directory of the package itself are made available to the installation scripts via the two shell/environment variables R_LIBRARY_DIR and R_PACKAGE_DIR. Additionally, the name of the package (e.g., ‘survival’ or ‘MASS’) being installed is available from the shell variable R_PACKAGE_NAME.


Next: , Previous: Configure and cleanup, Up: Configure and cleanup

1.2.1 Using Makevars

Sometimes writing your own configure script can be avoided by supplying a file Makevars: also one of the most common uses of a configure script is to make Makevars from Makevars.in.

The most common use of a Makevars file is to set additional preprocessor (for example include paths) flags via PKG_CPPFLAGS, and additional compiler flags by setting PKG_CFLAGS, PKG_CXXFLAGS and PKG_FFLAGS, for C, C++, or FORTRAN respectively (see Creating shared objects).

Also, Makevars can be used to set flags for the linker, for example ‘-L’ and ‘-l’ options.

When writing a Makevars file for a package you intend to distribute, take care to ensure that it is not specific to your compiler: flags such as -O2 -Wall -pedantic are all specific to GCC.

There are some macros which are built whilst configuring the building of R itself, are stored on Unix-alikes in R_HOME/etcR_ARCH/Makeconf and can be used in Makevars. These include

FLIBS
A macro containing the set of libraries need to link FORTRAN code. This may need to be included in PKG_LIBS.
BLAS_LIBS
A macro containing the BLAS libraries used when building R. This may need to be included in PKG_LIBS. Beware that if it is empty then the R executable will contain all the double-precision and double-complex BLAS routines, but no single-precision or complex routines. If BLAS_LIBS is included, then FLIBS also needs to be6, as most BLAS libraries are written in FORTRAN.
LAPACK_LIBS
A macro containing the LAPACK libraries (and paths where appropriate) used when building R. This may need to be included in PKG_LIBS. This may point to a dynamic library libRlapack which contains all the double-precision LAPACK routines as well as those double-complex LAPACK and BLAS routines needed to build R, or it may point to an external LAPACK library, or may be empty if an external BLAS library also contains LAPACK.

[There is no guarantee that the LAPACK library will provide more than all the double-precision and double-complex routines, and some do not provide all the auxiliary routines.]

The macros BLAS_LIBS and FLIBS should always be included after LAPACK_LIBS.

SAFE_FFLAGS
A macro containing flags which are needed to circumvent over-optimization of FORTRAN code: it is typically ‘-g -O2 -ffloat-store’ on ‘ix86’ platforms using g77 or gfortran. Note that this is not an additional flag to be used as part of PKG_FFLAGS, but a replacement for FFLAGS, and that it is intended for the FORTRAN-77 compiler ‘F77’ and not necessarily for the Fortran 90/95 compiler ‘FC’. See the example later in this section.

Setting certain macros in Makevars will prevent R CMD SHLIB setting them: in particular if Makevars sets ‘OBJECTS’ it will not be set on the make command line. This can be useful in conjunction with implicit rules to allow other types of source code to be compiled and included in the shared object.

Note that Makevars should not normally contain targets, as it is (except on Windows) included before the default makefile and make is called without an explicit target. To circumvent that, use a suitable phony target before any actual targets: for example fastICA has

     SLAMC_FFLAGS=$(R_XTRA_FFLAGS) $(FPICFLAGS) $(SHLIB_FFLAGS) $(SAFE_FFLAGS)
     
     all: $(SHLIB)
     
     slamc.o: slamc.f
             $(F77) $(SLAMC_FFLAGS) -c -o slamc.o slamc.f

to ensure that the LAPACK routines find some constants without infinite looping. The Windows equivalent is

     slamc.o: slamc.f
             $(F77) $(SAFE_FFLAGS) -c -o slamc.o slamc.f

More generally, on a Unix-alike one could have something like

     .PHONY: all
     
     all: before $(SHLIB) after
     
     before:
             Things that need to be done first like creating libraries
     
     after:
             Cleanup needed after 'before'

On Windows, one can add dependencies to the ‘all’ target (which is what will get called), e.g. (based on package rcom)

     all: ../inst/tst/bin/rcom_test.exe extraclean
     
     ../inst/tst/bin/rcom_test.exe: rcom_test.exe
     	$(MKDIR) -p ../inst/tst/bin
     	$(CP) $? $ rcom_test.exe: rcom_test.o
     rcom_test-LIBS = -L. -lsupc++ -luuid -lole32 -loleaut32
     
     extraclean:
             $(RM) rcom_test.exe

The added dependencies will be built after the DLL: it is also possible (but not advisable) to have a target ‘all’ with commands (rather than dependencies)

There are two another targets, ‘before’ and ‘after’, which by default have neither dependencies nor commands so can be overridden in a Makevars.win.


Next: , Previous: Using Makevars, Up: Configure and cleanup

1.2.2 Configure example

It may be helpful to give an extended example of using a configure script to create a src/Makevars file: this is based on that in the RODBC package.

The configure.ac file follows: configure is created from this by running autoconf in the top-level package directory (containing configure.ac).

          AC_INIT([RODBC], 1.1.8) dnl package name, version
          
          dnl A user-specifiable option
          odbc_mgr=""
          AC_ARG_WITH([odbc-manager],
                      AC_HELP_STRING([--with-odbc-manager=MGR],
                                     [specify the ODBC manager, e.g. odbc or iodbc]),
                      [odbc_mgr=$withval])
          
          if test "$odbc_mgr" = "odbc" ; then
            AC_PATH_PROGS(ODBC_CONFIG, odbc_config)
          fi
          
          dnl Select an optional include path, from a configure option
          dnl or from an environment variable.
          AC_ARG_WITH([odbc-include],
                      AC_HELP_STRING([--with-odbc-include=INCLUDE_PATH],
                                     [the location of ODBC header files]),
                      [odbc_include_path=$withval])
          RODBC_CPPFLAGS="-I."
          if test [ -n "$odbc_include_path" ] ; then
             RODBC_CPPFLAGS="-I. -I${odbc_include_path}"
          else
            if test [ -n "${ODBC_INCLUDE}" ] ; then
               RODBC_CPPFLAGS="-I. -I${ODBC_INCLUDE}"
            fi
          fi
          
          dnl ditto for a library path
          AC_ARG_WITH([odbc-lib],
                      AC_HELP_STRING([--with-odbc-lib=LIB_PATH],
                                     [the location of ODBC libraries]),
                      [odbc_lib_path=$withval])
          if test [ -n "$odbc_lib_path" ] ; then
             LIBS="-L$odbc_lib_path ${LIBS}"
          else
            if test [ -n "${ODBC_LIBS}" ] ; then
               LIBS="-L${ODBC_LIBS} ${LIBS}"
            else
              if test -n "${ODBC_CONFIG}"; then
                odbc_lib_path=`odbc_config --libs | sed s/-lodbc//`
                LIBS="${odbc_lib_path} ${LIBS}"
              fi
            fi
          fi
          
          dnl Now find the compiler and compiler flags to use
          : ${R_HOME=`R RHOME`}
          if test -z "${R_HOME}"; then
            echo "could not determine R_HOME"
            exit 1
          fi
          CC=`"${R_HOME}/bin/R" CMD config CC`
          CPP=`"${R_HOME}/bin/R" CMD config CPP`
          CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
          CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
          AC_PROG_CC
          AC_PROG_CPP
          
          
          if test -n "${ODBC_CONFIG}"; then
            RODBC_CPPFLAGS=`odbc_config --cflags`
          fi
          CPPFLAGS="${CPPFLAGS} ${RODBC_CPPFLAGS}"
          
          dnl Check the headers can be found
          AC_CHECK_HEADERS(sql.h sqlext.h)
          if test "${ac_cv_header_sql_h}" = no ||
             test "${ac_cv_header_sqlext_h}" = no; then
             AC_MSG_ERROR("ODBC headers sql.h and sqlext.h not found")
          fi
          
          dnl search for a library containing an ODBC function
          if test [ -n "${odbc_mgr}" ] ; then
            AC_SEARCH_LIBS(SQLTables, ${odbc_mgr}, ,
                AC_MSG_ERROR("ODBC driver manager ${odbc_mgr} not found"))
          else
            AC_SEARCH_LIBS(SQLTables, odbc odbc32 iodbc, ,
                AC_MSG_ERROR("no ODBC driver manager found"))
          fi
          
          dnl for 64-bit ODBC need SQL[U]LEN, and it is unclear where they are defined.
          AC_CHECK_TYPES([SQLLEN, SQLULEN], , , [# include <sql.h>])
          dnl for unixODBC header
          AC_CHECK_SIZEOF(long, 4)
          
          dnl substitute RODBC_CPPFLAGS and LIBS
          AC_SUBST(RODBC_CPPFLAGS)
          AC_SUBST(LIBS)
          AC_CONFIG_HEADERS([src/config.h])
          dnl and do substitution in the src/Makevars.in and src/config.h
          AC_CONFIG_FILES([src/Makevars])
          AC_OUTPUT

where src/Makevars.in would be simply

          PKG_CPPFLAGS = @RODBC_CPPFLAGS@
          PKG_LIBS = @LIBS@

A user can then be advised to specify the location of the ODBC driver manager files by options like (lines broken for easier reading)

     R CMD INSTALL
       --configure-args='--with-odbc-include=/opt/local/include
       --with-odbc-lib=/opt/local/lib --with-odbc-manager=iodbc'
       RODBC

or by setting the environment variables ODBC_INCLUDE and ODBC_LIBS.


Previous: Configure example, Up: Configure and cleanup

1.2.3 Using F95 code

R currently does not distinguish between FORTRAN 77 and Fortran 90/95 code, and assumes all FORTRAN comes in source files with extension .f. Commercial Unix systems typically use a F95 compiler, but only since the release of gcc 4.0.0 in April 2005 have Linux and other non-commercial OSes had much support for F95. Only wih R 2.6.0 did the Windows port adopt a Fortran 90 compiler.

This means that portable packages need to be written in correct FORTRAN 77, which will also be valid Fortran 95. See http://developer.r-project.org/Portability.html for reference resources. In particular, free source form F95 code is not portable.

On some systems an alternative F95 compiler is available: from the gcc family this might be gfortran or g95. Configuring R will try to find a compiler which (from its name) appears to be a Fortran 90/95 compiler, and set it in macro ‘FC’. Note that it does not check that such a compiler is fully (or even partially) compliant with Fortran 90/95. Packages making use of Fortran 90/95 features should use file extension .f90 or .f95 for the source files: the variable PKG_FCFLAGS specifies any special flags to be used. There is no guarantee that compiled Fortran 90/95 code can be mixed with any other type of code, nor that a build of R will have support for such packages.


Next: , Previous: Configure and cleanup, Up: Creating R packages

1.3 Checking and building packages

Before using these tools, please check that your package can be installed and loaded. R CMD check will inter alia do this, but you will get more informative error messages doing the checks directly.


Next: , Previous: Checking and building packages, Up: Checking and building packages

1.3.1 Checking packages

Using R CMD check, the R package checker, one can test whether source R packages work correctly. It can be run on one or more directories, or gzipped package tar archives7 with extension .tar.gz or .tgz. This runs a series of checks, including

  1. The package is installed. This will warn about missing cross-references and duplicate aliases in help files.
  2. The file names are checked to be valid across file systems and supported operating system platforms.
  3. The files and directories are checked for sufficient permissions (Unix only).
  4. The DESCRIPTION file is checked for completeness, and some of its entries for correctness. Unless installation tests are skipped, checking is aborted if the package dependencies cannot be resolved at run time. One check is that the package name is not that of a standard package, nor of the defunct standard packages (‘ctest’, ‘eda’, ‘lqs’, ‘mle’, ‘modreg’, ‘mva’, ‘nls’, ‘stepfun’ and ‘ts’) which are handled specially by library. Another check is that all packages mentioned in library or requires or from which the NAMESPACE file imports or are called via :: or ::: are listed (in ‘Depends’, ‘Imports’, ‘Suggests’ or ‘Contains’): this is not an exhaustive check of the actual imports.
  5. Available index information (in particular, for demos and vignettes) is checked for completeness.
  6. The package subdirectories are checked for suitable file names and for not being empty. The checks on file names are controlled by the option --check-subdirs=value. This defaults to ‘default’, which runs the checks only if checking a tarball: the default can be overridden by specifying the value as ‘yes’ or ‘no’. Further, the check on the src directory is only run if the package/bundle does not contain a configure script (which corresponds to the value ‘yes-maybe’) and there is no src/Makefile or src/Makefile.in.

    To allow a configure script to generate suitable files, files ending in ‘.in’ will be allowed in the R directory.

  7. The R files are checked for syntax errors. Bytes which are non-ASCII are reported as warnings, but these should be regarded as errors unless it is known that the package will always be used in the same locale.
  8. It is checked that the package can be loaded, first with the usual default packages and then only with package base already loaded. If the package has a namespace, it is checked if this can be loaded in an empty session with only the base namespace loaded. (Namespaces and packages can be loaded very early in the session, before the default packages are available, so packages should work then.)
  9. The R files are checked for correct calls to library.dynam (with no extension). In addition, it is checked whether methods have all arguments of the corresponding generic, and whether the final argument of replacement functions is called ‘value’. All foreign function calls (.C, .Fortran, .Call and .External calls) are tested to see if they have a PACKAGE argument, and if not, whether the appropriate DLL might be deduced from the name space of the package. Any other calls are reported. (The check is generous, and users may want to supplement this by examining the output of tools::checkFF("mypkg", verbose=TRUE), especially if the intention were to always use a PACKAGE argument)
  10. The Rd files are checked for correct syntax and meta data, including the presence of the mandatory (\name, \alias, \title and \description) fields. The Rd name and title are checked for being non-empty, and the keywords found are compared to the standard ones. There is a check for missing cross-references (links).
  11. A check is made for missing documentation entries, such as undocumented user-level objects in the package.
  12. Documentation for functions, data sets, and S4 classes is checked for consistency with the corresponding code.
  13. It is checked whether all function arguments given in \usage sections of Rd files are documented in the corresponding \arguments section.
  14. C, C++ and FORTRAN source and header files are tested for portable (LF-only) line endings. If there is a Makefile or Makefile.in or Makevars or Makevars.in in the src directory, it is checked for portable line endings and the correct use of ‘$(BLAS_LIBS)’.
  15. The examples provided by the package's documentation are run. (see Writing R documentation files, for information on using \examples to create executable example code.)

    Of course, released packages should be able to run at least their own examples. Each example is run in a `clean' environment (so earlier examples cannot be assumed to have been run), and with the variables T and F redefined to generate an error unless they are set in the example: See Logical vectors.

  16. If the package sources contain a tests directory then the tests specified in that directory are run. (Typically they will consist of a set of .R source files and target output files .Rout.save.)
  17. The code in package vignettes (see Writing package vignettes) is executed.
  18. If a working pdflatex or latex program is available, the .pdf or .dvi version, respectively, of the package's manual is created (to check that the Rd files can be converted successfully).

Use R CMD check --help to obtain more information about the usage of the R package checker. A subset of the checking steps can be selected by adding flags.

You do need to ensure that the package is checked in a suitable locale if it contains non-ASCII characters. Such packages are likely to fail some of the checks in a C locale, and R CMD check will warn if it spots the problem. You should be able to check any package in a UTF-8 locale (if one is available). Beware that although a C locale is rarely used at a console, it may be the default if logging in remotely or for batch jobs.


Next: , Previous: Checking packages, Up: Checking and building packages

1.3.2 Building packages

Using R CMD build, the R package builder, one can build R packages from their sources (for example, for subsequent release).

Prior to actually building the package in the common gzipped tar file format, a few diagnostic checks and cleanups are performed. In particular, it is tested whether object indices exist and can be assumed to be up-to-date, and C, C++ and FORTRAN source files and relvant make files are tested and converted to LF line-endings if necessary.

Run-time checks whether the package works correctly should be performed using R CMD check prior to invoking the build procedure.

To exclude files from being put into the package, one can specify a list of exclude patterns in file .Rbuildignore in the top-level source directory. These patterns should be Perl regexps, one per line, to be matched against the file names relative to the top-level source directory. In addition, directories called CVS or .svn or .arch-ids and files GNUMakefile or with base names starting with ‘.#’, or starting and ending with ‘#’, or ending in ‘~’, ‘.bak’ or ‘.swp’, are excluded by default. In addition, those files in the R, demo and man directories which are flagged by R CMD check as having invalid names will be excluded.

Use R CMD build --help to obtain more information about the usage of the R package builder.

Unless R CMD build is invoked with the --no-vignettes option, it will attempt to rebuild the vignettes (see Writing package vignettes) in the package. To do so it installs the current package/bundle into a temporary library tree, but any dependent packages need to be installed in an available library tree (see the Note: below).

One of the checks that R CMD build runs is for empty source directories. These are in most cases unintentional, in which case they should be removed and the build re-run.

It can be useful to run R CMD check --check-subdirs=yes on the built tarball as a final check on the contents.

R CMD build can also build pre-compiled version of packages for binary distributions, but R CMD INSTALL --build is preferred (and is considerably more flexible). In particular, Windows users are recommended to use R CMD INSTALL --build and install into the main library tree (the default) so that HTML links are resolved.

Note: R CMD check and R CMD build run R with --vanilla, so none of the user's startup files are read. If you need R_LIBS set (to find packages in a non-standard library) you will need to set it in the environment.
Note to Windows users: R CMD check and R CMD build need you to have installed the files for building source packages (which is the default), as well as the Windows toolset (see the “R Installation and Administration” manual).


Previous: Building packages, Up: Checking and building packages

1.3.3 Customizing checking and building

In addition to the available command line options, R CMD check also allows customization by setting (Perl) configuration variables in a configuration file, the location of which can be specified via the --rcfile option and defaults to $HOME/.R/check.conf provided that the environment variable HOME is set.

The following configuration variables are currently available.

$R_check_use_install_log
If true, record the output from installing a package as part of its check to a log file (00install.out by default), even when running interactively. Default: true.
$R_check_all_non_ISO_C
If true, do not ignore compiler (typically GCC) warnings about non ISO C code in system headers. Default: false.
$R_check_weave_vignettes
If true, weave package vignettes in the process of checking them. Default: true.
$R_check_latex_vignettes
If true (and $R_check_weave_vignettes is also true), latex package vignettes in the process of checking them: this will show up Sweave source errors, including missing source files. Default: true.
$R_check_subdirs_nocase
If true, check the case of directories such as R and man. Default: false.
$R_check_subdirs_strict
Initial setting for --check-subdirs. Default: ‘default’ (which checks only tarballs, and checks in the src only if there is no configure file).
$R_check_force_suggests
If true, give an error if suggested packages are not available. Default: true.
$R_check_use_codetools
If true, make use of the codetools package, which provides a detailed analysis of visibility of objects (but may give false positives). Default: true.
$R_check_Rd_style
If true, check whether Rd usage entries for S3 methods use the full function name rather than the appropiate \method markup. Default: true.
$R_check_Rd_xrefs
If true, check the cross-references in .Rd files. Default: true.

Values ‘1’ or a string with lower-cased version ‘"yes"’ or ‘"true"’ can be used for setting the variables to true; similarly, ‘0’ or strings with lower-cased version ‘"no"’ or ‘"false"’ give false.

For example, a configuration file containing

     $R_check_use_install_log = "TRUE";
     $R_check_weave_vignettes = 0;

results in using install logs and turning off weaving.

Future versions of R may enhance this customization mechanism, and provide a similar scheme for R CMD build.

There are other internal settings that can be changed via environment variables _R_CHECK_*_: see the Perl source code.


Next: , Previous: Checking and building packages, Up: Creating R packages

1.4 Writing package vignettes

In addition to the help files in Rd format, R packages allow the inclusion of documents in arbitrary other formats. The standard location for these is subdirectory inst/doc of a source package, the contents will be copied to subdirectory doc when the package is installed. Pointers from package help indices to the installed documents are automatically created. Documents in inst/doc can be in arbitrary format, however we strongly recommend to provide them in PDF format, such that users on all platforms can easily read them. To ensure that they can be accessed from a browser, the file names should start with an ASCII letter and be comprised entirely of ASCII letters or digits or minus or underscore.

A special case are documents in Sweave format, which we call package vignettes. Sweave allows the integration of LaTeX documents and R code and is contained in package utils which is part of the base R distribution, see the Sweave help page for details on the document format. Package vignettes found in directory inst/doc are tested by R CMD check by executing all R code chunks they contain to ensure consistency between code and documentation. Code chunks with option eval=FALSE are not tested. The R working directory for all vignette tests in R CMD check is the installed version of the doc subdirectory. Make sure all files needed by the vignette (data sets, ...) are accessible by either placing them in the inst/doc hierarchy of the source package, or using calls to system.file().

R CMD build will automatically create PDF versions of the vignettes for distribution with the package sources. By including the PDF version in the package sources it is not necessary that the vignettes can be compiled at install time, i.e., the package author can use private LaTeX extensions which are only available on his machine. 8

By default R CMD build will run Sweave on all files in Sweave format. If no Makefile is found in directory inst/doc, then texi2dvi --pdf is run on all vignettes. Whenever a Makefile is found, then R CMD build will try to run make after the Sweave step, such that PDF manuals can be created from arbitrary source formats (plain LaTeX files, ...). The Makefile should take care of both creation of PDF files and cleaning up afterwards, i.e., delete all files that shall not appear in the final package archive. Note that the make step is executed independently from the presence of any files in Sweave format.

It is no longer necessary to provide a 00Index.dcf file in the inst/doc directory—the corresponding information is generated automatically from the \VignetteIndexEntry statements in all Sweave files when installing from source, or when using the package builder (see Checking and building packages). The \VignetteIndexEntry statement is best placed in LaTeX comment, such that no definition of the command is necessary.

At install time an HTML index for all vignettes is automatically created from the \VignetteIndexEntry statements unless a file index.html exists in directory inst/doc. This index is linked into the HTML help system for each package.


Next: , Previous: Writing package vignettes, Up: Creating R packages

1.5 Submitting a package to CRAN

CRAN is a network of WWW sites holding the R distributions and contributed code, especially R packages. Users of R are encouraged to join in the collaborative project and to submit their own packages to CRAN.

Before submitting a package mypkg, do run the following steps to test it is complete and will install properly. (Unix procedures only, run from the directory containing mypkg as a subdirectory.)

  1. Run R CMD build to make the release .tar.gz file.
  2. Run R CMD check on the .tar.gz file to check that the package will install and will run its examples, and that the documentation is complete and can be processed. If the package contains code that needs to be compiled, try to enable a reasonable amount of diagnostic messaging (“warnings”) when compiling, such as e.g. -Wall -pedantic for tools from GCC, the Gnu Compiler Collection. (If R was not configured accordingly, one can achieve this e.g. via PKG_CFLAGS and related variables.)

Please ensure that you can run through the complete procedure with only warnings that you understand and have reasons not to eliminate. In principle, packages must pass R CMD check without warnings to be admitted to the main CRAN package area.

When all the testing is done, upload the .tar.gz file, using ‘anonymous’ as log-in name and your e-mail address as password, to

     ftp://CRAN.R-project.org/incoming/

(note: use ftp and not sftp to connect to this server) and send a message to CRAN@R-project.org about it. The CRAN maintainers will run these tests before putting a submission in the main archive.

Note that CRAN generally does not accept submissions of precompiled binaries due to security reasons.


Next: , Previous: Submitting a package to CRAN, Up: Creating R packages

1.6 Package name spaces

R has a name space management system for packages. This system allows the package writer to specify which variables in the package should be exported to make them available to package users, and which variables should be imported from other packages.

The current mechanism for specifying a name space for a package is to place a NAMESPACE file in the top level package directory. This file contains name space directives describing the imports and exports of the name space. Additional directives register any shared objects to be loaded and any S3-style methods that are provided. Note that although the file looks like R code (and often has R-style comments) it is not processed as R code. Only very simple conditional processing of if statements is implemented.

Like other packages, packages with name spaces are loaded and attached to the search path by calling library. Only the exported variables are placed in the attached frame. Loading a package that imports variables from other packages will cause these other packages to be loaded as well (unless they have already been loaded), but they will not be placed on the search path by these implicit loads.

Name spaces are sealed once they are loaded. Sealing means that imports and exports cannot be changed and that internal variable bindings cannot be changed. Sealing allows a simpler implementation strategy for the name space mechanism. Sealing also allows code analysis and compilation tools to accurately identify the definition corresponding to a global variable reference in a function body.

Note that adding a name space to a package changes the search strategy. The package name space comes first in the search, then the imports, then the base name space and then the normal search path.


Next: , Previous: Package name spaces, Up: Package name spaces

1.6.1 Specifying imports and exports

Exports are specified using the export directive in the NAMESPACE file. A directive of the form

     export(f, g)

specifies that the variables f and g are to be exported. (Note that variable names may be quoted, and reserved words and non-standard names such as [<-.fractions must be.)

For packages with many variables to export it may be more convenient to specify the names to export with a regular expression using exportPattern. The directive

     exportPattern("^[^\\.]")

exports all variables that do not start with a period.

A package with a name space implicitly imports the base name space. Variables exported from other packages with name spaces need to be imported explicitly using the directives import and importFrom. The import directive imports all exported variables from the specified package(s). Thus the directives

     import(foo, bar)

specifies that all exported variables in the packages foo and bar are to be imported. If only some of the exported variables from a package are needed, then they can be imported using importFrom. The directive

     importFrom(foo, f, g)

specifies that the exported variables f and g of the package foo are to be imported.

It is possible to export variables from a name space that it has imported from other namespaces.

If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo:::f. This is less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file, so this approach is usually not recommended. Evaluating foo:::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage is delaying the loading of a rarely used package.

Using foo:::f allows access to unexported objects: to confine references to exported objects use foo::f.


Next: , Previous: Specifying imports and exports, Up: Package name spaces

1.6.2 Registering S3 methods

The standard method for S3-style UseMethod dispatching might fail to locate methods defined in a package that is imported but not attached to the search path. To ensure that these methods are available the packages defining the methods should ensure that the generics are imported and register the methods using S3method directives. If a package defines a function print.foo intended to be used as a print method for class foo, then the directive

     S3method(print, foo)

ensures that the method is registered and available for UseMethod dispatch. The function print.foo does not need to be exported. Since the generic print is defined in base it does not need to be imported explicitly. This mechanism is intended for use with generics that are defined in a name space. Any methods for a generic defined in a package that does not use a name space should be exported, and the package defining and exporting the methods should be attached to the search path if the methods are to be found.

(Note that function and class names may be quoted, and reserved words and non-standard names such as [<- and function must be.)


Next: , Previous: Registering S3 methods, Up: Package name spaces

1.6.3 Load hooks

There are a number of hooks that apply to packages with name spaces. See help(".onLoad") for more details.

Packages with name spaces do not use the .First.lib function. Since loading and attaching are distinct operations when a name space is used, separate hooks are provided for each. These hook functions are called .onLoad and .onAttach. They take the same arguments as .First.lib; they should be defined in the name space but not exported.

However, packages with name spaces do use the .Last.lib function. There is also a hook .onUnload which is called when the name space is unloaded (via a call to unloadNamespace) with argument the full path to the directory in which the package was installed. .onUnload should be defined in the name space and not exported, but .Last.lib does need to be exported.

Packages are not likely to need .onAttach (except perhaps for a start-up banner); code to set options and load shared objects should be placed in a .onLoad function, or use made of the useDynLib directive described next.

There can be one or more useDynLib directives which allow shared objects that need to be loaded to be specified in the NAMESPACE file. The directive

     useDynLib(foo)

registers the shared object foo for loading with library.dynam. Loading of registered object(s) occurs after the package code has been loaded and before running the load hook function. Packages that would only need a load hook function to load a shared object can use the useDynLib directive instead.

User-level hooks are also available: see the help on function setHook.

The useDynLib directive also accepts the names of the native routines that are to be used in R via the .C, .Call, .Fortran and .External interface functions. These are given as additional arguments to the directive, for example,

     useDynLib(foo, myRoutine, myOtherRoutine)

By specifying these names in the useDynLib directive, the native symbols are resolved when the package is loaded and R variables identifying these symbols are added to the package's name space with these names. These can be used in the .C, .Call, .Fortran and .External calls in place of the name of the routine and the PACKAGE argument. For instance, we can call the routine myRoutine from R with the code

      .Call(myRoutine, x, y)

rather than

      .Call("myRoutine", x, y, PACKAGE = "foo")

There are at least two benefits to this approach. Firstly, the symbol lookup is done just once for each symbol rather than each time it the routine is invoked. Secondly, this removes any ambiguity in resolving symbols that might be present in several compiled libraries. In particular, it allows for correctly resolving routines when different versions of the same package are loaded concurrently in the same R session.

In some circumstances, there will already be an R variable in the package with the same name as a native symbol. For example, we may have an R function in the package named myRoutine. In this case, it is necessary to map the native symbol to a different R variable name. This can be done in the useDynLib directive by using named arguments. For instance, to map the native symbol name myRoutine to the R variable myRoutine_sym, we would use

     useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)

We could then call that routine from R using the command

      .Call(myRoutine_sym, x, y)

Symbols without explicit names are assigned to the R variable with that name.

In some cases, it may be preferable not to create R variables in the package's name space that identify the native routines. It may be too costly to compute these for many routines when the package is loaded if many of these routines are not likely to be used. In this case, one can still perform the symbol resolution correctly using the DLL, but do this each time the routine is called. Given a reference to the DLL as an R variable, say dll, we can call the routine myRoutine using the expression

      .Call(dll$myRoutine, x, y)

The $ operator resolves the routine with the given name in the DLL using a call to getNativeSymbol. This is the same computation as above where we resolve the symbol when the package is loaded. The only difference is that this is done each time in the case of dll$myRoutine.

In order to use this dynamic approach (e.g., dll$myRoutine), one needs the reference to the DLL as an R variable in the package. The DLL can be assigned to a variable by using the variable = dllName format used above for mapping symbols to R variables. For example, if we wanted to assign the DLL reference for the DLL foo in the example above to the variable myDLL, we would use the following directive in the NAMESPACE file:

     myDLL = useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)

Then, the R variable myDLL is in the package's name space and available for calls such as myDLL$dynRoutine to access routines that are not explicitly resolved at load time.

If the package has registration information (see Registering native routines), then we can use that directly rather than specifying the list of symbols again in the useDynLib directive in the NAMESPACE file. Each routine in the registration information is specified by giving a name by which the routine is to be specified along with the address of the routine and any information about the number and type of the parameters. Using the .registration argument of useDynLib, we can instruct the name space mechanism to create R variables for these symbols. For example, suppose we have the following registration information for a DLL named myDLL:

     R_CMethodDef cMethods[] = {
        {"foo", &foo, 4, {REALSXP, INTSXP, STRSXP, LGLSXP}},
        {"bar_sym", &bar, 0},
        {NULL, NULL, 0}
     };
     
     R_CallMethodDef callMethods[] = {
        {"R_call_sym", &R_call, 4},
        {"R_version_sym", &R_version, 0},
        {NULL, NULL, 0}
     };

Then, the directive in the NAMESPACE file

     useDynLib(myDLL, .registration = TRUE)

causes the DLL to be loaded and also for the R variables foo, bar_sym, R_call_sym and R_version_sym to be defined in the package's name space.

Note that the names for the R variables are taken from the entry in the registration information and do not need to be the same as the name of the native routine. This allows the creator of the registration information to map the native symbols to non-conflicting variable names in R, e.g. R_version to R_version_sym for use in an R function such as

     R_version <- function()
     {
       .Call(R_version_sym)
     }

Using argument .fixes allows an automatic prefix to be added to the registered symbols, which can be useful when working with an existing package. For example, package KernSmooth has

     useDynLib(KernSmooth, .registration = TRUE, .fixes = "F_")

which makes the R variables corresponding to the FORTRAN symbols F_bkde and so on, and so avoid clashes with R code in the name space.

More information about this symbol lookup, along with some approaches for customizing it, is available from http://www.omegahat.org/examples/RDotCall.


Next: , Previous: Load hooks, Up: Package name spaces

1.6.4 An example

As an example consider two packages named foo and bar. The R code for package foo in file foo.R is

          x <- 1
          f <- function(y) c(x,y)
          foo <- function(x) .Call("foo", x, PACKAGE="foo")
          print.foo <- function(x, ...) cat("<a foo>\n")

Some C code defines a C function compiled into DLL foo (with an appropriate extension). The NAMESPACE file for this package is

          useDynLib(foo)
          export(f, foo)
          S3method(print, foo)

The second package bar has code file bar.R

          c <- function(...) sum(...)
          g <- function(y) f(c(y, 7))
          h <- function(y) y+9

and NAMESPACE file

          import(foo)
          export(g, h)

Calling library(bar) loads bar and attaches its exports to the search path. Package foo is also loaded but not attached to the search path. A call to g produces

     > g(6)
     [1]  1 13

This is consistent with the definitions of c in the two settings: in bar the function c is defined to be equivalent to sum, but in foo the variable c refers to the standard function c in base.


Next: , Previous: An example, Up: Package name spaces

1.6.5 Summary – converting an existing package

To summarize, converting an existing package to use a name space involves several simple steps:

Some code analysis tools to aid in this process are currently under development.


Previous: Summary -- converting an existing package, Up: Package name spaces

1.6.6 Name spaces with formal classes and methods

Some additional steps are needed for packages which make use of formal (S4-style) classes and methods (unless these are purely used internally). The package should have Depends: methods in its DESCRIPTION file and any classes and methods which are to be exported need to be declared in the NAMESPACE file. For example, the stats package has

     export(mle)
     importFrom(graphics, plot)
     importFrom(stats, AIC, coef, confint, logLik, optim, profile,
     	   qchisq, update, vcov)
     exportClasses(mle, profile.mle, summary.mle)
     exportMethods(BIC, coef, confint, logLik, plot, profile,
                   summary, show, update, vcov)
     export(AIC)

All formal classes need to be listed in an exportClasses directive. Generics for which formal methods are defined need to be declared in an exportMethods directive, and where the generics are formed by taking over existing functions, those functions need to be imported (explicitly unless they are defined in the base name space).

Note that exporting methods on a generic in the namespace will also export the generic, and exporting a generic in the namespace will also export its methods. Where a generic has been created in the package solely to add S4 methods to it, it can be declared via either or both of exports or exportMethods, but the latter seems clearer (and is used in the stats4 example above). On the other hand, where a generic is created in a package without methods (such as AIC in stats4), exports must be used.

Further, a package using classes and methods defined in another package needs to import them, with directives

     importClassesFrom(package, ...)
     importMethodsFrom(package, ...)

listing the classes and functions with methods respectively. Suppose we had two small packages A and B with B using A. Then they could have NAMESPACE files

          export(f1, ng1)
          exportMethods("[")
          exportClasses(c1)

and

          importFrom(A, ng1)
          importClassesFrom(A, c1)
          importMethodsFrom(A, f1)
          export(f4, f5)
          exportMethods(f6, "[")
          exportClasses(c1, c2)

respectively.

Note that importMethodsFrom will also import any generics defined in the namespace on those methods.

If your package imports the whole of a name space, it will automatically import the classes from that namespace. It will also import methods, but it is best to do so explicitly, especially where there are methods being imported from more than one namespace.


Next: , Previous: Package name spaces, Up: Creating R packages

1.7 Writing portable packages

Portable packages should have simple file names: use only alphanumeric ASCII characters and ., and avoid those names not allowed under Windows which are mentioned above.

R CMD check provides a basic set of checks, but often further problems emerge when people try to install and use packages submitted to CRAN – many of these involve compiled code. Here are some further checks that you can do to make your package more portable.


Previous: Writing portable packages, Up: Writing portable packages

1.7.1 Encoding issues

Care is needed if your package contains non-ASCII text, an