Preamble

For interactive / manual vignette building, setting caching instead to TRUE in the above code chunk can dramatically speed up the builds, by caching the results of each chunk. Note that the above chunk is not shown in the built version of the vignette because of the echo=FALSE option.

First load the analysis library:

library(ProjectAsPackage)

R Markdown

This vignette is written in Rmarkdown. You can find details about markdown here and details about code chunks here.

Data

Note that there are several ways to include data in your package, documented at http://r-pkgs.had.co.nz/data.html. ProjectAsPackage demonstrates two ways:

  1. as a prepared R object (in this case data.frame) prepared and stored as data/gardasil.rdain the package’s directory and documented in the file R/gardasil.R.
  2. as a .csv file inst/extdata/gardasil.csv

Option 1 - data cleaning done prior to saving an R object in package directory data/gardasil.rda:

data(gardasil)
class(gardasil)
#> [1] "data.frame"
summary(gardasil)
#>       Age         AgeGroup         Race         Shots       Completed
#>  Min.   :11.00   18-26:712   white   :732   Min.   :1.000   no :944  
#>  1st Qu.:15.00   11-17:701   black   :443   1st Qu.:1.000   yes:469  
#>  Median :18.00               hispanic: 52   Median :2.000            
#>  Mean   :18.55               NA's    :186   Mean   :2.069            
#>  3rd Qu.:22.00                              3rd Qu.:3.000            
#>  Max.   :26.00                              Max.   :3.000            
#>             InsuranceType MedAssist                             Location  
#>  medical assistance:275   yes: 275   Odenton                        :798  
#>  private payer     :723   no :1138   White Marsh                    :165  
#>  hospital based    : 84              Johns Hopkins Outpatient Center: 89  
#>  military          :331              Bayview                        :361  
#>                                                                           
#>                                                                           
#>    LocationType          PracticeType   RaceSummary 
#>  urban   :450   pediatric      :515   white   :732  
#>  suburban:963   family practice:365   minority:495  
#>                 OB-GYN         :533   NA's    :186  
#>                                                     
#>                                                     
#> 

The data-cleaning script used to create the data/gardasil.rda file is traditionally provided in the data-raw/ Note that there is a help file for this dataset as well, serving as a codebook. This was generated by the file R/gardasil.R using roxygen markup, then using roxygen2 to automatically create the documentation file man/gardasil.Rd:

?gardasil

Option 2 - a function to load and clean the raw data

See the function R/read_gardasil.R, with automatically-generated help page man/read_gardasil.Rd.

?read_gardasil
gardasil2 <- read_gardasil()
class(gardasil2)
#> [1] "data.frame"
summary(gardasil2)
#>       Age         AgeGroup              Race         Shots       Completed
#>  Min.   :11.00   18-26:712   white        :732   Min.   :1.000   no :944  
#>  1st Qu.:15.00   11-17:701   black        :443   1st Qu.:1.000   yes:469  
#>  Median :18.00               hispanic     : 52   Median :2.000            
#>  Mean   :18.55               other/unknown:186   Mean   :2.069            
#>  3rd Qu.:22.00                                   3rd Qu.:3.000            
#>  Max.   :26.00                                   Max.   :3.000            
#>             InsuranceType MedAssist                             Location  
#>  medical assistance:275   yes: 275   Odenton                        :798  
#>  private payer     :723   no :1138   White Marsh                    :165  
#>  hospital based    : 84              Johns Hopkins Outpatient Center: 89  
#>  military          :331              Bayview                        :361  
#>                                                                           
#>                                                                           
#>    LocationType          PracticeType   RaceSummary 
#>  urban   :450   pediatric      :515   white   :732  
#>  suburban:963   family practice:365   minority:495  
#>                 OB-GYN         :533   NA's    :186  
#>                                                     
#>                                                     
#> 

You may wish to provide arguments to this function, e.g.:

summary(gardasil2$RaceSummary)
#>    white minority     NA's 
#>      732      495      186
gardasil3 <- read_gardasil(other.as.NA = TRUE)
summary(gardasil3$RaceSummary)
#>    white minority     NA's 
#>      732      495      186

Analysis

Regression analysis

Univariate regressions like the following reproduce Table 2 of Chou et al. Relative to public insurance, holders of private, hospital-based, and military insurance are more likely to complete the vaccine series.

fitins <- glm(Completed ~ InsuranceType,
           data=gardasil,
           family=binomial(link='logit'))
summary(fitins)
#> 
#> Call:
#> glm(formula = Completed ~ InsuranceType, family = binomial(link = "logit"), 
#>     data = gardasil)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.1173  -0.9281  -0.9281   1.4129   1.7941  
#> 
#> Coefficients:
#>                             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                  -1.3863     0.1508  -9.196  < 2e-16 ***
#> InsuranceTypeprivate payer    0.7670     0.1697   4.519 6.22e-06 ***
#> InsuranceTypehospital based   1.2432     0.2657   4.679 2.88e-06 ***
#> InsuranceTypemilitary         0.8480     0.1890   4.487 7.21e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 1796.0  on 1412  degrees of freedom
#> Residual deviance: 1763.1  on 1409  degrees of freedom
#> AIC: 1771.1
#> 
#> Number of Fisher Scoring iterations: 4
round(exp(coef(fitins)), 2)
#>                 (Intercept)  InsuranceTypeprivate payer 
#>                        0.25                        2.15 
#> InsuranceTypehospital based       InsuranceTypemilitary 
#>                        3.47                        2.33

An example figure

mosaicplot(InsuranceType ~ Shots, data=gardasil)
Mosaic plot of number of shots completed against type of insurance

Mosaic plot of number of shots completed against type of insurance

Session Information

This shows which packages and versions were used for the analysis. Note that under RStudio “Tools - Project Options” you can select “Packrat” to create a project-specific library to ensure other analysts will run the exact same version of all packages.

options(width = 100)
sessionInfo()
#> R version 3.6.2 (2017-01-27)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/travis/R-bin/lib/R/lib/libRblas.so
#> LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
#>  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
#> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ProjectAsPackage_0.0.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4       rprojroot_1.3-2  crayon_1.3.4     digest_0.6.25    assertthat_0.2.1
#>  [6] MASS_7.3-51.4    R6_2.4.1         backports_1.1.5  magrittr_1.5     evaluate_0.14   
#> [11] highr_0.8        stringi_1.4.6    rlang_0.4.5      fs_1.3.2         rmarkdown_2.1   
#> [16] pkgdown_1.5.0    desc_1.2.0       tools_3.6.2      stringr_1.4.0    yaml_2.2.1      
#> [21] xfun_0.12        compiler_3.6.2   memoise_1.1.0    htmltools_0.4.0  knitr_1.28

Note, if you have devtools installed you might use devtools::session_info() to provide a more detailed and structured session info.