analysis.Rmd
For interactive / manual vignette building, setting caching instead to TRUE in the above code chunk can dramatically speed up the builds, by caching the results of each chunk. Note that the above chunk is not shown in the built version of the vignette because of the echo=FALSE
option.
First load the analysis library:
library(ProjectAsPackage)
This vignette is written in Rmarkdown. You can find details about markdown here and details about code chunks here.
Note that there are several ways to include data in your package, documented at http://r-pkgs.had.co.nz/data.html. ProjectAsPackage demonstrates two ways:
data.frame
) prepared and stored as data/gardasil.rda
in the package’s directory and documented in the file R/gardasil.R
.inst/extdata/gardasil.csv
data/gardasil.rda
:data(gardasil) class(gardasil) #> [1] "data.frame" summary(gardasil) #> Age AgeGroup Race Shots Completed #> Min. :11.00 18-26:712 white :732 Min. :1.000 no :944 #> 1st Qu.:15.00 11-17:701 black :443 1st Qu.:1.000 yes:469 #> Median :18.00 hispanic: 52 Median :2.000 #> Mean :18.55 NA's :186 Mean :2.069 #> 3rd Qu.:22.00 3rd Qu.:3.000 #> Max. :26.00 Max. :3.000 #> InsuranceType MedAssist Location #> medical assistance:275 yes: 275 Odenton :798 #> private payer :723 no :1138 White Marsh :165 #> hospital based : 84 Johns Hopkins Outpatient Center: 89 #> military :331 Bayview :361 #> #> #> LocationType PracticeType RaceSummary #> urban :450 pediatric :515 white :732 #> suburban:963 family practice:365 minority:495 #> OB-GYN :533 NA's :186 #> #> #>
The data-cleaning script used to create the data/gardasil.rda
file is traditionally provided in the data-raw/
Note that there is a help file for this dataset as well, serving as a codebook. This was generated by the file R/gardasil.R
using roxygen markup, then using roxygen2 to automatically create the documentation file man/gardasil.Rd
:
?gardasil
See the function R/read_gardasil.R
, with automatically-generated help page man/read_gardasil.Rd
.
?read_gardasil
gardasil2 <- read_gardasil() class(gardasil2) #> [1] "data.frame" summary(gardasil2) #> Age AgeGroup Race Shots Completed #> Min. :11.00 18-26:712 white :732 Min. :1.000 no :944 #> 1st Qu.:15.00 11-17:701 black :443 1st Qu.:1.000 yes:469 #> Median :18.00 hispanic : 52 Median :2.000 #> Mean :18.55 other/unknown:186 Mean :2.069 #> 3rd Qu.:22.00 3rd Qu.:3.000 #> Max. :26.00 Max. :3.000 #> InsuranceType MedAssist Location #> medical assistance:275 yes: 275 Odenton :798 #> private payer :723 no :1138 White Marsh :165 #> hospital based : 84 Johns Hopkins Outpatient Center: 89 #> military :331 Bayview :361 #> #> #> LocationType PracticeType RaceSummary #> urban :450 pediatric :515 white :732 #> suburban:963 family practice:365 minority:495 #> OB-GYN :533 NA's :186 #> #> #>
You may wish to provide arguments to this function, e.g.:
summary(gardasil2$RaceSummary) #> white minority NA's #> 732 495 186 gardasil3 <- read_gardasil(other.as.NA = TRUE) summary(gardasil3$RaceSummary) #> white minority NA's #> 732 495 186
Univariate regressions like the following reproduce Table 2 of Chou et al. Relative to public insurance, holders of private, hospital-based, and military insurance are more likely to complete the vaccine series.
fitins <- glm(Completed ~ InsuranceType, data=gardasil, family=binomial(link='logit')) summary(fitins) #> #> Call: #> glm(formula = Completed ~ InsuranceType, family = binomial(link = "logit"), #> data = gardasil) #> #> Deviance Residuals: #> Min 1Q Median 3Q Max #> -1.1173 -0.9281 -0.9281 1.4129 1.7941 #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -1.3863 0.1508 -9.196 < 2e-16 *** #> InsuranceTypeprivate payer 0.7670 0.1697 4.519 6.22e-06 *** #> InsuranceTypehospital based 1.2432 0.2657 4.679 2.88e-06 *** #> InsuranceTypemilitary 0.8480 0.1890 4.487 7.21e-06 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for binomial family taken to be 1) #> #> Null deviance: 1796.0 on 1412 degrees of freedom #> Residual deviance: 1763.1 on 1409 degrees of freedom #> AIC: 1771.1 #> #> Number of Fisher Scoring iterations: 4 round(exp(coef(fitins)), 2) #> (Intercept) InsuranceTypeprivate payer #> 0.25 2.15 #> InsuranceTypehospital based InsuranceTypemilitary #> 3.47 2.33
This shows which packages and versions were used for the analysis. Note that under RStudio “Tools - Project Options” you can select “Packrat” to create a project-specific library to ensure other analysts will run the exact same version of all packages.
options(width = 100) sessionInfo() #> R version 3.6.2 (2017-01-27) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 16.04.6 LTS #> #> Matrix products: default #> BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so #> LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 #> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C #> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] ProjectAsPackage_0.0.1 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.4 rprojroot_1.3-2 crayon_1.3.4 digest_0.6.25 assertthat_0.2.1 #> [6] MASS_7.3-51.4 R6_2.4.1 backports_1.1.5 magrittr_1.5 evaluate_0.14 #> [11] highr_0.8 stringi_1.4.6 rlang_0.4.5 fs_1.3.2 rmarkdown_2.1 #> [16] pkgdown_1.5.0 desc_1.2.0 tools_3.6.2 stringr_1.4.0 yaml_2.2.1 #> [21] xfun_0.12 compiler_3.6.2 memoise_1.1.0 htmltools_0.4.0 knitr_1.28
Note, if you have devtools installed you might use devtools::session_info()
to provide a more detailed and structured session info.