A group of helper functions for manipulating and cleaning a MultiAssayExperiment — MultiAssayExperiment-helpers • MultiAssayExperiment

A set of helper functions were created to help clean and manipulate a MultiAssayExperiment object. intersectRows also works for ExperimentList objects.

complete.cases: Returns a logical vector corresponding to 'colData' rows that have data across all experiments
isEmpty: Returns a logical TRUE value for zero length MultiAssayExperiment objects
intersectRows: Takes all common rows across experiments, excludes experiments with empty rownames
intersectColumns: A wrapper for complete.cases to return a MultiAssayExperiment with only those biological units that have measurements across all experiments
replicated: Identifies, via logical vectors, colnames that originate from a single biological unit within each assay
replicates: Provides the replicate colnames found with the replicated function by their name, empty list if none
anyReplicated: Whether the assay has replicate measurements
showReplicated: Displays the actual columns that are replicated per assay and biological unit, i.e., primary value (colData rowname) in the sampleMap
mergeReplicates: A function that combines replicated / repeated measurements across all experiments and is guided by the replicated return value
longFormat: A MultiAssayExperiment method that returns a small and skinny DataFrame. The colDataCols arguments allows the user to append colData columns to the data.
wideFormat: A function to reshape the data in a MultiAssayExperiment to a "wide" format DataFrame. Each row in the DataFrame represents an observation (corresponding to an entry in the colData). If replicates are present, their data will be appended at the end of the corresponding row and will generate additional NA data. It is recommended to remove or consolidate technical replicates with mergeReplicates. Optional colDataCols can be added when the original object is a MultiAssayExperiment.
hasRowRanges: A function that identifies ExperimentList elements that have a rowRanges method
getWithColData: A convenience function for extracting an assay and associated colData
renamePrimary: A convenience function to rename the primary biological units as represented in the rownames(colData)
renameColname: A convenience function to rename the colnames of a particular assay

Usage

# S4 method for class 'MultiAssayExperiment'
complete.cases(...)

# S4 method for class 'MultiAssayExperiment'
isEmpty(x)

intersectRows(x)

intersectColumns(x)

replicated(x)

# S4 method for class 'MultiAssayExperiment'
replicated(x)

anyReplicated(x)

# S4 method for class 'MultiAssayExperiment'
anyReplicated(x)

showReplicated(x)

# S4 method for class 'MultiAssayExperiment'
showReplicated(x)

replicates(x, ...)

# S4 method for class 'MultiAssayExperiment'
replicates(x, ...)

mergeReplicates(x, replicates = list(), simplify = BiocGenerics::mean, ...)

# S4 method for class 'MultiAssayExperiment'
mergeReplicates(
  x,
  replicates = replicated(x),
  simplify = BiocGenerics::mean,
  ...
)

# S4 method for class 'ANY'
mergeReplicates(x, replicates = list(), simplify = BiocGenerics::mean, ...)

longFormat(object, colDataCols = NULL, i = 1L)

wideFormat(
  object,
  colDataCols = NULL,
  check.names = TRUE,
  collapse = "_",
  i = 1L
)

hasRowRanges(x)

# S4 method for class 'MultiAssayExperiment'
hasRowRanges(x)

# S4 method for class 'ExperimentList'
hasRowRanges(x)

getWithColData(x, i, mode = c("append", "replace"), verbose = FALSE)

renamePrimary(x, value)

renameColname(x, i, value)

splitAssays(x, hitList)

# S4 method for class 'MultiAssayExperiment'
splitAssays(x, hitList)

makeHitList(x, patternList)

Arguments

...: Additional arguments. See details for more information.
x: A MultiAssayExperiment or ExperimentList
replicates: A list of LogicalLists indicating multiple / duplicate entries for each biological unit per assay, see replicated (default replicated(x)).
simplify: A function for merging repeat measurements in experiments as indicated by the replicated method for MultiAssayExperiment
object: Any supported class object
colDataCols: A character, logical, or numeric index for colData columns to be included
i: longFormat: The i-th assay in SummarizedExperiment-like objects. A vector input is supported in the case that the SummarizedExperiment object(s) has more than one assay (default 1L), renameColname: Either a numeric or character index indicating the assay whose colnames are to be renamed
check.names: (logical default TRUE) Column names of the output DataFrame will be checked for syntactic validity and made unique, if necessary
collapse: (character default "_") A single string delimiter for output column names. In wideFormat, experiments and rownames (and when replicate samples are present, colnames) are seperated by this delimiter
mode: String indicating how MultiAssayExperiment column-level metadata should be added to the SummarizedExperiment colData.
verbose: logical(1) Whether to suppressMessages on subsetting operations in getWithColData (default FALSE)
value: renamePrimary: A character vector of the same length as the existing rownames(colData) to use for replacement, renameColname: A CharacterList or list with matching lengths to replace colnames(x)
hitList: a named list or List of logical vectors that indicate groupings in the assays
patternList: a named list or List of atomic character vectors that are the input to grepl for identifying groupings in the assays

Value

See the itemized list in the description section for details.

Details

The replicated function finds replicate measurements in each assay and returns a list of LogicalLists. Each element in a single LogicalList corresponds to a biological or primary unit as in the sampleMap. Below is a small graphic for one particular biological unit in one assay, where the logical vector corresponds to the number of measurements/samples in the assay:


 >      replicated(MultiAssayExperiment)
 (list str)       '-- $ AssayName
 (LogicalList str)      '-- [[ "Biological Unit" ]]
 Replicated if sum(...) > 1          '-- TRUE TRUE FALSE FALSE

anyReplicated determines if any of the assays have at least one replicate. Note. These methods are not available for the ExperimentList class due to a missing sampleMap structure (by design). showReplicated returns a list of CharacterLists where each element corresponds to the the biological or primary units that are replicated in that assay element. The values in the inner list are the colnames in the assay that are technical replicates.

The replicates function (noun) returns the colnames from the sampleMap that were identified as replicates. It returns a list of CharacterLists for each assay present in the MultiAssayExperiment and an inner entry for each biological unit that has replicate observations in that assay.

The mergeReplicates function is a house-keeping method for a MultiAssayExperiment where only complete.cases are returned. This by-assay operation averages replicate measurements (by default) and columns are aligned by the row order in colData. Users can provide their own function for merging replicates with the simplify functional argument. Additional inputs ... are sent to the 'simplify' function.

The mergeReplicates "ANY" method consolidates duplicate measurements for rectangular data structures, returns object of the same class (endomorphic). The ellipsis or ... argument allows the user to provide additional arguments to the simplify functional argument.

The longFormat "ANY" class method, works with classes such as ExpressionSet and SummarizedExperiment as well as matrix to provide a consistent long and skinny DataFrame.

The hasRowRanges method identifies assays that support a rowRanges method and return a GRanges object.

mergeReplicates

The mergeReplicates function makes use of the output from replicated which will point out the duplicate measurements by biological unit in the MultiAssayExperiment. This function will return a MultiAssayExperiment with merged replicates. Additional arguments can be provided to the simplify argument via the ellipsis (...). For example, when replicates "TCGA-B" and "TCGA-A" are found in the assay, the name of the first appearing replicate is taken (i.e., "B"). Note that a typical use case of merging replicates occurs when there are multiple measurements on the same sample (within the same assay) and can therefore be averaged.

longFormat

The 'longFormat' method takes data from the ExperimentList in a MultiAssayExperiment and returns a uniform DataFrame. The resulting DataFrame has columns indicating primary, rowname, colname and value. This method can optionally include columns of the MultiAssayExperiment colData named by colDataCols character vector argument. (MultiAssayExperiment method only). The i argument allows the user to specify the assay value for the SummarizedExperiment assay function's i argument.

wideFormat

The wideFormat function returns standardized wide DataFrame where each row represents a biological unit as in the colData. Depending on the data and setup, biological units can be patients, tumors, specimens, etc. Metadata columns are generated based on the names produced in the wide format DataFrame. These can be accessed via the mcols() function. See the wideFormat section for description of the colDataCols and i arguments.

hasRowRanges

The hasRowRanges method identifies assays with associated ranged row data by directly testing the method on the object. The result from the test must be a GRanges class object to satisfy the test.

getWithColData

The getWithColData function allows the user to conveniently extract a particular assay as indicated by the i index argument. It will also attempt to provide the colData along with the extracted object using the colData<- replacement method when possible. Typically, this method is available for SummarizedExperiment and RaggedExperiment classes.

The setting of mode determines how the colData is added. If mode="append", the MultiAssayExperiment metadata is appended onto that of the SummarizedExperiment. If any fields are duplicated by name, the values in the SummarizedExperiment are retained, with a warning emitted if the values are different. For mode="replace", the MultiAssayExperiment metadata replaces that of the SummarizedExperiment, while for mode="none", no replacement or appending is performed.

rename*

The renamePrimary function allows the user to conveniently change the actual names of the primary biological units as seen in rownames(colData). renameColname allows the user to change the names of a particular assay based on index i. i can either be a single numeric or character value. See colnames<- method for renaming multiple colnames in a MultiAssayExperiment.

splitAssays

The splitAssays method separates columns in each of the assays based on the hitList input. The hitList can be generated using the makeHitList helper function. To use the makeHitList helper, the user should input a list of patterns that will match on the column names of each assay. These matches should be mutually exclusive as to avoid repetition of columns across assays. See the examples section.

Examples


example(MultiAssayExperiment)
#> 
#> MltAsE> ## Run the example ExperimentList
#> MltAsE> example("ExperimentList")
#> 
#> ExprmL> ## Create an empty ExperimentList instance
#> ExprmL> ExperimentList()
#> ExperimentList class object of length 0:
#>  
#> ExprmL> ## Create array matrix and AnnotatedDataFrame to create an ExpressionSet class
#> ExprmL> arraydat <- matrix(data = seq(101, length.out = 20), ncol = 4,
#> ExprmL+     dimnames = list(
#> ExprmL+         c("ENST00000294241", "ENST00000355076",
#> ExprmL+         "ENST00000383706","ENST00000234812", "ENST00000383323"),
#> ExprmL+         c("array1", "array2", "array3", "array4")
#> ExprmL+     ))
#> 
#> ExprmL> colDat <- data.frame(slope53 = rnorm(4),
#> ExprmL+     row.names = c("array1", "array2", "array3", "array4"))
#> 
#> ExprmL> ## SummarizedExperiment constructor
#> ExprmL> exprdat <- SummarizedExperiment::SummarizedExperiment(arraydat,
#> ExprmL+     colData = colDat)
#> 
#> ExprmL> ## Create a sample methylation dataset
#> ExprmL> methyldat <- matrix(data = seq(1, length.out = 25), ncol = 5,
#> ExprmL+     dimnames = list(
#> ExprmL+         c("ENST00000355076", "ENST00000383706",
#> ExprmL+           "ENST00000383323", "ENST00000234812", "ENST00000294241"),
#> ExprmL+         c("methyl1", "methyl2", "methyl3",
#> ExprmL+           "methyl4", "methyl5")
#> ExprmL+     ))
#> 
#> ExprmL> ## Create a sample RNASeqGene dataset
#> ExprmL> rnadat <- matrix(
#> ExprmL+     data = sample(c(46851, 5, 19, 13, 2197, 507,
#> ExprmL+         84318, 126, 17, 21, 23979, 614), size = 20, replace = TRUE),
#> ExprmL+     ncol = 4,
#> ExprmL+     dimnames = list(
#> ExprmL+         c("XIST", "RPS4Y1", "KDM5D", "ENST00000383323", "ENST00000234812"),
#> ExprmL+         c("samparray1", "samparray2", "samparray3", "samparray4")
#> ExprmL+     ))
#> 
#> ExprmL> ## Create a mock RangedSummarizedExperiment from a data.frame
#> ExprmL> rangedat <- data.frame(chr="chr2", start = 11:15, end = 12:16,
#> ExprmL+     strand = c("+", "-", "+", "*", "."),
#> ExprmL+     samp0 = c(0,0,1,1,1), samp1 = c(1,0,1,0,1), samp2 = c(0,1,0,1,0),
#> ExprmL+     row.names = c(paste0("ENST", "00000", 135411:135414), "ENST00000383323"))
#> 
#> ExprmL> rangeSE <- SummarizedExperiment::makeSummarizedExperimentFromDataFrame(rangedat)
#> 
#> ExprmL> ## Combine to a named list and call the ExperimentList constructor function
#> ExprmL> assayList <- list(Affy = exprdat, Methyl450k = methyldat, RNASeqGene = rnadat,
#> ExprmL+                 GISTIC = rangeSE)
#> 
#> ExprmL> ## Use the ExperimentList constructor
#> ExprmL> ExpList <- ExperimentList(assayList)
#> 
#> MltAsE> ## Create sample maps for each experiment
#> MltAsE> exprmap <- data.frame(
#> MltAsE+     primary = c("Jack", "Jill", "Barbara", "Bob"),
#> MltAsE+     colname = c("array1", "array2", "array3", "array4"),
#> MltAsE+     stringsAsFactors = FALSE)
#> 
#> MltAsE> methylmap <- data.frame(
#> MltAsE+     primary = c("Jack", "Jack", "Jill", "Barbara", "Bob"),
#> MltAsE+     colname = c("methyl1", "methyl2", "methyl3", "methyl4", "methyl5"),
#> MltAsE+     stringsAsFactors = FALSE)
#> 
#> MltAsE> rnamap <- data.frame(
#> MltAsE+     primary = c("Jack", "Jill", "Bob", "Barbara"),
#> MltAsE+     colname = c("samparray1", "samparray2", "samparray3", "samparray4"),
#> MltAsE+     stringsAsFactors = FALSE)
#> 
#> MltAsE> gistmap <- data.frame(
#> MltAsE+     primary = c("Jack", "Bob", "Jill"),
#> MltAsE+     colname = c("samp0", "samp1", "samp2"),
#> MltAsE+     stringsAsFactors = FALSE)
#> 
#> MltAsE> ## Combine as a named list and convert to a DataFrame
#> MltAsE> maplist <- list(Affy = exprmap, Methyl450k = methylmap,
#> MltAsE+     RNASeqGene = rnamap, GISTIC = gistmap)
#> 
#> MltAsE> ## Create a sampleMap
#> MltAsE> sampMap <- listToMap(maplist)
#> 
#> MltAsE> ## Create an example phenotype data
#> MltAsE> colDat <- data.frame(sex = c("M", "F", "M", "F"), age = 38:41,
#> MltAsE+     row.names = c("Jack", "Jill", "Bob", "Barbara"))
#> 
#> MltAsE> ## Create a MultiAssayExperiment instance
#> MltAsE> mae <- MultiAssayExperiment(experiments = ExpList, colData = colDat,
#> MltAsE+     sampleMap = sampMap)

complete.cases(mae)
#> [1]  TRUE  TRUE  TRUE FALSE

isEmpty(MultiAssayExperiment())
#> [1] TRUE


## renaming biological units (primary)

mae2 <- renamePrimary(mae, paste0("pt", 1:4))
colData(mae2)
#> DataFrame with 4 rows and 2 columns
#>             sex       age
#>     <character> <integer>
#> pt1           M        38
#> pt2           F        39
#> pt3           M        40
#> pt4           F        41
sampleMap(mae2)
#> DataFrame with 16 rows and 3 columns
#>          assay     primary     colname
#>       <factor> <character> <character>
#> 1   Affy               pt1      array1
#> 2   Affy               pt2      array2
#> 3   Affy               pt4      array3
#> 4   Affy               pt3      array4
#> 5   Methyl450k         pt1     methyl1
#> ...        ...         ...         ...
#> 12  RNASeqGene         pt3  samparray3
#> 13  RNASeqGene         pt4  samparray4
#> 14  GISTIC             pt1       samp0
#> 15  GISTIC             pt3       samp1
#> 16  GISTIC             pt2       samp2


## renaming observational units (colname)

mae2 <- renameColname(mae, i = "Affy", paste0("ARRAY", 1:4))
colnames(mae2)
#> CharacterList of length 4
#> [["Affy"]] ARRAY1 ARRAY2 ARRAY3 ARRAY4
#> [["Methyl450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
#> [["RNASeqGene"]] samparray1 samparray2 samparray3 samparray4
#> [["GISTIC"]] samp0 samp1 samp2
sampleMap(mae2)
#> DataFrame with 16 rows and 3 columns
#>          assay     primary     colname
#>       <factor> <character> <character>
#> 1   Affy              Jack      ARRAY1
#> 2   Affy              Jill      ARRAY2
#> 3   Affy           Barbara      ARRAY3
#> 4   Affy               Bob      ARRAY4
#> 5   Methyl450k        Jack     methyl1
#> ...        ...         ...         ...
#> 12  RNASeqGene         Bob  samparray3
#> 13  RNASeqGene     Barbara  samparray4
#> 14  GISTIC            Jack       samp0
#> 15  GISTIC             Bob       samp1
#> 16  GISTIC            Jill       samp2



patts <- list(
    normals = "TCGA-[A-Z0-9]{2}-[A-Z0-9]{4}-11",
    tumors = "TCGA-[A-Z0-9]{2}-[A-Z0-9]{4}-01"
)

data("miniACC")

hits <- makeHitList(miniACC, patts)

## only turmors present
splitAssays(miniACC, hits)
#> A MultiAssayExperiment object of 5 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 5:
#>  [1] tumors_RNASeq2GeneNorm: SummarizedExperiment with 198 rows and 79 columns
#>  [2] tumors_gistict: SummarizedExperiment with 198 rows and 90 columns
#>  [3] tumors_RPPAArray: SummarizedExperiment with 33 rows and 46 columns
#>  [4] tumors_Mutations: matrix with 97 rows and 90 columns
#>  [5] tumors_miRNASeqGene: SummarizedExperiment with 471 rows and 80 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files