A group of helper functions for manipulating and cleaning a MultiAssayExperiment
Source:R/MultiAssayExperiment-helpers.R
MultiAssayExperiment-helpers.RdA set of helper functions were created to help clean and
manipulate a MultiAssayExperiment object. intersectRows also works
for ExperimentList objects.
complete.cases: Returns a logical vector corresponding to 'colData' rows that have data across all experiments
isEmpty: Returns a logical
TRUEvalue for zero lengthMultiAssayExperimentobjectsintersectRows: Takes all common rows across experiments, excludes experiments with empty rownames
intersectColumns: A wrapper for
complete.casesto return aMultiAssayExperimentwith only those biological units that have measurements across all experimentsreplicated: Identifies, via logical vectors,
colnames that originate from a single biological unit within each assayreplicates: Provides the replicate
colnames found with thereplicatedfunction by their name, empty list if noneanyReplicated: Whether the assay has replicate measurements
showReplicated: Displays the actual columns that are replicated per assay and biological unit, i.e.,
primaryvalue (colDatarowname) in thesampleMapmergeReplicates: A function that combines replicated / repeated measurements across all experiments and is guided by the replicated return value
longForm: A
MultiAssayExperimentmethod that returns a small and skinnyDataFrame. ThecolDataColsarguments allows the user to appendcolDatacolumns to the data.wideFormat: A function to reshape the data in a
MultiAssayExperimentto a "wide" formatDataFrame. Each row in theDataFramerepresents an observation (corresponding to an entry in thecolData). If replicates are present, their data will be appended at the end of the corresponding row and will generate additionalNAdata. It is recommended to remove or consolidate technical replicates withmergeReplicates. OptionalcolDataColscan be added when the original object is aMultiAssayExperiment.hasRowRanges: A function that identifies ExperimentList elements that have a
rowRangesmethodhasRowData: A function that identifies ExperimentList elements that have a
rowDatamethodgetWithColData: A convenience function for extracting an assay and associated colData
renamePrimary: A convenience function to rename the primary biological units as represented in the
rownames(colData)renameColname: A convenience function to rename the colnames of a particular assay
Usage
# S4 method for class 'MultiAssayExperiment'
complete.cases(...)
# S4 method for class 'MultiAssayExperiment'
isEmpty(x)
intersectRows(x)
intersectColumns(x)
replicated(x)
# S4 method for class 'MultiAssayExperiment'
replicated(x)
anyReplicated(x)
# S4 method for class 'MultiAssayExperiment'
anyReplicated(x)
showReplicated(x)
# S4 method for class 'MultiAssayExperiment'
showReplicated(x)
replicates(x, ...)
# S4 method for class 'MultiAssayExperiment'
replicates(x, ...)
mergeReplicates(x, replicates = list(), simplify = BiocGenerics::mean, ...)
# S4 method for class 'MultiAssayExperiment'
mergeReplicates(
x,
replicates = replicated(x),
simplify = BiocGenerics::mean,
...
)
# S4 method for class 'ANY'
mergeReplicates(x, replicates = list(), simplify = BiocGenerics::mean, ...)
# S4 method for class 'MultiAssayExperiment'
longForm(object, colDataCols = NULL, i = 1L, ...)
# S4 method for class 'ExperimentList'
longForm(object, colDataCols, i = 1L, ...)
# S4 method for class 'ANY'
longForm(object, colDataCols, i = 1L, ...)
wideFormat(
object,
colDataCols = NULL,
check.names = TRUE,
collapse = "_",
i = 1L
)
hasRowRanges(x)
# S4 method for class 'MultiAssayExperiment'
hasRowRanges(x)
# S4 method for class 'ExperimentList'
hasRowRanges(x)
hasRowData(x)
# S4 method for class 'MultiAssayExperiment'
hasRowData(x)
# S4 method for class 'ExperimentList'
hasRowData(x)
getWithColData(x, i, mode = c("append", "replace"), verbose = FALSE)
renamePrimary(x, value)
renameColname(x, i, value)
splitAssays(x, hitList)
# S4 method for class 'MultiAssayExperiment'
splitAssays(x, hitList)
makeHitList(x, patternList)Arguments
- ...
Additional arguments. See details for more information.
- x
A MultiAssayExperiment or ExperimentList
- replicates
A list of
LogicalLists indicating multiple / duplicate entries for each biological unit per assay, seereplicated(defaultreplicated(x)).- simplify
A function for merging repeat measurements in experiments as indicated by the
replicatedmethod forMultiAssayExperiment- object
Any supported class object
- colDataCols
A
character,logical, ornumericindex forcolDatacolumns to be included- i
longForm: The i-th assay in
SummarizedExperiment-like objects. A vector input is supported in the case that theSummarizedExperimentobject(s) has more than one assay (default 1L), renameColname: Either anumericorcharacterindex indicating the assay whose colnames are to be renamed- check.names
logical(1)Column names of the outputDataFramewill be run throughmake.namesto ensure syntactic validity (defaultTRUE).- collapse
character(1)A single string delimiter (default "_") for output column names. InwideFormat, experiments andrownames(and when replicate samples are present,colnames) are separated by this delimiter- mode
String indicating how
MultiAssayExperimentcolumn-level metadata should be added to theSummarizedExperimentcolData.- verbose
logical(1)Whether tosuppressMessageson subsetting operations ingetWithColData(default FALSE)- value
renamePrimary: A
charactervector of the same length as the existingrownames(colData)to use for replacement, renameColname: ACharacterListorlistwith matchinglengthsto replacecolnames(x)- hitList
a named
listorListof logical vectors that indicate groupings in the assays- patternList
a named
listorListof atomic character vectors that are the input togreplfor identifying groupings in the assays
Details
The replicated function finds replicate measurements in each
assay and returns a list of LogicalLists.
Each element in a single LogicalList corresponds to
a biological or primary unit as in the sampleMap. Below is a
small graphic for one particular biological unit in one assay, where the
logical vector corresponds to the number of measurements/samples in the
assay:
> replicated(MultiAssayExperiment)
(list str) '-- $ AssayName
(LogicalList str) '-- [[ "Biological Unit" ]]
Replicated if sum(...) > 1 '-- TRUE TRUE FALSE FALSE
anyReplicated determines if any of the assays have at least one
replicate. Note. These methods are not available for the
ExperimentList class due to a missing sampleMap structure
(by design).
showReplicated returns a list of CharacterLists
where each element corresponds to the the biological or primary units that
are replicated in that assay element. The values in the inner list are
the colnames in the assay that are technical replicates.
The replicates function (noun) returns the colnames
from the sampleMap that were identified as replicates. It returns a
list of CharacterLists for each assay present in
the MultiAssayExperiment and an inner entry for each biological unit that
has replicate observations in that assay.
The mergeReplicates function is a house-keeping method
for a MultiAssayExperiment where only complete.cases are
returned. This by-assay operation averages replicate measurements
(by default) and columns are aligned by the row order in colData.
Users can provide their own function for merging replicates with the
simplify functional argument. Additional inputs ... are
sent to the 'simplify' function.
The mergeReplicates "ANY" method consolidates duplicate
measurements for rectangular data structures, returns object of the same
class (endomorphic). The ellipsis or ... argument allows the
user to provide additional arguments to the simplify functional
argument.
The longForm "ANY" class method, works with classes such as
ExpressionSet and
SummarizedExperiment as
well as matrix to provide a consistent long and skinny
DataFrame.
The hasRowRanges method identifies assays that support
a rowRanges
method and return a GRanges object.
The hasRowData method identifies assays that support a
rowData method and
return a DataFrame object.
Functions
hasRowData(MultiAssayExperiment): ThehasRowDatamethod identifies experiments that have arowDatamethod via direct testinghasRowData(ExperimentList): ThehasRowDatamethod identifies experiments that have arowDatamethod via direct testing
mergeReplicates
The mergeReplicates function makes use of the output from
replicated which will point out the duplicate measurements by
biological unit in the MultiAssayExperiment. This function will
return a MultiAssayExperiment with merged replicates. Additional
arguments can be provided to the simplify argument via the ellipsis
(...). For example, when replicates "TCGA-B" and "TCGA-A" are found in
the assay, the name of the first appearing replicate is taken (i.e., "B").
Note that a typical use case of merging replicates occurs when there are
multiple measurements on the same sample (within the same assay)
and can therefore be averaged.
longForm
The 'longForm' method takes data from the ExperimentList
in a MultiAssayExperiment and returns a uniform
DataFrame. The resulting DataFrame has columns indicating
primary, rowname, colname and value. This method can optionally include
columns of the MultiAssayExperiment colData named by colDataCols character
vector argument. (MultiAssayExperiment method only). The i argument
allows the user to specify the assay value for the
SummarizedExperiment assay function's i argument.
wideFormat
The wideFormat function returns standardized wide DataFrame
where each row represents a biological unit as in the colData.
Depending on the data and setup, biological units can be patients, tumors,
specimens, etc. Metadata columns are
generated based on the names produced in the wide format
DataFrame. These can be accessed via the
mcols() function.
See the wideFormat section for description of the colDataCols and
i arguments.
hasRowRanges
The hasRowRanges method identifies assays with associated ranged
row data by directly testing the method on the object. The result from the
test must be a GRanges class object to
satisfy the test.
getWithColData
The getWithColData function allows the user to conveniently extract
a particular assay as indicated by the i index argument. It
will also attempt to provide the
colData
along with the extracted object using the colData<- replacement
method when possible. Typically, this method is available for
SummarizedExperiment
and RaggedExperiment classes.
The setting of mode determines how the colData
is added. If mode="append", the MultiAssayExperiment
metadata is appended onto that of the SummarizedExperiment.
If any fields are duplicated by name, the values in the
SummarizedExperiment are retained, with a warning emitted if
the values are different. For mode="replace", the
MultiAssayExperiment metadata replaces that of the
SummarizedExperiment, while for mode="none",
no replacement or appending is performed.
rename*
The renamePrimary function allows the user to conveniently change the
actual names of the primary biological units as seen in
rownames(colData). renameColname allows the user to change the
names of a particular assay based on index i. i can either be
a single numeric or character value. See colnames<- method for
renaming multiple colnames in a MultiAssayExperiment.
splitAssays
The splitAssays method separates columns in each of the assays based
on the hitList input. The hitList can be generated using
the makeHitList helper function. To use the makeHitList
helper, the user should input a list of patterns that will match on the
column names of each assay. These matches should be mutually exclusive as
to avoid repetition of columns across assays. See the examples section.
Examples
example(MultiAssayExperiment)
#>
#> MltAsE> ## Run the example ExperimentList
#> MltAsE> example("ExperimentList")
#>
#> ExprmL> ## Create an empty ExperimentList instance
#> ExprmL> ExperimentList()
#> ExperimentList class object of length 0:
#>
#> ExprmL> ## Create array matrix and AnnotatedDataFrame to create an ExpressionSet class
#> ExprmL> arraydat <- matrix(data = seq(101, length.out = 20), ncol = 4,
#> ExprmL+ dimnames = list(
#> ExprmL+ c("ENST00000294241", "ENST00000355076",
#> ExprmL+ "ENST00000383706","ENST00000234812", "ENST00000383323"),
#> ExprmL+ c("array1", "array2", "array3", "array4")
#> ExprmL+ ))
#>
#> ExprmL> colDat <- data.frame(slope53 = rnorm(4),
#> ExprmL+ row.names = c("array1", "array2", "array3", "array4"))
#>
#> ExprmL> ## SummarizedExperiment constructor
#> ExprmL> exprdat <- SummarizedExperiment::SummarizedExperiment(arraydat,
#> ExprmL+ colData = colDat)
#>
#> ExprmL> ## Create a sample methylation dataset
#> ExprmL> methyldat <- matrix(data = seq(1, length.out = 25), ncol = 5,
#> ExprmL+ dimnames = list(
#> ExprmL+ c("ENST00000355076", "ENST00000383706",
#> ExprmL+ "ENST00000383323", "ENST00000234812", "ENST00000294241"),
#> ExprmL+ c("methyl1", "methyl2", "methyl3",
#> ExprmL+ "methyl4", "methyl5")
#> ExprmL+ ))
#>
#> ExprmL> ## Create a sample RNASeqGene dataset
#> ExprmL> rnadat <- matrix(
#> ExprmL+ data = sample(c(46851, 5, 19, 13, 2197, 507,
#> ExprmL+ 84318, 126, 17, 21, 23979, 614), size = 20, replace = TRUE),
#> ExprmL+ ncol = 4,
#> ExprmL+ dimnames = list(
#> ExprmL+ c("XIST", "RPS4Y1", "KDM5D", "ENST00000383323", "ENST00000234812"),
#> ExprmL+ c("samparray1", "samparray2", "samparray3", "samparray4")
#> ExprmL+ ))
#>
#> ExprmL> ## Create a mock RangedSummarizedExperiment from a data.frame
#> ExprmL> rangedat <- data.frame(chr="chr2", start = 11:15, end = 12:16,
#> ExprmL+ strand = c("+", "-", "+", "*", "."),
#> ExprmL+ samp0 = c(0,0,1,1,1), samp1 = c(1,0,1,0,1), samp2 = c(0,1,0,1,0),
#> ExprmL+ row.names = c(paste0("ENST", "00000", 135411:135414), "ENST00000383323"))
#>
#> ExprmL> rangeSE <- SummarizedExperiment::makeSummarizedExperimentFromDataFrame(rangedat)
#>
#> ExprmL> ## Combine to a named list and call the ExperimentList constructor function
#> ExprmL> assayList <- list(Affy = exprdat, Methyl450k = methyldat, RNASeqGene = rnadat,
#> ExprmL+ GISTIC = rangeSE)
#>
#> ExprmL> ## Use the ExperimentList constructor
#> ExprmL> ExpList <- ExperimentList(assayList)
#>
#> MltAsE> ## Create sample maps for each experiment
#> MltAsE> exprmap <- data.frame(
#> MltAsE+ primary = c("Jack", "Jill", "Barbara", "Bob"),
#> MltAsE+ colname = c("array1", "array2", "array3", "array4"),
#> MltAsE+ stringsAsFactors = FALSE)
#>
#> MltAsE> methylmap <- data.frame(
#> MltAsE+ primary = c("Jack", "Jack", "Jill", "Barbara", "Bob"),
#> MltAsE+ colname = c("methyl1", "methyl2", "methyl3", "methyl4", "methyl5"),
#> MltAsE+ stringsAsFactors = FALSE)
#>
#> MltAsE> rnamap <- data.frame(
#> MltAsE+ primary = c("Jack", "Jill", "Bob", "Barbara"),
#> MltAsE+ colname = c("samparray1", "samparray2", "samparray3", "samparray4"),
#> MltAsE+ stringsAsFactors = FALSE)
#>
#> MltAsE> gistmap <- data.frame(
#> MltAsE+ primary = c("Jack", "Bob", "Jill"),
#> MltAsE+ colname = c("samp0", "samp1", "samp2"),
#> MltAsE+ stringsAsFactors = FALSE)
#>
#> MltAsE> ## Combine as a named list and convert to a DataFrame
#> MltAsE> maplist <- list(Affy = exprmap, Methyl450k = methylmap,
#> MltAsE+ RNASeqGene = rnamap, GISTIC = gistmap)
#>
#> MltAsE> ## Create a sampleMap
#> MltAsE> sampMap <- listToMap(maplist)
#>
#> MltAsE> ## Create an example phenotype data
#> MltAsE> colDat <- data.frame(sex = c("M", "F", "M", "F"), age = 38:41,
#> MltAsE+ row.names = c("Jack", "Jill", "Bob", "Barbara"))
#>
#> MltAsE> ## Create a MultiAssayExperiment instance
#> MltAsE> mae <- MultiAssayExperiment(experiments = ExpList, colData = colDat,
#> MltAsE+ sampleMap = sampMap)
complete.cases(mae)
#> [1] TRUE TRUE TRUE FALSE
isEmpty(MultiAssayExperiment())
#> [1] TRUE
## renaming biological units (primary)
mae2 <- renamePrimary(mae, paste0("pt", 1:4))
colData(mae2)
#> DataFrame with 4 rows and 2 columns
#> sex age
#> <character> <integer>
#> pt1 M 38
#> pt2 F 39
#> pt3 M 40
#> pt4 F 41
sampleMap(mae2)
#> DataFrame with 16 rows and 3 columns
#> assay primary colname
#> <factor> <character> <character>
#> 1 Affy pt1 array1
#> 2 Affy pt2 array2
#> 3 Affy pt4 array3
#> 4 Affy pt3 array4
#> 5 Methyl450k pt1 methyl1
#> ... ... ... ...
#> 12 RNASeqGene pt3 samparray3
#> 13 RNASeqGene pt4 samparray4
#> 14 GISTIC pt1 samp0
#> 15 GISTIC pt3 samp1
#> 16 GISTIC pt2 samp2
## renaming observational units (colname)
mae2 <- renameColname(mae, i = "Affy", paste0("ARRAY", 1:4))
colnames(mae2)
#> CharacterList of length 4
#> [["Affy"]] ARRAY1 ARRAY2 ARRAY3 ARRAY4
#> [["Methyl450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
#> [["RNASeqGene"]] samparray1 samparray2 samparray3 samparray4
#> [["GISTIC"]] samp0 samp1 samp2
sampleMap(mae2)
#> DataFrame with 16 rows and 3 columns
#> assay primary colname
#> <factor> <character> <character>
#> 1 Affy Jack ARRAY1
#> 2 Affy Jill ARRAY2
#> 3 Affy Barbara ARRAY3
#> 4 Affy Bob ARRAY4
#> 5 Methyl450k Jack methyl1
#> ... ... ... ...
#> 12 RNASeqGene Bob samparray3
#> 13 RNASeqGene Barbara samparray4
#> 14 GISTIC Jack samp0
#> 15 GISTIC Bob samp1
#> 16 GISTIC Jill samp2
patts <- list(
normals = "TCGA-[A-Z0-9]{2}-[A-Z0-9]{4}-11",
tumors = "TCGA-[A-Z0-9]{2}-[A-Z0-9]{4}-01"
)
data("miniACC")
hits <- makeHitList(miniACC, patts)
## only turmors present
splitAssays(miniACC, hits)
#> A MultiAssayExperiment object of 5 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 5:
#> [1] tumors_RNASeq2GeneNorm: SummarizedExperiment with 198 rows and 79 columns
#> [2] tumors_gistict: SummarizedExperiment with 198 rows and 90 columns
#> [3] tumors_RPPAArray: SummarizedExperiment with 33 rows and 46 columns
#> [4] tumors_Mutations: matrix with 97 rows and 90 columns
#> [5] tumors_miRNASeqGene: SummarizedExperiment with 471 rows and 80 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files