Skip to contents

These functions allow the user to enter a character vector of identifiers and use the GDC API to translate from TCGA barcodes to Universally Unique Identifiers (UUID) and vice versa. These relationships are not one-to-one. Therefore, a data.frame is returned for all inputs. The UUID to TCGA barcode translation only applies to file and case UUIDs. Two-way UUID translation is available from 'file_id' to 'case_id' and vice versa. Please double check any results before using these features for analysis. Case / submitter identifiers are translated by default, see the from_type argument for details. All identifiers are converted to lower case.

Usage

UUIDtoBarcode(id_vector, from_type = c("case_id", "file_id", "aliquot_ids"))

UUIDtoUUID(id_vector, to_type = c("case_id", "file_id"))

barcodeToUUID(barcodes)

filenameToBarcode(filenames, slides = FALSE)

UUIDhistory(id, endpoint = .HISTORY_ENDPOINT)

Arguments

id_vector

character() A vector of UUIDs corresponding to either files or cases (default assumes case_ids)

from_type

character(1) Either case_id or file_id indicating the type of id_vector entered (default "case_id")

to_type

character(1) The desired UUID type to obtain, can either be "case_id" (default) or "file_id"

barcodes

character() A vector of TCGA barcodes

filenames

character() A vector of file names usually obtained from a GenomicDataCommons query

slides

logical(1L) DEPRECATED: Whether the provided file names correspond to slides typically with an .svs extension. Note The barcodes returned correspond 1:1 with the filename inputs. Always triple check the output against the Genomic Data Commons Data Portal by searching the file name and comparing associated "Entity ID" with the submitter_id given by the function.

id

character(1) A UUID whose history of versions is sought

endpoint

character(1) Generally a constant pertaining to the location of the history api endpoint. This argument rarely needs to change.

Value

Generally, a data.frame of identifier mappings

UUIDhistory: A data.frame containting a list of associated UUIDs for the given input along with file_change status, data_release versions, etc.

Details

Based on the file UUID supplied, the appropriate entity_id (TCGA barcode) is returned. In previous versions of the package, the 'end_point' parameter would require the user to specify what type of barcode needed. This is no longer supported as entity_id returns the appropriate one.

When providing slide file names, the function will only work if all the provided files are slide files with an .svs extension.

Author

Sean Davis, M. Ramos

Examples

## Translate UUIDs >> TCGA Barcode

uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")

UUIDtoBarcode(uuids, from_type = "file_id")
#>                                file_id associated_entities.entity_submitter_id
#> 1 b4bce3ff-7fdc-4849-880b-56f2b348ceac            TCGA-B0-5094-11A-01D-1421-08
#> 2 5ca9fa79-53bc-4e91-82cd-5715038ee23e            TCGA-E9-A295-10A-01D-A16D-09
#> 3 b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382            TCGA-B0-5117-11A-01D-1421-08

UUIDtoBarcode("ae55b2d3-62a1-419e-9f9a-5ddfac356db4", from_type = "case_id")
#>                                case_id submitter_id
#> 1 ae55b2d3-62a1-419e-9f9a-5ddfac356db4 TCGA-B0-5117

UUIDtoBarcode("d85d8a17-8aea-49d3-8a03-8f13141c163b", "aliquot_ids")
#>   portions.analytes.aliquots.aliquot_id portions.analytes.aliquots.submitter_id
#> 1  d85d8a17-8aea-49d3-8a03-8f13141c163b            TCGA-CV-5443-01A-01D-1510-01

## Translate file UUIDs >> case UUIDs

uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")

UUIDtoUUID(uuids)
#>                                file_id                        cases.case_id
#> 1 5ca9fa79-53bc-4e91-82cd-5715038ee23e fec0da58-1047-44d2-b6d1-c18cceed43dc
#> 2 b4bce3ff-7fdc-4849-880b-56f2b348ceac 8aaa4e25-5c12-4ace-96dc-91aaa0c4457c
#> 3 b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382 ae55b2d3-62a1-419e-9f9a-5ddfac356db4

## Translate TCGA Barcode >> UUIDs

fullBarcodes <- c("TCGA-B0-5117-11A-01D-1421-08",
"TCGA-B0-5094-11A-01D-1421-08",
"TCGA-E9-A295-10A-01D-A16D-09")

sample_ids <- TCGAbarcode(fullBarcodes, sample = TRUE)

barcodeToUUID(sample_ids)
#>   submitter_sample_ids                           sample_ids
#> 9     TCGA-B0-5117-11A b1116541-bece-4df3-b3dd-cec50aeb277b
#> 4     TCGA-B0-5094-11A 7519d7a8-c3ee-417b-9cfc-111bc5ad0637
#> 3     TCGA-E9-A295-10A e74183e1-f0b4-412a-8dac-a62d404add78

participant_ids <- c("TCGA-CK-4948", "TCGA-D1-A17N",
"TCGA-4V-A9QX", "TCGA-4V-A9QM")

barcodeToUUID(participant_ids)
#>   submitter_id                              case_id
#> 1 TCGA-CK-4948 5d73b382-3da3-4220-890e-2095228bbe6c
#> 2 TCGA-D1-A17N 001e0309-9c50-42b0-9e38-347883ee2cd3
#> 4 TCGA-4V-A9QX 0050d8be-1db6-4c17-8bef-3ae2eaaa63ce
#> 3 TCGA-4V-A9QM 0be4fa90-0122-4b26-b35f-7b1a4a16e63b

library(GenomicDataCommons)
#> 
#> Attaching package: ‘GenomicDataCommons’
#> The following object is masked from ‘package:stats’:
#> 
#>     filter

### Query CNV data and get file names

cnv <- files() |>
    filter(
        ~ cases.project.project_id == "TCGA-COAD" &
        data_category == "Copy Number Variation" &
        data_type == "Copy Number Segment"
    ) |>
    results(size = 6)

filenameToBarcode(cnv$file_name)
#>                                                                      file_name
#> 1           SONGS_p_TCGAb36_SNP_N_GenomeWideSNP_6_D04_585408.grch38.seg.v2.txt
#> 2 BRIAR_p_TCGA_297_298_299_300_S_GenomeWideSNP_6_F09_1362378.grch38.seg.v2.txt
#> 3     TCGA-AA-3854-01A-01D-A91V-36.WholeGenome.RP-1657.cr.igv.reheader.seg.txt
#> 4  SERVO_p_TCGA_157_158_159_SNP_N_GenomeWideSNP_6_B12_831300.grch38.seg.v2.txt
#> 5  SERVO_p_TCGA_157_158_159_SNP_N_GenomeWideSNP_6_H03_831380.grch38.seg.v2.txt
#> 6     TCGA-CA-6717-01A-11D-A91X-36.WholeGenome.RP-1657.cr.igv.reheader.seg.txt
#>                                file_id
#> 1 945941d2-5072-41d2-aed0-a62bbb468ad6
#> 2 e80d30dc-2c81-4c15-a99b-c5134e97141f
#> 3 cf625270-4fd4-40ba-8071-c9ccb66cfe73
#> 4 1d009e51-ae0e-4b89-953c-28257f18073b
#> 5 28b375dd-6aa6-43a2-bba2-473a5e0512de
#> 6 3c38f65a-833c-43ac-bbf2-0dfee62e3970
#>   samples.portions.analytes.aliquots.submitter_id
#> 1                    TCGA-AA-3854-01A-01D-0903-01
#> 2                    TCGA-QG-A5YW-10A-01D-A28F-01
#> 3                    TCGA-AA-3854-10B-01D-A91V-36
#> 4                    TCGA-D5-6926-01A-11D-1923-01
#> 5                    TCGA-D5-6926-10A-01D-1923-01
#> 6                    TCGA-CA-6717-10A-01D-A91X-36
#>   samples.portions.analytes.aliquots.submitter_id
#> 1                    TCGA-AA-3854-01A-01D-0903-01
#> 2                    TCGA-QG-A5YW-10A-01D-A28F-01
#> 3                    TCGA-AA-3854-01A-01D-A91V-36
#> 4                    TCGA-D5-6926-01A-11D-1923-01
#> 5                    TCGA-D5-6926-10A-01D-1923-01
#> 6                    TCGA-CA-6717-01A-11D-A91X-36

### Query slides data and get file names

slides <- files() |>
    filter(
        ~ cases.project.project_id == "TCGA-BRCA" &
        cases.samples.sample_type == "Primary Tumor" &
        data_type == "Slide Image" &
        experimental_strategy == "Diagnostic Slide"
    ) |>
    results(size = 3)

filenameToBarcode(slides$file_name, slides = TRUE)
#> Warning: The 'slides' argument is deprecated.
#>                                                          file_name
#> 1 TCGA-A2-A25D-01Z-00-DX1.41DADDB8-3E3F-4F8F-8BE7-C43F8FBCFD2B.svs
#> 2 TCGA-AR-A5QP-01Z-00-DX1.256FDB13-1F81-42DA-AF6E-8A94835550C1.svs
#> 3 TCGA-C8-A12P-01Z-00-DX1.670B5DE8-07B0-4E4C-93FA-FA3DFFCCE50D.svs
#>                                file_id     entity_submitter_id entity_type
#> 1 decbdda7-e62a-4436-b233-28c5353d0f61 TCGA-A2-A25D-01Z-00-DX1       slide
#> 2 eccd20fd-f1f1-4d8f-9104-6e28cedb00f2 TCGA-AR-A5QP-01Z-00-DX1       slide
#> 3 c2c93798-a4df-47ff-a281-8960ae8c5c41 TCGA-C8-A12P-01Z-00-DX1       slide
#>                                case_id                            entity_id
#> 1 3b963d72-ba5c-467b-83c9-fbdb462510a3 443dd181-f3f3-4719-a408-acccc2f36a0d
#> 2 3c275152-d04b-440c-9621-2fc05ea977b6 3b1e7cea-af95-4d3b-a8e8-d684d53a0bae
#> 3 abdc76db-f85e-4337-a57e-6d098789da03 283b9b24-5f93-4a26-af2f-8a14446ca146
#>   project.project_id samples.tumor_descriptor samples.tissue_type
#> 1          TCGA-BRCA                  Primary               Tumor
#> 2          TCGA-BRCA                  Primary               Tumor
#> 3          TCGA-BRCA                  Primary               Tumor

## Get the version history of a BAM file in TCGA-KIRC
UUIDhistory("0001801b-54b0-4551-8d7a-d66fb59429bf")
#>                                   uuid version file_change release_date
#> 1 0001801b-54b0-4551-8d7a-d66fb59429bf       1  superseded   2018-08-23
#> 2 b4bce3ff-7fdc-4849-880b-56f2b348ceac       2    released   2022-03-29
#>   data_release
#> 1         12.0
#> 2         32.0