Translate study identifiers from barcode to UUID and vice versa
Source:R/ID-translation.R
ID-translation.Rd
These functions allow the user to enter a character vector of
identifiers and use the GDC API to translate from TCGA barcodes to
Universally Unique Identifiers (UUID) and vice versa. These relationships
are not one-to-one. Therefore, a data.frame
is returned for all
inputs. The UUID to TCGA barcode translation only applies to file and case
UUIDs. Two-way UUID translation is available from 'file_id' to 'case_id'
and vice versa. Please double check any results before using these
features for analysis. Case / submitter identifiers are translated by
default, see the from_type
argument for details. All identifiers are
converted to lower case.
Arguments
- id_vector
character() A vector of UUIDs corresponding to either files or cases (default assumes case_ids)
- from_type
character(1) Either
case_id
orfile_id
indicating the type ofid_vector
entered (default"case_id"
)- to_type
character(1) The desired UUID type to obtain, can either be
"case_id"
(default) or"file_id"
- barcodes
character() A vector of TCGA barcodes
- filenames
character()
A vector of file names usually obtained from aGenomicDataCommons
query- slides
logical(1L)
DEPRECATED: Whether the provided file names correspond to slides typically with an.svs
extension. Note The barcodes returned correspond 1:1 with thefilename
inputs. Always triple check the output against the Genomic Data Commons Data Portal by searching the file name and comparing associated "Entity ID" with thesubmitter_id
given by the function.- id
character(1) A UUID whose history of versions is sought
- endpoint
character(1) Generally a constant pertaining to the location of the history api endpoint. This argument rarely needs to change.
Value
Generally, a data.frame
of identifier mappings
UUIDhistory: A data.frame
containting a list of associated UUIDs
for the given input along with file_change
status, data_release
versions, etc.
Details
Based on the file UUID supplied, the appropriate entity_id (TCGA barcode) is
returned. In previous versions of the package, the 'end_point' parameter
would require the user to specify what type of barcode needed. This is no
longer supported as entity_id
returns the appropriate one.
When providing slide file names, the function will only work if
all the provided files are slide files with an .svs
extension.
Examples
## Translate UUIDs >> TCGA Barcode
uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")
UUIDtoBarcode(uuids, from_type = "file_id")
#> file_id associated_entities.entity_submitter_id
#> 1 b4bce3ff-7fdc-4849-880b-56f2b348ceac TCGA-B0-5094-11A-01D-1421-08
#> 2 5ca9fa79-53bc-4e91-82cd-5715038ee23e TCGA-E9-A295-10A-01D-A16D-09
#> 3 b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382 TCGA-B0-5117-11A-01D-1421-08
UUIDtoBarcode("ae55b2d3-62a1-419e-9f9a-5ddfac356db4", from_type = "case_id")
#> case_id submitter_id
#> 1 ae55b2d3-62a1-419e-9f9a-5ddfac356db4 TCGA-B0-5117
UUIDtoBarcode("d85d8a17-8aea-49d3-8a03-8f13141c163b", "aliquot_ids")
#> portions.analytes.aliquots.aliquot_id portions.analytes.aliquots.submitter_id
#> 1 d85d8a17-8aea-49d3-8a03-8f13141c163b TCGA-CV-5443-01A-01D-1510-01
## Translate file UUIDs >> case UUIDs
uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")
UUIDtoUUID(uuids)
#> file_id cases.case_id
#> 1 5ca9fa79-53bc-4e91-82cd-5715038ee23e fec0da58-1047-44d2-b6d1-c18cceed43dc
#> 2 b4bce3ff-7fdc-4849-880b-56f2b348ceac 8aaa4e25-5c12-4ace-96dc-91aaa0c4457c
#> 3 b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382 ae55b2d3-62a1-419e-9f9a-5ddfac356db4
## Translate TCGA Barcode >> UUIDs
fullBarcodes <- c("TCGA-B0-5117-11A-01D-1421-08",
"TCGA-B0-5094-11A-01D-1421-08",
"TCGA-E9-A295-10A-01D-A16D-09")
sample_ids <- TCGAbarcode(fullBarcodes, sample = TRUE)
barcodeToUUID(sample_ids)
#> submitter_sample_ids sample_ids
#> 9 TCGA-B0-5117-11A b1116541-bece-4df3-b3dd-cec50aeb277b
#> 4 TCGA-B0-5094-11A 7519d7a8-c3ee-417b-9cfc-111bc5ad0637
#> 3 TCGA-E9-A295-10A e74183e1-f0b4-412a-8dac-a62d404add78
participant_ids <- c("TCGA-CK-4948", "TCGA-D1-A17N",
"TCGA-4V-A9QX", "TCGA-4V-A9QM")
barcodeToUUID(participant_ids)
#> submitter_id case_id
#> 1 TCGA-CK-4948 5d73b382-3da3-4220-890e-2095228bbe6c
#> 2 TCGA-D1-A17N 001e0309-9c50-42b0-9e38-347883ee2cd3
#> 4 TCGA-4V-A9QX 0050d8be-1db6-4c17-8bef-3ae2eaaa63ce
#> 3 TCGA-4V-A9QM 0be4fa90-0122-4b26-b35f-7b1a4a16e63b
library(GenomicDataCommons)
#>
#> Attaching package: ‘GenomicDataCommons’
#> The following object is masked from ‘package:stats’:
#>
#> filter
### Query CNV data and get file names
cnv <- files() |>
filter(
~ cases.project.project_id == "TCGA-COAD" &
data_category == "Copy Number Variation" &
data_type == "Copy Number Segment"
) |>
results(size = 6)
filenameToBarcode(cnv$file_name)
#> file_name
#> 1 SONGS_p_TCGAb36_SNP_N_GenomeWideSNP_6_D04_585408.grch38.seg.v2.txt
#> 2 BRIAR_p_TCGA_297_298_299_300_S_GenomeWideSNP_6_F09_1362378.grch38.seg.v2.txt
#> 3 TCGA-AA-3854-01A-01D-A91V-36.WholeGenome.RP-1657.cr.igv.reheader.seg.txt
#> 4 SERVO_p_TCGA_157_158_159_SNP_N_GenomeWideSNP_6_B12_831300.grch38.seg.v2.txt
#> 5 SERVO_p_TCGA_157_158_159_SNP_N_GenomeWideSNP_6_H03_831380.grch38.seg.v2.txt
#> 6 TCGA-CA-6717-01A-11D-A91X-36.WholeGenome.RP-1657.cr.igv.reheader.seg.txt
#> file_id
#> 1 945941d2-5072-41d2-aed0-a62bbb468ad6
#> 2 e80d30dc-2c81-4c15-a99b-c5134e97141f
#> 3 cf625270-4fd4-40ba-8071-c9ccb66cfe73
#> 4 1d009e51-ae0e-4b89-953c-28257f18073b
#> 5 28b375dd-6aa6-43a2-bba2-473a5e0512de
#> 6 3c38f65a-833c-43ac-bbf2-0dfee62e3970
#> samples.portions.analytes.aliquots.submitter_id
#> 1 TCGA-AA-3854-01A-01D-0903-01
#> 2 TCGA-QG-A5YW-10A-01D-A28F-01
#> 3 TCGA-AA-3854-10B-01D-A91V-36
#> 4 TCGA-D5-6926-01A-11D-1923-01
#> 5 TCGA-D5-6926-10A-01D-1923-01
#> 6 TCGA-CA-6717-10A-01D-A91X-36
#> samples.portions.analytes.aliquots.submitter_id
#> 1 TCGA-AA-3854-01A-01D-0903-01
#> 2 TCGA-QG-A5YW-10A-01D-A28F-01
#> 3 TCGA-AA-3854-01A-01D-A91V-36
#> 4 TCGA-D5-6926-01A-11D-1923-01
#> 5 TCGA-D5-6926-10A-01D-1923-01
#> 6 TCGA-CA-6717-01A-11D-A91X-36
### Query slides data and get file names
slides <- files() |>
filter(
~ cases.project.project_id == "TCGA-BRCA" &
cases.samples.sample_type == "Primary Tumor" &
data_type == "Slide Image" &
experimental_strategy == "Diagnostic Slide"
) |>
results(size = 3)
filenameToBarcode(slides$file_name, slides = TRUE)
#> Warning: The 'slides' argument is deprecated.
#> file_name
#> 1 TCGA-A2-A25D-01Z-00-DX1.41DADDB8-3E3F-4F8F-8BE7-C43F8FBCFD2B.svs
#> 2 TCGA-AR-A5QP-01Z-00-DX1.256FDB13-1F81-42DA-AF6E-8A94835550C1.svs
#> 3 TCGA-C8-A12P-01Z-00-DX1.670B5DE8-07B0-4E4C-93FA-FA3DFFCCE50D.svs
#> file_id entity_submitter_id entity_type
#> 1 decbdda7-e62a-4436-b233-28c5353d0f61 TCGA-A2-A25D-01Z-00-DX1 slide
#> 2 eccd20fd-f1f1-4d8f-9104-6e28cedb00f2 TCGA-AR-A5QP-01Z-00-DX1 slide
#> 3 c2c93798-a4df-47ff-a281-8960ae8c5c41 TCGA-C8-A12P-01Z-00-DX1 slide
#> case_id entity_id
#> 1 3b963d72-ba5c-467b-83c9-fbdb462510a3 443dd181-f3f3-4719-a408-acccc2f36a0d
#> 2 3c275152-d04b-440c-9621-2fc05ea977b6 3b1e7cea-af95-4d3b-a8e8-d684d53a0bae
#> 3 abdc76db-f85e-4337-a57e-6d098789da03 283b9b24-5f93-4a26-af2f-8a14446ca146
#> project.project_id samples.tumor_descriptor samples.tissue_type
#> 1 TCGA-BRCA Primary Tumor
#> 2 TCGA-BRCA Primary Tumor
#> 3 TCGA-BRCA Primary Tumor
## Get the version history of a BAM file in TCGA-KIRC
UUIDhistory("0001801b-54b0-4551-8d7a-d66fb59429bf")
#> uuid version file_change release_date
#> 1 0001801b-54b0-4551-8d7a-d66fb59429bf 1 superseded 2018-08-23
#> 2 b4bce3ff-7fdc-4849-880b-56f2b348ceac 2 released 2022-03-29
#> data_release
#> 1 12.0
#> 2 32.0