#cancerdata

2017-01-07

Sean Davis (09:08:07): > @Sean Davis has joined the channel

Lucas Schiffer (09:08:07): > @Lucas Schiffer has joined the channel

Sean Davis (09:08:07): > set the channel description: Discuss tools for accessing and using cancer datasets in Bioconductor

Levi Waldron (09:08:08): > @Levi Waldron has joined the channel

Aedin Culhane (09:08:08): > @Aedin Culhane has joined the channel

Marcel Ramos Pérez (09:08:08): > @Marcel Ramos Pérez has joined the channel

Tim Triche (09:08:08): > @Tim Triche has joined the channel

Vince Carey (09:08:08): > @Vince Carey has joined the channel

Sean Davis (09:09:07): > added an integration to this channel: github

Unknown User (09:12:12): Unknown User (12:16:11):

2017-01-08

Unknown User (11:36:21): Unknown User (15:52:05): Unknown User (15:55:05):

2017-01-09

Unknown User (18:47:21): Unknown User (18:49:21): Unknown User (21:01:57): Unknown User (21:06:18):

2017-01-23

Unknown User (10:49:16): Unknown User (11:44:46): Unknown User (12:18:13): Unknown User (13:53:12): Unknown User (14:02:21): Unknown User (19:20:46): Unknown User (19:22:06): Unknown User (20:25:04): Unknown User (20:30:08): Unknown User (20:37:09): Unknown User (20:58:08): Unknown User (21:58:01):

2017-01-24

Unknown User (04:15:26): Unknown User (21:53:19): Unknown User (22:00:38):

2017-01-25

Unknown User (11:07:46): Unknown User (20:38:27): Unknown User (20:49:15): Unknown User (20:49:40):

2017-01-26

Unknown User (20:50:27):

2017-01-27

Unknown User (06:30:49): Unknown User (06:31:16):

2017-01-30

Unknown User (12:40:41): Unknown User (12:41:32): Unknown User (12:49:30): Unknown User (17:50:17): Unknown User (17:53:14): Unknown User (18:09:05): Unknown User (19:13:10): Unknown User (20:13:00): Unknown User (21:11:45): Unknown User (22:02:03): Unknown User (22:10:10): Unknown User (22:20:50): Unknown User (22:23:18): Unknown User (22:23:19): Unknown User (23:57:14):

2017-01-31

Unknown User (00:15:55): Unknown User (00:40:55): Unknown User (00:43:28): Unknown User (00:55:08): Unknown User (01:17:48): Unknown User (01:26:11): Unknown User (01:26:45): Unknown User (01:26:45): Unknown User (11:33:33): Unknown User (11:41:27): Unknown User (11:44:31):

2017-02-02

Unknown User (17:51:10):

2017-02-04

Unknown User (15:59:22): Unknown User (17:09:25): Unknown User (19:36:06): Unknown User (20:03:29):

2017-02-05

Unknown User (09:55:36): Unknown User (11:10:13): Unknown User (11:17:46): Unknown User (11:34:44): Unknown User (11:34:55): Unknown User (11:35:36): Unknown User (11:51:05): Unknown User (11:51:35):

2017-02-06

Unknown User (16:54:23): Unknown User (17:17:59): Unknown User (17:29:50): Unknown User (18:39:04): Unknown User (18:48:35): Unknown User (18:57:41): Unknown User (19:05:36): Unknown User (19:13:35): Unknown User (19:14:31): Unknown User (19:14:31):

2017-02-07

Unknown User (05:20:32): Unknown User (06:07:09): Unknown User (06:08:06): Unknown User (06:08:07): Unknown User (14:10:04): Unknown User (14:59:10): Unknown User (15:03:18): Unknown User (15:04:15): Unknown User (15:04:21):

2017-02-12

Unknown User (18:28:22): Unknown User (18:30:24): Unknown User (19:23:28): Unknown User (20:22:57): Unknown User (21:10:51): Unknown User (21:11:13): Unknown User (21:11:36): Unknown User (21:11:36):

2017-02-13

Unknown User (13:16:28): Unknown User (16:57:03): Unknown User (17:59:22): Unknown User (18:36:46):

2017-02-14

Unknown User (16:49:48): Unknown User (17:19:04): Unknown User (17:20:23):

2017-02-15

Unknown User (15:19:23): Unknown User (15:26:43):

2017-02-16

Unknown User (12:37:04): Unknown User (18:58:58):

2017-02-17

Unknown User (15:49:18): Unknown User (15:51:50): Unknown User (15:52:34): Unknown User (15:52:35): Unknown User (16:15:00): Unknown User (16:15:00):

2017-02-20

Unknown User (17:04:25):

2017-02-21

Unknown User (18:17:01): Unknown User (18:17:39): Unknown User (18:18:22): Unknown User (18:26:59): Unknown User (18:26:59): Unknown User (18:33:01):

2017-02-22

Unknown User (08:13:36):

2017-02-23

Unknown User (15:44:38): Unknown User (16:35:16): Unknown User (16:35:54): Unknown User (16:35:54):

2017-02-27

Unknown User (20:29:36):

2017-03-01

Unknown User (08:42:22): Unknown User (23:13:20):

2017-03-02

Unknown User (14:42:34): Unknown User (14:42:34): Unknown User (14:45:11): Unknown User (16:20:51): Unknown User (16:22:38):

2017-03-03

Unknown User (11:59:58):

2017-03-04

Unknown User (08:58:20): Unknown User (09:05:34): Unknown User (09:11:03):

2017-03-15

Unknown User (16:15:34): Sean Davis (22:43:06): > http://biorxiv.org/content/early/2017/03/15/117200

2017-03-17

Unknown User (22:53:32):

2017-03-24

Unknown User (18:53:21):

2017-04-03

Unknown User (12:29:42): Unknown User (12:32:11):

2017-05-01

Aedin Culhane (11:34:31): > @Sean DavisShould we do a BoF on GDC, cancer genomics, ?

Marcel Ramos Pérez (12:03:40): > I would certainly be interested

Marcel Ramos Pérez (12:03:45): > Good idea

Sean Davis (14:12:27): > I had planned on doing a tutorial, but a BOF could work, also. However, I have recently learned that the GDC will likely not be accepting most NCI genomics datasets, at least in the short term. It appears that NCI needs to create a separate team to deal with submissions, so we are probably talking about a year or two before NCI has a good approach. In the meantime, NCI is working with NCBI to fund deposits into the standard repos (SRA, GEO, etc.). Very interesting to watch.

Sean Davis (14:12:57): > Broadly, cancer genomics might be a good topic for a BOF, though.

2017-05-04

Kasper D. Hansen (12:51:02): > @Kasper D. Hansen has joined the channel

2017-05-10

Artem Sokolov (14:28:09): > @Artem Sokolov has joined the channel

2017-05-19

Aedin Culhane (14:10:30): > @Sean Davisthere is always a hiccup:wink:

Aedin Culhane (14:10:52): > Lets put together a cancer genomics BoF. Anyone else interested?

2017-05-22

Ludwig Geistlinger (05:10:47): > @Ludwig Geistlinger has joined the channel

Ludwig Geistlinger (05:11:07): > Yes, I would be in.

2017-05-23

Aedin Culhane (12:29:49): > @ludwig,@Sean DavisWant me to put form together and post here?

Ludwig Geistlinger (12:36:45): > would appreciate that, aedin

Sean Davis (14:14:29): > Perfect! Sorry to be so slow on the uptake.

2017-07-24

hcorrada (07:58:44): > @hcorrada has joined the channel

2017-08-16

Steve Tsang (19:01:31): > @Steve Tsang has joined the channel

2017-10-06

David Jenkins (16:34:10): > @David Jenkins has joined the channel

2017-11-28

Stephanie Hicks (14:20:40): > @Stephanie Hicks has joined the channel

Simina Boca (14:21:39): > @Simina Boca has joined the channel

2017-11-29

Matthew McCall (09:32:12): > @Matthew McCall has joined the channel

2018-03-14

Davide Risso (10:48:15): > @Davide Risso has joined the channel

2018-06-25

Elana Fertig (16:06:13): > @Elana Fertig has joined the channel

2018-07-25

Neke Ibeh (09:32:19): > @Neke Ibeh has joined the channel

2018-07-26

Andrea Mcewan (11:16:21): > @Andrea Mcewan has joined the channel

2018-12-14

Rena Yang (12:45:36): > @Rena Yang has joined the channel

2018-12-30

Evan Biederstedt (14:38:47): > @Evan Biederstedt has joined the channel

2019-01-24

Ming Tang (19:40:28): > @Ming Tang has joined the channel

2019-03-17

gamzeaydilek (07:17:57): > @gamzeaydilek has joined the channel

2019-04-03

Tao Liu (17:23:23): > @Tao Liu has joined the channel

2019-04-17

Craig (22:49:57): > @Craig has joined the channel

2019-04-23

darlanminussi (12:54:13): > @darlanminussi has joined the channel

2019-05-08

Sean Davis (10:05:30): > removed an integration from this channel: github

2019-05-09

Vince Carey (07:49:25): > Great idea to have a channel on this topic. I would welcome comments on the app athttps://vjcitn.shinyapps.io/ca43k/that gives access to 43000 cancer-related transcriptomes not present in TCGA. Note the ‘about’ tab which gives details on provenance of this representation of the data.

Kasper D. Hansen (09:39:40): > This looks very similar to what we have been doing with recount

Kasper D. Hansen (09:40:29): > Things I would like to improve (1) better metadata. For an application like this, search is everything. It is easy to group by PMID, but is - I think - harder to use unless you’re coming at it from the direction of knowing which studies you want

Kasper D. Hansen (09:41:26): > (2) I really think we should use stuff like ExperimentHub to serve this data. I would love to make a custom hub for recount and use that to let users browse and download. I don’t know if that is feasible with current code. We’re about to work on the next release of recount and I want to do this

Kasper D. Hansen (09:41:57): > (3) It is unclear to me where the data is hosted for your restSEs - perhaps that is clear with more digging around (minor)

Vince Carey (10:04:24): > Thanks@Kasper D. Hansen. The metadata concept is central to this app. Apropos (1), the search in the search box is conducted over all titles, abstracts, field names and field values of the sample.attributes component. This was based on harvesting a snapshot from Sean Davis’ Omicidx service, and the ssrch package utilities process the snapshot to generate environments that are used in the search element of ca43k. The organization is not by PMID but by SRA study.accession. Apropos (2) I would agree that I need to connect to ExperimentHub. But notice, apropos (3), that the data are ‘hosted’ via the HDF Scalable Data Service API. The SE that comes back has a delayed assay. The relationship to recount should also be discussed. The quantifications are generated by salmon in Sean’s BigRNA project, and one can generate an SE that seamlessly combines a number of SRA studies. Because there is no guarantee that the colData for these different studies use, e.g., the same field names, we do not programmatically generate a colData but place separate colData components in the metadata element of the returned restfulSE.

Kasper D. Hansen (10:05:54): > Yeah, so the metadata is really hard to do anything about. It is basically about making metadata from repositories more standardized. We do the same as you have (I think), but we are also harvesting GEO which sometimes have more details. I don’t know about Omicidx - I’ll look at that

Kasper D. Hansen (10:06:32): > Having said that, as a user, the better the metadata, the easier it will be to search

Kasper D. Hansen (10:07:12): > So when I say “Like to improve” (1)+(2) are actually wishlist items for recount as well.

Vince Carey (10:09:11): > Omicidx can be used via SRAdbV2 ingithub.com/seandavi

Vince Carey (10:11:34): > Apropos metadata you are probably familiar with thishttps://www.biorxiv.org/content/10.1101/618025v1that addresses metadata improvement for brain.

Kasper D. Hansen (10:11:47): > oh yes

Kasper D. Hansen (10:21:03): > We have had issues with SRAdbv2 (but it is also work in progress) for examplehttps://github.com/seandavi/SRAdbV2/issues/16

Kasper D. Hansen (10:21:20): > Is omicidx a replacement somehow?

Vince Carey (10:22:22): > No, I think omicidx can be used with various clients, but SRAdbV2 is the R interface. I think SRAdbV2 has been pretty stable lately but I will look at your issue.

Vince Carey (10:24:31): > Getting the query syntax exactly right for elasticsearch/lucene seems a bit delicate, I would agree. Working at the level of the swagger to see what works and then checking the mapping into the client language is probably the way to proceed.

Vince Carey (10:26:44): > language-agnostic query formulation/resolution is challenging

Kasper D. Hansen (10:27:40): > I am mostly passing stuff on re. this. I am not actually the one doing these queries. We basically need a way to retrive information from SRA for collecting data ids / paths / etc prior to doing a big mapping excersice

Sean Davis (14:33:12): > OmicIDX will soon contain SRA, Biosample, and GEO and be updated monthly. SRAdbV2 is a waypoint to the completed project which will be renamed and have expanded functionality.

Sean Davis (14:35:57): > It turns out that the infrastructure to support search, etc., applies equally well to GEO, SRA, and Biosample so I am just doing everything. We also have some hand-curated samples (on the order of 7k RNA-seq and ~10k metagenome samples) that I need to pull in.

Kasper D. Hansen (14:36:18): > how soon is soon?

Sean Davis (14:37:38): > Which part?

Sean Davis (14:38:34): > https://omicidx-test.cancerdatasci.orgis the home of the current API. The R package is a bit behind.

Sean Davis (14:47:28): > rough use in R: > > .base = '[https://omicidx-test.cancerdatasci.org](https://omicidx-test.cancerdatasci.org)' > > q = "cancer AND sample.taxon_id:9606" > > search = function(q, size=100) { > res = httr::GET(sprintf("%s/experiments/search?q=%s&size=%d", > .base, URLencode(q), size)) > jsonlite::fromJSON(httr::content(res, as='text'), simplifyDataFrame = TRUE)$hits$`*d*` > } > > res = search(q) > > resis a dataframe with some nested columns (sample, study, runs).res$sampleis the sample data frame for those experiments.res$studyis the study data frame for those experiments.

2019-05-16

Sridhar N (11:21:13): > @Sridhar N has joined the channel

2019-05-23

dave_sevenbridges (11:03:12): > @dave_sevenbridges has joined the channel

2019-06-19

ZainabAlTaie (13:45:55): > @ZainabAlTaie has joined the channel

2019-06-20

Sanjeev Sariya (17:36:29): > @Sanjeev Sariya has joined the channel

Marko Zecevic (19:39:06): > @Marko Zecevic has joined the channel

2019-06-24

Sonali (09:41:53): > @Sonali has joined the channel

2019-06-26

Junhao Li (13:27:15): > @Junhao Li has joined the channel

2019-07-02

Grégoire de Streel (10:16:21): > @Grégoire de Streel has joined the channel

2019-07-05

Kevin Missault (05:31:20): > @Kevin Missault has joined the channel

2019-08-02

Jared Andrews (10:03:12): > @Jared Andrews has joined the channel

2019-11-04

Izaskun Mallona (07:56:58): > @Izaskun Mallona has joined the channel

2019-12-04

Jonathan Carroll (17:38:29): > @Jonathan Carroll has joined the channel

2019-12-12

Tim Triche (17:50:15): > @Tim Triche has left the channel

2019-12-23

Princy Parsana (11:24:09): > @Princy Parsana has joined the channel

2020-01-16

Nitin Sharma (07:27:10): > @Nitin Sharma has joined the channel

2020-02-05

Sean Davis (16:08:17): > archived the channel