#cbioportal-data-r
2019-01-21
U04K8VD5U (13:04:34): > @U04K8VD5U has joined the channel
U04KXP98M (13:04:34): > @U04KXP98M has joined the channel
UFJ9H52N4 (13:04:34): > @UFJ9H52N4 has joined the channel
2019-01-23
UFM384EEB (07:24:15): > @UFM384EEB has joined the channel
U04K8VD5U (07:25:59): > <@UFM384EEB><@UFJ9H52N4>: welcome to cBioPortal:slightly_smiling_face:
U04K8VD5U (07:27:11): > Should we have a planning meeting to kick off the collaboration?
UFJ9H52N4 (09:14:46): > Thanks<@U04K8VD5U>! Yes that’d be great. How about next week, maybe Thursday?
U04K8VD5U (12:54:09): > <@UFJ9H52N4>sounds good. My schedule is pretty open before 4:30. Would 3/31 work for everyone<!channel>?
U04KXP98M (16:47:58): > Thursday 1/31 before 4:30 works for me
UFJ9H52N4 (17:50:09): > How about 2pm Thursday 3/31? I just checked, Martin Morgan would be able & interested in joining a kick-off meeting then.
U04K8VD5U (18:15:05): > Sounds good. Could you send a calendar invite?<@UFJ9H52N4>
2019-01-30
UFM384EEB (17:08:29): > Hi JJ!<@U04K8VD5U>We’d prefer to meet in person (either at CUNY or MSK). Do you have a meeting location set up? If so, we can head down there.
U04K8VD5U (17:56:24): > <@UFM384EEB>We can meet in my office. Zuckerman 577.
U04K8VD5U (17:56:51): > We could also meet before your meetup presentation on 7th if you’d like to save one trip:slightly_smiling_face:
2019-01-31
UFJ9H52N4 (11:34:24): > How about we meet online today, since Martin Morgan is confirmed for today, and see each other in person next week. Would be good to avoid the extra trip with this cold today.
U04K8VD5U (11:41:48): > <!channel>sounds good!
U04KXP98M (11:50:42): > :+1:
UFJ9H52N4 (14:02:18): > I take that back, we’re in the Hangout
2019-03-13
U04K8VD5U (12:50:56): > @U04K8VD5U has left the channel
2019-12-24
UFM384EEB (17:53:55): > I’m having issues finding micro RNA data for particular study identifiers. For example, I am unable to obtaingbm_tcga_pub_mirna
andgbm_tcga_pub_mirna_median_Zscores
molecular profile data. Is there a specific endpoint for this data or is it still cooking?<@U04KXP98M>Thanks
U04KXP98M (18:47:17): > good question - i think micro RNA is still cooking. CC<@U060QQ9ML>?
U060QQ9ML (18:47:19): > @U060QQ9ML has joined the channel
2020-01-15
U04KXP98M (11:57:29): > hey<@UFM384EEB>- we are trying to come up with ideas for Google Summer of Code. Do you think there’s someone e.g. you or someone else in the bioconductor community that would be interested in improving the docs for the R client? And perhaps fix some other outstanding bugs? We have something here for Python that we can maybe use for R?https://github.com/mskcc/cbsp-hackathon/blob/master/0-introduction/cbsp_hackathon.ipynb
U04KXP98M (11:57:55): > our list for GSoC ideas is here;https://github.com/cBioPortal/GSoC/issues
U04KXP98M (12:00:10): > Maybe we can post something in the bioconductor community slack once we have worked out the project idea a bit more to see if anybody would be interested in co-mentoring? It could be a good opportunity to improve bioconductor & cBioPortal collaboration
U04KXP98M (12:00:39): > Let me know if you think it’s a good idea
2020-02-04
UFM384EEB (15:12:23): > We’ve added our ideas here :https://github.com/cBioPortal/GSoC/issues/83 - Attachment: #83 [r-client] cBioPortalData: Example Bioconductor Workflow > Background:
> The cBioPortalData
R client opens up cBio Portal data to alternative analyses platforms such as Bioconductor and R. Bioconductor provides workflows for demonstrating use-cases for particular packages, analyses, visualizations, and technologies including DESeq2, edgeR, limma, GGviz, GenomicRanges.
> Goal:
> To create Bioconductor workflows and iPython notebooks demonstrating the use of cBioPortalData r-client and a general Bioconductor approach to data analysis.
> Approach: > > • Provide a template workflow using cBioPortalData á la Bioconductor Workflows (package) > • Implement exploratory visualizations using MultiAssayExperiment (e.g., from trackViewer
) > > Need skills:
> R (analysis and pkg dev), Bioconductor
> Possible mentors:
> @LiNk-NY @lwaldron
2020-05-18
U04KXP98M (14:03:48): > added a bioconda recipe for cBioPortalData:https://github.com/bioconda/bioconda-recipes/pull/22187 - Attachment: #22187 Add recipe for cBioPortalData from Bioconductor > Add cBioPortalData recipe https://www.bioconductor.org/packages/release/bioc/html/cBioPortalData.html
U04KXP98M (21:31:20): > got some start to the r workshop:https://cbioportal.github.io/2020-cbioportal-r-workshop/. This is mostly showing how to get data and doing some basic plot to not overwhelm people. Thinking next section will focus more on multiassayexperiment features
2020-05-23
UFJ9H52N4 (15:47:15): > Really cool,<@U04KXP98M>! Did you find the multiassayexperiment features easy enough to figure out?
2020-05-25
U04KXP98M (12:21:36): > Thanks<@UFJ9H52N4>! Yeah the MAE features are awesome! MAE really clicked for me more recently when I was working on some single cell expression analysis in R. The cBioPortalData interface is so useful as well! You and Marcel have done an excellent job on it. I think it really shines when using expression data from multiple molecular profiles. Might be useful to have some data structure like MAE in JavaScript as well:slightly_smiling_face:I have a ton of ideas after using it more. Specifically for mutation data. E.g. it would be nice to integrate some more annotations in it. Like for the values, maybe a binary matrix of whether something is oncogenic in OncoKB or CIVIC might be neat. Similarly some indication what type of mutation it is e.g. inframe / missense etc. Also rowData on which genes are TSG vs Oncogenes. Endless possibilities really > > I think for the webinar it will prolly be hard to dive into MAE fully b/c of limited time, so it’s mostly gonna be an introduction to the data structure with some example analyses and pointers to resources people can use to learn more. I’m thinking after the webinar to work on making more example analyses using cBioPortalData and convert thehttps://cbioportal.github.io/2020-cbioportal-r-workshop/to a repo showing all kinds of use cases that people can easily reuse. I think it has a ton of potential given that so many people are already running cBioPortal locally at their institution
2020-05-28
U04KXP98M (18:15:41): > <@UFM384EEB>quick question from the mailing list. For cBioPortalData to point to a different unauthenticated API is the recommended way to do it like this? > > .cBioPortal <- setClass("cBioPortal", contains = "Service") > > cBioPortalBetaInstance <- function() { > .cBioPortal( > Service( > service = "cBioPortal", > host = "[beta.cbioportal.org](http://beta.cbioportal.org)", > config = httr::config(ssl_verifypeer = 0L, ssl_verifyhost = 0L, > http_version = 0L), > authenticate = FALSE, > api_url = "[https://beta.cbioportal.org/api/api-docs](https://beta.cbioportal.org/api/api-docs)", > package = "cBioPortalData", > schemes = "http" > ) > ) > } > > cbioBeta <- cBioPortalBetaInstance() >
> Or is there a shorthand to pass the API URL?
U04KXP98M (18:16:01): > And should they e.g. be clearing their cache? Maybe it’s fine if they have different study names?
2020-05-29
UFM384EEB (08:59:04): > <@U04KXP98M>, it’s hard-coded into the package. Yes, they can modify the Service class themselves or create a beta one like above
U04KXP98M (09:14:33): > :ok_hand:thanks<@UFM384EEB>!
U04KXP98M (09:14:53): > I submitted a PR, let me know if something like this would work:https://github.com/waldronlab/cBioPortalData/pull/16. I still need to handle a few more cases - Attachment: #16 Allow pointing to different API > Need to handle a few more cases: > > ☐ using different port than 80 (e.g. 8080) > ☐ check if http works as well > ☐ missing protocol (http / https)
U04KXP98M (09:17:13): > also could you assign this one to me:https://github.com/waldronlab/cBioPortalData/issues/17 - Attachment: #17 How to handle using API token > cBioPortal provides a way to connect to authenticated portals using an API token. I’m guessing we can pass the token to rapiclient somehow. This probably isn’t too tricky. Volunteering myself to solve this
U04KXP98M (10:11:35): > <@UFM384EEB>thanks for the quick review & update. Made a comment re that it’s not so much subdomain but it could be any host really:https://github.com/waldronlab/cBioPortalData/pull/16/files#r432508104
U04KXP98M (10:13:46): > this will also be useful for allowing us to run tests on CI. E.g. we can spin up a local version of cBioPortal in CircleCI on the Pull Request itself and run a set ofcBioPortalData
functions that point to localhost
U04KXP98M (10:15:40): > i think thesubdomain
parameter is only useful for us (i imagine not many people are running multiple instances on their domain). So maybe better to have ahostname
parameter instead?
U04KXP98M (10:15:51): > https://github.com/waldronlab/cBioPortalData/commit/9b95f2e7b111a9de977577f225d7c8b6d584bde4#diff-5945a882fb99ade6194fb828fb5ede7eR103-R105
U04KXP98M (10:15:57): > also thanks for fixing the error i made:slightly_smiling_face:
UFM384EEB (10:19:26): > Cool I’ll make that change in a sec
U04KXP98M (10:21:06): > thanks so much!
U04KXP98M (10:23:21): > How does it work with the versioning btw, does it get pushed to CRAN automatically? I’m not so familiar with R package publishing. I guess the feature would eventually end up in BioConductor v3.12. Or are there post-release updates to Bioconductor v3.11 as well?
UFM384EEB (10:59:50): > Bioconductor has a separate repository located atgit.bioconductor.org. Any changes to a package’s API usually go into devel Bioconductor 3.12 first but I can also push the changes to release 3.11. It takes about a day for the package to get propagated into the list of packages. In the meantime, you can use the GitHub version.
U04KXP98M (11:05:54): > Great! Thanks for the explanation. Makes sense to put it in 3.12 I think. Interesting to know that the releases aren’t static. Is there like a patch version identifier or something? E.g. 3.11.patch
UFM384EEB (11:09:56): > No, we use semantic versioning on the package itself so any changes are picked up from there, the latest release version is2.0.4
UFJ9H52N4 (12:24:01): > Unless my memory is failing the last release was 3.12 :).
UFJ9H52N4 (12:25:15): > In fact it was failing me!
U04KXP98M (12:30:17): > Oh i see, so how does that work with e.g. the amazon images or the docker images for 3.11? When do these get updated? > > And another question. I guess it is the developer of a package who decides whether it is a bug fix that should go into bioconductor 3.11 or whether the package’s API has changed and it should go into 3.12? - Attachment: Attachment > No, we use semantic versioning on the package itself so any changes are picked up from there, the latest release version is 2.0.4
UFM384EEB (12:34:04): > The images do not have any packages installed. They only contain the system dependencies needed. Only critical bug fixes go back into the release branch (3.11) other things like API updates go into the devel branch.
UFM384EEB (12:34:48): > Of course the developer can decide to change a lot of things in 3.11 but it wouldn’t result in a stable package
U04KXP98M (12:43:02): > i see, ok makes sense:+1:- Thanks for the explanation!
2020-06-30
UFJ9H52N4 (10:59:35): > <!channel>want to discuss now, using Slack?
U04KXP98M (10:59:43): > sure
U04K8VD5U (11:00:00): > @U04K8VD5U has joined the channel
UFJ9H52N4 (11:00:05): > Let me ping Marcel on our team, in case he doesn’t see here
U04KXP98M (11:00:05): > <@U04K8VD5U>:point_up:
UFM384EEB (11:00:21): > Yeah I’m here
UFJ9H52N4 (11:00:29): > Ah OK great!
Unknown User (11:00:37): > Unknown message type unknown
UFJ9H52N4 (11:30:46): > lwaldron.research@gmail.com, “community-bioc” team
UFJ9H52N4 (11:32:34): > cBioPortalData-r
U04KXP98M (11:34:52): > ~4K users / day ofcbioportal.organd from the webinar: we had 581 total unique viewers and i think we averaged around 400 through most of it
USLACKBOT (11:45:20): > WaldronLabhas joined this channel by invitation fromcBioPortal.
U1LCB8WEA (11:45:21): > @U1LCB8WEA has joined the channel
USLACKBOT (11:45:23): > community-biochas joined this channel by invitation fromcBioPortal.
Levi Waldron (11:45:23): > @Levi Waldron has joined the channel
U04K8VD5U (11:46:11): > Yay! It feels like multiple universes connected
Sean Davis (11:48:57): > @Sean Davis has joined the channel
U1LCB8WEA (11:51:26) (in thread): > Yes!! I love it.
U04KXP98M (11:53:34): > <@UFM384EEB>i added an issue here:https://github.com/waldronlab/cBioPortalData/issues/20. I’m thinking to make a Rmd file of all those commands used in supplementary. We can worry later about where the host the Rmd file, but if you have a preference let me know. Seems like there was already a main repo that contained all the code for the figures in the manuscript, so might make sense to add it there - Attachment: #20 Add CI test for manuscript > ☐ Figure S2B > ☐ Other figures: S2A, S2C + S3 > ☐ Main figures (different repo, might be able to reuse) > > We can run the CI test on cbioportal/cbioportal using GitHub actions. It will pull a Rmd file with all the figures/supplementary code from somewhere (maybe this repo waldronlab/cBioPortalData?) and make sure it can finish within reasonable time
Martin Morgan (13:09:58): > @Martin Morgan has joined the channel
U04KXP98M (15:00:48): > so one of the issues to run the S2B figure code on CI is thatdownload.cbioportal.orgis very slow. 200MB takes ~1 hour. Since thecBioDataPack
function pulls all data from there it’s pretty slow the first time (pre caching). I filed an issue for that:https://github.com/cBioPortal/datahub/issues/1166. For CI I’m therefore only going to focus on the API part for now. > > Updated the issue accordingly
U04KXP98M (18:54:26): > i created a dockerfile for cbioportaldata + a github action to build it, see:https://github.com/waldronlab/cBioPortalData/pull/21After looking at the code some more, one idea is to add relevant tests for the manuscript to the already existing test framework of tests/ in cBioPortalData:https://github.com/waldronlab/cBioPortalData/tree/master/tests. I included the test folder in the docker image, that way it will be pretty straightforward to run them on any CI service that supports docker containers (e.g. github actions) > > The current Travis CI tests on waldronlab/cBioPortalData have been failing for a while. Since we are already using Travis for testing, might sense to fix those again.<@UFM384EEB>lmk if you need help with that. Would be good to add a daily cronjob later as well that makes sure things build (so you can easily spot build failures unrelated to code changes). If you add me as admin for the repo I’m happy to add it (https://docs.travis-ci.com/user/cron-jobs/)
U04KXP98M (19:32:24): > actually looks like Travis CI tests just passed, so I guess the problem magically solved itself:slightly_smiling_face:. But would still be good to add a cronjob
U04KXP98M (19:59:13): > im btw not very familiar with appveyor. What are we using it for? Seems to have been failing since a month ago or so
UFM384EEB (20:16:37): > Appveyor does testing on Windows instances. I’m not sure if the bioc configuration is correct though
U04KXP98M (21:25:30): > Gotcha - thanks!
2020-07-01
U1LCB8WEA (03:13:39) (in thread): > <@U04KXP98M>thanks!! You’re invited as an admin. Yes it seems like a good idea to add relevant tests from the manuscript to the tests/ directory for Bioconductor’s daily (40m max) or weekly (6h max, seehttps://bioconductor.org/developers/how-to/long-tests/) tests, although these will only be performed on version bumps, so as a complement not a replacement for testing on a schedule to catch problems not related to code changes.
U04KXP98M (03:53:04) (in thread): > Thanks! It’s cool to see how much bioconductor adds, didn’t realize they also run the tests on version bumps
Vince Carey (06:25:34): > @Vince Carey has joined the channel
U04KXP98M (10:16:19): > i enabled the cronjob on travis, but it seems like Travis is still kinda flaky:https://travis-ci.org/github/waldronlab/cBioPortalData. It seems to hang on building the vignettes sometimes. Not sure what the cause is yet - Attachment (travis-ci.org): Travis CI - Test and Deploy Your Code with Confidence > Travis CI enables your team to test and ship your apps with confidence. Easily sync your projects with Travis CI and you’ll be testing your code in minutes.
Levi Waldron (11:17:24): > Looks related to the slowdowns… > > No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself. > > Check the details on how to adjust your build configuration on:https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received - Attachment (docs.travis-ci.com): Travis CI Documentation > Common Build Problems
UFM384EEB (11:48:12): > I had the travis wait time at 20 but it doesn’t seem to have an effect. I will keep looking into it. I adjusted the timeout time to 30 min. I will keep checking the vignette build.
2020-07-22
UFM384EEB (12:06:15): > <@U04KXP98M>I am having issues reading the data from theccrcc_utokyo_2013/data_mutations_extended.txt
file. > > library(cBioPortalData) > tarloc <- downloadStudy("ccrcc_utokyo_2013") > untar(tarloc, file = "data_mutations_extended.txt", exdir = "data") > readr::read_tsv("data/data_mutations_extended.txt", comment = "#") > # Error: segfault from C stack overflow >
> This kind of works: > > readr::read_tsv("data/data_mutations_extended.txt", skip = 1, comment = NA_character_) > # Warning: 78952 parsing failures. > # row col expected actual file > # 1 -- 82 columns 50 columns 'data/data_mutations_extended.txt' > # 2 -- 82 columns 72 columns 'data/data_mutations_extended.txt' > # 3 -- 82 columns 118 columns 'data/data_mutations_extended.txt' > # 4 -- 82 columns 118 columns 'data/data_mutations_extended.txt' > # 5 -- 82 columns 72 columns 'data/data_mutations_extended.txt' >
> Maybe there are extra \t
s that throw it off?
Sean Davis (12:11:37) (in thread): > My guess is that the first 1000 rows are not representative of the entire dataset. Try using guess_max=5000 or guess_max=10000 in your read_tsv command.
Sean Davis (12:15:02) (in thread): > If you don’t need a tibble, consider using data.table::fread as a replacement.
UFM384EEB (12:32:30) (in thread): > Thanks Sean,fread
gives further clues … > > Warning message: > In data.table::fread(file = "data/data_mutations_extended.txt", sep="\t", skip = 1) : > Found and resolved improper quoting in first 100 rows. >
U04KXP98M (12:42:48) (in thread): > yeah looks like there is some issue with quoting but apparently it’s no problem for loading the data. Can you open this validation report?https://4939-63335718-gh.circle-artifacts.com/0/~/test-reports/ccrcc_utokyo_2013-validation.html
U04KXP98M (12:44:34) (in thread): > yeah looks like im able to open it in a private window
UFM384EEB (12:47:31) (in thread): > I see it. This helps, thanks!
U04KXP98M (12:48:50) (in thread): > i also added you as a contributor to the datahub repo
U04KXP98M (12:50:01) (in thread): > this should allow you to be able to see the weekly validation runs for all studies here:https://app.circleci.com/pipelines/github/cBioPortal/datahub/429/workflows/345d1d99-0225-440a-8974-9fd80fbfb118/jobs/4939/artifacts
U04KXP98M (12:50:16) (in thread): > (that was from 4 days ago)
2020-07-30
Aedin Culhane (13:06:07): > @Aedin Culhane has joined the channel
2020-07-31
bogdan tanasa (14:00:46): > @bogdan tanasa has joined the channel
Dr Awala Fortune O. (16:17:55): > @Dr Awala Fortune O. has joined the channel
2020-10-08
Marcel Ramos Pérez (14:14:29): > @Marcel Ramos Pérez has joined the channel
2020-10-17
Kevin Blighe (10:17:31): > @Kevin Blighe has joined the channel
2020-10-21
Synnøve Yndestad (08:42:24): > @Synnøve Yndestad has joined the channel
2020-10-28
Marcel Ramos Pérez (10:52:22): > Hi<@U04KXP98M>, I don’t quite remember but was there a separate endpoint for mirna data? I get an empty response withov_tcga_pub_mirna
(and similar) molecular profile(s) with this query: > > curl -X POST "[https://www.cbioportal.org/api/molecular-profiles/ov_tcga_pub_mirna/molecular-data/fetch?projection=SUMMARY](https://www.cbioportal.org/api/molecular-profiles/ov_tcga_pub_mirna/molecular-data/fetch?projection=SUMMARY)" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"entrezGeneIds\": [ 25, 142, 207, 208, 242 ], \"sampleIds\": [ \"TCGA-04-1331-01\", \"TCGA-04-1332-01\", \"TCGA-04-1336-01\", \"TCGA-04-1337-01\" ]}" >
U04KXP98M (12:10:57): > Hi<@UFM384EEB>i forget how the mirna molecular data works exactly, but we have negative entrezgeneids for a lot of these e.g.: > > [{"entrezGeneId":-474,"hugoGeneSymbol":"MIR-29B-1/29B","type":"miRNA"}] >
> > > curl -X POST "[https://www.cbioportal.org/api/molecular-profiles/ov_tcga_pub_mirna/molecular-data/fetch?projection=SUMMARY](https://www.cbioportal.org/api/molecular-profiles/ov_tcga_pub_mirna/molecular-data/fetch?projection=SUMMARY)" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"entrezGeneIds\": [ -474], \"sampleIds\": [ \"TCGA-04-1331-01\", \"TCGA-04-1332-01\", \"TCGA-04-1336-01\", \"TCGA-04-1337-01\" ]}" >
> <@U04K8VD5U>do you remember the reasoning behind that?
U04K8VD5U (16:43:11): > <@UFM384EEB><@U04KXP98M>it was designed to handle both precursor miRNA (mutation and cna data) and mature miRNA (expression data). We will likely to refactor the miRNA implementation using the generic assay feature when we find bandwidth.
2020-11-01
Amarinder Singh Thind (18:23:11): > @Amarinder Singh Thind has joined the channel
2021-01-22
Annajiat Alim Rasel (15:42:09): > @Annajiat Alim Rasel has joined the channel
2021-03-05
U04KXTHJD (18:36:47): > @U04KXTHJD has joined the channel
2021-03-09
U04KXTHJD (10:59:00): > <!channel>Can someone take a look at this google group user question about the R-cBioPortalData package? Thank you
Vince Carey (11:19:30): > thanks for the tip.@Levi Waldronwant to take this? i think it seeks explanation of the ragged structure of mutation data.
Vince Carey (11:26:59): > and@Marcel Ramos Pérezmaybe your recent work with maftools could be helpful in response to this question about mutation data
2021-03-20
watanabe_st (01:56:58): > @watanabe_st has joined the channel
2021-04-20
USLACKBOT (01:28:33): > cBioPortalhas removed your organization from this channel. You’ll continue to have access to this archived copy.