#papersandpreprints
2019-05-20
Sean Davis (06:42:28): > @Sean Davis has joined the channel
Sean Davis (06:42:28): > set the channel description: Post your Bioconductor-related manuscripts and preprints here
Levi Waldron (06:42:28): > @Levi Waldron has joined the channel
Vince Carey (06:42:28): > @Vince Carey has joined the channel
Charlotte Soneson (06:45:00): > @Charlotte Soneson has joined the channel
Malte Thodberg (07:01:38): > @Malte Thodberg has joined the channel
Federico Marini (07:14:48): > @Federico Marini has joined the channel
Koen Van den Berge (07:42:41): > @Koen Van den Berge has joined the channel
Kayla Interdonato (07:44:59): > @Kayla Interdonato has joined the channel
Kasper D. Hansen (08:57:24): > @Kasper D. Hansen has joined the channel
Lukas Weber (09:19:16): > @Lukas Weber has joined the channel
Ludwig Geistlinger (10:32:17): > @Ludwig Geistlinger has joined the channel
Artem Sokolov (10:33:28): > @Artem Sokolov has joined the channel
Gabriele Sales (10:34:10): > @Gabriele Sales has joined the channel
Simina Boca (10:36:18): > @Simina Boca has joined the channel
darlanminussi (10:51:08): > @darlanminussi has joined the channel
Dave Tang (10:54:00): > @Dave Tang has joined the channel
Stephanie Hicks (11:49:30): > @Stephanie Hicks has joined the channel
Sehyun Oh (13:13:50): > @Sehyun Oh has joined the channel
Leonardo Collado Torres (13:33:35): > @Leonardo Collado Torres has joined the channel
Dror Berel (15:41:39): > @Dror Berel has joined the channel
Luyi Tian (19:40:06): > @Luyi Tian has joined the channel
Shian Su (20:21:09): > @Shian Su has joined the channel
2019-05-21
Almut (06:38:32): > @Almut has joined the channel
Stuart Lee (19:27:00): > @Stuart Lee has joined the channel
2019-05-22
Brendan Innes (15:47:48): > @Brendan Innes has joined the channel
2019-05-23
dave_sevenbridges (11:32:07): > @dave_sevenbridges has joined the channel
2019-06-05
Mike Smith (04:59:56): > @Mike Smith has joined the channel
2019-06-07
Jenny Drnevich (10:24:41): > @Jenny Drnevich has joined the channel
2019-06-13
Federico Marini (15:20:39): > The channel is little silent, so…
Federico Marini (15:20:40): > https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2879-1 - Attachment (BMC Bioinformatics): pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components > Principal component analysis (PCA) is frequently used in genomics applications for quality assessment and exploratory analysis in high-dimensional data, such as RNA sequencing (RNA-seq) gene expression assays. Despite the availability of many software packages developed for this purpose, an interactive and comprehensive interface for performing these operations is lacking. We developed the pcaExplorer software package to enhance commonly performed analysis steps with an interactive and user-friendly application, which provides state saving as well as the automated creation of reproducible reports. pcaExplorer is implemented in R using the Shiny framework and exploits data structures from the open-source Bioconductor project. Users can easily generate a wide variety of publication-ready graphs, while assessing the expression data in the different modules available, including a general overview, dimension reduction on samples and genes, as well as functional interpretation of the principal components. pcaExplorer is distributed as an R package in the Bioconductor project ( http://bioconductor.org/packages/pcaExplorer/ ), and is designed to assist a broad range of researchers in the critical step of interactive data exploration.
Federico Marini (15:20:44): > :party_parrot:
Stephanie Hicks (19:28:16): > I’ll join in the fun@Federico Marini!
Stephanie Hicks (19:28:58): > This paper led to the developement ofSummarizedBenchmark
in Biochttps://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1716-1 - Attachment (Genome Biology): A practical guide to methods controlling false discoveries in computational biology > In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.
2019-06-23
Ameya Kulkarni (22:09:25): > @Ameya Kulkarni has joined the channel
2019-06-24
Komal Rathi (09:23:48): > @Komal Rathi has joined the channel
2019-06-26
Junhao Li (13:29:05): > @Junhao Li has joined the channel
2019-08-06
Stephanie Hicks (14:49:15): > https://peerj.com/preprints/27885/lots of great people here from the bioc community including@Davis McCarthy@Mark Robinson@Catalina Vallejos! - Attachment (PeerJ Preprints): 12 Grand challenges in single-cell data science > The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of ‘Single Cell Data Science’. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in ‘Single Cell Data Science’ for the coming years.
2019-08-09
Catalina Vallejos (18:51:47): > @Catalina Vallejos has joined the channel
2019-08-14
Aedin Culhane (14:41:49): > @Aedin Culhane has joined the channel
Aedin Culhane (14:42:20): > Our MOGSA paper is publishedhttps://www.mcponline.org/content/early/2019/06/26/mcp.TIR118.001251 - Attachment (Molecular & Cellular Proteomics): MOGSA: integrative single sample gene-set analysis of multiple omics data > Gene-set analysis (GSA) summarizes individual molecular measurements to more interpretable pathways or gene-sets and has become an indispensable step in the interpretation of large-scale omics data. However, GSA methods are limited to the analysis of single omics data. Here, we introduce a new computation method termed multi-omics gene-set analysis (MOGSA), a multivariate single sample gene-set analysis method that integrates multiple experimental and molecular data types measured over the same set of samples. The method learns a low dimensional representation of most variant correlated features (genes, proteins, etc.) across multiple omics data sets, transforms the features onto the same scale and calculates an integrated gene-set score from the most informative features in each data type. MOGSA does not require filtering data to the intersection of features (gene IDs), therefore, all molecular features, including those that lack annotation may be included in the analysis. Using simulated data, we demonstrate that integrating multiple diverse sources of molecular data increases the power to discover subtle changes in gene-sets and may reduce the impact of unreliable information in any single data type. Using real experimental data, we demonstrate three use-cases of MOGSA. First, we show how to remove a source of noise (technical or biological) in integrative MOGSA of NCI60 transcriptome and proteome data. Second, we apply MOGSA to discover similarities and differences in mRNA, protein and phosphorylation profiles of a small study of stem cell lines and assess the influence of each data type or feature on the total gene-set score. Finally, we apply MOGSA to cluster analysis and show that three molecular subtypes are robustly discovered when copy number variation and mRNA data of 308 bladder cancers from The Cancer Genome Atlas are integrated using MOGSA. MOGSA is available in the Bioconductor R package “mogsa”.
2019-08-16
Shila Ghazanfar (05:21:05): > @Shila Ghazanfar has joined the channel
2019-08-18
Luke Zappia (21:28:11): > @Luke Zappia has joined the channel
2019-08-27
Levi Waldron (16:03:27): > https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz632/5544927
Levi Waldron (16:17:11): > That’s “CNVRanger: association analysis of CNVs with gene expression and quantitative phenotypes”:@Ludwig Geistlinger’s new package paper for population copy number analysis including identifying regions of frequent variation (think GISTIC), functional enrichment (based on regioneR), eQTL (based on edgeR), and GWAS (based on PLINK).
2019-12-02
Federico Marini (13:43:53): > Shotgun:slightly_smiling_face:https://www.nature.com/articles/s41592-019-0654-x - Attachment (Nature Methods): Orchestrating single-cell analysis with Bioconductor > This Perspective highlights open-source software for single-cell analysis released as part of the Bioconductor project, providing an overview for users and developers.
2019-12-05
Stephany Orjuela (07:23:34): > @Stephany Orjuela has joined the channel
2019-12-17
Jean Yang (02:41:32): > @Jean Yang has joined the channel
2019-12-22
Sara Fonseca Costa (16:09:07): > @Sara Fonseca Costa has joined the channel
2020-02-14
Andrew Skelton (05:09:39): > @Andrew Skelton has joined the channel
2020-02-22
Sean Davis (10:24:31): > Promoter CpG density predicts downstream gene loss-of-function intolerancehttps://www.biorxiv.org/content/10.1101/2020.02.15.936351v2
Sean Davis (10:24:54): > Nice paper by@Kasper D. Hansen.
Theresa Alexander (10:26:23): > @Theresa Alexander has joined the channel
Christian Brueffer (10:35:39): > @Christian Brueffer has joined the channel
Martin Morgan (11:13:03): > @Martin Morgan has joined the channel
Robert Castelo (11:44:26): > @Robert Castelo has joined the channel
Alan O’C (12:18:35): > @Alan O’C has joined the channel
Jared Andrews (14:21:53): > @Jared Andrews has joined the channel
Peter Hickey (15:42:37): > @Peter Hickey has joined the channel
Stephanie Hicks (16:27:14): > This was another led by@Kasper D. Hansen’s PhD student Yi Wang (https://www.biorxiv.org/content/10.1101/2020.02.13.944777v1) pre-printed the week before on as bias (the mean-correlation bias) in co-expression analysis
Mikhail Dozmorov (19:37:15): > @Mikhail Dozmorov has joined the channel
2020-02-23
Kevin Blighe (02:54:31): > @Kevin Blighe has joined the channel
Mikhael Manurung (11:25:14): > @Mikhael Manurung has joined the channel
Mikhael Manurung (11:52:07) (in thread): > Seems like there are a lot of biases that have to be corrected when doing co-expression analysis on bulk RNA-Seq data such as cell type composition, mean expression difference, and now this mean-correlation. I usually just correct for cell type composition, which I get from cytometry studies on the same samples. Now, I do not think that is enough… > > What would be your recommended pre-processing workflow prior to coexpression/differential coexpression analysis?
2020-02-24
Stephanie Hicks (09:12:36) (in thread): > tagging@Kasper D. Hansen
2020-02-25
Federico Marini (15:06:30): > Look look:slightly_smiling_face:https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007664Great job@Michael Love@Charlotte Soneson@Peter Hickey@Lori Shepherd@Martin Morganand@Rob Patro(plus a couple more not here)!!! - Attachment (journals.plos.org): Tximeta: Reference sequence checksums for provenance identification in RNA-seq > Author summary Gene expression quantification from RNA sequencing is a common component of many research publications. In order that research findings can be computationally reproducible, it is critical that gene expression datasets are linked to the correct gene annotation, including the source of the annotation, the release number, and the location of the genes in a particular genome assembly. Often it is difficult for this critical metadata to be found for public datasets, and manually curating this information subjects the process to human error. We describe a solution for the missing metadata problem, whereby we embed a checksum of the RNA reference sequences in the output files during the expression quantification step. Later we use this checksum for identification and automatic attachment of the correct metadata when the dataset is loaded into R for statistical analysis. We feel this paradigm of embedded checksums and subsequent metadata retrieval will also prove useful in other computational biology contexts.
Kasper D. Hansen (20:54:59) (in thread): > We have only extensively studied co-expression analysis (not differential). We essentially take logTPM or logRPKM and then we remove principal components from the data matrix. It is unclear exactly how many PCs to remove, but it is clear that removing some (or a lot) is a good idea
2020-02-26
Mikhael Manurung (16:05:52) (in thread): > Thanks for the reply@Kasper D. Hansen. One more question: do you prefer to regress out PCs or known covariates? Or do you assume that the removal of PCs will also take care of the covariates as well?
2020-03-03
Russ Bainer (21:55:03): > @Russ Bainer has joined the channel
2020-03-08
Goutham Atla (16:59:25): > @Goutham Atla has joined the channel
2020-03-10
Robert Ivánek (03:55:23): > @Robert Ivánek has joined the channel
2020-03-11
Keegan Korthauer (18:51:17): > @Keegan Korthauer has joined the channel
2020-05-01
Charlotte Rich-Griffin (05:25:39): > @Charlotte Rich-Griffin has joined the channel
2020-05-05
Devika Agarwal (09:56:34): > @Devika Agarwal has joined the channel
2020-05-12
Shani Amarasinghe (09:55:45): > @Shani Amarasinghe has joined the channel
2020-05-13
Nadine Bestard-Cuche (10:51:31): > @Nadine Bestard-Cuche has joined the channel
2020-06-04
Charlotte Soneson (07:14:00): > Just up on F1000 (package in Bioc release 3.11)https://f1000research.com/articles/9-512/v1 - Attachment (f1000research.com): F1000Research Article: ExploreModelMatrix: Interactive exploration for improved understanding of design matrices and linear models in R. > Read the latest article version by Charlotte Soneson, Federico Marini, Florian Geier, Michael I. Love, Michael B. Stadler, at F1000Research.
2020-06-06
Olagunju Abdulrahman (19:57:41): > @Olagunju Abdulrahman has joined the channel
2020-06-29
Aedin Culhane (10:36:39): > Did a mini review of PCA of scRNAseq datahttps://www.frontiersin.org/articles/10.3389/fonc.2020.00973/full - Attachment (Frontiers): Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data > Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
2020-07-04
Umar Ahmad (08:20:58): > @Umar Ahmad has joined the channel
2020-07-15
Spencer Nystrom (08:53:47): > @Spencer Nystrom has joined the channel
wmuehlhaeuser (09:22:17): > @wmuehlhaeuser has joined the channel
2020-07-27
Sonali (08:49:16): > @Sonali has joined the channel
Petra Palenikova (18:34:21): > @Petra Palenikova has joined the channel
Shian Su (18:49:50): > @Shian Su has left the channel
2020-07-29
Riyue Sunny Bao (17:38:20): > @Riyue Sunny Bao has joined the channel
2020-07-31
Hector Climente (13:15:31): > @Hector Climente has joined the channel
2020-08-06
Laurent Gatto (13:01:25): > @Laurent Gatto has joined the channel
2020-08-17
Roye Rozov (02:12:06): > @Roye Rozov has joined the channel
2020-08-18
Will Macnair (09:08:56): > @Will Macnair has joined the channel
2020-09-06
Tyrone Chen (22:32:33): > @Tyrone Chen has joined the channel
2020-11-11
Joshua Shapiro (09:09:25): > @Joshua Shapiro has joined the channel
2020-11-12
Philippe Boileau (15:08:58): > @Philippe Boileau has joined the channel
2020-11-19
Kevin Blighe (08:30:19): > @Kevin Blighe has joined the channel
2020-11-23
Dominique Paul (08:38:22): > @Dominique Paul has joined the channel
2020-12-02
Konstantinos Geles (Constantinos Yeles) (05:43:46): > @Konstantinos Geles (Constantinos Yeles) has joined the channel
2020-12-07
Aoi Senju (21:19:46): > @Aoi Senju has joined the channel
2020-12-12
Huipeng Li (00:39:04): > @Huipeng Li has joined the channel
Jared Andrews (19:58:20): > The dittoSeq paper is out in Bioinformatics, spearheaded by@Dan Bunis:https://doi.org/10.1093/bioinformatics/btaa1011
Dan Bunis (20:00:00): > @Dan Bunis has joined the channel
2021-01-22
Annajiat Alim Rasel (15:46:16): > @Annajiat Alim Rasel has joined the channel
2021-02-07
Mikhael Manurung (11:10:16): > @Mikhael Manurung has left the channel
2021-02-12
Janani Ravi (15:53:00): > @Janani Ravi has joined the channel
2021-02-17
abdullah hanta (16:07:44): > @abdullah hanta has joined the channel
2021-02-23
Wynn Cheung (10:32:55): > @Wynn Cheung has joined the channel
2021-02-25
margherita mutarelli (09:53:06): > @margherita mutarelli has joined the channel
2021-03-20
watanabe_st (01:58:02): > @watanabe_st has joined the channel
2021-03-23
Lambda Moses (23:05:44): > @Lambda Moses has joined the channel
2021-04-28
Mahmoud Ahmed (08:06:18): > @Mahmoud Ahmed has joined the channel
2021-05-11
Megha Lal (16:45:24): > @Megha Lal has joined the channel
2021-05-21
Federico Marini (14:52:46): > Channel’s been quite for a while… > Time to drink aGeneTonic
on it to revive it:cocktail:https://www.biorxiv.org/content/10.1101/2021.05.19.444862v1- out on bioRxiv today! - Attachment (bioRxiv): GeneTonic: an R/Bioconductor package for streamlining the interpretation of RNA-seq data > Background: The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats - normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. Results: We developed the GeneTonic software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. GeneTonic is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. GeneTonic is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. Conclusion: GeneTonic is distributed as an R package in the Bioconductor project (https://bioconductor.org/packages/GeneTonic/) under the MIT license. Offering both bird’s-eye views of the components of transcriptome data analysis and the detailed inspection of single genes, individual signatures, and their relationships, GeneTonic aims at simplifying the process of interpretation of complex and compelling RNA-seq datasets for many researchers with different expertise profiles. ### Competing Interest Statement The authors have declared no competing interest.
2021-05-25
Enrica Calura (03:48:12): > @Enrica Calura has joined the channel
Quang Nguyen (12:20:41): > @Quang Nguyen has joined the channel
2021-06-04
Izaskun Mallona (08:56:50): > @Izaskun Mallona has joined the channel
2021-06-11
Sebastian Worms (07:10:45): > @Sebastian Worms has joined the channel
2021-07-21
Aedin Culhane (21:05:25): > Excited our BIRS single cell data integration workshop & hackathon was featured in Naturehttps://nature.com/articles/d41586-021-01994-w…The R code and data are on githubhttps://github.com/BIRSBiointegration… > Videos, slides of talks athttps://birs.ca/events/2020/5-day-workshops/20w5197… - Attachment (birs.ca): 20w5197: Mathematical Frameworks for Integrative Analysis of Emerging Biological Data Types (Online) | Banff International Research Station > Workshop at the Banff International Research Station in Banff, Alberta between Jun 14 and Jun 19, 2020: Mathematical Frameworks for Integrative Analysis of Emerging Biological Data Types (Online).
2021-07-30
Tom Sinden (04:19:28): > @Tom Sinden has joined the channel
2021-08-04
TimNieuwenhuis (12:16:44): > @TimNieuwenhuis has joined the channel
2021-08-19
Ava Hoffman (she/her) (11:33:13): > @Ava Hoffman (she/her) has joined the channel
2021-09-06
Eddie (08:23:13): > @Eddie has joined the channel
2021-09-07
Andrew Jaffe (14:51:35): > @Andrew Jaffe has joined the channel
2021-09-16
Henry Miller (18:35:48): > @Henry Miller has joined the channel
2021-11-08
Paula Nieto García (03:29:05): > @Paula Nieto García has joined the channel
2022-01-03
Kurt Showmaker (17:05:01): > @Kurt Showmaker has joined the channel
2022-01-19
Stephany Orjuela (10:11:06): > @Stephany Orjuela has left the channel
2022-01-28
Megha Lal (11:14:07): > @Megha Lal has left the channel
2022-02-01
Stephanie Hicks (20:25:06): > @Stephanie Hicks has left the channel
2022-03-20
Sarvesh Nikumbh (21:41:33): > @Sarvesh Nikumbh has joined the channel
2022-04-26
Indrik Wijaya (21:39:07): > @Indrik Wijaya has joined the channel
2022-05-15
Antonija Kolobaric (15:23:54): > @Antonija Kolobaric has joined the channel
2022-05-16
Pedro Sanchez (07:04:14): > @Pedro Sanchez has joined the channel
2022-05-18
Vince Carey (06:25:12): > @Vince Carey has left the channel
2022-07-05
Sehyun Oh (16:16:21): > The GenomicSuperSignature package is out in Nature Communication. Collaborative work by@Sehyun Oh@Levi Waldron@Sean Davishttps://www.nature.com/articles/s41467-022-31411-3 - Attachment (Nature): GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases > Nature Communications - Many transcriptomic profiles have been deposited in public archives but are underused for the interpretation of experiments. Here the authors report GenomicSuperSignature…
2022-07-15
Ashley Robbins (15:21:23): > @Ashley Robbins has joined the channel
2022-09-27
Jennifer Holmes (16:15:11): > @Jennifer Holmes has joined the channel
2022-10-21
John Ogata (16:06:01): > @John Ogata has joined the channel
2023-01-26
Chenyue Lu (16:49:33): > @Chenyue Lu has joined the channel
2023-02-26
Arda Keles (03:57:17): > @Arda Keles has joined the channel
2023-03-07
Sarvesh Nikumbh (08:07:54): > Happy to share theseqArchRpreprint:https://www.biorxiv.org/content/10.1101/2023.03.02.530868v1https://twitter.com/sarveshnikumbh/status/1632756125148893184?t=sJqnz7QXSAeCd9to_4-cGQ&s=19 - Attachment (Bioconductor): seqArchR > seqArchR enables unsupervised discovery of de novo clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does not require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos. - Attachment (twitter): Attachment > Happy to introduce seqArchR, an approach using non-negative matrix factorisation (NMF) for de novo identification of promoter sequence architectures. 1/7 > https://www.biorxiv.org/content/10.1101/2023.03.02.530868v1
Pedro Sanchez (12:00:47) (in thread): > That’s great! Congrats Sarvesh!!
2023-03-08
Sarvesh Nikumbh (17:42:03) (in thread): > Thanks, Pedro!:slightly_smiling_face:
2023-03-10
Edel Aron (15:28:51): > @Edel Aron has joined the channel
Joaquin Reyna (15:36:21): > @Joaquin Reyna has joined the channel
2023-03-21
Assa (02:13:32): > @Assa has joined the channel
Reece11 (08:35:55): > @Reece11 has joined the channel
Alexander Bender (10:47:46): > @Alexander Bender has joined the channel
2023-03-22
Fabricio Almeida-Silva (02:21:45): > @Fabricio Almeida-Silva has joined the channel
Chris Vanderaa (08:27:32): > @Chris Vanderaa has joined the channel
2023-04-20
Chris Vanderaa (04:39:24): > So glad to share with you our new preprint:https://arxiv.org/abs/2304.06654:partying_face:In this perspective, we discuss how to deal with missing values in#mass-spectrometry#singlecell#proteomics. Here’s a thread with our key messages. - Attachment (arXiv.org): Revisiting the thorny issue of missing values in single-cell proteomics > Missing values are a notable challenge when analysing mass spectrometry-based > proteomics data. While the field is still actively debating on the best > practices, the challenge increased with the emergence of mass > spectrometry-based single-cell proteomics and the dramatic increase in missing > values. A popular approach to deal with missing values is to perform > imputation. Imputation has several drawbacks for which alternatives exist, but > currently imputation is still a practical solution widely adopted in > single-cell proteomics data analysis. This perspective discusses the advantages > and drawbacks of imputation. We also highlight 5 main challenges linked to > missing value management in single-cell proteomics. Future developments should > aim to solve these challenges, whether it is through imputation or data > modelling. The perspective concludes with recommendations for reporting missing > values, for reporting methods that deal with missing values and for proper > encoding of missing values.
Chris Vanderaa (04:40:03) (in thread): > What should we do with these missing values? To impute, or not to impute, that is the question. While the discussion is relevant to any proteomics experiment, we frame our arguments with single-cell applications in mind. - File (PNG): image.png
Chris Vanderaa (04:40:22) (in thread): > A major challenge with SCP is the high proportions of missing values, whatever the technology used to generate and process the data. - File (PNG): image.png
Chris Vanderaa (04:40:40) (in thread): > But that’s not the only challenge. We describe 5 main challenges that future computational development should address. - File (PNG): image.png
Chris Vanderaa (04:41:23) (in thread): > We next provide recommendations, extending the SCP community guidelines initiated by Nikolai Slavov, but focusing on missing values.http://dx.doi.org/10.1038/s41592-023-01785-3
Chris Vanderaa (04:41:45) (in thread): > First, report at least these 4 metrics when describing sensitivity of an experiment: local sensitivity, total sensitivity, data completeness and number of cells acquired. The text provides a definition for each metric, along with their respective advantages and limitations. - File (PNG): image.png
Chris Vanderaa (04:42:04) (in thread): > Second, report at least these 3 pieces of information when mentioning a method to deal with missing values: name of the algorithm, name of the software and version of that software. Better, provide the code that reproduces your data analysis.
Chris Vanderaa (04:42:16) (in thread): > Software details do matter! For instance, the same KNN algorithm implemented in two pieces of software leads to different results, because they either impute by samples (cells) or by variable (proteins). Here’s the impact on cell correlations. - File (PNG): image.png
Chris Vanderaa (04:42:33) (in thread): > Finally, encode missing values using NA (or numpy.nan). Zero should not be used because it leads to imputation by zero, the worst thing to do when dealing with missing values in proteomics.
Chris Vanderaa (04:46:09) (in thread): > We performed data exploration and illustration using our Bioconductor tools:scpandscpdata:grin: - Attachment (Bioconductor): scp > Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the ‘QFeatures’ package and relies on ‘SingleCellExpirement’ to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization. - Attachment (Bioconductor): scpdata > The package disseminates mass spectrometry (MS)-based single-cell proteomics (SCP) datasets. The data were collected from published work and formatted using the scp
data structure. The data sets contain quantitative information at spectrum, peptide and/or protein level for single cells or minute sample amounts. - File (PNG): image.png - File (PNG): image.png
Chris Vanderaa (04:47:32) (in thread): > Many thanks to@Laurentfor his guidance. I also thank the UCLouvain-CBIO lab for the fruitful discussions about missing values in omics and single cell data, and@Charlotte Soneson,@lievenClementand@Davide Rissofor discussing missing value imputation in scRNA-Seq data:pray:
2023-05-18
Oluwafemi Oyedele (05:54:46): > @Oluwafemi Oyedele has joined the channel
2023-07-28
Benjamin Yang (15:58:23): > @Benjamin Yang has joined the channel
2023-08-02
Beth Cimini (08:21:07): > @Beth Cimini has joined the channel
2023-08-03
Ritika Giri (15:58:53): > @Ritika Giri has joined the channel
2023-08-20
Jacques SERIZAY (10:38:50): > @Jacques SERIZAY has joined the channel
2023-09-01
Kwangwoon (Jon) Lee (09:38:57): > @Kwangwoon (Jon) Lee has joined the channel
Samuel Gamboa (09:50:25): > @Samuel Gamboa has joined the channel
Maria Doyle (10:21:18): > @Maria Doyle has joined the channel
Vince Carey (11:01:54): > @Vince Carey has joined the channel
Xiuwen Zheng (15:33:19): > @Xiuwen Zheng has joined the channel
2023-09-02
Y-h. Taguchi (03:26:19): > @Y-h. Taguchi has joined the channel
Y-h. Taguchi (03:28:54): > You can perform tensor analysis without knowing about tensorhttps://link.growkudos.com/1e9dormza4gI have published a paper to advertise my recent bioconductor packages. > I am happy if many people make use of these packages for their own studies. - Attachment (growkudos.com): You can perform tensor analysis without knowing about tensor > We have developed tensor based analysis for omics data set and successfully applied it to various bioinformatics topics. But it cannot be widely accepted possibly because of unpopularity of tensor algebra.
Krutika (04:01:12): > @Krutika has joined the channel
Abdullah Al Nahid (23:03:02): > @Abdullah Al Nahid has joined the channel
2023-09-03
Lea Seep (09:52:17): > @Lea Seep has joined the channel
2023-09-04
saskia (00:57:11): > @saskia has joined the channel
2023-09-05
Mariela (22:47:38): > @Mariela has joined the channel
2023-09-06
rizoic (22:32:16): > @rizoic has joined the channel
2023-09-09
Mikaila Chetty (04:58:45): > @Mikaila Chetty has joined the channel
2023-09-20
Timothy Keyes (18:14:18): > @Timothy Keyes has joined the channel
2023-09-25
Nikhita (18:40:48): > @Nikhita has joined the channel
2023-10-18
Aedin Culhane (11:55:45): > @Sean Davis@Martin Morganhave you played with the CZI software mentions datasethttps://github.com/chanzuckerberg/software-mentions. It might be a nice additional to BiocPKgTools when pulling Bioconductor Impact Stats.@Lori Shepherd@Vince Carey
Martin Morgan (12:00:49) (in thread): > I’ve seen this but not used it. Here are some tools to investigate the 70k+ full-text citations to ‘Bioconductor’ in PubMedCentral…https://mtmorgan.github.io/pmcbioc/ - Attachment (mtmorgan.github.io): Summarize PubMedCentral Publications Mentioning Bioconductor > PubMedCentral provides full-text search functionality, > returning scientific articles matching the query. pmcbioc can > parse the XML result of a PubMedCentral query, extracting > article metadata to a database and creating an index to facilitate > fast access to individual articles. The database is then easily > queried to summarize, using dbplyr and dplyr, the > articles. The indexed XML file of results can be queried for one > or more records using xpath to extract information not contained > in the metadata.
2023-12-14
Mikhael Manurung (04:33:31): > @Mikhael Manurung has joined the channel
2024-03-11
Melysssa Minto (10:12:32): > @Melysssa Minto has joined the channel
2024-03-22
Stevie Pederson (02:12:04): > @Stevie Pederson has joined the channel
2024-04-04
Tung Trinh (23:39:10): > @Tung Trinh has joined the channel
2024-04-18
Philipp Sergeev (03:02:24): > @Philipp Sergeev has joined the channel
Weston Elison (15:53:48): > @Weston Elison has joined the channel
2024-05-15
Sunil Nahata (08:31:21): > @Sunil Nahata has joined the channel
2024-07-05
Margherita (12:29:15): > @Margherita has joined the channel
2024-07-19
Sudipta Hazra (17:25:13): > @Sudipta Hazra has joined the channel
2024-07-26
Qiwen Octavia Huang (19:52:07): > @Qiwen Octavia Huang has joined the channel
2024-08-19
Rema Gesaka (09:41:49): > @Rema Gesaka has joined the channel
2024-09-11
Y-h. Taguchi (09:08:39): > A comprehensive volume about “Tensor decomposition based unsupervised feature extraction”https://www.growkudos.com/publications/10.1007%252F978-3-031-60982-4 - Attachment (growkudos.com): A comprehensive volume about “Tensor decomposition based unsupervised feature extraction” > This volume explains principal component analysis/tensor decomposition based unsupervised feature extraction that I have proposed at 2012 and 2017, respectively. You can learn the mathematical background and various applications. The method focuses so-called feature selection/extracttion.
2024-09-20
Camille Guillermin (09:30:23): > @Camille Guillermin has joined the channel
2024-10-01
Caroline Schreiber (04:10:10): > @Caroline Schreiber has joined the channel
2025-02-23
Sean Davis (12:44:48): > https://www.nature.com/articles/s41592-024-02212-x - Attachment (Nature): SpatialData: an open and universal data framework for spatial omics > Nature Methods - SpatialData is a user-friendly computational framework for exploring, analyzing, annotating, aligning and storing spatial omics data that can seamlessly handle large multimodal…
2025-02-24
Vince Carey (05:07:53): - File (PDF): GrangesInRust.pdf
2025-03-07
Abdullah Al Nahid (09:47:34) (in thread): > neat!
2025-03-18
Andres Wokaty (14:28:28): > @Andres Wokaty has joined the channel
2025-03-20
Louise Morlot (12:45:59): > @Louise Morlot has joined the channel