#longread

2023-03-03

Michael Love (11:24:12): > @Michael Love has joined the channel

Michael Love (11:24:12): > set the channel description: Discussions about long-read RNA-seq in Bioconductor and beyond

Charlotte Soneson (11:25:37): > @Charlotte Soneson has joined the channel

Michael Love (11:26:35): > should I ask for a BiocViews? - File (PNG): Screenshot 2023-03-03 at 11.26.17 AM.png

Michael Love (11:27:34): > LongReadRNASeq? is this future proof?

Michael Love (11:27:47): > FullTranscriptRNASeq?

Mervin Fansler (11:32:00): > @Mervin Fansler has joined the channel

Meeta Mistry (11:33:46): > @Meeta Mistry has joined the channel

Frederick Tan (11:48:46): > @Frederick Tan has joined the channel

Stephanie Hicks (12:14:31): > @Stephanie Hicks has joined the channel

Stephanie Hicks (12:15:14): > I saw a paper a few weeks back doing long read proteomics

Stephanie Hicks (12:15:50): > Maybe justLongRead?

Michael Love (12:48:33): > true. there won’t be much build out of long read assembly in Bioc so not much conflict

2023-03-05

Peter Hickey (15:48:07): > @Peter Hickey has joined the channel

Shian Su (17:34:58): > @Shian Su has joined the channel

Shian Su (17:37:43): > Agree withLongRead, a second tag can be used to be more specific, rather than havingLongReadDNA,LongReadRNA, etc..

Jonathan Goeke (23:10:27): > @Jonathan Goeke has joined the channel

Jonathan Goeke (23:13:48): > LongReadsounds good to me as well.LongReadRNASeqmight duplicateRNASeqassuming that all RNA Seq will be long read in the future:wink:

Jonathan Goeke (23:16:02): > even though in practice maybeLongReadRNASeqthe tag that is most unambiguous probably for quite some time

2023-03-06

Matt Ritchie (03:42:54): > @Matt Ritchie has joined the channel

Michael Love (06:37:07): > has renamed the channel from “long-read-rna” to “longread”

Michael Love (06:40:53): > ok i’m going to email the developer list to ask about this

Spencer Nystrom (08:18:39): > @Spencer Nystrom has joined the channel

Jeroen Gilis (08:20:31): > @Jeroen Gilis has joined the channel

Benjamin McMichael (12:17:33): > @Benjamin McMichael has joined the channel

Ying Chen (20:44:44): > @Ying Chen has joined the channel

2023-03-07

Krithika Bhuvanesh (10:38:07): > @Krithika Bhuvanesh has joined the channel

2023-03-08

Anže Lovše (12:42:25): > @Anže Lovše has joined the channel

Federico Marini (15:23:16): > @Federico Marini has joined the channel

2023-03-31

Ilaria Billato (08:57:31): > @Ilaria Billato has joined the channel

2023-05-03

Jenny Drnevich (17:09:59): > @Jenny Drnevich has joined the channel

Jenny Drnevich (17:23:38): > Hi all! Glad to see this channel established. Anyone done or heard about doing quantitative RNA-Seq using PacBio? With the new Revio, the throughput might be high enough to start to get accurate quantification of transcripts per sample instead of just the identification of transcripts per sample, with the end goal of doing differential expression testing. LOTS of issues to consider… I’m interested in getting a conversation going.https://www.pacb.com/products-and-services/applications/rna-sequencing/ - Attachment (PacBio): RNA sequencing > RNA sequencing provides full-length transcripts to characterize the full diversity of transcriptomes and reliable isoform information.

Jenny Drnevich (17:28:06): > And here is a good older discussion:https://bioinformatics.stackexchange.com/questions/6910/why-illumina-if-pacbio-provides-longer-and-better-reads - Attachment (Bioinformatics Stack Exchange): Why Illumina if PacBio provides longer and better reads? > PacBio provides longer read length than Illumina’s short-length reads. Longer reads offer better opportunity for genome assembly, structural variant calling. It is not worse than short reads for ca…

2023-05-04

Stephanie Hicks (08:39:30): > hi Jenny! I’ve seen more examples with Nanopore, but I think@Matt Ritchiehas done quite a bit in this space. Here are some references I’m aware of (not pacbio specific tho) > * https://www.biorxiv.org/content/10.1101/2022.07.22.501076v2 > * https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02525-6 > * https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02907-y > * https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02399-8 > * https://www.biorxiv.org/content/10.1101/2022.11.14.516358v2 - Attachment (bioRxiv): Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures > The current lack of benchmark datasets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (“sequins”). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that, StringTie2 and bambu outperformed other tools from the 6 isoform detection tools tested, DESeq2, edgeR and limma-voom were best amongst the 5 differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the 5 tools compared, which suggests further methods development is needed for this application. > > ### Competing Interest Statement > > The authors have declared no competing interest. - Attachment (BioMed Central): Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing - Genome Biology > A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity. - Attachment (BioMed Central): Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE - Genome Biology > Long-read single-cell RNA sequencing (scRNA-seq) enables the quantification of RNA isoforms in individual cells. However, long-read scRNA-seq using the Oxford Nanopore platform has largely relied upon matched short-read data to identify cell barcodes. We introduce BLAZE, which accurately and efficiently identifies 10x cell barcodes using only nanopore long-read scRNA-seq data. BLAZE outperforms the existing tools and provides an accurate representation of the cells present in long-read scRNA-seq when compared to matched short reads. BLAZE simplifies long-read scRNA-seq while improving the results, is compatible with downstream tools accepting a cell barcode file, and is available at https://github.com/shimlab/BLAZE . - Attachment (BioMed Central): LIQA: long-read isoform quantification and analysis - Genome Biology > Long-read RNA sequencing (RNA-seq) technologies can sequence full-length transcripts, facilitating the exploration of isoform-specific gene expression over short-read RNA-seq. We present LIQA to quantify isoform expression and detect differential alternative splicing (DAS) events using long-read direct mRNA sequencing or cDNA sequencing data. LIQA incorporates base pair quality score and isoform-specific read length information in a survival model to assign different weights across reads, and uses an expectation-maximization algorithm for parameter estimation. We apply LIQA to long-read RNA-seq data from the Universal Human Reference, acute myeloid leukemia, and esophageal squamous epithelial cells and demonstrate its high accuracy in profiling alternative splicing events. - Attachment (bioRxiv): Context-Aware Transcript Quantification from Long Read RNA-Seq data with Bambu > Most approaches to transcript quantification rely on fixed reference annotations. However, the transcriptome is dynamic, and depending on the context, such static annotations contain inactive isoforms for some genes while they are incomplete for others. > > To address this, we have developed Bambu, a method that performs machine-learning based transcript discovery to enable quantification specific to the context of interest using long-read RNA-Seq data. To identify novel transcripts, Bambu employs a precision-focused threshold referred to as the novel discovery rate (NDR), which replaces arbitrary per-sample thresholds with a single interpretable parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. > > Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve abundance estimates for both novel and known transcripts. We apply Bambu to human embryonic stem cells to quantify isoforms from repetitive HERVH-LTR7 retrotransposons, demonstrating the ability to estimate transcript expression specific to the context of interest. > > ### Competing Interest Statement > > Jonathan Göke received travel and accommodation expenses to speak at the Oxford Nanopore Community Meeting 2018. All other authors declare no competing interest.

Jenny Drnevich (09:10:09): > Thanks,@Stephanie Hicks! These seem like a great place to start. Are there any known biases that have been identified for Nanopore? PacBio looks to be biased towards shorter sequences because of their easier migration into the SMRT cell wells (https://www.pacb.com/proceedings/getting-the-most-out-of-your-pacbio-libraries-with-size-selection/). - Attachment (PacBio): Getting the most out of your PacBio libraries with size selection. - PacBio > PacBio RS II sequencing chemistries provide read lengths beyond 20 kb with high consensus accuracy. The long read lengths of P4-C2 chemistry and demonstrated consensus accuracy of 99.999% are ideal for applications such as de novo assembly, targeted sequencing and isoform sequencing. The recently launched P5-C3 chemistry generates even longer reads with N50 often >10,000 bp, making it the best choice for scaffolding and spanning structural rearrangements. With these chemistry advances, PacBio’s read length performance is now primarily determined by the SMRTbell library itself. Size selection of a high-quality, sheared 20 kb library using the BluePippin™ System has been demonstrated to increase the N50 read length by as much as 5 kb with C3 chemistry. BluePippin size selection or a more stringent AMPure® PB selection cutoff can be used to recover long fragments from degraded genomic material. The selection of chemistries, P4-C2 versus P5-C3, is highly dependent on the final size distribution of the SMRTbell library and experimental goals. PacBio’s long read lengths also allow for the sequencing of full-length cDNA libraries at single-molecule resolution. However, longer transcripts are difficult to detect due to lower abundance, amplification bias, and preferential loading of smaller SMRTbell constructs. Without size selection, most sequenced transcripts are 1-1.5 kb. Size selection dramatically increases the number of transcripts >1.5 kb, and is essential for >3 kb transcripts.

Stephanie Hicks (09:11:10): > no documented biases that i’m aware of, but yes I can imagine similar biases in nanopore

2023-06-13

Jonathan Goeke (05:00:48): > Hi everyone, we just published bambu, a bioconductor package for transcript discovery and quantification from long read RNA-Seq dataHere is the link to the paperhttps://rdcu.be/deluQ(with@Ying Chenand@Michael Lovewho are actually in this channel as well!) - Attachment (Bioconductor): bambu > bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.

2023-06-15

Matt Ritchie (00:28:02) (in thread): > Sorry for the late reply to this@Jenny Drnevichand@Stephanie Hicks. The list of papers@Stephanie Hicksprovides above is a great place to start, and do check out the bambu package mentioned below (congrats@Jonathan Goekeand co. on your paper!) > > Re: biases, you might find this recent work of interesthttps://academic.oup.com/nargab/article/5/2/lqad060/7192649(pertains to direct RNA)

Jonathan Goeke (02:25:21) (in thread): > Thanks@Matt Ritchie! (and I agree this is a very interesting study + data set)

Michael Love (03:09:57): > Reminder that we now have aLongReadBiocViews:https://bioconductor.org/packages/devel/BiocViews.html#___LongReadConsider adding to your packages@Jonathan Goeke@Matt Ritchieand others…

2023-06-16

Jonathan Goeke (05:16:44): > Yes, thanks for the reminder! We will include that with the next update

2023-06-29

Sowmya Parthiban (11:44:52): > @Sowmya Parthiban has joined the channel

2023-08-04

Trisha Timpug (09:35:46): > @Trisha Timpug has joined the channel

2023-09-15

Leo Lahti (04:55:14): > @Leo Lahti has joined the channel

2023-09-25

Matt Ritchie (20:49:36): > Sharing a preprint from theLR-GASP Consortiumand another one looking at isoform variation inAlzheimer’s diseasein case they are of interest - Attachment (bioRxiv): Using deep long-read RNAseq in Alzheimer’s disease brain to assess clinical relevance of RNA isoform diversity > Due to alternative splicing, human protein-coding genes average over eight RNA isoforms, resulting in nearly four distinct protein coding sequences per gene. Long-read RNAseq (IsoSeq) enables more accurate quantification of isoforms, shedding light on their specific roles. To assess the clinical relevance of measuring RNA isoform expression, we sequenced 12 aged human frontal cortices (6 Alzheimer’s disease cases and 6 controls, 50% female) using one Oxford Nanopore PromethION flow cell per sample. Our study uncovered 53 new high-confidence RNA isoforms in clinically relevant genes, including several where the new isoform was one of the most highly expressed for that gene. Specific examples include WDR4 (61%; microcephaly), MYL3 (44%; hypertrophic cardiomyopathy), and MTHFS (25%; major depression, schizophrenia, bipolar disorder). Other notable genes with new high-confidence isoforms include CPLX2 (10%; schizophrenia, epilepsy) and MAOB (9%; targeted for Parkinson’s disease treatment). We identified 1,917 clinically relevant genes expressing multiple isoforms in human frontal cortex, where 1,018 had multiple isoforms with different protein coding sequences, demonstrating the need to better understand how individual isoforms from a “single” gene are involved in human health and disease. Exactly 98 of the 1,917 genes are implicated in brain-related diseases, including Alzheimer’s disease genes such as APP (Aβ precursor protein; five), MAPT (tau protein; four), and BIN1 (eight). We also found 99 differentially expressed RNA isoforms between Alzheimer’s cases and controls, despite the genes themselves not exhibiting differential expression. Our findings highlight the significant knowledge gaps in RNA isoform diversity and their clinical relevance. Deep long-read RNA sequencing will be necessary going forward to fully comprehend the clinical relevance of individual isoforms for a “single” gene. > > ### Competing Interest Statement > > The authors have declared no competing interest.

2023-09-26

Jenny Drnevich (10:06:57) (in thread): > Cool! Although I’ve never seen anything like this in the methods: “Long read cDNA library preparation commenced, utilizing the Oxford Nanopore Technologies PCR-amplified cDNA kit … The protocol,though not available due to a legal embargo, was followed…” (emphasis mine).:smile:

2023-09-27

Matt Ritchie (21:44:28) (in thread): > Oh! I I imagine that line needs updating, as the technology they are using would surely have been released as a kit you can buy now (although it too may be obsolete, as the versions of kits seems to update often…)

2023-10-13

Stephanie Hicks (00:20:35): > This is a great read (and congrats@Matt Ritchie!)https://www.biorxiv.org/content/10.1101/2023.07.25.550582v1

2023-10-14

Michael Love (11:20:49): > And recently publishedhttps://www.nature.com/articles/s41592-023-02026-3 - Attachment (Nature): Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures > Nature Methods - This analysis leverages experimentally sequenced data and in silico mixtures to simulate transcript expression differences, which enables a performance assessment of long-read…

2024-01-20

Michael Kaufman (15:22:51): > @Michael Kaufman has joined the channel

2024-03-25

Pedro Sanchez (06:38:28): > @Pedro Sanchez has joined the channel

2024-04-28

Danielle Callan (08:32:15): > @Danielle Callan has joined the channel

2024-04-29

Jacqui Thompson (19:00:25): > @Jacqui Thompson has joined the channel

2024-05-27

Aedin Culhane (21:16:30): > @Aedin Culhane has joined the channel

2024-08-21

Laura Symul (08:57:15): > @Laura Symul has joined the channel

2024-08-26

Krithika Bhuvanesh (22:50:10): > @Krithika Bhuvanesh has left the channel

2024-09-16

Mike Morgan (06:24:51): > @Mike Morgan has joined the channel

2024-10-07

Michael Love (16:05:58): > CC@Pedro Baldoni

Pedro Baldoni (20:27:39): > @Pedro Baldoni has joined the channel

2024-11-25

rohitsatyam102 (16:17:30): > @rohitsatyam102 has joined the channel

2025-01-09

Ammar Sabir Cheema (11:39:46): > @Ammar Sabir Cheema has joined the channel

2025-04-25

Carlos Mata-Machado (21:49:22): > @Carlos Mata-Machado has joined the channel