【佳學(xué)基因檢測(cè)】基因解碼數(shù)據(jù)源：轉(zhuǎn)錄本豐度的tximport算法

[轉(zhuǎn)錄本豐度和tximport管道

在我們演示如何對(duì)齊然后計(jì)數(shù)RNA seq片段之前，我們提到一種更新更快的替代管道是使用轉(zhuǎn)錄豐度量化方法，如Sailfish8、Salmon9、kallisto10或RSEM11，在不對(duì)齊讀取的情況下估計(jì)豐度，然后是tximport軟件包，用于組裝計(jì)數(shù)矩陣和偏移矩陣，用于Bioconductor差異基因表達(dá)軟件包。我們已將其作為該工作流程修訂過(guò)程的一部分添加，因此以下材料涵蓋了通過(guò)對(duì)齊和讀取/碎片計(jì)數(shù)生成計(jì)數(shù)矩陣。將轉(zhuǎn)錄物豐度量詞與tximport結(jié)合使用以產(chǎn)生基因水平計(jì)數(shù)矩陣和標(biāo)準(zhǔn)化偏移量的優(yōu)點(diǎn)是：該方法校正了樣本間基因長(zhǎng)度的任何潛在變化（例如，來(lái)自差異異構(gòu)體的使用）12；與基于對(duì)齊的方法相比，其中一些方法速度更快，所需的內(nèi)存和磁盤使用量更少；而且可以避免丟棄那些可以與多個(gè)具有同源序列的基因?qū)R的片段13。請(qǐng)注意，成績(jī)單豐度量詞跳過(guò)存儲(chǔ)讀取比對(duì)的大型文件的生成，而生成存儲(chǔ)每個(gè)成績(jī)單的估計(jì)豐度、計(jì)數(shù)和有效長(zhǎng)度的較小文件。有關(guān)更多詳細(xì)信息，請(qǐng)參閱描述tximport方法的手稿14和tximport包漸暈圖以了解軟件詳細(xì)信息。在使用成績(jī)單量詞和tximport之后，返回此工作流的入口點(diǎn)將是下面數(shù)據(jù)對(duì)象的部分。

Transcript abundances and the tximport pipeline

Before we demonstrate how to align and then count RNA-seq fragments, we mention that a newer and faster alternative pipeline is to use transcript abundance quantification methods such as Sailfish8, Salmon9, kallisto10 or RSEM11 to estimate abundances without aligning reads, followed by the tximport package for assembling count and offset matrices for use with Bioconductor differential gene expression packages. We have added this as part of the revision process for this workflow, therefore the following material covers generation of count matrices through alignment and read/fragment counting. The advantages of using the transcript abundance quantifiers in conjunction with tximport to produce gene-level count matrices and normalizing offsets are: this approach corrects for any potential changes in gene length across samples (e.g., from differential isoform usage)12; some of these methods are substantially faster and require less memory and disk usage compared to alignment-based methods; and it is possible to avoid discarding those fragments that can align to multiple genes with homologous sequence13. Note that transcript abundance quantifiers skip the generation of large files which store read alignments, instead producing smaller files which store estimated abundances, counts, and effective lengths per transcript. For more details, see the manuscript describing the tximport approach14 and the tximport package vignette for software details. The entry point back into this workflow after using a transcript quantifier and tximport would be the section on the data object below.

(責(zé)任編輯：佳學(xué)基因)