|
Tutorial_AltExpression_RNASeq
Tutorial 3 - Analyzing RNASeq Splicing Data
Featured Tutorial 3 - Analyzing RNASeq Alternative Exons & JunctionsThe below tutorial applies to the analysis of RNA sequencing data and junction arrays. A separate tutorial for the analysis of Affymetrix Exon and Gene arrays can be found here. IntroductionThe primary use of AltAnalyze is to evaluate alternative splicing, alternative promoters or other forms of alternative gene regulation. To do this, AltAnalyze imports pre-aligned exon-junctions and/or exon coordinates and read counts, calculates a reciprocal junction alternative exon-score (ASPIRE and LinearRegression), assigns exon/intron/splicing annotations to these results and further assesses protein, protein domain and microRNA binding site changes for associated isoforms. AltAnalyze makes this process relatively easy, with the user only required to download and extract the program and provide one set of basic files. In the following tutorial we will walk through these steps using a sample dataset. Note: If analyzing junction arrays as opposed to RNASeq data, the analysis options and result files are nearly identical. Downloading Sample DataSample data can be downloaded here. Building Junction and Exon AlignmentsAltAnalyze can be run with junction and/or exon read counts from BED format or BioScope result files. Instructions for obtaining junction and exon-level input files from your sequencing experiments can be found here. If you already have junction.bed files, instructions for building exon.bed files can be found here. Installing AltAnalyze and Saving Your DataAltAnalyze versiown 2.0 can be downloaded for multiple operating systems from http://www.altanalyze.org. Once you have downloaded the compressed archive to your computer, extract it to an accessible folder on your hard-drive (e.g., your user account). In addition to AltAnalyze, Cytoscape and DomainGraph are automatically downloaded when a species database is first installed. See the 2nd to last step in the Running AltAnalyze instructions for how to immediately start DomainGraph after generating results. Creating a Comparison and Groups File (OPTIONAL)If your dataset has over 30 BED files or dozens of groups, it may save you time to make the groups and comps files in advance. Although not recommended when working with this sample dataset, go here if this applies to your own dataset. Running AltAnalyzeNow you are ready to process your input files and obtain alternative exons with alternative splicing and functional annotations. If running through the graphical user interface follow the below directions, otherwise follow the commandline options for RNASeq analysis. To proceed:
Interpreting the ResultsWhen AltAnalyze was running it produced a number of output files, most to the folder AltResults/AlternativeOutput in the user output directory. These include:
These files are tab-delimited text files that can be opened in a spreadsheet program like Microsoft Excel, OpenOffice or Google Documents. In addition to these files, similar files will be produced with the algorithm "splicing-index" (replaces the filename ASPIRE above with splicing-index). These are similar format files with single junction and exon results (as opposed to reciprocal-junction pairs). These results allow users to examine the independent regulation of e exon junctions. File #1 reports gene expression values for each sample and group based on junctions present in your input BED files. The values are derived from junctions that align to regions of a gene that are common to all transcripts and thus are informative for transcription (unless the option "known junctions" is selected – see “Select expression analysis parameters”, above) and expressed above specified background levels (minimum group average read count). Along with the raw gene expression values (mean read counts), statistics for each indicated comparison (mean expression, folds, t-test p-values) will be included along with gene annotations from Ensembl, including putative microRNA binding sites. This file is analogous to the results file you would have with a typical microarray experiment and is saved to the folder “ExpressionOutput”. Results from files #2-5 are produced from all junctions that may suggest alternative splicing, alternative promoter regulation, or any other variation indicated by a reciprocal junction analysis for that gene. Each set of results correspond to a single pair-wise comparison (e.g., cancer vs. normal) and will be named with the group names you assigned (groups file). If analyzing a multiple groups, the two groups with the largest difference in reciprocal junction scores will be reported along with the conditions these occur in. File #2 reports reciprocal junctions that are alternatively expressed, based on the user defined ASPIRE or LinearRegression scores and p-values. For each reciprocal junction has several statistics, gene annotations and functional predictions provided. A detailed description of all of the columns in this file is provided here. File #3 is a summarization of reciprocal-junction results at the gene level from file #2. In addition to this summary, Gene Ontology terms and WikiPathways for that gene are reported. Files #4 and #5 report over-representation results for protein domains (or other protein features) and microRNA-binding sites, predicted to be regulated by AltAnalyze. These files include over-representation statistics and genes associated with the different domains or features¸ predicted to be regulated. File #6 compares the reciprocal-junction (e.g., ASPIRE) and exon-level results (splicing-index) to determine which splicing-events are indicated by multiple and independent lines of evidence. The direction of the fold change and algorithm detected by are indicated for each row. More information about these files can be found in the AltAnalyze ReadMe (section 2.3). Full Directory Tree of Output FilesAfter you run AltAnalyze, the following directory tree and set of files will be generated in the folder that you specified for output. For this example, the species is "species," and we assume there were two comparisons made, between groups A and B, and between groups B and C.
Visualizing AltAnalyze Results in DomainGraphThe text file results produced by AltAnalyze can be directly used as input in the protein domain and microRNA binding site visualization program, DomainGraph. DomainGraph is a plugin for the Java program Cytoscape which can be immediately opened from AltAnalyze. Rather than visualizing junctions, however, DomainGraph currently only supports exon visualization. RNASeq highlighted exons (identified directly from exons or by reciprocal exon-junctions) are associated with Affymetrix Exon 1.0 identifiers to be loaded in DomainGraph. Visualizing Over-Represented GO terms and PathwaysOnce over-represented pathways have been found or before doing this analysis, you can see which genes on which pathways are alternatively regulated in the program PathVisio or GenMAPP 2.1. PathVisio is a cross-platform analysis program, while GenMAPP is restricted to Windows. Both tools are easy use and have access to a large archive of curated pathways. An input file for either PathVisio or GenMAPP is found in the directory "ExpressionOutput" with the prefix "GenMAPP-". This text file will need to be imported into either program prior to building criterion for analysis. For making pathways, PathVisio or WikiPathways is recommended, since these resources produce superior pathway content (valid interactions between genes and metabolite IDs) in the same format (gpml). PathVisio can also export pathways to the GenMAPP format. A PathVisio tutorial can be found here, while a GenMAPP tutorial can be found here. |