RNA-Seq API

pyrpipe implements specialized classes for RNA-Seq processing. These classes are defined into different modules, each designed to capture steps integral to analysis of RNA-Seq data –from downloading raw data to trimming, alignment and assembly or quantification. Each of these module implements classes coressponding to a RNA-Seq tool. These classes extend the Runnable class. Specialized functions are implemented such that analysis of RNA-Seq data is intuitive and easy to code. The following table provide details about the pyrpipe’s RNA-Seq related modules.

Module Class Purpose
assembly Assembly Abstract class to represent Assebler type
assembly Stringtie API to Stringtie
assembly Cufflinks API to Cufflinks
mapping Aligner Abstract class for Aligner type
mapping Star API to Star
mapping Bowtie2 API to Bowtie2
mapping Hisat2 API to Hisat2
qc RNASeqQC Abstract class for RNASeqQC type (quality control and trimming)
qc Trimgalore API to Trim Galore
qc BBmap API to bbduk.sh
quant Quant Abstract class for Quantification type
quant Salmon API to Salmon
quant Kallisto API to Kallisto
sra SRA Class to represent RNA-Seq data and API to NCBI SRA-Tools
tools Samtools API to Samtools and other commonly used tools

The SRA class

The SRA class contained in the sra module captures RNA-Seq data. It can automatically download RNA-Seq data from the NCBI SRA servers via the prefetch command. The SRA constructor can take the SRR accession, path to fastq or sra file as arguments.

The main attributes and functions are defined the following table.

Attribute Description
fastq_path Path to the fastq file. If single end this is the only fastq file.
fastq2_path Path to the second fastq file for paired end data.
sra_path Path to the sra file
srr_accession The SRR accession for RNA-Seq run
layout RNA-Seq layout, auto determined by SRA class.
bam_path Path to bam file after running the align() function
gtf Path to the gtf file after running assemble() function
Function Description
__init__() This is the constructor. It can take SRR accession, path to fastq files, or sra file as input. If accession if provided as input the files are downloaded via prefetch if they aren’t preset on disk. It will automatically handle single-end and paired-end data.
download_sra() This function downloads the sra file via prefetch.
download_fastq() This function runs fasterq-dump on the sra file downloaded via prefetch.
sra_exists() Check if fastq sra files are present
fastq_exists() Check if fastq file exists
delete_sra() Delete the sra file
delete_fastq() Delete the fastq files
trim() This function takes a RNASeqQC type object and performs trimming. The trimmed fastq files are then stored in fastq_path and fastq2_path.
align() This function takes an Aligner type object and performs read alignemnt. The BAM file returned is stored in bam_path attribute.
assemble() This function takes an Assembly type object and performs transcript assembly. The result is soted on the SRA object as gtf attributes
quant() This function takes a Quantification type object and performs quant.

The RNASeqQC class

The RNASeqQC is an abstract class defined in the qc module. RNASeqQC class extends the Runnable class and thus has all the attributes as in the Runnable class. Classes Trimgalore and BBmap extends RNASeqQC class and share following attributes and functions.

Attribute Description
_category Represents the type: “RNASeqQC”
Function Description
__init__() The constructor function
perform_qc() Takes a SRA object, performs qc and returns path to resultant fastq files

The Aligner class

The Aligner is an abstract class defined in the mapping module. Aligner class extends the Runnable class and thus has all the attributes as in the Runnable class. Classes Star, Hisat2 and Bowtie2 extends the Aligner class and share following attributes and functions.

Attribute Description
_category Represents the type: “Aligner”
index Index used by the aligner tool
genome Reference genome used by the tool
Function Description
__init__() The constructor function
build_index() Build an index for the aligner tool using the genome.
check_index() Checks if the index is valid
perform_alignment() Takes a sra object, performs alignemnt and returns path to the bam file

The Assembly class

The Assembly is an abstract class defined in the assembly module. Assembly class extends the Runnable class and thus has all the attributes as in the Runnable class. Classes Stringtie and Cufflinks extends the Assembly class and share following attributes and functions.

Attribute Description
_category Represents the type: “Assembler”
Function Description
__init__() The constructor function
perform_assembly() Takes a SRA object, performs transcript assembly and returns path to resultant gtf/gff files

The Quant class

The Quant is an abstract class defined in the quant module. Quant class extends the Runnable class and thus has all the attributes as in the Runnable class. Classes Salmon and Kallisto extends the Quant class and share following attributes and functions.

Attribute Description
_category Represents the type: “Quantification”
index Index used by the aligner tool
transcriptome Reference transcriptome used by the tool
Function Description
__init__() The constructor function
build_index() Build an index for the quantification tool using the transcriptome.
check_index() Checks if the index is valid
perform_quant() Takes a sra object, performs quantification and returns path to the quantification results file