pyrpipe package

Submodules

pyrpipe.arg_parser module

Created on Fri Dec 18 18:17:58 2020

@author: usingh

pyrpipe.assembly module

Created on Mon Nov 25 15:21:01 2019

@author: usingh

class pyrpipe.assembly.Assembly(*args, **kwargs)

Bases: pyrpipe.runnable.Runnable

This class represents an abstract parent class for all programs which can perfrom transcripts assembly.

perform_assembly()

Function to perform assembly using a bam file as input. Inherited by all children. :param bam_file: path to input BAM :type bam_file: string

Returns:path to output GTF or output directory depending on the specific assembly program.
Return type:string

Bases: pyrpipe.assembly.Assembly

This class represents cufflinks

threads: int
Number of threads to use
perform_assembly(bam_file, out_dir=None, out_suffix='_cufflinks', objectid='NA')

Function to run cufflinks with BAM file as input.

bam_file: string
path to bam file
out_dir:
output directory
out_suffix: string
Suffix for the output gtf file
objectid: str
Provide an id to attach with this command e.g. the SRR accession.
Returns:Returns the path to output GTF file
Return type:string
class pyrpipe.assembly.Stringtie(*args, threads=None, guide=None, **kwargs)

Bases: pyrpipe.assembly.Assembly

This class represents Stringtie program for transcript assembly.

threads: int
number of threads
guide: str
Reference annotation gtf/gff to use as guide
perform_assembly(bam_file, out_dir=None, out_suffix='_stringtie', objectid='NA')

Function to run stringtie using a bam file.

bam_file: string
path to the bam file
out_dir: string
Path to out file
out_suffix: string
Suffix for the output gtf file
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the path to output GTF file
Return type:string

pyrpipe.benchmark module

Created on Sat Dec 21 16:58:13 2019

@author: usingh

class pyrpipe.benchmark.Benchmark(log_file, env_log, out_dir='')

Bases: object

Class to generate benchmark reports from pyrpipe logs.

log_file: string
path to the log file
env_log: string
path to the ENV log file
out_dir: string
path to the output directory
get_programtime_boxdata()

Return dataframe to make box plot of program times

get_time_perobject(func='sum')

Returns a dataframe containing total execution time for each object in a pyrpipe log. An object is identified by the objectid e.g. SRR accession.

get_time_perprogram()

Returns a dataframe with program execution times.

parse_logs()

Parse the input logs. For each command create a dict with runtimes as list and program name as key For each object id in input log create a dict. The followind dicts are created:

runtimes_by_prog contains runtimes for each program. program is the key and the runtimes are in a list in order as they apprear in the log file.

runtimes_by_object is nested a dict containing runtimes for each object by each program. e.g. {‘ob1’:{‘prog1’:[1,2,3],’prog2’:[1,2,3]}, ‘ob2’:{‘prog1’:[12,22,13],’prog2’:[1,2,3]} }

parse_runtime(timestring)

Parse runtime as string and return seconds.

Returns: float
runtime in sec
plot_time_perobject()

Function to plot charts summarizing runtimes for each object in the pipeline. The charts are save to the out_dir path.

plot_time_perprogram()

Function to plot charts to summarize runtimes of each program

pyrpipe.buildtools module

Created on Sun Jan 17 16:12:47 2021

@author: usingh

pyrpipe.buildtools.build_tools()

pyrpipe.mapping module

Created on Sun Nov 24 19:53:42 2019

@author: usingh contains classes of RNA-Seq alignment programs

class pyrpipe.mapping.Aligner(*args, index=None, genome=None, threads=None, **kwargs)

Bases: pyrpipe.runnable.Runnable

This is an abstract class for alignment programs.

build_index()

function to create an index used by the aligner

check_index()

Function to check if index of this object is valid and exists

perform_alignment(sra_object)

Function to perform alignment taking and sraobject as input

class pyrpipe.mapping.Bowtie2(*args, index=None, genome=None, threads=None, **kwargs)

Bases: pyrpipe.mapping.Aligner

This Bowtie2 aligner class. Extends Aligner class

build_index(index_path, genome, objectid='NA')

Build a bowtie2 index with given parameters and saves the new index to self.index.

index_path: string
Path where the index will be created
genome: string
Path to the reference genome
objectid : string
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the status of bowtie2-build
Return type:bool
check_index()

Function to check bowtie index. Returns True is index exist on disk.

perform_alignment(sra_object, out_suffix='_bowtie2', out_dir='', objectid='NA')

Function to perform alignment using sra_object.

sra_object SRA object
An object of type SRA. The path to fastq files will be obtained from this object.
out_suffix: string
Suffix for the output sam file
out_dir: string
Directory to save the results. Default value is sra_object.directory
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the sorted bam file path after converting sam to bam and sorting it
Return type:string
class pyrpipe.mapping.Hisat2(*args, index=None, genome=None, threads=None, **kwargs)

Bases: pyrpipe.mapping.Aligner

Extends Aligner class Attributes ———-

build_index(index_path, genome, objectid='NA')

Build a hisat index with given parameters and saves the new index to self.index.

index_path: string
Path where the index will be created
genome: string
Path to the reference genome
objectid : string
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the status of hisat2-build
Return type:bool
check_index()

Check self.index exists

bool
True if index exists.
perform_alignment(sra_object, out_suffix='_hisat2', out_dir='', objectid='NA')

Function to perform alignment using sra_object.

sra_object SRA object
An object of type SRA. The path to fastq files will be obtained from this object.
out_suffix: string
Suffix for the output sam file
out_dir: string
Directory to save the results. Default value is sra_object.directory
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the sorted bam file path after converting sam to bam and sorting it
Return type:string
class pyrpipe.mapping.Star(*args, index=None, genome=None, threads=None, **kwargs)

Bases: pyrpipe.mapping.Aligner

This class represents STAR program. Extends the Aligner class Attributes ———-

build_index(index_path, genome, objectid='NA')

Build a STAR index with given parameters and saves the new index to self.index.

index_path: string
Path where the index will be created
genome: string
Path to the reference genome
objectid : string
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the status of STAR-build index
Return type:bool
check_index()

Function to check if index of this object is valid and exists

perform_alignment(sra_object, out_suffix='_star', out_dir='', objectid='NA')

Function to perform STAR alignment using sra_object.

sra_object SRA object
An object of type SRA. The path to fastq files will be obtained from this object.
out_suffix: string
Suffix for the output sam file
out_dir: string
Directory to save the results. Default value is sra_object.directory
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the path to output bam
Return type:string

pyrpipe.param_loader module

Created on Mon Dec 7 14:37:07 2020

@author: usingh

class pyrpipe.param_loader.YAML_loader(file)

Bases: object

Load parameters from a yaml file

get_kwargs()
get_params()
parse_params()

store params as dict

pyrpipe.param_loader.add_bool(self, node)

pyrpipe.pyrpipe_engine module

Created on Tue Dec 10 16:26:02 2019

@author: usingh

Methods and classes related to execution and logging

pyrpipe.pyrpipe_engine.check_dependencies(dependencies)

Check whether specified programs exist in the environment. This uses the which command to test whether a program is present.

dependencies: list
list of programs to test
Returns:True is all dependencies are satified, False otherwise.
Return type:bool
pyrpipe.pyrpipe_engine.delete_file(file_path)

Delete a given file from disk Returns true if file is deleted or doesn’t exist. shell=True is not added to make this function secure.

file_path : String
Path to the file to be deleted
bool
True if file deleted.
pyrpipe.pyrpipe_engine.delete_files(*args)

Delete multiple files passed as argument. returns true is all files a re deleted

pyrpipe.pyrpipe_engine.dryable(func)

decorator function for drying all functions capable of executing commands

pyrpipe.pyrpipe_engine.execute_command(cmd, verbose=None, logs=None, objectid=None, command_name='')

Function to execute commands using popen. All commands executed by this function can be logged and saved to pyrpipe logs.

cmd: list
command to execute via popen in a list
verbose: bool
Whether to print stdout and stderr. Default: False. All stdout and stderr will be saved to logs regardless of this flag.
logs: bool
Log the execution
dryrun: bool
If True, perform a dry run i.e. print commands to screen and log and exit
objectid: string
An id to be attached with the command. This is useful fo storing logs for SRA objects where object id is the SRR id.
command_name: string
Name of command to be save in log. If empty it is determined as the first element of the cmd list.
Returns:Return status.True is returncode is 0
Return type:bool
pyrpipe.pyrpipe_engine.execute_commandRealtime(cmd)

Execute shell command and print stdout in realtime.

Example: for output in pe.execute_commandRealtime([‘ping’,’-c’,’4’,’google.com’]):

print (output)
pyrpipe.pyrpipe_engine.get_program_path(program)

Get path of installed program Returns the path as string

pyrpipe.pyrpipe_engine.get_program_version(programName)

Get version of installed program return version as string

pyrpipe.pyrpipe_engine.get_return_status(cmd)

run a shell command and get the return status

cmd: list
shell command in list
Returns:True is returncode is 0
Return type:bool
pyrpipe.pyrpipe_engine.get_shell_output(cmd, verbose=None)

Function to run a shell command and return returncode, stdout and stderr Currently (pyrpipe v 0.0.4) this function is called in get_return_status(), get_program_version()

pyrpipe v0.0.5 onwards the get_shell_output function allows shell=True

cdm: list or string
command to run
verbose: bool
to print messages
Returns:(returncode, stdout and stderr)
Return type:tuple: (int,str,str)
pyrpipe.pyrpipe_engine.is_paired(sra_file)

Function to test wheather a .sra file is paired or single.

sra_file (string) the path ro sra file

Returns:True is sra is paired
Return type:bool
pyrpipe.pyrpipe_engine.move_file(source, destination, verbose=False)

perform mv command to move a file from sourc to destination Returns True if move is successful

pyrpipe.pyrpipe_engine.parse_cmd(cmd)

This function converts a list to str. If a command is passed as list it is converted to str. pyrpipe v0.0.5 onwards the get_shell_output function uses shell=True

pyrpipe.pyrpipe_engine.skippable(func)

Skip a function execution in safemode

pyrpipe.pyrpipe_session module

Created on Tue Dec 10 13:40:01 2019

@author: usingh

pyrpipe.pyrpipe_session.getTimestamp(shorten=False)

Return timestamp YYYYMMDDHHMISE shorten: return shorter version without spaces.

pyrpipe.pyrpipe_session.restore_session(file)

Resore a session from file.

Returns:Returns True if session is restored
Return type:bool
pyrpipe.pyrpipe_session.save_session(filename, add_timestamp=True, out_dir='')

Save current workspace using dill. Returns True is save is successful

pyrpipe.pyrpipe_utils module

Created on Mon Oct 21 12:04:28 2019

@author: usingh

pyrpipe.pyrpipe_utils.byte_to_readable(size_bytes)

Function to convert bytes to human readable format (MB,GB …)

size_bytes: float
size in bytes
Returns:Return size in human readable format
Return type:str
pyrpipe.pyrpipe_utils.check_bowtie2index(index)

Function to check if bowtie2 index is valid and exists.

index: str
Path to the index
Returns:Return true if index is valid
Return type:bool
pyrpipe.pyrpipe_utils.check_files_exist(*args)

Function to check if files exist.

args: tuple
a list of paths to check
Returns:return true only if all files exist
Return type:bool
pyrpipe.pyrpipe_utils.check_hisatindex(index)

Function to check if hisat2 index is valid and exists.

index: str
Path to the index
Returns:Return true if index is valid
Return type:bool
pyrpipe.pyrpipe_utils.check_kallistoindex(index)

Function to check if kallisto index is valid and exists.

index: str
Path to the index
Returns:Return true if index is valid
Return type:bool
pyrpipe.pyrpipe_utils.check_paths_exist(*args)

Function to check if a directory exists.

args: tuple
a list of paths to check
Returns:return true only if all paths exist
Return type:bool
pyrpipe.pyrpipe_utils.check_salmonindex(index)

Function to check if salmon index is valid and exists.

index: str
Path to the index
Returns:Return true if index is valid
Return type:bool
pyrpipe.pyrpipe_utils.check_starindex(index)

Function to check if star index is valid and exists.

index: str
Path to the index
Returns:Return true if index is valid
Return type:bool
pyrpipe.pyrpipe_utils.find_files(search_path, search_pattern, recursive=False, verbose=False)

Example: find_files(path,”.fastq$”)

search_path : TYPE
DESCRIPTION.
search_pattern : TYPE
DESCRIPTION.
recursive : TYPE, optional
DESCRIPTION. The default is False.
verbose : TYPE, optional
DESCRIPTION. The default is False.
result : TYPE
DESCRIPTION.

find_files(‘path/to/directory’,’.*.txt$’,recursive=False,verbose=False)

pyrpipe.pyrpipe_utils.get_file_basename(file_path)

Returns file basename without extension

file_path: str
Path to file
Returns:file basename without extension
Return type:string
pyrpipe.pyrpipe_utils.get_file_directory(file_path)

Returns directory of a file

file_path: str
Path to file
Returns:directory os file_path
Return type:string
pyrpipe.pyrpipe_utils.get_file_size(file_path)

Returns file size in human readable format

file_path: str
Path to file
Returns:Return size in human readable format
Return type:string
pyrpipe.pyrpipe_utils.get_fileext(file_path)

Returns file extension

file_path: str
Path to file
Returns:file extension
Return type:string
pyrpipe.pyrpipe_utils.get_filename(file_path)

Returns filename with extension

file_path: str
Path to file
Returns:filename
Rtyep:string
pyrpipe.pyrpipe_utils.get_mdf(filename)

Compute and return md5checksum

filename : str
Path to input file
md5 : str
MD5 checksum value
pyrpipe.pyrpipe_utils.get_timestamp(shorten=False)

Function to return current timestamp.

shorten: bool
return short version without space, dash and colons
Returns:timestamp as string
Return type:string
pyrpipe.pyrpipe_utils.get_union(*args)

Return unioin of multiple input lists.

pyrpipe.pyrpipe_utils.mkdir(dir_path)

Create a directory

Returns:true is directory created
Return type:bool
pyrpipe.pyrpipe_utils.parse_java_args(valid_args_list, passed_args)

Function creates arguments to pass to java programs

valid_args_list: list
list of valid arguments. Invalid arguments will be ignored
passed_args: *dict
keyword value argument list to be parsed
Returns:a list with command line arguments to be used with subprocess.popen
Return type:list
>>> parse_java_args(['A','B','-C'], {"A": "3", "B": "22","-C":""})
    ['A=3', 'B=22', '-C']
pyrpipe.pyrpipe_utils.parse_unix_args(valid_args_list, passed_args)

Function creates command line arguments to pass to unix programs

valid_args_list: list
list of valid arguments. Invalid arguments will be ignored
passed_args: *dict
keyword value argument list to be parsed
Returns:a list with command line arguments to be used with subprocess.popen
Return type:list
>>> parse_unix_args(['-O','-t','-q'], {"-O": "./test", "Attr2": "XX","--":("IN1","IN2")})
    Unknown argument Attr2 XX. ignoring...
    ['-O', './test', 'IN1', 'IN2']
pyrpipe.pyrpipe_utils.print_blue(text)

Print in blue font

text: str
text to print

Returns: None

pyrpipe.pyrpipe_utils.print_boldred(text)

Print in bold red font

text: str
text to print

Returns: None

pyrpipe.pyrpipe_utils.print_error(*args)

Print message to stderr

pyrpipe.pyrpipe_utils.print_green(text)

Print in green font

text: str
text to print

Returns: None

pyrpipe.pyrpipe_utils.print_message(*args)

Print message to stderr

pyrpipe.pyrpipe_utils.print_notification(*args)

Print message to stderr

pyrpipe.pyrpipe_utils.print_success(*args)

Print message to stderr

pyrpipe.pyrpipe_utils.print_warning(*args)

Print message to stderr

pyrpipe.pyrpipe_utils.print_yellow(text)

Print in yellow font

text: str
text to print

Returns: None

pyrpipe.pyrpipe_utils.pyrpipe_print(color, *args, stderr=False, **kwargs)

pyrpipe.qc module

Created on Mon Nov 25 17:48:00 2019

@author: usingh

class pyrpipe.qc.BBmap(*args, threads=None, memory=None, **kwargs)

Bases: pyrpipe.qc.RNASeqQC

This class represents bbmap programs

perform_cleaning(sra_object, bbsplit_index, out_dir='', out_suffix='_bbsplit', objectid='NA', **kwargs)

Remove contaminated reads mapping to given reference using bbsplit

sra_object: SRA
an SRA object
bbsplit_index: string
Path to bbsplit index or fasta file which will generate index
out_dir: string
Path to output dir. Default: sra_object.directory
out_suffix: string
Suffix for output file name
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
kwargs: dict

options passed to bbsplit

return:Returns the path of fastq files after QC. tuple has one item for single end files and 2 for paired.
rtype:tuple
perform_qc(sra_object, out_dir='', out_suffix='_bbduk', objectid='NA')

Run bbduk on fastq files specified by the sra_object

sra_object: SRA
An SRA object whose fastq files will be used
out_dir: str
Path to output directory
out_suffix: string
Suffix for the output sam file
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the path of fastq files after QC. tuple has one item for single end files and 2 for paired.
Return type:tuple
run_bbsplit(objectid='NA', **kwargs)

wrapper to run bbsplit

Returns:Status of bbsplit command
Return type:bool
class pyrpipe.qc.RNASeqQC(*args, **kwargs)

Bases: pyrpipe.runnable.Runnable

This is an abstract parent class for fastq quality control programs.

perform_qc()

Perform qc on a SRA object

class pyrpipe.qc.Trimgalore(*args, threads=None, **kwargs)

Bases: pyrpipe.qc.RNASeqQC

This class represents trimgalore

args: tuple
arguments passed to trim_galore
threads: int
Num threads to use
kwargs:
trim_galore arguments.
perform_qc(sra_object, out_dir='', out_suffix='_trimgalore', objectid='NA')

Function to perform qc using trimgalore. The function perform_qc() is consistent for all QC classess.

sra_object: SRA
An SRA object whose fastq files will be used
out_dir: str
Path to output directory
out_suffix: string
Suffix for the output sam file
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Returns the path of fastq files after QC. tuple has one item for single end files and two for paired.
Return type:tuple

pyrpipe.quant module

Created on Thu Jan 2 18:20:41 2020

@author: usingh

class pyrpipe.quant.Kallisto(*args, index=None, transcriptome=None, threads=None, **kwargs)

Bases: pyrpipe.quant.Quant

This class represents kallisto

index: string
path to kallisto index
threads: int
num threads to use
build_index(index_path, transcriptome, objectid='NA')

Function to build kallisto index

index_path: str
path to the index
transcriptome: str
Path to transcriptome
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Status of kallisto index
Return type:bool
check_index()

Check valid kallisto index

perform_quant(sra_object, out_suffix='', out_dir='', objectid='NA')

Run kallisto quant

sra_object: SRA
SRA object contatining paths to fastq files
out_suffix: str
suffix for output file
out_dir: str
path to output directory
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Path to kallisto out directory
Return type:string
class pyrpipe.quant.Quant(*args, index=None, transcriptome=None, threads=None, **kwargs)

Bases: pyrpipe.runnable.Runnable

This is an abstract class for quantification programs.

build_index()

function to create an index used by the quantification program

check_index()

Function to check if index of this object is valid and exists

perform_quant(sra_object)

Function to perform quant taking and sraobject as input

class pyrpipe.quant.Salmon(*args, index=None, transcriptome=None, threads=None, **kwargs)

Bases: pyrpipe.quant.Quant

This class represents salmon

index: string
Path to salmon index
threads: int
Number of threads
build_index(index_path, transcriptome, objectid='NA')
index_path : TYPE
DESCRIPTION.
transcriptome : TYPE
DESCRIPTION.
objectid : TYPE, optional
DESCRIPTION. The default is “NA”.
OSError
DESCRIPTION.
bool
DESCRIPTION.
check_index()

Function to check if index of this object is valid and exists

perform_quant(sra_object, out_suffix='', out_dir='', objectid='NA')

run salmon quant sra_object: SRA

An SRA object with valid fastq files
out_suffix: str
suffix string fout out file
out_dir: str
path to outdir
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns:Path to salmon out file
Return type:string

pyrpipe.reports module

Created on Tue Dec 22 13:13:02 2020

@author: usingh

pyrpipe.reports.checkEnvLog(logFile)

Check log exist and return path to corresponding ENV log

logFile : str
path to log file.
envLog : TYPE
DESCRIPTION.
pyrpipe.reports.generateBashScript(logFile, outFile, filterList, coverage='a', verbose=True)

Write commands to a bash file

logFile : str
path to input pyrpipe log file
outFile : str
path to output file.
filterList : list
list of programs to ignore.
coverage : char, optional
type of commands passed or failed. The default is ‘a’.
verbose : boolean, optional
print messages. The default is True.

None.

pyrpipe.reports.generateBenchmarkReport(logFile, envLog, filterList, tempDir, outFile='', verbose=False)

ignores failed commands with exitcode !=0

pyrpipe.reports.generateEnvReportTable(sysInfo, progList)

create html table to list environment Parameters ———- sysInfo: dict

system information from env log file
progList: dict
programs and their information from env log
pyrpipe.reports.generateHTMLReport(templateFile, cmdLog, envLog, coverage='f')

Generates html report Parameters ———- templatefile: string

path to a template file
cmdlog: string
path to the log file
envlog: string
path to the env log file
coverage: string
tpye of report: full, summary, fail, pass
pyrpipe.reports.generate_multiqc(directory, tempDir, outDir='', coverage='a', verbose=False, cleanup=False)

Generate reports using multiqc

directory : str
path to directory containing logs.
tempDir : str
temp dir.
outDir : str, optional
output dir. The default is “”.
coverage : char, optional
commands to use in pyrpipe log: fa(i)led (p)assed or (a)ll. The default is ‘a’.
verbose : bool, optional
print messages. The default is False.
cleanup : bool, optional
remove temp files. The default is False.

None.

pyrpipe.reports.generate_multiqc_from_log(logFile, filterList, tempDir, outDir='', coverage='a', verbose=False, cleanup=False)
pyrpipe.reports.generate_summary(cmdLog, envLog, coverage='a')

Generates summary at the end of run. Simillar to generateHTMLReport Parameters ———- templatefile: string

path to a template file
cmdlog: string
path to the log file
envlog: string
path to the env log file
coverage: string
tpye of report: full, summary, fail, pass
pyrpipe.reports.getCommandsFromLog(inFile, filterList, coverage)

Get commands from a log

inFile : str
path to pyrpipe log file.
filterList : str
list of commands to ignore.
coverage : char
type of commands to report all, passed or failed: a,p, i.
commands : TYPE
DESCRIPTION.
pyrpipe.reports.getStdoutFromLog(inFile, filterList, coverage)

Return a dict with objid_program as key and stdout

pyrpipe.reports.parseEnvLog(envLog)

Parse env log file

envLog : str
Env lof file path.
sysInfo : dict
system information .
progList : dict
programs used.
pyrpipe.reports.writeHtml(htmlText, outFile)
pyrpipe.reports.writeHtmlToMarkdown(htmlText, outFile)
pyrpipe.reports.writeHtmlToPdf(htmlText, outFile)

pyrpipe.runnable module

Created on Sat Dec 12 13:55:13 2020

@author: usingh

class pyrpipe.runnable.Runnable(*args, command=None, yaml=None, style='LINUX', deps=None, valid_args=None, **kwargs)

Bases: object

The runnable class

_runnable : bool
Boolean indicating the runnable command
_command : str
Name of the unix command
_param_yaml : str
the name/path of .yaml file contating tool/command options (relative to the –param-dir)
_args_style : str
Style of commands: –key value(LINUX) or key=value(JAVA)
_valid_args : dict or list
A dict containing valid arguments for command subcommand, accessible as _valid_args[subcommand], or a list of valid options for command.
check_dependency(deps_list)

Check depndencies of a tool/command.

deps_list : List
List of command to check.
OSError
If a command is not found raise OSError.
bool
Returns true is all commands are found.
create_lock(target_list, message)

Cretes a temporary .Lock file associated with a target file and write a message in it.

target_list : List
List of target files.
message : Str
Message to write in file.
templist : List
A list of .Lock file names coressponding to the target files.
get_lock_files(target)

Returns .Lock files associated with a target

target : Str
Target file name.
lock_files : List
List of .Lock files present.
get_valid_parameters(subcommand)

Returns th evalid parameter list for a command subcommand after looking up the self_valid_args dictionary.

subcommand : Str
The subcommand name
List
Returns the list of valid options for the subcommand.
load_args(*args, **kwargs)

Initializes the args (positonial arguments) and kwargs (options) passed during object creation. These are sored in self._args and self._kwargs for all future references.

*args : tuple
Positional arguments
**kwargs : dict
the keyword arguments.

None.

load_yaml()

Loads a .yaml file containing tool options/parameters

None.

remove_locks(file_list)

Take a list of file names and removes them using os.remove

file_list : List
List of file names ending with .Lock.

None.

resolve_parameter(parameter_key, passed_value, default_value, parameter_variable)

Resolve a tool parameter by passing as an argument. For example if unix command orfipy take a parameter –procs <num_threads>. This can be converted to a python variable accessible as an attribute of Runnable class. resolve_parameter function will update self._kwargs if parameter_keys exists. Otherwise it will create the parameter_key in self._kwargs To do this call the function as: <Runnable obj>.resolve_parameter(”–procs”,<passed_value>,<default_value>,’_threads’) Now the <Runnable obj>._threads will point to “–procs” value

parameter_key : Str
The parameter/option name for the Unix command/tool e.g. –threads
passed_value : Str
The value supplied by user.
default_value : Str
Default value to use if no value is supplied
parameter_variable : Str
Name of a variable that will be stored in the Runnable class. e.g. threads

None.

run(*args, subcommand=None, target=None, requires=None, objectid=None, verbose=None, logs=None, **kwargs)
*args : Tuple
Positoinal arguments passed to a command. This will copmletely REPLACE the exsiting self._args created during initialization of the runnable object.
subcommand : String or List, optional
DESCRIPTION. subcommand passed to the command. The default is None.
target : Str or List of Str, optional
DESCRIPTION. The expected output/target files produced by the run operation. False is returned is all target files are not found after the command. The default is None.
requires : Str or List of Str, optional
DESCRIPTION. Files required to strat the run method. Exception is thrown if files are missing. The default is None.
objectid : Str, optional
DESCRIPTION. A uniq id to identify the run operation in the logs. Thi is useful for benchmarks. The default is None.
**kwargs : Keyword arguments
DESCRIPTION. The options to be passed to the command. This will OVERRIDE ANY EXISTING options in the self._kwargs created during initialization of the runnable object.
TypeError
If incorerct types are used for target and required.
FileNotFoundError
Raises FileNotFoundError if any of the required files are missing.
OSError
Raises OSError if the command is incorrect or not present in path.
ValueError
Raises ValueError if args_type is something other than LINUX or JAVA.
bool
Return the status of command as True or False. True implies command had 0 exit-code and all target files were found after the command finished.
verify_integrity(target, verbose=False)

Verify target file is present and is not LOCKED i.e. .Lock file is not present.

target : Str
The target file.
verbose : bool, optional
Print additional messages. The default is False.
bool
Return True is target is present and not Locked.
verify_target(target, verbose=False)

Verify a single target file is present

target : Str
target file name.
verbose : bool, optional
Print additional messages. The default is False.
bool
True is target file is presetn.
verify_target_list(target_list, verbose=False)

Verify a list of target files are present.

target_list : List
List of target files.
verbose : bool, optional
Print additional messages. The default is False.
bool
True is all targets are present.

pyrpipe.sra module

Created on Sat Nov 23 15:45:26 2019

@author: usingh

class pyrpipe.sra.SRA(srr_accession=None, directory=None, fastq=None, fastq2=None, sra=None)

Bases: object

This class represents an SRA object

srr_accession: string
A valid SRR accession
directory: string
Path where all data related to this object (e.g. .sra files, metadata, fastq files) will be stored. Default value of the path will be “./<SRR_accession>”. <SRR_accession> is added at the end of the path so that final directory is directory/<SRR_accession>. For consistency, directory and SRR Accession id are not allowed to be modified.
scan_path: string
If RNA-Seq data already exists locally, provide the scan path to scan a directory and create an SRA object.
align(mapping_object, **kwargs)
assemble(assembly_object, **kwargs)
delete_fastq()

Delte the fastq files from the disk. The files are referenced by self.fastq_path or self.fastq_path and self.fastq2_path

delete_sra()

Delete the downloaded SRA files.

download_fastq(*args, **kwargs)

Function to download fastq files

download_sra(**kwargs)

This function downloads .sra file from NCBI SRA servers using the prefetch command.

NCBI sra-toolkit 2.9 or higher must be installed on the system in order to use prefetch. prefetch will create a folder with name same as <srr_accession> under the directory (path) specified. The path of downloaded file is saved in the object as localSRAPath. This localSRAPath is then used by other functions to access the downloaded data. The **kwargs is for passing arguments to the prefetch command.

kwargs: dict
dict containing additional prefetch arguments
Returns:Return status of the prefetch command. True if successful download and False if failed.
Return type:bool
>>> object.download_sra()
True
fastq_exists()

Function to check if fastq file is present on disk

get_read_length(lines_to_examine=None)

Examine first lines_to_examine lines and return the mode of read lengths returns int

init_from_accession(srr_accession, directory)

Create SRA object using provided srr accession and directory, where data is downloaded/saved This functions inits srrid, and paths to srr/fastq if they already exist thus will not be downloaded again

init_object(srr_accession, directory, fastq, fastq2, sra)
quant(quant_object)
search_fastq(path)

Search .fastq file under a dir and create SRA object Return True if found otherwise False

sra_exists()

Function to check if sra file is present on disk

trim(qc_object, delete_original=False, **kwargs)

Function to perform quality control with specified qc object. A qc object refers to one of the RNA-Seq qc program like trim_galore oe bbduk. The qc_object should be initialized with all the parameters. By default the trimmed/qc fastq files will be generated in the same directory as the original fastq files. After QC, this SRA object will update the fastq_path or fastq_path and fastq2_path variables to store the new fastq files. New variables localRawfastqPath or rawfastq_path and rawfastq2_path will be created to store the paths of original fastq files.

qc_object: RNASeqQC object
qc_object specifying the program to be used. The object contains the necessary parametrs to execute the parameters
deleteRawFastq: bool
Delete the raw fastq files after QC
kwargs: dict
Arguments to pass on to perform_qc function
Returns:Return status of the QC. True if successful download and False if failed.
Return type:bool
>>> object.perform_qc(qc.BBmap())
True

pyrpipe.test_sratools module

Created on Sun Jan 17 15:24:42 2021

@author: usingh

pyrpipe.test_sratools.runtest()

pyrpipe.tools module

Created on Wed Dec 4 14:54:22 2019

@author: usingh

class pyrpipe.tools.RNASeqTools(*args, **kwargs)

Bases: pyrpipe.runnable.Runnable

class pyrpipe.tools.Samtools(*args, threads=None, **kwargs)

Bases: pyrpipe.tools.RNASeqTools

Class to access samtools args: tuple

arguments to samtools
threads: int
Number of threads samtools will use
kwargs: dict
Options to samtools
merge_bam(bam_list, out_file='merged', out_dir=None, delete_bams=False, objectid=None)

Merge multiple bam files into a single file

args: *args
Input bam files to merge
out_file: string
Output file name to save the results. .bam will be added at the end.
out_dir: string
Path where to save the merged bam file. Default path is the same as the first bam_file’s
threads: int
Number of threads. Default: Use self.threads initialized in init().
delete_bams: bool
Delete input bam files after merging.
verbose: bool
Print stdout and std error
quiet: bool
Print nothing
logs: bool
Log this command to pyrpipe logs
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
kwargs: dict
Options to pass to samtools. This will override the existing options
Returns:Returns the path to the merged bam file.
Return type:string
sam_sorted_bam(sam_file, out_dir=None, out_suffix=None, delete_sam=False, delete_bam=False, objectid=None)

Convert sam file to bam and sort the bam file. sam_file: str

Path to the input sam file
out_dir: str
Path to output directory
out_suffix: str
Output file suffix
threads: int
Number of threads. Default: Use self.threads initialized in init().
delete_sam: bool
Delete input sam_file
delete_bam: bool
Delete the intermediate unsorted bam_file
verbose: bool
Print stdout and std error
quiet: bool
Print nothing
logs: bool
Log this command to pyrpipe logs
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
kwargs: dict
Options to pass to samtools. This will override the existing options
Returns:Returns path to the sorted bam file. Returns empty string if operation failed.
Return type:string
sam_to_bam(sam_file, out_dir=None, out_suffix=None, delete_sam=False, objectid=None)

Convert sam file to a bam file. Output bam file will have same name as input sam. sam_file: string

Path to input Sam file
out_suffix: string
Suffix for the output sam file
threads: int
Number of threads. Default: Use self.threads initialized in init().
delete_sam: bool
delete the sam file after conversion
verbose: bool
Print stdout and std error
quiet: bool
Print nothing
logs: bool
Log this command to pyrpipe logs
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
kwargs: dict
Options to pass to samtools. This will override the existing options
Returns:Returns the path to the bam file. Returns empty string if operation failed.
Return type:string
sort_bam(bam_file, out_dir=None, out_suffix=None, delete_bam=False, objectid=None)

Sorts an input bam file. Outpufile will end in _sorted.bam bam_file: str

Path to the input bam file
out_dir: str
Path to output directory
out_suffix: str
Output file suffix
threads: int
Number of threads. Default: Use self.threads initialized in init().
delete_bam: bool
Delete input bam_file
verbose: bool
Print stdout and std error
quiet: bool
Print nothing
logs: bool
Log this command to pyrpipe logs
objectid: str
Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
kwargs: dict
Options to pass to samtools. This will override the existing options
Returns:Returns path to the sorted bam file. Returns empty string if operation failed.
Return type:string

pyrpipe.valid_args module

Created on Mon Dec 7 14:15:50 2020

@author: usingh

This module contains a list of valid arguments for the tools

pyrpipe.version module

Created on Tue Dec 10 11:21:20 2019

@author: usingh

Module contents

Created on Sat Nov 23 15:17:38 2019

@author: usingh

Read pyrpipe configuration

class pyrpipe.Conf

Bases: object

Read and store pyrpipe configuration

init_sys_args()
init_threads_mem()
class pyrpipe.LogFormatter

Bases: object

A formatter for logs

format(record)
class pyrpipe.PyrpipeLogger(name, logdir=None)

Bases: object

Class to manage pyrpipe logs

env_logger: logger to log the current environment cmd_logger: logger to log the execution status, stdout, stderr and runtimes for each command run using execute_command()

create_logger(name, logfile, formatter, level=10)

Creates a logger

name: str
name of logger
logfile: str
file name to save logs
formatter: formatter object
formatter for log
Returns: logger
A logger object
init_cmdlog()

init the cmdlog

init_envlog()

init the envlog

pyrpipe.goodbye()