pyrpipe package¶
Subpackages¶
Submodules¶
pyrpipe.assembly module¶
Created on Mon Nov 25 15:21:01 2019
@author: usingh
-
class
pyrpipe.assembly.
Assembly
(*args, **kwargs)¶ Bases:
pyrpipe.runnable.Runnable
This class represents an abstract parent class for all programs which can perfrom transcripts assembly.
-
perform_assembly
()¶ Function to perform assembly using a bam file as input. Inherited by all children. :param bam_file: path to input BAM :type bam_file: string
Returns: path to output GTF or output directory depending on the specific assembly program. Return type: string
-
-
class
pyrpipe.assembly.
Cufflinks
(*args, threads=None, guide=None, **kwargs)¶ Bases:
pyrpipe.assembly.Assembly
This class represents cufflinks
- threads: int
- Number of threads to use
-
perform_assembly
(bam_file, out_dir=None, out_suffix='_cufflinks', objectid='NA')¶ Function to run cufflinks with BAM file as input.
- bam_file: string
- path to bam file
- out_dir:
- output directory
- out_suffix: string
- Suffix for the output gtf file
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession.
Returns: Returns the path to output GTF file Return type: string
-
class
pyrpipe.assembly.
Stringtie
(*args, threads=None, guide=None, **kwargs)¶ Bases:
pyrpipe.assembly.Assembly
This class represents Stringtie program for transcript assembly.
- threads: int
- number of threads
- guide: str
- Reference annotation gtf/gff to use as guide
-
perform_assembly
(bam_file, out_dir=None, out_suffix='_stringtie', objectid='NA')¶ Function to run stringtie using a bam file.
- bam_file: string
- path to the bam file
- out_dir: string
- Path to out file
- out_suffix: string
- Suffix for the output gtf file
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the path to output GTF file Return type: string
pyrpipe.benchmark module¶
Created on Sat Dec 21 16:58:13 2019
@author: usingh
-
class
pyrpipe.benchmark.
Benchmark
(log_file, env_log, out_dir='')¶ Bases:
object
Class to generate benchmark reports from pyrpipe logs.
- log_file: string
- path to the log file
- env_log: string
- path to the ENV log file
- out_dir: string
- path to the output directory
-
get_programtime_boxdata
()¶ Return dataframe to make box plot of program times
-
get_time_perobject
(func='sum')¶ Returns a dataframe containing total execution time for each object in a pyrpipe log. An object is identified by the objectid e.g. SRR accession.
-
get_time_perprogram
()¶ Returns a dataframe with program execution times.
-
parse_logs
()¶ Parse the input logs. For each command create a dict with runtimes as list and program name as key For each object id in input log create a dict. The followind dicts are created:
runtimes_by_prog contains runtimes for each program. program is the key and the runtimes are in a list in order as they apprear in the log file.
runtimes_by_object is nested a dict containing runtimes for each object by each program. e.g. {‘ob1’:{‘prog1’:[1,2,3],’prog2’:[1,2,3]}, ‘ob2’:{‘prog1’:[12,22,13],’prog2’:[1,2,3]} }
-
parse_runtime
(timestring)¶ Parse runtime as string and return seconds.
- Returns: float
- runtime in sec
-
plot_time_perobject
()¶ Function to plot charts summarizing runtimes for each object in the pipeline. The charts are save to the out_dir path.
-
plot_time_perprogram
()¶ Function to plot charts to summarize runtimes of each program
pyrpipe.buildtools module¶
Created on Sun Jan 17 16:12:47 2021
@author: usingh
-
pyrpipe.buildtools.
build_tools
()¶
pyrpipe.mapping module¶
Created on Sun Nov 24 19:53:42 2019
@author: usingh contains classes of RNA-Seq alignment programs
-
class
pyrpipe.mapping.
Aligner
(*args, index=None, genome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.runnable.Runnable
This is an abstract class for alignment programs.
-
build_index
()¶ function to create an index used by the aligner
-
check_index
()¶ Function to check if index of this object is valid and exists
-
perform_alignment
(sra_object)¶ Function to perform alignment taking and sraobject as input
-
-
class
pyrpipe.mapping.
Bowtie2
(*args, index=None, genome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.mapping.Aligner
This Bowtie2 aligner class. Extends Aligner class
-
build_index
(index_path, genome, objectid='NA')¶ Build a bowtie2 index with given parameters and saves the new index to self.index.
- index_path: string
- Path where the index will be created
- genome: string
- Path to the reference genome
- objectid : string
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the status of bowtie2-build Return type: bool
-
check_index
()¶ Function to check bowtie index. Returns True is index exist on disk.
-
perform_alignment
(sra_object, out_suffix='_bowtie2', out_dir='', objectid='NA')¶ Function to perform alignment using sra_object.
- sra_object SRA object
- An object of type SRA. The path to fastq files will be obtained from this object.
- out_suffix: string
- Suffix for the output sam file
- out_dir: string
- Directory to save the results. Default value is sra_object.directory
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the sorted bam file path after converting sam to bam and sorting it Return type: string
-
-
class
pyrpipe.mapping.
Hisat2
(*args, index=None, genome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.mapping.Aligner
Extends Aligner class Attributes ———-
-
build_index
(index_path, genome, objectid='NA')¶ Build a hisat index with given parameters and saves the new index to self.index.
- index_path: string
- Path where the index will be created
- genome: string
- Path to the reference genome
- objectid : string
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the status of hisat2-build Return type: bool
-
check_index
()¶ Check self.index exists
- bool
- True if index exists.
-
perform_alignment
(sra_object, out_suffix='_hisat2', out_dir='', objectid='NA')¶ Function to perform alignment using sra_object.
- sra_object SRA object
- An object of type SRA. The path to fastq files will be obtained from this object.
- out_suffix: string
- Suffix for the output sam file
- out_dir: string
- Directory to save the results. Default value is sra_object.directory
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the sorted bam file path after converting sam to bam and sorting it Return type: string
-
-
class
pyrpipe.mapping.
Star
(*args, index=None, genome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.mapping.Aligner
This class represents STAR program. Extends the Aligner class Attributes ———-
-
build_index
(index_path, genome, objectid='NA')¶ Build a STAR index with given parameters and saves the new index to self.index.
- index_path: string
- Path where the index will be created
- genome: string
- Path to the reference genome
- objectid : string
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the status of STAR-build index Return type: bool
-
check_index
()¶ Function to check if index of this object is valid and exists
-
perform_alignment
(sra_object, out_suffix='_star', out_dir='', objectid='NA')¶ Function to perform STAR alignment using sra_object.
- sra_object SRA object
- An object of type SRA. The path to fastq files will be obtained from this object.
- out_suffix: string
- Suffix for the output sam file
- out_dir: string
- Directory to save the results. Default value is sra_object.directory
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the path to output bam Return type: string
-
pyrpipe.param_loader module¶
Created on Mon Dec 7 14:37:07 2020
@author: usingh
-
class
pyrpipe.param_loader.
YAML_loader
(file)¶ Bases:
object
Load parameters from a yaml file
-
get_kwargs
()¶
-
get_params
()¶
-
parse_params
()¶ store params as dict
-
-
pyrpipe.param_loader.
add_bool
(self, node)¶
pyrpipe.pyrpipe_engine module¶
Created on Tue Dec 10 16:26:02 2019
@author: usingh
Methods and classes related to execution and logging
-
pyrpipe.pyrpipe_engine.
check_dependencies
(dependencies)¶ Check whether specified programs exist in the environment. This uses the which command to test whether a program is present.
- dependencies: list
- list of programs to test
Returns: True is all dependencies are satified, False otherwise. Return type: bool
-
pyrpipe.pyrpipe_engine.
delete_file
(file_path)¶ Delete a given file from disk Returns true if file is deleted or doesn’t exist. shell=True is not added to make this function secure.
- file_path : String
- Path to the file to be deleted
- bool
- True if file deleted.
-
pyrpipe.pyrpipe_engine.
delete_files
(*args)¶ Delete multiple files passed as argument. returns true is all files a re deleted
-
pyrpipe.pyrpipe_engine.
dryable
(func)¶ decorator function for drying all functions capable of executing commands
-
pyrpipe.pyrpipe_engine.
execute_command
(cmd, verbose=None, logs=None, objectid=None, command_name='')¶ Function to execute commands using popen. All commands executed by this function can be logged and saved to pyrpipe logs.
- cmd: list
- command to execute via popen in a list
- verbose: bool
- Whether to print stdout and stderr. Default: False. All stdout and stderr will be saved to logs regardless of this flag.
- logs: bool
- Log the execution
- dryrun: bool
- If True, perform a dry run i.e. print commands to screen and log and exit
- objectid: string
- An id to be attached with the command. This is useful fo storing logs for SRA objects where object id is the SRR id.
- command_name: string
- Name of command to be save in log. If empty it is determined as the first element of the cmd list.
Returns: Return status.True is returncode is 0 Return type: bool
-
pyrpipe.pyrpipe_engine.
execute_commandRealtime
(cmd)¶ Execute shell command and print stdout in realtime.
Example: for output in pe.execute_commandRealtime([‘ping’,’-c’,’4’,’google.com’]):
print (output)
-
pyrpipe.pyrpipe_engine.
get_program_path
(program)¶ Get path of installed program Returns the path as string
-
pyrpipe.pyrpipe_engine.
get_program_version
(programName)¶ Get version of installed program return version as string
-
pyrpipe.pyrpipe_engine.
get_return_status
(cmd)¶ run a shell command and get the return status
- cmd: list
- shell command in list
Returns: True is returncode is 0 Return type: bool
-
pyrpipe.pyrpipe_engine.
get_shell_output
(cmd, verbose=None)¶ Function to run a shell command and return returncode, stdout and stderr Currently (pyrpipe v 0.0.4) this function is called in get_return_status(), get_program_version()
pyrpipe v0.0.5 onwards the get_shell_output function allows shell=True
- cdm: list or string
- command to run
- verbose: bool
- to print messages
Returns: (returncode, stdout and stderr) Return type: tuple: (int,str,str)
-
pyrpipe.pyrpipe_engine.
is_paired
(sra_file)¶ Function to test wheather a .sra file is paired or single.
sra_file (string) the path ro sra file
Returns: True is sra is paired Return type: bool
-
pyrpipe.pyrpipe_engine.
move_file
(source, destination, verbose=False)¶ perform mv command to move a file from sourc to destination Returns True if move is successful
-
pyrpipe.pyrpipe_engine.
parse_cmd
(cmd)¶ This function converts a list to str. If a command is passed as list it is converted to str. pyrpipe v0.0.5 onwards the get_shell_output function uses shell=True
-
pyrpipe.pyrpipe_engine.
skippable
(func)¶ Skip a function execution in safemode
pyrpipe.pyrpipe_session module¶
Created on Tue Dec 10 13:40:01 2019
@author: usingh
-
pyrpipe.pyrpipe_session.
getTimestamp
(shorten=False)¶ Return timestamp YYYYMMDDHHMISE shorten: return shorter version without spaces.
-
pyrpipe.pyrpipe_session.
restore_session
(file)¶ Resore a session from file.
Returns: Returns True if session is restored Return type: bool
-
pyrpipe.pyrpipe_session.
save_session
(filename, add_timestamp=True, out_dir='')¶ Save current workspace using dill. Returns True is save is successful
pyrpipe.pyrpipe_utils module¶
Created on Mon Oct 21 12:04:28 2019
@author: usingh
-
pyrpipe.pyrpipe_utils.
byte_to_readable
(size_bytes)¶ Function to convert bytes to human readable format (MB,GB …)
- size_bytes: float
- size in bytes
Returns: Return size in human readable format Return type: str
-
pyrpipe.pyrpipe_utils.
check_bowtie2index
(index)¶ Function to check if bowtie2 index is valid and exists.
- index: str
- Path to the index
Returns: Return true if index is valid Return type: bool
-
pyrpipe.pyrpipe_utils.
check_files_exist
(*args)¶ Function to check if files exist.
- args: tuple
- a list of paths to check
Returns: return true only if all files exist Return type: bool
-
pyrpipe.pyrpipe_utils.
check_hisatindex
(index)¶ Function to check if hisat2 index is valid and exists.
- index: str
- Path to the index
Returns: Return true if index is valid Return type: bool
-
pyrpipe.pyrpipe_utils.
check_kallistoindex
(index)¶ Function to check if kallisto index is valid and exists.
- index: str
- Path to the index
Returns: Return true if index is valid Return type: bool
-
pyrpipe.pyrpipe_utils.
check_paths_exist
(*args)¶ Function to check if a directory exists.
- args: tuple
- a list of paths to check
Returns: return true only if all paths exist Return type: bool
-
pyrpipe.pyrpipe_utils.
check_salmonindex
(index)¶ Function to check if salmon index is valid and exists.
- index: str
- Path to the index
Returns: Return true if index is valid Return type: bool
-
pyrpipe.pyrpipe_utils.
check_starindex
(index)¶ Function to check if star index is valid and exists.
- index: str
- Path to the index
Returns: Return true if index is valid Return type: bool
-
pyrpipe.pyrpipe_utils.
find_files
(search_path, search_pattern, recursive=False, verbose=False)¶ Example: find_files(path,”.fastq$”)
- search_path : TYPE
- DESCRIPTION.
- search_pattern : TYPE
- DESCRIPTION.
- recursive : TYPE, optional
- DESCRIPTION. The default is False.
- verbose : TYPE, optional
- DESCRIPTION. The default is False.
- result : TYPE
- DESCRIPTION.
find_files(‘path/to/directory’,’.*.txt$’,recursive=False,verbose=False)
-
pyrpipe.pyrpipe_utils.
get_file_basename
(file_path)¶ Returns file basename without extension
- file_path: str
- Path to file
Returns: file basename without extension Return type: string
-
pyrpipe.pyrpipe_utils.
get_file_directory
(file_path)¶ Returns directory of a file
- file_path: str
- Path to file
Returns: directory os file_path Return type: string
-
pyrpipe.pyrpipe_utils.
get_file_size
(file_path)¶ Returns file size in human readable format
- file_path: str
- Path to file
Returns: Return size in human readable format Return type: string
-
pyrpipe.pyrpipe_utils.
get_fileext
(file_path)¶ Returns file extension
- file_path: str
- Path to file
Returns: file extension Return type: string
-
pyrpipe.pyrpipe_utils.
get_filename
(file_path)¶ Returns filename with extension
- file_path: str
- Path to file
Returns: filename Rtyep: string
-
pyrpipe.pyrpipe_utils.
get_mdf
(filename)¶ Compute and return md5checksum
- filename : str
- Path to input file
- md5 : str
- MD5 checksum value
-
pyrpipe.pyrpipe_utils.
get_timestamp
(shorten=False)¶ Function to return current timestamp.
- shorten: bool
- return short version without space, dash and colons
Returns: timestamp as string Return type: string
-
pyrpipe.pyrpipe_utils.
get_union
(*args)¶ Return unioin of multiple input lists.
-
pyrpipe.pyrpipe_utils.
mkdir
(dir_path)¶ Create a directory
Returns: true is directory created Return type: bool
-
pyrpipe.pyrpipe_utils.
parse_java_args
(valid_args_list, passed_args)¶ Function creates arguments to pass to java programs
- valid_args_list: list
- list of valid arguments. Invalid arguments will be ignored
- passed_args: *dict
- keyword value argument list to be parsed
Returns: a list with command line arguments to be used with subprocess.popen Return type: list >>> parse_java_args(['A','B','-C'], {"A": "3", "B": "22","-C":""}) ['A=3', 'B=22', '-C']
-
pyrpipe.pyrpipe_utils.
parse_unix_args
(valid_args_list, passed_args)¶ Function creates command line arguments to pass to unix programs
- valid_args_list: list
- list of valid arguments. Invalid arguments will be ignored
- passed_args: *dict
- keyword value argument list to be parsed
Returns: a list with command line arguments to be used with subprocess.popen Return type: list >>> parse_unix_args(['-O','-t','-q'], {"-O": "./test", "Attr2": "XX","--":("IN1","IN2")}) Unknown argument Attr2 XX. ignoring... ['-O', './test', 'IN1', 'IN2']
-
pyrpipe.pyrpipe_utils.
print_blue
(text)¶ Print in blue font
- text: str
- text to print
Returns: None
-
pyrpipe.pyrpipe_utils.
print_boldred
(text)¶ Print in bold red font
- text: str
- text to print
Returns: None
-
pyrpipe.pyrpipe_utils.
print_error
(*args)¶ Print message to stderr
-
pyrpipe.pyrpipe_utils.
print_green
(text)¶ Print in green font
- text: str
- text to print
Returns: None
-
pyrpipe.pyrpipe_utils.
print_message
(*args)¶ Print message to stderr
-
pyrpipe.pyrpipe_utils.
print_notification
(*args)¶ Print message to stderr
-
pyrpipe.pyrpipe_utils.
print_success
(*args)¶ Print message to stderr
-
pyrpipe.pyrpipe_utils.
print_warning
(*args)¶ Print message to stderr
-
pyrpipe.pyrpipe_utils.
print_yellow
(text)¶ Print in yellow font
- text: str
- text to print
Returns: None
-
pyrpipe.pyrpipe_utils.
pyrpipe_print
(color, *args, stderr=False, **kwargs)¶
pyrpipe.qc module¶
Created on Mon Nov 25 17:48:00 2019
@author: usingh
-
class
pyrpipe.qc.
BBmap
(*args, threads=None, memory=None, **kwargs)¶ Bases:
pyrpipe.qc.RNASeqQC
This class represents bbmap programs
-
perform_cleaning
(sra_object, bbsplit_index, out_dir='', out_suffix='_bbsplit', objectid='NA', **kwargs)¶ Remove contaminated reads mapping to given reference using bbsplit
- sra_object: SRA
- an SRA object
- bbsplit_index: string
- Path to bbsplit index or fasta file which will generate index
- out_dir: string
- Path to output dir. Default: sra_object.directory
- out_suffix: string
- Suffix for output file name
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
- kwargs: dict
options passed to bbsplit
return: Returns the path of fastq files after QC. tuple has one item for single end files and 2 for paired. rtype: tuple
-
perform_qc
(sra_object, out_dir='', out_suffix='_bbduk', objectid='NA')¶ Run bbduk on fastq files specified by the sra_object
- sra_object: SRA
- An SRA object whose fastq files will be used
- out_dir: str
- Path to output directory
- out_suffix: string
- Suffix for the output sam file
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the path of fastq files after QC. tuple has one item for single end files and 2 for paired. Return type: tuple
-
run_bbsplit
(objectid='NA', **kwargs)¶ wrapper to run bbsplit
Returns: Status of bbsplit command Return type: bool
-
-
class
pyrpipe.qc.
RNASeqQC
(*args, **kwargs)¶ Bases:
pyrpipe.runnable.Runnable
This is an abstract parent class for fastq quality control programs.
-
perform_qc
()¶ Perform qc on a SRA object
-
-
class
pyrpipe.qc.
Trimgalore
(*args, threads=None, **kwargs)¶ Bases:
pyrpipe.qc.RNASeqQC
This class represents trimgalore
- args: tuple
- arguments passed to trim_galore
- threads: int
- Num threads to use
- kwargs:
- trim_galore arguments.
-
perform_qc
(sra_object, out_dir='', out_suffix='_trimgalore', objectid='NA')¶ Function to perform qc using trimgalore. The function perform_qc() is consistent for all QC classess.
- sra_object: SRA
- An SRA object whose fastq files will be used
- out_dir: str
- Path to output directory
- out_suffix: string
- Suffix for the output sam file
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Returns the path of fastq files after QC. tuple has one item for single end files and two for paired. Return type: tuple
pyrpipe.quant module¶
Created on Thu Jan 2 18:20:41 2020
@author: usingh
-
class
pyrpipe.quant.
Kallisto
(*args, index=None, transcriptome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.quant.Quant
This class represents kallisto
- index: string
- path to kallisto index
- threads: int
- num threads to use
-
build_index
(index_path, transcriptome, objectid='NA')¶ Function to build kallisto index
- index_path: str
- path to the index
- transcriptome: str
- Path to transcriptome
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Status of kallisto index Return type: bool
-
check_index
()¶ Check valid kallisto index
-
perform_quant
(sra_object, out_suffix='', out_dir='', objectid='NA')¶ Run kallisto quant
- sra_object: SRA
- SRA object contatining paths to fastq files
- out_suffix: str
- suffix for output file
- out_dir: str
- path to output directory
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Path to kallisto out directory Return type: string
-
class
pyrpipe.quant.
Quant
(*args, index=None, transcriptome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.runnable.Runnable
This is an abstract class for quantification programs.
-
build_index
()¶ function to create an index used by the quantification program
-
check_index
()¶ Function to check if index of this object is valid and exists
-
perform_quant
(sra_object)¶ Function to perform quant taking and sraobject as input
-
-
class
pyrpipe.quant.
Salmon
(*args, index=None, transcriptome=None, threads=None, **kwargs)¶ Bases:
pyrpipe.quant.Quant
This class represents salmon
- index: string
- Path to salmon index
- threads: int
- Number of threads
-
build_index
(index_path, transcriptome, objectid='NA')¶ - index_path : TYPE
- DESCRIPTION.
- transcriptome : TYPE
- DESCRIPTION.
- objectid : TYPE, optional
- DESCRIPTION. The default is “NA”.
- OSError
- DESCRIPTION.
- bool
- DESCRIPTION.
-
check_index
()¶ Function to check if index of this object is valid and exists
-
perform_quant
(sra_object, out_suffix='', out_dir='', objectid='NA')¶ run salmon quant sra_object: SRA
An SRA object with valid fastq files- out_suffix: str
- suffix string fout out file
- out_dir: str
- path to outdir
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
Returns: Path to salmon out file Return type: string
pyrpipe.reports module¶
Created on Tue Dec 22 13:13:02 2020
@author: usingh
-
pyrpipe.reports.
checkEnvLog
(logFile)¶ Check log exist and return path to corresponding ENV log
- logFile : str
- path to log file.
- envLog : TYPE
- DESCRIPTION.
-
pyrpipe.reports.
generateBashScript
(logFile, outFile, filterList, coverage='a', verbose=True)¶ Write commands to a bash file
- logFile : str
- path to input pyrpipe log file
- outFile : str
- path to output file.
- filterList : list
- list of programs to ignore.
- coverage : char, optional
- type of commands passed or failed. The default is ‘a’.
- verbose : boolean, optional
- print messages. The default is True.
None.
-
pyrpipe.reports.
generateBenchmarkReport
(logFile, envLog, filterList, tempDir, outFile='', verbose=False)¶ ignores failed commands with exitcode !=0
-
pyrpipe.reports.
generateEnvReportTable
(sysInfo, progList)¶ create html table to list environment Parameters ———- sysInfo: dict
system information from env log file- progList: dict
- programs and their information from env log
-
pyrpipe.reports.
generateHTMLReport
(templateFile, cmdLog, envLog, coverage='f')¶ Generates html report Parameters ———- templatefile: string
path to a template file- cmdlog: string
- path to the log file
- envlog: string
- path to the env log file
- coverage: string
- tpye of report: full, summary, fail, pass
-
pyrpipe.reports.
generate_multiqc
(directory, tempDir, outDir='', coverage='a', verbose=False, cleanup=False)¶ Generate reports using multiqc
- directory : str
- path to directory containing logs.
- tempDir : str
- temp dir.
- outDir : str, optional
- output dir. The default is “”.
- coverage : char, optional
- commands to use in pyrpipe log: fa(i)led (p)assed or (a)ll. The default is ‘a’.
- verbose : bool, optional
- print messages. The default is False.
- cleanup : bool, optional
- remove temp files. The default is False.
None.
-
pyrpipe.reports.
generate_multiqc_from_log
(logFile, filterList, tempDir, outDir='', coverage='a', verbose=False, cleanup=False)¶
-
pyrpipe.reports.
generate_summary
(cmdLog, envLog, coverage='a')¶ Generates summary at the end of run. Simillar to generateHTMLReport Parameters ———- templatefile: string
path to a template file- cmdlog: string
- path to the log file
- envlog: string
- path to the env log file
- coverage: string
- tpye of report: full, summary, fail, pass
-
pyrpipe.reports.
getCommandsFromLog
(inFile, filterList, coverage)¶ Get commands from a log
- inFile : str
- path to pyrpipe log file.
- filterList : str
- list of commands to ignore.
- coverage : char
- type of commands to report all, passed or failed: a,p, i.
- commands : TYPE
- DESCRIPTION.
-
pyrpipe.reports.
getStdoutFromLog
(inFile, filterList, coverage)¶ Return a dict with objid_program as key and stdout
-
pyrpipe.reports.
parseEnvLog
(envLog)¶ Parse env log file
- envLog : str
- Env lof file path.
- sysInfo : dict
- system information .
- progList : dict
- programs used.
-
pyrpipe.reports.
writeHtml
(htmlText, outFile)¶
-
pyrpipe.reports.
writeHtmlToMarkdown
(htmlText, outFile)¶
-
pyrpipe.reports.
writeHtmlToPdf
(htmlText, outFile)¶
pyrpipe.runnable module¶
Created on Sat Dec 12 13:55:13 2020
@author: usingh
-
class
pyrpipe.runnable.
Runnable
(*args, command=None, yaml=None, style='LINUX', deps=None, valid_args=None, **kwargs)¶ Bases:
object
The runnable class
- _runnable : bool
- Boolean indicating the runnable command
- _command : str
- Name of the unix command
- _param_yaml : str
- the name/path of .yaml file contating tool/command options (relative to the –param-dir)
- _args_style : str
- Style of commands: –key value(LINUX) or key=value(JAVA)
- _valid_args : dict or list
- A dict containing valid arguments for command subcommand, accessible as _valid_args[subcommand], or a list of valid options for command.
-
check_dependency
(deps_list)¶ Check depndencies of a tool/command.
- deps_list : List
- List of command to check.
- OSError
- If a command is not found raise OSError.
- bool
- Returns true is all commands are found.
-
create_lock
(target_list, message)¶ Cretes a temporary .Lock file associated with a target file and write a message in it.
- target_list : List
- List of target files.
- message : Str
- Message to write in file.
- templist : List
- A list of .Lock file names coressponding to the target files.
-
get_lock_files
(target)¶ Returns .Lock files associated with a target
- target : Str
- Target file name.
- lock_files : List
- List of .Lock files present.
-
get_valid_parameters
(subcommand)¶ Returns th evalid parameter list for a command subcommand after looking up the self_valid_args dictionary.
- subcommand : Str
- The subcommand name
- List
- Returns the list of valid options for the subcommand.
-
load_args
(*args, **kwargs)¶ Initializes the args (positonial arguments) and kwargs (options) passed during object creation. These are sored in self._args and self._kwargs for all future references.
None.
-
load_yaml
()¶ Loads a .yaml file containing tool options/parameters
None.
-
remove_locks
(file_list)¶ Take a list of file names and removes them using os.remove
- file_list : List
- List of file names ending with .Lock.
None.
-
resolve_parameter
(parameter_key, passed_value, default_value, parameter_variable)¶ Resolve a tool parameter by passing as an argument. For example if unix command orfipy take a parameter –procs <num_threads>. This can be converted to a python variable accessible as an attribute of Runnable class. resolve_parameter function will update self._kwargs if parameter_keys exists. Otherwise it will create the parameter_key in self._kwargs To do this call the function as: <Runnable obj>.resolve_parameter(”–procs”,<passed_value>,<default_value>,’_threads’) Now the <Runnable obj>._threads will point to “–procs” value
- parameter_key : Str
- The parameter/option name for the Unix command/tool e.g. –threads
- passed_value : Str
- The value supplied by user.
- default_value : Str
- Default value to use if no value is supplied
- parameter_variable : Str
- Name of a variable that will be stored in the Runnable class. e.g. threads
None.
-
run
(*args, subcommand=None, target=None, requires=None, objectid=None, verbose=None, logs=None, **kwargs)¶ - *args : Tuple
- Positoinal arguments passed to a command. This will copmletely REPLACE the exsiting self._args created during initialization of the runnable object.
- subcommand : String or List, optional
- DESCRIPTION. subcommand passed to the command. The default is None.
- target : Str or List of Str, optional
- DESCRIPTION. The expected output/target files produced by the run operation. False is returned is all target files are not found after the command. The default is None.
- requires : Str or List of Str, optional
- DESCRIPTION. Files required to strat the run method. Exception is thrown if files are missing. The default is None.
- objectid : Str, optional
- DESCRIPTION. A uniq id to identify the run operation in the logs. Thi is useful for benchmarks. The default is None.
- **kwargs : Keyword arguments
- DESCRIPTION. The options to be passed to the command. This will OVERRIDE ANY EXISTING options in the self._kwargs created during initialization of the runnable object.
- TypeError
- If incorerct types are used for target and required.
- FileNotFoundError
- Raises FileNotFoundError if any of the required files are missing.
- OSError
- Raises OSError if the command is incorrect or not present in path.
- ValueError
- Raises ValueError if args_type is something other than LINUX or JAVA.
- bool
- Return the status of command as True or False. True implies command had 0 exit-code and all target files were found after the command finished.
-
verify_integrity
(target, verbose=False)¶ Verify target file is present and is not LOCKED i.e. .Lock file is not present.
- target : Str
- The target file.
- verbose : bool, optional
- Print additional messages. The default is False.
- bool
- Return True is target is present and not Locked.
-
verify_target
(target, verbose=False)¶ Verify a single target file is present
- target : Str
- target file name.
- verbose : bool, optional
- Print additional messages. The default is False.
- bool
- True is target file is presetn.
-
verify_target_list
(target_list, verbose=False)¶ Verify a list of target files are present.
- target_list : List
- List of target files.
- verbose : bool, optional
- Print additional messages. The default is False.
- bool
- True is all targets are present.
pyrpipe.sra module¶
Created on Sat Nov 23 15:45:26 2019
@author: usingh
-
class
pyrpipe.sra.
SRA
(srr_accession=None, directory=None, fastq=None, fastq2=None, sra=None)¶ Bases:
object
This class represents an SRA object
- srr_accession: string
- A valid SRR accession
- directory: string
- Path where all data related to this object (e.g. .sra files, metadata, fastq files) will be stored. Default value of the path will be “./<SRR_accession>”. <SRR_accession> is added at the end of the path so that final directory is directory/<SRR_accession>. For consistency, directory and SRR Accession id are not allowed to be modified.
- scan_path: string
- If RNA-Seq data already exists locally, provide the scan path to scan a directory and create an SRA object.
-
align
(mapping_object, **kwargs)¶
-
assemble
(assembly_object, **kwargs)¶
-
delete_fastq
()¶ Delte the fastq files from the disk. The files are referenced by self.fastq_path or self.fastq_path and self.fastq2_path
-
delete_sra
()¶ Delete the downloaded SRA files.
-
download_fastq
(*args, **kwargs)¶ Function to download fastq files
-
download_sra
(**kwargs)¶ This function downloads .sra file from NCBI SRA servers using the prefetch command.
NCBI sra-toolkit 2.9 or higher must be installed on the system in order to use prefetch. prefetch will create a folder with name same as <srr_accession> under the directory (path) specified. The path of downloaded file is saved in the object as localSRAPath. This localSRAPath is then used by other functions to access the downloaded data. The **kwargs is for passing arguments to the prefetch command.
- kwargs: dict
- dict containing additional prefetch arguments
Returns: Return status of the prefetch command. True if successful download and False if failed. Return type: bool >>> object.download_sra() True
-
fastq_exists
()¶ Function to check if fastq file is present on disk
-
get_read_length
(lines_to_examine=None)¶ Examine first lines_to_examine lines and return the mode of read lengths returns int
-
init_from_accession
(srr_accession, directory)¶ Create SRA object using provided srr accession and directory, where data is downloaded/saved This functions inits srrid, and paths to srr/fastq if they already exist thus will not be downloaded again
-
init_object
(srr_accession, directory, fastq, fastq2, sra)¶
-
quant
(quant_object)¶
-
search_fastq
(path)¶ Search .fastq file under a dir and create SRA object Return True if found otherwise False
-
sra_exists
()¶ Function to check if sra file is present on disk
-
trim
(qc_object, delete_original=False, **kwargs)¶ Function to perform quality control with specified qc object. A qc object refers to one of the RNA-Seq qc program like trim_galore oe bbduk. The qc_object should be initialized with all the parameters. By default the trimmed/qc fastq files will be generated in the same directory as the original fastq files. After QC, this SRA object will update the fastq_path or fastq_path and fastq2_path variables to store the new fastq files. New variables localRawfastqPath or rawfastq_path and rawfastq2_path will be created to store the paths of original fastq files.
- qc_object: RNASeqQC object
- qc_object specifying the program to be used. The object contains the necessary parametrs to execute the parameters
- deleteRawFastq: bool
- Delete the raw fastq files after QC
- kwargs: dict
- Arguments to pass on to perform_qc function
Returns: Return status of the QC. True if successful download and False if failed. Return type: bool >>> object.perform_qc(qc.BBmap()) True
pyrpipe.test_sratools module¶
Created on Sun Jan 17 15:24:42 2021
@author: usingh
-
pyrpipe.test_sratools.
runtest
()¶
pyrpipe.tools module¶
Created on Wed Dec 4 14:54:22 2019
@author: usingh
-
class
pyrpipe.tools.
RNASeqTools
(*args, **kwargs)¶ Bases:
pyrpipe.runnable.Runnable
-
class
pyrpipe.tools.
Samtools
(*args, threads=None, **kwargs)¶ Bases:
pyrpipe.tools.RNASeqTools
Class to access samtools args: tuple
arguments to samtools- threads: int
- Number of threads samtools will use
- kwargs: dict
- Options to samtools
-
merge_bam
(bam_list, out_file='merged', out_dir=None, delete_bams=False, objectid=None)¶ Merge multiple bam files into a single file
- args: *args
- Input bam files to merge
- out_file: string
- Output file name to save the results. .bam will be added at the end.
- out_dir: string
- Path where to save the merged bam file. Default path is the same as the first bam_file’s
- threads: int
- Number of threads. Default: Use self.threads initialized in init().
- delete_bams: bool
- Delete input bam files after merging.
- verbose: bool
- Print stdout and std error
- quiet: bool
- Print nothing
- logs: bool
- Log this command to pyrpipe logs
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
- kwargs: dict
- Options to pass to samtools. This will override the existing options
Returns: Returns the path to the merged bam file. Return type: string
-
sam_sorted_bam
(sam_file, out_dir=None, out_suffix=None, delete_sam=False, delete_bam=False, objectid=None)¶ Convert sam file to bam and sort the bam file. sam_file: str
Path to the input sam file- out_dir: str
- Path to output directory
- out_suffix: str
- Output file suffix
- threads: int
- Number of threads. Default: Use self.threads initialized in init().
- delete_sam: bool
- Delete input sam_file
- delete_bam: bool
- Delete the intermediate unsorted bam_file
- verbose: bool
- Print stdout and std error
- quiet: bool
- Print nothing
- logs: bool
- Log this command to pyrpipe logs
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
- kwargs: dict
- Options to pass to samtools. This will override the existing options
Returns: Returns path to the sorted bam file. Returns empty string if operation failed. Return type: string
-
sam_to_bam
(sam_file, out_dir=None, out_suffix=None, delete_sam=False, objectid=None)¶ Convert sam file to a bam file. Output bam file will have same name as input sam. sam_file: string
Path to input Sam file- out_suffix: string
- Suffix for the output sam file
- threads: int
- Number of threads. Default: Use self.threads initialized in init().
- delete_sam: bool
- delete the sam file after conversion
- verbose: bool
- Print stdout and std error
- quiet: bool
- Print nothing
- logs: bool
- Log this command to pyrpipe logs
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
- kwargs: dict
- Options to pass to samtools. This will override the existing options
Returns: Returns the path to the bam file. Returns empty string if operation failed. Return type: string
-
sort_bam
(bam_file, out_dir=None, out_suffix=None, delete_bam=False, objectid=None)¶ Sorts an input bam file. Outpufile will end in _sorted.bam bam_file: str
Path to the input bam file- out_dir: str
- Path to output directory
- out_suffix: str
- Output file suffix
- threads: int
- Number of threads. Default: Use self.threads initialized in init().
- delete_bam: bool
- Delete input bam_file
- verbose: bool
- Print stdout and std error
- quiet: bool
- Print nothing
- logs: bool
- Log this command to pyrpipe logs
- objectid: str
- Provide an id to attach with this command e.g. the SRR accession. This is useful for debugging, benchmarking and reports.
- kwargs: dict
- Options to pass to samtools. This will override the existing options
Returns: Returns path to the sorted bam file. Returns empty string if operation failed. Return type: string
pyrpipe.valid_args module¶
Created on Mon Dec 7 14:15:50 2020
@author: usingh
This module contains a list of valid arguments for the tools
Module contents¶
Created on Sat Nov 23 15:17:38 2019
@author: usingh
Read pyrpipe configuration
-
class
pyrpipe.
Conf
¶ Bases:
object
Read and store pyrpipe configuration
-
init_sys_args
()¶
-
init_threads_mem
()¶
-
-
class
pyrpipe.
PyrpipeLogger
(name, logdir=None)¶ Bases:
object
Class to manage pyrpipe logs
env_logger: logger to log the current environment cmd_logger: logger to log the execution status, stdout, stderr and runtimes for each command run using execute_command()
-
create_logger
(name, logfile, formatter, level=10)¶ Creates a logger
- name: str
- name of logger
- logfile: str
- file name to save logs
- formatter: formatter object
- formatter for log
- Returns: logger
- A logger object
-
init_cmdlog
()¶ init the cmdlog
-
init_envlog
()¶ init the envlog
-
-
pyrpipe.
goodbye
()¶