Setting up the environment¶
First step is to setup the environment in a way such that reproducibility is ensured or maximized. In this tutorial we will use the conda environment manager to install python, pyrpipe and the required tools and dependencies into a single environment. Note: Conda must be installed on the system. For help with setting up conda, please see miniconda.
Create a new conda environment¶
To create a new conda environment with python 3.8 execute the following commands. We recommend sharing conda environment files with pipeline scripts to allow for reproducible analysis.
conda create -n pyrpipe python=3.8
Activate the newly created conda environment and install required tools
conda activate pyrpipe
conda install -c bioconda pyrpipe star=2.7.7a sra-tools=2.10.9 stringtie=2.1.4 trim-galore=0.6.6 orfipy=0.0.3 salmon=1.4.0
To create a yaml file containing information about the conda environment, run the following command
conda env export | grep -v "^prefix: " > environment.yml
To recreate the conda environment in the environment.yml, use
conda env create -f environment.yml
We have also provided a utility to install required RNA-Seq tools via a single command:
pyrpipe_diagnostic build-tools
Note Users must verify the versions of the tools installed in the conda environment.
Setting up NCBI SRA-Tools¶
After installing sra-tools, please configure prefetch to save the downloads the the public user-repository. This will ensure that the prefetch command will download the data to the user defined directory. To do this
- Type vdb-config -i command in terminal to open the NCBI SRA-Tools configuration editor.
- Under the TOOLS tab, set prefetch downloads option to public user-repository
Users can easily test if SRA-Tools has been setup properly by invoking the following command
pyrpipe_diagnostic test