Choosing an Interface¶
All interfaces are based on Nextflow. Nextflow is a workflow language and executor for reproducible, containerized bioinformatics. The following serves as a quick, comparative reference for different ways you can run the same workflow, including ones you write yourself. See e.g. our Bulk RNA-Seq Guide or the official nf-core site for more in-depth parameter information for any examples.
Feature Comparison¶
For a practical overview of each method, see the launching comparison
Genomic Workflow Utility | Nextflow Tower | Command-line | |
---|---|---|---|
Fast access of reads generated at Ramaciotti | |||
Parameter input | UI | UI | nf-params.json |
Remote monitoring | Intermediate | Advanced | |
Initial configuration | None | Recommended: use Genomic Workflow Utility sidebar |
module load |
Launching Comparison¶
Nextflow is a workflow language and executor for reproducible, containerized bioinformatics. The following serves as a quick, comparative reference for different ways you can run the same workflow, including ones you write yourself. See e.g. our Bulk RNA-Seq Guide or the official nf-core site for more in-depth parameter information for any examples.
Tip
Due to Nextflow's intermediate file size requirements, we offer /srv/scratch/genomicwf
for all BABS users with the limitation that files are deleted irreversibly after 3 days of not being read. Within this time-frame, you can -resume
quickly with modified parameters. If lab scratch is preferred, we encourage the regular use of nextflow clean
.
New to Katana? You should review the Katana Guide before using any of the following methods.
Platform
With a few clicks, you can run highly maintained, peer-reviewed nf-core community workflows on Katana OnDemand using a graphical user interface, directly on reads from Ramaciotti.
Access the utility here. You must be at UNSW, or be logged in via VPN.
- (Optional) To uninstall, delete ~/ondemand/data/workflows_beta_4, and any datasets/runs you created
The following instructions can be applied for community workflows or your own:
-
Connect via ssh
ssh <zid>@kdm.restech.unsw.edu.au # (1)!
- It's best to use the Katana Data Mover node especially for step 3.
-
Create and enter a new project folder
mkdir -p /srv/scratch/genomicwf/$USER/myproject && cd $_ # (1)! git clone https://github.com/WalshKieran/katana-rnaseq-start.git . # (2)!
- Files stored in the "genomicwf" scratch are deleted if unused for 3 days. Nextflow working directories can exceed 1TB, but you may wish to try different parameters without recomputing everything within this timeframe.
- This copies a PBS batch template into your project folder. It will fail if there are any files already present.
-
(Optional) Download your data from Ramaciotti and create samplesheet:
wget -qO- https://mydata.ramaciotti...MYDATA1234.tar | tar xvz -C ./mydata1234 # (1)! wget https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py python3 fastq_dir_to_samplesheet.py --recursive ./mydata1234 ./samplesheet.csv
- Follow the instructions from Ramaciotti - if access is via BaseSpace or a non-tar share link, you can use the "Katana OnDemand" utility to download.
-
Launch and monitor
qsub run.pbs qstat -u $USER tail .nextflow.log
-
(Optional) Stop your job
qsig <ID returned from qsub> # (1)!
- Why not qdel? When exiting, Nextflow waits for all child jobs to exit - this can take more time than qdel allows.
Nextflow Tower is a more advanced interface for launching and monitoring Nextflow workflows, but you will need to move data on and off Katana yourself using the Katana Data Mover.
Bug
Tower is still slightly incompatible with Katana as of June 20, 2023.
One-off setup:
- Create a Tower account at https://tower.nf.
- Navigate to https://tower.nf/tokens, create and copy an access token.
- Paste your token into the "Katana OnDemand" workflow utility sidebar to automatically add Katana credentials/compute and even nf-core community workflows to your Tower personal workspace.
We currently do not support group workspaces, as sharing login credentials is against the Katana usage policy.
Resource Optimization Comparison (Advanced)¶
The default allocations for generalized Nextflow workflows are extremely generous for most datasets - this may negatively impact your queue priority and run duration. If your input files are reasonably similar, you should consider configuring each process based on measurements.
Optimization
See video in Launching a Workflow - the graphical interface interactively encourages the process described in "Command-Line".
Below is an illustration of how to run nf-core/rnaseq without previous similar runs (e.g. similar or greater read depth). This is not a substitute for reading the nf-optimizer documentation/drawbacks carefully.
-
Limit the samples in your samplesheet, or by other means
head -n 5 samplesheet.csv > samplesheet_4.csv
-
Run Nextflow on limited samples
export NXF_ENABLE_CACHE_INVALIDATION_ON_TASK_DIRECTIVE_CHANGE=false nextflow run ... --input samplesheet_4.csv
-
Generate resources.config (limited to ~120GB, 12 hours)
nf-optimizer -m 500 120000 -t 300 43200 -o resources.config .
-
Run Nextflow on all samples
export NXF_ENABLE_CACHE_INVALIDATION_ON_TASK_DIRECTIVE_CHANGE=false nextflow run ... --input samplesheet.csv -c resources.config -resume