Example bioinformatics workflow

  • Bioinformatics pipelines are often built by stringing together many command line tools.
    These tools may have different installation methods and incompatible dependencies.
    Bioinformatics workflow managers solve these problems by allowing for a separate environment definition or container in each step.Aug 14, 2022
An example of bioinformatics workflow, illustrating the flow of data (red ovals) through various services (blue rectangles).
Examples of registries within bioinformatics include the EMBRACE Web Service Registry [20] , BioCatalogue [1], and AppDB [9]. Other systems, such as BioMoby [29] 

About this repository

The repository was created to illustrate features of workflow managers that are discussed in detail in this manuscript:

Basic proof-of-concept implementations

Each workflow manager folder in this repository has a README detailing how to run the proof-of-concept pipeline.
Implementations that are contributed and reviewed by developers from these workflow management systems are marked with ⭐ (see Acknowledgements):

Contact and Call for Contribution

This repository was created by Laura Wratten.
We very much encourage contributions by users of these workflows.
If you would like to add an implementation for any of these workflow managers you can follow the template.
If you would like to suggest changes to any of the existing implementations, please raise an issue and submit a pull request.

Is gitflow a good model for bioinformatics software development?

Motivated by laboratory accreditation requirements for the development of bioinformatics workflows related to high throughput sequencing in a clinical setting, the authors have developed a software development process formalisation based on gitflow, a popular model in mainstream software engineering practice.

Online Documentation for Workflow managers

Workflow managers have many more features which are not used in these implementations, and there are many additional workflow managers.
You can read more about each workflow manager in their official documentation:

Overview

Workflow managers provide an easy and intuitive way to simplify pipeline development.
Here we provide basic proof-of-concept implementations for selected workflow managers.
The analysis workflow is based on a small portion of an RNA-seq pipeline, using fastqc for quality controls and salmon for transcript quantification.
These implementations are d.

Test Data

This repository contains a simulated test data set which can be used to run the example implementations.
The test data contains RNA-Seq reads (reads_1.fq.gz and reads_2.fq.gz), a transcriptome reference file (transcriptome.fa) and the true counts from the simulation experiments (truth.tsv)

The RNA-Seq workflow

The RNA-Seq analysis workflow performs quality controls with fastqc and quantifies transcripts expression using Salmon.
Here we will use local installation (see documentation for salmon and fastqc).
For the local installations you can add a symbolic link to the executables to your $PATH:

What are some examples of bioinformatics workflow management systems?

In alphabetical order, some examples of bioinformatics workflow management systems include:

  • CLC bio
  • a bioinformatics analysis and workflow management platform from QIAGEN Digital Insights.
    Clone Manager from Sci-Ed.
  • What is the biogitflow documentation?

    The biogitflow documentation provides all the technical details, on how to configure the remote repository in GitLab to develop a new bioinformatics pipeline, how to use git and GitLab depending on the roles and permissions during the time frame of the development workflow for both use cases.
    Figure 1. biogitflow protocol for the nominal mode.

    Why should bioinformaticians use scientific workflows?

    This second generation of workflow systems has gained wider acceptance among bioinformaticians as they address many of the challenges faced by pipeline developers and users.
    Thirdly, we focus on reuse, one of the major benefits that can be drawn from scientific workflows.

    Workflow management system

    Pegasus is an open-source workflow management system.
    It provides the necessary abstractions for scientists to create scientific workflows and allows for transparent execution of these workflows on a range of computing platforms including high performance computing clusters, clouds, and national cyberinfrastructure.
    In Pegasus, workflows are described abstractly as directed acyclic graphs (DAGs) using a provided API for Jupyter Notebooks, Python, R, or Java.
    During execution, Pegasus translates the constructed abstract workflow into an executable workflow which is executed and managed by HTCondor.

    Workflow management system

    Pegasus is an open-source workflow management system.
    It provides the necessary abstractions for scientists to create scientific workflows and allows for transparent execution of these workflows on a range of computing platforms including high performance computing clusters, clouds, and national cyberinfrastructure.
    In Pegasus, workflows are described abstractly as directed acyclic graphs (DAGs) using a provided API for Jupyter Notebooks, Python, R, or Java.
    During execution, Pegasus translates the constructed abstract workflow into an executable workflow which is executed and managed by HTCondor.

    Categories

    Bioinformatics learning path
    Learning bioinformatics genomics
    Learning bioinformatics in r
    Bioinformatics notes complete
    Bioinformatics online tools
    Bioinformatics online projects
    Bioinformatics what is sequence alignment
    What's bioinformatics analysis
    Bioinformatics how to start
    Bioinformatics how to find genes
    How bioinformatics help in drug discovery
    How bioinformatics help in drug discovery for covid-19
    How bioinformatics helps in covid-19
    How bioinformatics is related to computational biology
    Bioinformatics cancer
    Bioinformatics cancer genomics
    Bioinformatics cancer genome analysis
    Bioinformatics cancer therapeutics
    Bioinformatics cancer biomarkers
    Bioinformatics cancer medicine