Tree @master (Download .tar.gz)
COVID-19 PubSeq: Public Sequence uploader
This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. There are two versions, one that runs on the command line and another that acts as web interface. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. For more information see the paper.

To get started, first install the uploader, and use the bh20-seq-uploader command to upload your data.
Installation
There are several ways to install the uploader. The most portable is with a virtualenv.
Installation with virtualenv
- Prepare your system. You need to make sure you have Python, and the ability to install modules such as
pycurlandpyopenssl. On Ubuntu 18.04, you can run:
sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
- Create and enter your virtualenv. Go to some memorable directory and make and enter a virtualenv:
virtualenv --python python3 venv
. venv/bin/activate
Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.
- Install the tool. Once in your virtualenv, install this project:
Install from PyPi:
pip3 install bh20-seq-uploader
Install from git:
pip3 install git+https://github.com/arvados/bh20-seq-resource.git@master
- Test the tool. Try running:
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
Make sure you are in your virtualenv whenever you run the tool! If you ever can't run the tool, and your prompt doesn't say (venv), try going to the directory where you put the virtualenv and running . venv/bin/activate. It only works for the current terminal window; you will need to run it again if you open a new terminal.
Installation with pip3 --user
If you don't want to have to enter a virtualenv every time you use the uploader, you can use the --user feature of pip3 to install the tool for your user.
- Prepare your system. Just as for the
virtualenvmethod, you need to install some dependencies. On Ubuntu 18.04, you can run:
sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
- Install the tool. You can run:
pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master
- Make sure the tool is on your
PATH. Thepip3command will install the uploader in.local/bininside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:
export PATH=$PATH:$HOME/.local/bin
To make this change permanent, assuming your shell is Bash, run:
echo 'export PATH=$PATH:$HOME/.local/bin' >>~/.bashrc
- Test the tool. Try running:
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
Installation from Source for Development
If you plan to contribute to the project, you may want to install an editable copy from source. With this method, changes to the source code are automatically reflected in the installed copy of the tool.
- Prepare your system. On Ubuntu 18.04, you can run:
sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
- Clone and enter the repository. You can run:
git clone https://github.com/arvados/bh20-seq-resource.git
cd bh20-seq-resource
- Create and enter a virtualenv. Go to some memorable directory and make and enter a virtualenv:
virtualenv --python python3 venv
. venv/bin/activate
Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.
- Install the checked-out repository in editable mode. Once in your virtualenv, install with this special pip command:
pip3 install -e .
- Test the tool. Try running:
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
Installation with GNU Guix
For running/developing the uploader with GNU Guix see INSTALL.md
Usage
Run the uploader with a FASTA or FASTQ file and accompanying metadata file in JSON or YAML:
bh20-seq-uploader example/metadata.yaml example/sequence.fasta
If the sample_id of your upload matches a sample already in PubSeq, it will be considered a new version and supercede the existing entry.
Workflow for Generating a Pangenome
All these uploaded sequences are being fed into a workflow to generate a pangenome for the virus. You can replicate this workflow yourself.
An example is to get your SARS-CoV-2 sequences from GenBank in seqs.fa, and then run a series of commands
minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf
seqwish -s seqs.fa -p seqs.paf -g seqs.gfa
odgi build -g seqs.gfa -s -o seqs.odgi
odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5
Here we convert such a pipeline into the Common Workflow Language (CWL) and sources can be found here.
For more information on building pangenome models, see this wiki page.
Web Interface
This project comes with a simple web server that lets you use the sequence uploader from a browser. It will work as long as you install the packager with the web extra.
To run it locally:
virtualenv --python python3 venv
. venv/bin/activate
pip install -e ".[web]"
env FLASK_APP=bh20simplewebuploader/main.py flask run
Then visit http://127.0.0.1:5000/.
Production
For production deployment, you can use gunicorn:
pip3 install gunicorn
gunicorn bh20simplewebuploader.main:app
This runs on http://127.0.0.1:8000/ by default, but can be adjusted with various gunicorn options.
Commit History
@master
git clone https://klaus.systemreboot.net/bh20-seq-resource/
- esr_samples script refactoring; added a reference of the esr_samples script in the blog as an example of how to parse metadata AndreaGuarracino 5 years ago
- new countries; updated genbank/sra scripts to manage more specimen sources AndreaGuarracino 5 years ago
- genbank and sra scripts more picky on the ontologies; added utils.py for shared functions AndreaGuarracino 5 years ago
- Virtuoso uploader: instructions Pjotr Prins 5 years ago
- Virtuoso uploader: explicit output Pjotr Prins 5 years ago
- Fixing missing dot in .ttl file lltommy 5 years ago
- Adding script supporting semantic enrichment lltommy 5 years ago
- Adding a few labels to other used vocabs lltommy 5 years ago
- Updated country labels and GPS coordinates lltommy 5 years ago
- script for processing the metadata of the ESR samples; moved delete_entries_on_arvados script in scripts directory AndreaGuarracino 5 years ago
- added new New Zealand entries AndreaGuarracino 5 years ago
- increased the quality filter threshold AndreaGuarracino 5 years ago
- fixed bugs in in index management and type conversion AndreaGuarracino 5 years ago
- sra script re-enabled, ready for tests AndreaGuarracino 5 years ago
- added in the sra script an option to include only a subset of ids AndreaGuarracino 5 years ago
- sra script updated for managing more locations AndreaGuarracino 5 years ago
- synchronized the create_sra_metadata.py script with the latest updates AndreaGuarracino 5 years ago
- fixed few countries ontology terms; added a new species AndreaGuarracino 5 years ago
- added script to remove entries on Arvados AndreaGuarracino 5 years ago
- local QC is optional AndreaGuarracino 5 years ago
- added control (locally and in the validation) that sample_id has to be the same in the metadata and in the FASTA header #103 AndreaGuarracino 5 years ago
- qc_fasta returns also the seq_type; verified that only one FASTA is sent at a time; code cleaning AndreaGuarracino 5 years ago
- fixed local QC, modifying the regex for FASTA/FASTQ to be more general AndreaGuarracino 5 years ago
- updated dependency from clustalw to minimap2; the genbank script no longer creates YAML/FASTA pairs for too short sequences AndreaGuarracino 5 years ago
- Increase RAM for odgi-build-from-spoa-gfa Peter Amstutz 5 years ago
- updated parameter name AndreaGuarracino 5 years ago
- added option in the genbank script to ignore (already validated) IDs; code cleaning; typos AndreaGuarracino 5 years ago
- typos in the code; little code refactoring AndreaGuarracino 5 years ago
- Docs Pjotr Prins 5 years ago
- Fix underscores Pjotr Prins 5 years ago