Tree @master (Download .tar.gz)
GenBank
This directory contains the tools to pull and transform GenBank data.
Workflows
Prepare new GenBank data for upload
The following workflow sends GenBank data into PubSeq
# --- get list of IDs already in PubSeq
../../tools/pubseq-fetch-ids > pubseq_ids.txt
# --- get list of missing genbank IDs
python3 genbank-fetch-ids.py --skip pubseq_ids.txt > genbank_ids.txt
# --- fetch XML
python3 update-from-genbank.py --ids genbank_ids.txt --out ~/tmp/genbank
# --- Transform to YAML/JSON and FASTA
python3 transform-genbank-xml2yamlfa.py --out ~/tmp/pubseq file(s)
# --- Normalize data (validation mode)
python3 ../../workflows/tools/normalize-yamlfa.py -s ~/tmp/yamlfa/state.json --species ncbi_host_species.csv --specimen specimen.csv --validate
Validate GenBank data
To pull the data from PubSeq use the list of pubseq ids generated above.
TODO
- [X] Add id for GenBank accession - i.e. how can we tell a record is from GenBank
History of
workflows
/
pull-data
/
genbank
@master
git clone https://klaus.systemreboot.net/bh20-seq-resource/
0
»»
- Removing refs. No point in guessing terms Pjotr Prins 5 years ago
- mapping sample_species using regex Pjotr Prins 5 years ago
- Started on normalization Pjotr Prins 5 years ago
- genbank: specimen source Pjotr Prins 5 years ago
- genbank: more or less complete. Need to add collection method Pjotr Prins 5 years ago
- genbank: deal with host, sex and age Pjotr Prins 5 years ago
- genbank: technology parsing Pjotr Prins 5 years ago
- genbank: submitter info Pjotr Prins 5 years ago
- genbank: get authors Pjotr Prins 5 years ago
- Move reference code to different file so it does not break python Pjotr Prins 5 years ago
- GenBank date parsing Pjotr Prins 5 years ago
- transform-genbank-xml2yamlfa.py refactoring Pjotr Prins 5 years ago
- transform-genbank-xml2yamlfa.py rewrite Pjotr Prins 5 years ago
- genbank: minor fixes Pjotr Prins 5 years ago
- gzip output Pjotr Prins 5 years ago
- update-from-genbank.py Pjotr Prins 5 years ago
- genbank-fetch-ids.py Pjotr Prins 5 years ago
- genbank-fetch-ids Pjotr Prins 5 years ago
- genbank: cleaning up Pjotr Prins 5 years ago
- genbank-fetch-ids simple call Pjotr Prins 5 years ago
- sparql: make use of pattern matching Pjotr Prins 5 years ago
- Add comment Pjotr Prins 5 years ago
- Improve SPARQL query and comments Pjotr Prins 5 years ago
- genbank: sparql-fetch-ids Pjotr Prins 5 years ago
- sparql: rename file Pjotr Prins 5 years ago
- genbank: started on SPARQL fetcher Pjotr Prins 5 years ago
- genbank: pseudo workflow Pjotr Prins 5 years ago
- genbank: header Pjotr Prins 5 years ago
- genbank: split script Pjotr Prins 5 years ago
- genbank: moving script into workflow space Pjotr Prins 5 years ago
0
»»