Tree @master (Download .tar.gz)
- ..
- create_sra_metadata
- db_enrichment
- dict_ontology_standardization
- docker
- esr_samples
- submit_ebi
- update_virtuoso
- uthsc_samples
- cleanup.py
- delete_entries_on_arvados.py
- fetch_from_genbank.cwl
- foreach.sh
- import.cwl
- import_from_genbank.cwl
- import_to_arvados.py
- README.md
- split_into_arrays.cwl
- upload.cwl
- utils.py
Instructions for download and/or prepare the data and/or the metadata
Just go into the download_genbank_data or download_sra_data directory and execute the python3 script inside.
download_genbank_data/from_genbank_to_fasta_and_yaml.pydownloads the data and the matadata, preparing the FASTA and the YAML files;download_sra_data/download_sra_data.pycreates the metadata in the form of YAML files from the SraExperimentPackage.XXX.xml.gz file in the same directory.
History of
scripts
@master
git clone https://klaus.systemreboot.net/bh20-seq-resource/
- fixed bug that lead to invalid sample_sequencing_technology values Andrea Guarracino (commit: GitHub) 5 years ago
- Merge pull request #90 from AndreaGuarracino/patch-21 LLTommy (commit: GitHub) 5 years ago
- fix missing authors #91 AndreaGuarracino 5 years ago
- minimap2 returns nothing when there is no alignment. Peter Amstutz 5 years ago
- if the technology is not found, the YAML file is not created; managed longer species strings AndreaGuarracino 5 years ago
- renamed sra script; added seq technology in its additional information field if the term … AndreaGuarracino 5 years ago
- fix ncbi_countries dictionary AndreaGuarracino 5 years ago
- new terms in the ncbi_countries dictionary Andrea Guarracino (commit: AndreaGuarracino) 5 years ago
- added seq technology in its additional information field if the term is missing in the dicts AndreaGuarracino 5 years ago
- updated SraExperimentPackage info AndreaGuarracino 5 years ago
- two more terms in the ncbi_sequencing_technology dictionary Andrea Guarracino (commit: GitHub) 5 years ago
- fixed bugs in the download_sra_data Andrea Guarracino (commit: GitHub) 5 years ago
- new terms in the sequencing_technology dictionary Andrea Guarracino (commit: GitHub) 5 years ago
- Add upload.cwl Peter Amstutz 5 years ago
- Improving genbank import workflow Peter Amstutz 5 years ago
- very little readme for the scripts Andrea Guarracino (commit: GitHub) 5 years ago
- added new script to prepare metadata of sra data AndreaGuarracino 5 years ago
- moved the genbank script in his specific directory AndreaGuarracino 5 years ago
- added new dictionary entries AndreaGuarracino 5 years ago
- little fix for specimen_source Andrea Guarracino (commit: GitHub) 5 years ago
- new entries for the EBI samples AndreaGuarracino 5 years ago
- corrected the wrong entities Andrea Guarracino (commit: GitHub) 5 years ago
- Handle upload & assembly of gzipped, paired-end fastq Peter Amstutz 5 years ago
- virtuoso: remove graph before update Pjotr Prins 5 years ago
- species dictionary Andrea Guarracino (commit: GitHub) 5 years ago
- species are managed in another dictionary, try-catch added to avoid unexpected stops Andrea Guarracino (commit: GitHub) 5 years ago
- the script is more verbose; added other countries AndreaGuarracino 5 years ago
- fixed collection_location when it is not present in the dictionary terms AndreaGuarracino 5 years ago
- fixed collection-date management using a parser AndreaGuarracino 5 years ago
- fixed collection-date management; updated assembly info management for new IDs AndreaGuarracino 5 years ago