Tree @master (Download .tar.gz)
- ..
- create_sra_metadata
- db_enrichment
- dict_ontology_standardization
- docker
- esr_samples
- submit_ebi
- update_virtuoso
- uthsc_samples
- cleanup.py
- delete_entries_on_arvados.py
- fetch_from_genbank.cwl
- foreach.sh
- import.cwl
- import_from_genbank.cwl
- import_to_arvados.py
- README.md
- split_into_arrays.cwl
- upload.cwl
- utils.py
Instructions for download and/or prepare the data and/or the metadata
Just go into the download_genbank_data or download_sra_data directory and execute the python3 script inside.
download_genbank_data/from_genbank_to_fasta_and_yaml.pydownloads the data and the matadata, preparing the FASTA and the YAML files;download_sra_data/download_sra_data.pycreates the metadata in the form of YAML files from the SraExperimentPackage.XXX.xml.gz file in the same directory.
History of
scripts
@master
git clone https://klaus.systemreboot.net/bh20-seq-resource/
- Fixing missing dot in .ttl file lltommy 5 years ago
- Adding script supporting semantic enrichment lltommy 5 years ago
- script for processing the metadata of the ESR samples; moved delete_entries_on_arvados script in scripts directory AndreaGuarracino 5 years ago
- added new New Zealand entries AndreaGuarracino 5 years ago
- fixed bugs in in index management and type conversion AndreaGuarracino 5 years ago
- sra script re-enabled, ready for tests AndreaGuarracino 5 years ago
- added in the sra script an option to include only a subset of ids AndreaGuarracino 5 years ago
- sra script updated for managing more locations AndreaGuarracino 5 years ago
- synchronized the create_sra_metadata.py script with the latest updates AndreaGuarracino 5 years ago
- fixed few countries ontology terms; added a new species AndreaGuarracino 5 years ago
- added control (locally and in the validation) that sample_id has to be the same in the metadata and in the FASTA header #103 AndreaGuarracino 5 years ago
- updated dependency from clustalw to minimap2; the genbank script no longer creates YAML/FASTA pairs for too short sequences AndreaGuarracino 5 years ago
- added option in the genbank script to ignore (already validated) IDs; code cleaning; typos AndreaGuarracino 5 years ago
- the YAML/FASTA pair is not created for samples where at least one mandatory field is missing AndreaGuarracino 5 years ago
- fixed protocol for the dictionary entries that caused validation problems AndreaGuarracino 5 years ago
- genbank/sra scripts update to be more generic with the specimen sources AndreaGuarracino 5 years ago
- added new countries and speciesman sources: fixed few country entries AndreaGuarracino 5 years ago
- genbank/sra scripts updated to read the dictionaries in a more general way AndreaGuarracino 5 years ago
- lots of new dictionary terms AndreaGuarracino 5 years ago
- Comment out some broken links for now Peter Amstutz 5 years ago
- Preparing for EBI submission Pjotr Prins 5 years ago
- Started EBI submission Pjotr Prins 5 years ago
- Report similarity == 0 Peter Amstutz 5 years ago
- Cleanup script also clears errors for revalidate Peter Amstutz 5 years ago
- Catch exceptions Peter Amstutz 5 years ago
- added a suffix to distinguish which script created the error/warning files AndreaGuarracino 5 years ago
- metadata with missing host_species are not created AndreaGuarracino 5 years ago
- an output file is created with the accessions for which no YAML file is created AndreaGuarracino 5 years ago
- updated metadata source AndreaGuarracino 5 years ago
- other term for Homo sapiens (for SRA samples) Andrea Guarracino (commit: GitHub) 5 years ago