systemreboot git repos bh20-seq-resource / master test / rest-api.org
master

Tree @master (Download .tar.gz)

rest-api.org @masterraw · history · blame

# C-c C-e h h   publish
# C-c !         insert date (use . for active agenda, C-u C-c ! for date+time, C-u C-c . for time)
# C-c C-t       task rotate
# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png
# C-c C-c to run test blocks
#
# This page runs tests and the HTML export doubles as documentation on
# http://covid19.genenetwork.org/apidoc

#+TITLE: PubSeq REST API
#+AUTHOR: Pjotr Prins
#+HTML_LINK_HOME: http://covid19.genenetwork.org/apidoc
# OPTIONS: section-numbers: nil, with-drawers: t

#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />

* PubSeq REST API

Here we document the public REST API that comes with PubSeq. The tests
run in emacs [[https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-python.html][org-babel]].  See the bottom of this document for running
the tests inside emacs. See bottom of the page how to run tests.

** Introduction

We built a REST API for COVID-19 PubSeq. The API source code can be
found in [[https://github.com/arvados/bh20-seq-resource/tree/master/bh20simplewebuploader/api.py][api.py]]. To see if the service is up try

#+begin_src sh
curl http://covid19.genenetwork.org/api/version
#+end_src

#+begin_src js
{
  "service": "PubSeq",
  "version": 0.1
}
#+end_src

The current API can fetch data

#+begin_src js
curl http://covid19.genenetwork.org/api/search?s=MT533203.1
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence"
  }
]

curl http://covid19.genenetwork.org/api/sample/MT533203.1.json
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "date": "2020-04-27",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence",
    "mapper": "minimap v. 2.17",
    "sequencer": "http://www.ebi.ac.uk/efo/EFO_0008632",
    "specimen": "http://purl.obolibrary.org/obo/NCIT_C155831"
  }
]
#+end_src


The Python3 version is

#+begin_src python :session :exports both
import requests
baseURL="http://localhost:5067" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
response_body
#+end_src

#+RESULTS:
| service | : | PubSeq | version | : | 0.1 |

** Search for an entry

When you use the search box on PubSeq it queries the REST end point
for information on the search items. For example

#+begin_src python :session :exports both
requests.get(baseURL+"/api/search?s=MT533203.1").json()
#+end_src

#+RESULTS:
| collection | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 | fasta | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta | id | : | MT533203.1 | info | : | http://identifiers.org/insdc/MT533203.1#sequence |

where collection is the raw uploaded data. The hash value in ~c=~ is
computed on the contents of the Arvados keep [[https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-mount-gnu-linux.html][collection]] and effectively
acts as a deduplication uuid.

** Fetch metadata

Using above collection link you can fetch the metadata in JSON as it
was uploaded originally from the SHeX expression, e.g. using
https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/

But better to use the more advanced sample metadata fetcher
because is does a bit more in terms of expansion

#+begin_src python :session :exports both
requests.get(baseURL+"/api/sample/MT533203.1.json").json()
#+end_src

#+RESULTS:
| collection | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 | date | : | 2020-04-27 | fasta | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta | id | : | MT533203.1 | info | : | http://identifiers.org/insdc/MT533203.1#sequence | mapper | : | minimap v. 2.17 | sequencer | : | http://www.ebi.ac.uk/efo/EFO_0008632 | specimen | : | http://purl.obolibrary.org/obo/NCIT_C155831 |



** Fetch EBI XML

PubSeq provides an API that is used to export formats that are
suitable for uploading data to EBI/ENA from our [[http://covid19.genenetwork.org/export][EXPORT]] menu. This is
documented [[http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6][here]].

#+begin_src python :session :exports both
requests.get(baseURL+"/api/ebi/sample-MT326090.1.xml").text
#+end_src

#+RESULTS:
#+begin_example
<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET>
  <SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
    <TITLE>COVID-19 PubSeq Sample</TITLE>
    <SAMPLE_NAME>
      <TAXON_ID>2697049</TAXON_ID>
      <SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
      <COMMON_NAME>SARS-CoV-2</COMMON_NAME>
    </SAMPLE_NAME>
    <SAMPLE_ATTRIBUTES>
      <SAMPLE_ATTRIBUTE>
        <TAG>investigation type</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>sequencing method</TAG>
        <VALUE>http://purl.obolibrary.org/obo/OBI_0000759</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>collection date</TAG>
        <VALUE>2020-03-21</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (latitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (longitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
     <TAG>geographic location (country and/or sea)</TAG>
     <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (region and locality)</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>environment (material)</TAG>
        <VALUE>http://purl.obolibrary.org/obo/NCIT_C155831</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>ENA-CHECKLIST</TAG>
        <VALUE>ERC000011</VALUE>
      </SAMPLE_ATTRIBUTE>
    </SAMPLE_ATTRIBUTES>
  </SAMPLE>
</SAMPLE_SET>
#+end_example

* Configure emacs to run tests

Execute a code block with C-c C-c. You may need to set

#+begin_src elisp
  (org-babel-do-load-languages
   'org-babel-load-languages
   '((python . t)))
  (setq org-babel-python-command "python3")
  (setq org-babel-eval-verbose t)
  (setq org-confirm-babel-evaluate nil)
#+end_src

#+RESULTS:

To skip confirmations you may also want to set

: (setq org-confirm-babel-evaluate nil)

To see output of the interpreter open then *Python* buffer.