Deplyoment Guide

The way we recommend you deploy Spfy is to simply use the Docker composition for everything; this approach is documented in Deploying in General. Specifics related to the NML’s deployment is given in Deploying to Corefacility.

Deploying in General

Let’s take a look at the docker-compose.yml file.

version: '2'
services:
  webserver:
    build:
      context: .
      dockerfile: Dockerfile-spfy
    image: backend
    ports:
    - "8000:80"
    depends_on:
    - redis
    - blazegraph
    volumes:
    - /datastore

  reactapp:
    build:
      context: .
      dockerfile: Dockerfile-reactapp
    image: reactapp
    ports:
    - "8090:5000"
    depends_on:
    - webserver

  worker:
    build:
      context: .
      dockerfile: Dockerfile-rq
    image: backend-rq
    ports:
    - "9181:9181" #this is for debugging, drop a shell and run rq-dashboard if you need to see jobs
    volumes_from:
    - webserver
    depends_on:
    - webserver

  worker-blazegraph-ids:
    build:
      context: .
      dockerfile: Dockerfile-rq-blazegraph
    image: backend-rq-blazegraph
    volumes_from:
    - webserver
    depends_on:
    - webserver

  worker-priority:
    build:
      context: .
      dockerfile: Dockerfile-rq-priority
    image: backend-rq-priority
    volumes_from:
    - webserver
    depends_on:
    - webserver

  redis:
    image: redis:3.2
    command: redis-server --appendonly yes # for persistance
    volumes:
    - /data

  blazegraph:
    image: superphy/blazegraph:2.1.4-inferencing
    ports:
    - "8080:8080"
    volumes:
    - /var/lib/jetty/

Host to Container Mapping

There are a few key points to note:

ports:
- "8000:80"

The configuration maps host:container; so port 8000 on the host (your computer) is linked to port 80 of the container. Fields like volumes typically have only one value: /var/lib/jetty/; this is done to instruct Docker to map the folder /var/lib/jetty within the container itself to a generic volume managed by Docker, thereby enabling the data to persist across start/stop cycles.

You can also add a host path to volume mappings such as /dbbackup/:/var/lib/jetty/ so that Docker uses an actual path on your host, instead of a generic Docker-managed volume. As before, the first term, /dbbackup/ would reside on the host.

Warning

A caveat to note is that if you do not specify a host folder on volume mappings, running a docker-compose down will still wipe the generic volume. Either run docker-compose stop instead, or specify a host mapping to persist the data.

Volume Mapping in Production

In production, at minimum we recommend you map Blazegraph’s volume to a backup directory. /datastore also stores all the uploaded genome files and related temporary files generated during analysis. /data is used to store both the parsed responses to the front-end, and the task queue managing them. If you want the analysis tasks to continue, or existing results shown to the front-end, to persist after running docker-compose down you’ll have to map both volumes - server failures or just running docker-compose stop will still persist the data without requiring you to map to host.

Ports

reactapp is the front-end user interface for Spfy whereas webserver serves the backend Flask APIs. Without modification, when you run docker-compose up port 8090 is used to access the app. The front-end then calls port 8000 to submit requests to the backend. This approach is fine for individual users on their own computer, but this setup should not be used for production as it would, at minimum, require opening one additional port.

Instead, we recommend you change the port for reactapp to the standard port 80, and also map the webserver to a subdomain.

Setting the host port mapping can be done by modifying the webserver config with the below:

ports:
- "80:80"

For networking the backend APIs, you can keep the webserver running on port 8000 and use a reverse-proxy such as NGINX to map the subdomain to port 8000 on your server. In other words, we’ll set it up so requests made by reactapp to the API are sent to api.mydomain.com, for example, which maps to the IP address of your server (ideally via HTTPS). Your reverse-proxy will then redirect the request to port 8000 locally, while serving the reactapp interface on the main domain (mydomain.com, in this case).

Setting a Subdomain

This has to be done through the interface of your domain registrar. You’ll have to add an Address Record (A Record), which is typically under the heading “Manage Advanced DNS Records” or similar.

Setting up a Reverse Proxy

We recommend you use NGINX as the reverse proxy. You can find their Getting Started guide at https://www.nginx.com/resources/wiki/start/

In addition, we recommend you use Certbot (part of the EFF’s Let’s Encrypt) project to get the required certificates and setup HTTPS on your server. You can find their interactive guide at https://certbot.eff.org/ which allow’s you to specify the webserver (NGINX) and operating system you are using. Certbot comes with a nice script to automatically modify your NGINX configuration as required.

Point Reactapp to Your Subdomain

To tell reactapp to point to your subdomain, you’ll have to modify the api.js settings located at reactapp/src/middleware/api.js.

The current ROOT of the target domain is:

const ROOT = window.location.protocol + '//' + window.location.hostname + ':8000/'

change this to:

const ROOT = 'https' + '//' + 'api.mydomain.com' + '/'

and then rebuild and redeploy reactapp.

docker-compose build --no-cache reactapp
docker-compose up -d

Note

The Flask webserver has Cross-Origin Requests (CORS) enabled, so you can deploy reactapp to another server (that is only running reactapp, and not the webserver, databases, workers). The domain can be mydomain.com or any domain name you own - you’ll just have to setup the A records as appropriate.

Deploying to Corefacility

Blazegraph

Looking at the filesystem:

[claing@superphy backend-4.3.3]$ df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/superphy-root   45G   31G   14G  69% /
devtmpfs                    12G     0   12G   0% /dev
tmpfs                       12G  2.5G  9.3G  21% /dev/shm
tmpfs                       12G   26M   12G   1% /run
tmpfs                       12G     0   12G   0% /sys/fs/cgroup
/dev/vda1                  497M  240M  258M  49% /boot
/dev/mapper/docker-docker  200G   21G  180G  11% /docker
warehouse:/ifs/Warehouse   769T  601T  151T  81% /Warehouse
tmpfs                      2.4G     0  2.4G   0% /run/user/40151
tmpfs                      2.4G     0  2.4G   0% /run/user/40290

/Warehouse is used for long-term data storage and shared across the NML. In order to write to /Warehouse, you need the permissions of either claing or superphy; there are some problems with passing these permissions into Docker environments, so we run Blazegraph, inside of folder /Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing and as claing, outside of Docker using:

java -server -Xmx4g -Dbigdata.propertyFile=/Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing/RWStore.properties -jar blazegraph.jar

This command is run using screen allowing us to detach it from our shell.

screen
CTRL+a, d

and to resume:

screen -r

See https://github.com/superphy/backend/issues/159

Docker Service

[claing@superphy docker]$ sudo cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Thu Dec 24 17:40:08 2015
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/superphy-root /                       xfs     defaults        1 1
UUID=6c62e5cf-fd55-41e8-8122-e5e78643e3cd /boot                   xfs     defaults        1 2
/dev/mapper/superphy-swap swap                    swap    defaults        0 0
warehouse:/ifs/Warehouse        /Warehouse      nfs     defaults        0 0
/dev/mapper/docker-docker /docker xfs defaults 1 2

Our root filesystem for the Corefacility VM is really small (45G) and we instead have a virtual drive at /dev/mapper/docker-docker which is mounted on /docker which has our Docker images / unmapped volumes. This is setup using symlinks:

sudo systemctl stop docker
cd /var/lib/
sudo cp -rf docker/ /docker/backups/
sudo rm -rf docker/
sudo mkdir /docker/docker
sudo ln -s /docker/docker /var/lib/docker
sudo systemctl start docker

Docker Hub

Docker Hub is used to host pre-built images; for us, this mostly consisting of our base docker-flask-conda image. The org. page is publically available at https://hub.docker.com/u/superphy/ and you can pull without any permission issues. To push a new image, first register an account at https://hub.docker.com/

The owner for the org. has the username superphyinfo and uses the same password as superphy.info@gmail.com. You can use it to add yourself to the org.

You can then build and tag docker images to be pushed onto Docker Hub.

docker build -f Dockerfile-reactapp -t superphy/reactapp:4.3.3-corefacility .

or tag an existing image:

docker images
docker tag 245d7e4bb63e superphy/reactapp:4.3.3-corefacility

Either way, you can then push using the same command:

docker push superphy/reactapp:4.3.3-corefacility

Note

We occasionally use Docker Hub as a work-around in case a computer can’t build an image. There is some bug where Corefacility VMs aren’t connecting to NPM and thus we build the reactapp image on Cybera and pull it down on Corefacility.

Nginx

We run Nginx above the Docker layer for 3 reasons:

  1. Handle the /superphy prefix to all our routes as we don’t sure on /
  2. To host both the original SuperPhy and Spfy on a single VM
  3. Buffer large file uploads before sending it to Spfy’s Flask API

In /etc/nginx/nginx.conf:

user spfy;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 1024;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;
    error_log /var/log/nginx/error.log warn;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   2m;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;

    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
    }

    server {
        client_max_body_size 60g;
        listen       80 default_server;
        listen       443 ssl http2 default_server;
        listen       [::]:80 default_server;
        listen       [::]:443 ssl http2 default_server;
        server_name  superphy.corefacility.ca;
        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;


        location / {
            proxy_pass http://127.0.0.1:8081;
        }
        location /spfy/ {
            rewrite ^/spfy/(.*)$ /$1 break;
            proxy_pass http://localhost:8090;
            proxy_redirect http://localhost:8090/ $scheme://$host/spfy/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 20d;
        }
        location /grouch/ {
            rewrite ^/grouch/(.*)$ /$1 break;
            proxy_pass http://localhost:8091;
            proxy_redirect http://localhost:8091/ $scheme://$host/grouch/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 20d;
        }
        location /shiny/ {
            rewrite ^/shiny/(.*)$ /$1 break;
            proxy_pass http://127.0.0.1:3838;
            proxy_redirect http://127.0.0.1:3838/ $scheme://$host/shiny/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 950s;
        }

    }

    server {
        client_max_body_size 60g;
        listen       80;
        listen       443 ssl http2;
        listen       [::]:80;
        listen       [::]:443 ssl http2;
        server_name  lfz.corefacility.ca;
        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
            proxy_pass http://127.0.0.1:8081;
        }
        location = /spfy {
            return 301 /superphy/spfy/;
        }
        location = /grouch {
            return 301 /superphy/grouch/;
        }
        location = /minio {
            return 301 /superphy/minio/;
        }
        location /spfy/ {
            rewrite ^/spfy/(.*)$ /$1 break;
            proxy_pass http://localhost:8090;
            proxy_redirect http://localhost:8090/superphy/ $scheme://$host/spfy/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 20d;
        }
        location /grouch/ {
            rewrite ^/grouch/(.*)$ /$1 break;
            proxy_pass http://localhost:8091;
            proxy_redirect http://localhost:8091/superphy/ $scheme://$host/grouch/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 2h;
            proxy_send_timeout 2h;
        }
        location /shiny/ {
            rewrite ^/shiny/(.*)$ /$1 break;
            proxy_pass http://127.0.0.1:3838;
            proxy_redirect http://127.0.0.1:3838/ $scheme://$host/shiny/;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_read_timeout 950s;
        }
    }


}

Currently, this is setup to run the new Reactapp version of Spfy at https://lfz.corefacility.ca/superphy/grouch/ and the old AngularJS version + all the API endpoint at https://lfz.corefacility.ca/superphy/spfy/ This will probably change in the future, when backwards-incompatible changes are introduced to Spfy; we will run exclusively out of https://lfz.corefacility.ca/superphy/spfy/ The old SuperPhy is at https://lfz.corefacility.ca/superphy/

Note

There is an http://superphy.corefacility.ca/spfy/ address (but not a http://superphy.corefacility.ca/grouch/ address) that is only accessible from within the NML network (you’d have to VPN in if you’re at the CFIA building), but we prefer to focus on the lfz.corefacility/superphy/ routes which are available on both external/internal networks.

Some other points to note:

  • The rewrite rules are critical to operating on Corefacility, as the /superphy/ requirement can be tricky
  • We’re unsure if the client_max_body_size 60g; has any effect when deployed on Corefacility, it might be that there is another Nginx instance ran by the NML to route its VMs. Currently we’re capped at ~250 MB uploads at a time on Corefacility, you can see a long debugging log of this at https://github.com/superphy/backend/issues/159
  • Nginx is not hosting the websites, it only serves to proxy the requests to Apache (for the old SuperPhy) or Docker (for the new Spfy)

Warning

Nginx is also run internally in the Docker webserver image to allow you to handle running the composition by itself, but generally you shouldn’t have to worry about it.