Run Qserv ingest on a Kubernetes cluster#

Prerequisites#

  • An up and running Qserv instance, managed by qserv-operator inside a k8s cluster:

$ kubectl get qserv
NAME    AGE
qserv   10d

$ kubectl get pods
NAME                              READY   STATUS    RESTARTS   AGE
qserv-czar-0                      3/3     Running   6          10d
qserv-ingest-db-0                 1/1     Running   2          10d
qserv-repl-ctl-0                  1/1     Running   6          10d
qserv-repl-db-0                   1/1     Running   2          10d
qserv-worker-0                    5/5     Running   0          10d
qserv-worker-1                    5/5     Running   0          10d
...
qserv-worker-7                    5/5     Running   0          10d
qserv-worker-8                    5/5     Running   0          10d
qserv-worker-9                    5/5     Running   0          10d
qserv-xrootd-redirector-0         2/2     Running   0          10d
qserv-xrootd-redirector-1         2/2     Running   0          10d

The namespace containing this instance will be called <QSERV_NAMESPACE>.

  • Save <QSERV_NAMESPACE> as the namespace for all subsequent kubectl commands:

kubectl config set-context --current --namespace=<QSERV_NAMESPACE>

For additional informations, check official documentation for setting the namespace preference

  • Privileges to create pods and persistent volumes inside <QSERV_NAMESPACE>.

  • An HTTP(s) server providing access to input data and metadata. All pods inside <QSERV_NAMESPACE> must be able to access this HTTP server.

  • An instance of Argo Workflow running in <QSERV_NAMESPACE>, and the argo client. The example script prereq-install.sh located at the top-level of the lsst-dm/qserv-ingest repository allows to install a version which is supported by the ingest process. It will also install helm, a package manager for Kubernetes.

$ kubectl get pod -l app.kubernetes.io/part-of=argo-workflows
NAME                                                  READY   STATUS    RESTARTS   AGE
argo-workflows-server-9f9bd4cc4-wkm5z                 1/1     Running   0          1m
argo-workflows-workflow-controller-57779f5f96-2r9jf   1/1     Running   1          1m
  • git-lfs is required in order to retrieve integration test datasets.

Prepare and configure Qserv ingest#

Get the project#

RELEASE="2022.1.1-rc1"
git clone --single-branch -b "$RELEASE" --depth 1 https://github.com/lsst-dm/qserv-ingest
cd qserv-ingest

Configuration#

A directory with ingest workflow configuration and an env.sh file have to be created:

cp -r manifests/in2p3-cosmo manifests/<CUSTOM_INGEST>
cp env.example.sh env.sh
  1. In env.sh, set OVERLAY to <CUSTOM_INGEST>, and eventually INSTANCE to the name of current Qserv instance.

  2. Configure ingest workflow configuration file (named manifests/<CUSTOM_INGEST>/configmap/ingest.yaml):

Inline documentation for this configuration file is avalaible at manifests/base/configmap/ingest.yaml:

 1version: 15
 2ingest:
 3    http:
 4        # Optional, default to no time-out
 5        # Timeout for POST and PUT queries in seconds
 6        # POST and PUT queries which might be very long for the following operations:
 7        # - posting contribution
 8        # - closing transaction
 9        # - publishing database
10        # - building the "director" index
11        # - building table indexes at workers
12        write_timeout: 1800
13        # Optional, default to no time-out
14        # Timeout for GET queries in seconds
15        read_timeout: 10
16    metadata:
17      # Optional, default to "ingest.input.servers[0]/ingest.input.path"
18      # Allow to customize metadata URL
19      url: http://dataserver/datasets/DC2/
20
21    input:
22        # List of http servers providing input dataset
23        # The ingest process will load-balance the download of input files accross
24        # these servers.
25        # Use file:// as first element in list when using local data
26        # TODO Add support for webdav protocol
27        servers:
28            - http://dataserver
29            - http://dataserver
30        # Path to input data on the http servers
31        path: datasets/DC2/
32
33    ## URLs of Qserv services
34    ## ----------------------
35    qserv:
36        # URL which serves Qserv SQL queries
37        query_url: "mysql://qsmaster:@qserv-czar:4040"
38        # URL which serves input chunk contributions queue
39        queue_url: "mysql://qsingest:@qserv-ingest-db-0.qserv-ingest-db/qservIngest"
40        # Replication controller service URL
41        replication_url: http://qserv-repl-ctl-0.qserv-repl-ctl:8080
42
43    ## Configure replication service
44    ## Documented at https://confluence.lsstcorp.org/display/DM/1.+Setting+configuration+parameters
45    ## --------------------------------------------------------------------------------------------
46    ingestservice:
47        # Optional, default to None
48        # Override the default value stored in input metadata (in database.json file)
49        # 1: build secondary index when closing a transaction
50        # 0: build secondary index after ingest
51        auto_build_secondary_index: 1
52
53        # Optional, default to "/etc/pki/tls/certs/ca-bundle.crt"
54        # cainfo: "/etc/pki/tls/certs/ca-bundle.crt"
55
56        # Optional, default to 1
57        # ssl_verifypeer: 1
58
59        # Optional, default to None
60        # Override the default Qserv value
61        async_proc_limit: 4
62
63        # Optional, default to None
64        # Override the default Qserv value
65        low_speed_limit: 10
66
67        # Optional, default to None
68        # Override the default Qserv value
69        low_speed_time: 3600

Launch Qserv ingest#

Launch the workflow using Argo

./argo-submit.sh
# monitor the workflow execution
argo get @latest

Then adapt example/query.sh to launch a few queries against freshly ingested data.

Delete an existing database#

Please refer to Qserv Replication Service documentation, and then adapt example script example/delete_database.sh.

Run interactively a workflow step#

./argo-submit.sh -s
# Retrive the pod name for the 'interactive' step
argo get @latest
# Open a shell inside it
kubectl exec -it qserv-ingest-2hrcf-595146013 -c main bash
# All binaries for launching benchmark steps are located here:
ls /ingest/bin/