Run Qserv ingest on a Kubernetes cluster#

Prerequisites#

An up and running Qserv instance, managed by qserv-operator inside a k8s cluster:

$ kubectl get qserv
NAME    AGE
qserv   10d

$ kubectl get pods
NAME                              READY   STATUS    RESTARTS   AGE
qserv-czar-0                      3/3     Running   6          10d
qserv-ingest-db-0                 1/1     Running   2          10d
qserv-repl-ctl-0                  1/1     Running   6          10d
qserv-repl-db-0                   1/1     Running   2          10d
qserv-worker-0                    5/5     Running   0          10d
qserv-worker-1                    5/5     Running   0          10d
...
qserv-worker-7                    5/5     Running   0          10d
qserv-worker-8                    5/5     Running   0          10d
qserv-worker-9                    5/5     Running   0          10d
qserv-xrootd-redirector-0         2/2     Running   0          10d
qserv-xrootd-redirector-1         2/2     Running   0          10d

The namespace containing this instance will be called <QSERV_NAMESPACE>.

Save <QSERV_NAMESPACE> as the namespace for all subsequent kubectl commands:

kubectl config set-context --current --namespace=<QSERV_NAMESPACE>

For additional informations, check official documentation for setting the namespace preference

Privileges to create pods and persistent volumes inside <QSERV_NAMESPACE>.
An HTTP(s) server providing access to input data and metadata. All pods inside <QSERV_NAMESPACE> must be able to access this HTTP server.
An instance of Argo Workflow running in <QSERV_NAMESPACE>, and the argo client. The example script prereq-install.sh located at the top-level of the lsst-dm/qserv-ingest repository allows to install a version which is supported by the ingest process. It will also install helm, a package manager for Kubernetes.

$ kubectl get pod -l app.kubernetes.io/part-of=argo-workflows
NAME                                                  READY   STATUS    RESTARTS   AGE
argo-workflows-server-9f9bd4cc4-wkm5z                 1/1     Running   0          1m
argo-workflows-workflow-controller-57779f5f96-2r9jf   1/1     Running   1          1m

git-lfs is required in order to retrieve integration test datasets.

Prepare and configure Qserv ingest#

Get the project#

RELEASE="2022.1.1-rc1"
git clone --single-branch -b "$RELEASE" --depth 1 https://github.com/lsst-dm/qserv-ingest
cd qserv-ingest

Configuration#

A directory with ingest workflow configuration and an env.sh file have to be created:

cp -r manifests/in2p3-cosmo manifests/<CUSTOM_INGEST>
cp env.example.sh env.sh

In env.sh, set OVERLAY to <CUSTOM_INGEST>, and eventually INSTANCE to the name of current Qserv instance.
Configure ingest workflow configuration file (named manifests/<CUSTOM_INGEST>/configmap/ingest.yaml):

Inline documentation for this configuration file is avalaible at manifests/base/configmap/ingest.yaml:

version: 15
ingest:
    http:
        # Optional, default to no time-out
        # Timeout for POST and PUT queries in seconds
        # POST and PUT queries which might be very long for the following operations:
        # - posting contribution
        # - closing transaction
        # - publishing database
        # - building the "director" index
        # - building table indexes at workers
        write_timeout: 1800
        # Optional, default to no time-out
        # Timeout for GET queries in seconds
        read_timeout: 10
    metadata:
      # Optional, default to "ingest.input.servers[0]/ingest.input.path"
      # Allow to customize metadata URL
      url: http://dataserver/datasets/DC2/

    input:
        # List of http servers providing input dataset
        # The ingest process will load-balance the download of input files accross
        # these servers.
        # Use file:// as first element in list when using local data
        # TODO Add support for webdav protocol
        servers:
            - http://dataserver
            - http://dataserver
        # Path to input data on the http servers
        path: datasets/DC2/

    ## URLs of Qserv services
    ## ----------------------
    qserv:
        # URL which serves Qserv SQL queries
        query_url: "mysql://qsmaster:@qserv-czar:4040"
        # URL which serves input chunk contributions queue
        queue_url: "mysql://qsingest:@qserv-ingest-db-0.qserv-ingest-db/qservIngest"
        # Replication controller service URL
        replication_url: http://qserv-repl-ctl-0.qserv-repl-ctl:8080

    ## Configure replication service
    ## Documented at https://confluence.lsstcorp.org/display/DM/1.+Setting+configuration+parameters
    ## --------------------------------------------------------------------------------------------
    ingestservice:
        # Optional, default to None
        # Override the default value stored in input metadata (in database.json file)
        # 1: build secondary index when closing a transaction
        # 0: build secondary index after ingest
        auto_build_secondary_index: 1

        # Optional, default to "/etc/pki/tls/certs/ca-bundle.crt"
        # cainfo: "/etc/pki/tls/certs/ca-bundle.crt"

        # Optional, default to 1
        # ssl_verifypeer: 1

        # Optional, default to None
        # Override the default Qserv value
        async_proc_limit: 4

        # Optional, default to None
        # Override the default Qserv value
        low_speed_limit: 10

        # Optional, default to None
        # Override the default Qserv value
        low_speed_time: 3600

Launch Qserv ingest#

Launch the workflow using Argo

./argo-submit.sh
# monitor the workflow execution
argo get @latest

Then adapt example/query.sh to launch a few queries against freshly ingested data.

Delete an existing database#

Please refer to Qserv Replication Service documentation, and then adapt example script example/delete_database.sh.

Run interactively a workflow step#

./argo-submit.sh -s
# Retrive the pod name for the 'interactive' step
argo get @latest
# Open a shell inside it
kubectl exec -it qserv-ingest-2hrcf-595146013 -c main bash
# All binaries for launching benchmark steps are located here:
ls /ingest/bin/