Run Qserv ingest on a Kubernetes cluster#
Prerequisites#
An up and running Qserv instance, managed by qserv-operator inside a k8s cluster:
$ kubectl get qserv
NAME AGE
qserv 10d
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
qserv-czar-0 3/3 Running 6 10d
qserv-ingest-db-0 1/1 Running 2 10d
qserv-repl-ctl-0 1/1 Running 6 10d
qserv-repl-db-0 1/1 Running 2 10d
qserv-worker-0 5/5 Running 0 10d
qserv-worker-1 5/5 Running 0 10d
...
qserv-worker-7 5/5 Running 0 10d
qserv-worker-8 5/5 Running 0 10d
qserv-worker-9 5/5 Running 0 10d
qserv-xrootd-redirector-0 2/2 Running 0 10d
qserv-xrootd-redirector-1 2/2 Running 0 10d
The namespace containing this instance will be called <QSERV_NAMESPACE>
.
Save
<QSERV_NAMESPACE>
as the namespace for all subsequentkubectl
commands:
kubectl config set-context --current --namespace=<QSERV_NAMESPACE>
For additional informations, check official documentation for setting the namespace preference
Privileges to create pods and persistent volumes inside
<QSERV_NAMESPACE>
.An HTTP(s) server providing access to input data and metadata. All pods inside
<QSERV_NAMESPACE>
must be able to access this HTTP server.An instance of Argo Workflow running in
<QSERV_NAMESPACE>
, and theargo
client. The example scriptprereq-install.sh
located at the top-level of the lsst-dm/qserv-ingest repository allows to install a version which is supported by the ingest process. It will also installhelm
, a package manager for Kubernetes.
$ kubectl get pod -l app.kubernetes.io/part-of=argo-workflows
NAME READY STATUS RESTARTS AGE
argo-workflows-server-9f9bd4cc4-wkm5z 1/1 Running 0 1m
argo-workflows-workflow-controller-57779f5f96-2r9jf 1/1 Running 1 1m
git-lfs is required in order to retrieve integration test datasets.
Prepare and configure Qserv ingest#
Get the project#
RELEASE="2022.1.1-rc1"
git clone --single-branch -b "$RELEASE" --depth 1 https://github.com/lsst-dm/qserv-ingest
cd qserv-ingest
Configuration#
A directory with ingest workflow configuration and an env.sh
file have to be created:
cp -r manifests/in2p3-cosmo manifests/<CUSTOM_INGEST>
cp env.example.sh env.sh
In
env.sh
, setOVERLAY
to<CUSTOM_INGEST>
, and eventuallyINSTANCE
to the name of current Qserv instance.Configure ingest workflow configuration file (named
manifests/<CUSTOM_INGEST>/configmap/ingest.yaml
):
Inline documentation for this configuration file is avalaible at manifests/base/configmap/ingest.yaml
:
1version: 15
2ingest:
3 http:
4 # Optional, default to no time-out
5 # Timeout for POST and PUT queries in seconds
6 # POST and PUT queries which might be very long for the following operations:
7 # - posting contribution
8 # - closing transaction
9 # - publishing database
10 # - building the "director" index
11 # - building table indexes at workers
12 write_timeout: 1800
13 # Optional, default to no time-out
14 # Timeout for GET queries in seconds
15 read_timeout: 10
16 metadata:
17 # Optional, default to "ingest.input.servers[0]/ingest.input.path"
18 # Allow to customize metadata URL
19 url: http://dataserver/datasets/DC2/
20
21 input:
22 # List of http servers providing input dataset
23 # The ingest process will load-balance the download of input files accross
24 # these servers.
25 # Use file:// as first element in list when using local data
26 # TODO Add support for webdav protocol
27 servers:
28 - http://dataserver
29 - http://dataserver
30 # Path to input data on the http servers
31 path: datasets/DC2/
32
33 ## URLs of Qserv services
34 ## ----------------------
35 qserv:
36 # URL which serves Qserv SQL queries
37 query_url: "mysql://qsmaster:@qserv-czar:4040"
38 # URL which serves input chunk contributions queue
39 queue_url: "mysql://qsingest:@qserv-ingest-db-0.qserv-ingest-db/qservIngest"
40 # Replication controller service URL
41 replication_url: http://qserv-repl-ctl-0.qserv-repl-ctl:8080
42
43 ## Configure replication service
44 ## Documented at https://confluence.lsstcorp.org/display/DM/1.+Setting+configuration+parameters
45 ## --------------------------------------------------------------------------------------------
46 ingestservice:
47 # Optional, default to None
48 # Override the default value stored in input metadata (in database.json file)
49 # 1: build secondary index when closing a transaction
50 # 0: build secondary index after ingest
51 auto_build_secondary_index: 1
52
53 # Optional, default to "/etc/pki/tls/certs/ca-bundle.crt"
54 # cainfo: "/etc/pki/tls/certs/ca-bundle.crt"
55
56 # Optional, default to 1
57 # ssl_verifypeer: 1
58
59 # Optional, default to None
60 # Override the default Qserv value
61 async_proc_limit: 4
62
63 # Optional, default to None
64 # Override the default Qserv value
65 low_speed_limit: 10
66
67 # Optional, default to None
68 # Override the default Qserv value
69 low_speed_time: 3600
Launch Qserv ingest#
Launch the workflow using Argo
./argo-submit.sh
# monitor the workflow execution
argo get @latest
Then adapt example/query.sh
to launch a few queries against freshly ingested data.
Delete an existing database#
Please refer to Qserv Replication Service documentation,
and then adapt example script example/delete_database.sh
.
Run interactively a workflow step#
./argo-submit.sh -s
# Retrive the pod name for the 'interactive' step
argo get @latest
# Open a shell inside it
kubectl exec -it qserv-ingest-2hrcf-595146013 -c main bash
# All binaries for launching benchmark steps are located here:
ls /ingest/bin/