Configuring parameters of the ingests#
Setting configuration parameters#
Parameters are set for a database (regardless of the published status) using the following service:
| 
 | 
 | 
The request object has the following schema:
{   "database" : <string>,
    "SSL_VERIFYHOST" : <number>,
    "SSL_VERIFYPEER" : <number>,
    "CAPATH" : <string>,
    "CAINFO" : <string>,
    "CAINFO_VAL" : <string>,
    "PROXY_SSL_VERIFYHOST" : <number>,
    "PROXY_SSL_VERIFYPEER" : <number>,
    "PROXY_CAPATH" : <string>,
    "PROXY_CAINFO" : <string>,
    "PROXY_CAINFO_VAL" : <string>,
    "CURLOPT_PROXY" : <string>,
    "CURLOPT_NOPROXY" : <string>,
    "CURLOPT_HTTPPROXYTUNNEL" : <number>,
    "CONNECTTIMEOUT" : <number>,
    "TIMEOUT" : <number>,
    "LOW_SPEED_LIMIT" : <number>,
    "LOW_SPEED_TIME" : <number>,
    "ASYNC_PROC_LIMIT" : <number>
}
Where:
- databasestringrequired
- The required name of a database affected by the operation. 
- SSL_VERIFYHOSTnumber =- 2
- The optional flag that tells the system to verify the host of the peer. If the value is set to - 0the system will not check the host name against the certificate. Any other value would tell the system to perform the check.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html. 
- SSL_VERIFYPEERnumber =- 1
- The optional flag that tells the system to verify the peer’s certificate. If the value is set to - 0the system will not check the certificate. Any other value would tell the system to perform the check.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html. 
- CAPATHstring =- /etc/ssl/certs
- The optional path to a directory holding multiple CA certificates. The system will use the certificates in the directory to verify the peer’s certificate. If the value is set to an empty string the system will not use the certificates. - Putting the empty string as a value of the parameter will effectively turn this option off as if it has never been configured for the database. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_CAPATH.html. 
- CAINFOstring =- /etc/ssl/certs/ca-certificates.crt
- The optional path to a file holding a bundle of CA certificates. The system will use the certificates in the file to verify the peer’s certificate. If the value is set to an empty string the system will not use the certificates. - Putting the empty string as a value of the parameter will effectively turn this option off as if it has never been configured for the database. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_CAINFO.html. 
- CAINFO_VALstring =- ""
- The optional value of a certificate bundle for a peer. This parameter is used in those cases when it’s impossible to inject the bundle directly into the Ingest workers’ environments. If a non-empty value of the parameter is provided then ingest servers will use it instead of the one mentioned (if any) in the above-described attribute - CAINFO.- Attention: Values of the attribute are the actual certificates, not file paths like in the case of - CAINFO.
- PROXY_SSL_VERIFYHOSTnumber =- 2
- The optional flag that tells the system to verify the host of the proxy. If the value is set to - 0the system will not check the host name against the certificate. Any other value would tell the system to perform the check.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_PROXY_SSL_VERIFYHOST.html. 
- PROXY_SSL_VERIFYPEERnumber =- 1
- The optional flag that tells the system to verify the peer’s certificate. If the value is set to - 0the system will not check the certificate. Any other value would tell the system to perform the check.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_PROXY_SSL_VERIFYPEER.html. 
- PROXY_CAPATHstring =- ""
- The optional path to a directory holding multiple CA certificates. The system will use the certificates in the directory to verify the peer’s certificate. If the value is set to an empty string the system will not use the certificates. - Putting the empty string as a value of the parameter will effectively turn this option off as if it has never been configured for the database. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_PROXY_CAPATH.html. 
- PROXY_CAINFOstring =- ""
- The optional path to a file holding a bundle of CA certificates. The system will use the certificates in the file to verify the peer’s certificate. If the value is set to an empty string the system will not use the certificates. - Putting the empty string as a value of the parameter will effectively turn this option off as if it has never been configured for the database. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_PROXY_CAINFO.html. 
- PROXY_CAINFO_VALstring =- ""
- The optional value of a certificate bundle for a proxy. This parameter is used in those cases when it’s impossible to inject the bundle directly into the Ingest workers’ environments. If a non-empty value of the parameter is provided then ingest servers will use it instead of the one mentioned (if any) in the above-described attribute - PROXY_CAINFO.- Attention: Values of the attribute are the actual certificates, not file paths like in the case of - PROXY_CAINFO.
- CURLOPT_PROXYstring =- ""
- Set the optional proxy to use for the upcoming request. The parameter should be a null-terminated string holding the host name or dotted numerical IP address. A numerical IPv6 address must be written within - [brackets].- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_PROXY.html. 
- CURLOPT_NOPROXYstring =- ""
- The optional string consists of a comma-separated list of host names that do not require a proxy to get reached, even if one is specified. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_NOPROXY.html. 
- CURLOPT_HTTPPROXYTUNNELnumber =- 0
- Set the optional tunnel parameter to - 1to tunnel all operations through the HTTP proxy (set with- CURLOPT_PROXY).- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_HTTPPROXYTUNNEL.html. 
- CONNECTTIMEOUTnumber =- 0
- The optional maximum time in seconds that the system will wait for a connection to be established. The default value means that the system will wait indefinitely. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_CONNECTTIMEOUT.html 
- TIMEOUTnumber =- 0
- The optional maximum time in seconds that the system will wait for a response from the server. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_TIMEOUT.html 
- LOW_SPEED_LIMITnumber =- 0
- The optional transfer speed in bytes per second that the system considers too slow and will abort the transfer. - This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_LOW_SPEED_LIMIT.html 
- LOW_SPEED_TIMEnumber =- 0
- The optional time in seconds that the system will wait for the transfer speed to be above the limit set by - LOW_SPEED_LIMIT.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_LOW_SPEED_TIME.html 
- ASYNC_PROC_LIMITnumber =- 0
- The optional maximum concurrency limit for the number of contributions to be processed in a scope of the database. The actual number of parallel requests may be further lowered by the hard limit specified by the Replication System worker’s configuration parameter ( - worker,- num-async-loader-processing-threads). The parameter can be adjusted in real time as needed. It gets into effect immediately. Putting- 0as a value of the parameter will effectively turn this option off as if it has never been configured for the database.- This attribute directly maps to https://curl.se/libcurl/c/CURLOPT_LOW_SPEED_TIME.html - Note: The parameter is available as of API version - 14.
If a request is successfully finished it returns the standard JSON object w/o any additional data but the standard completion status.
Retrieving configuration parameters#
Warning
As of version 14 of the API, the name of the database is required to be passed in the request’s query instead of
passing it in the JSON body. The older implementation was wrong.
| method | service | query parameters | 
|---|---|---|
| 
 | 
 | 
 | 
Where the mandatory query parameter database specifies the name of a database affected by the operation.
If the operation is successfully finished it returns an extended JSON object that has the following schema (in addition to the standard status and error reporting attributes):
{   "database" : <string>,
    "SSL_VERIFYHOST" : <number>,
    "SSL_VERIFYPEER" : <number>,
    "CAPATH" : <string>,
    "CAINFO" : <string>,
    "CAINFO_VAL" : <string>,
    "PROXY_SSL_VERIFYHOST" : <number>,
    "PROXY_SSL_VERIFYPEER" : <number>,
    "PROXY_CAPATH" : <string>,
    "PROXY_CAINFO" : <string>,
    "PROXY_CAINFO_VAL" : <string>,
    "CURLOPT_PROXY" : <string>,
    "CURLOPT_NOPROXY" : <string>,
    "CURLOPT_HTTPPROXYTUNNEL" : <number>,
    "CONNECTTIMEOUT" : <number>,
    "TIMEOUT" : <number>,
    "LOW_SPEED_LIMIT" : <number>,
    "LOW_SPEED_TIME" : <number>,
    "ASYNC_PROC_LIMIT" : <number>
}
The attributes of the response object are the same as the ones described in the section Setting configuration parameters.
Global configuration parameters of workers#
Note
This is the same service that was described in:
The response object of the service also returns the information on the workers.
There are two sectons related to workers in the response object. The first section config.general.worker
includes the general parameters of the ingest services. Values of the parameters are the same for all
workers. The second section config.workers has the information on the individual workers.
The general information on all workers#
The schema of the relevant section of the respionse object is illustrated by the following example:
{   "config": {
        "general" : {
            "worker" : {
                "num-loader-processing-threads" : 64,
                "num-http-loader-processing-threads" : 8,
                "num-async-loader-processing-threads" : 8,
                "ingest-charset-name" : "latin1",
                "ingest-max-retries" : 10,
                "ingest-num-retries" : 1,
                "loader-max-warnings" : 64,
                "async-loader-auto-resume" : 1,
                "async-loader-cleanup-on-resume" : 1,
            },
        }
    }
}
Where:
- config.general.workerobject
- A collection of the general parameters of the worker ingest service. 
- num-loader-processing-threadsnumber
- The number of ingest request processing threads in the service that supports the proprietary binary protocol. 
- num-http-loader-processing-threadsnumber
- The number of ingest request processing threads in the HTTP-based ingest service. Note that the service is used for processing synchronous contribution requess and for submitting the asynchronous requests to the service. 
- num-async-loader-processing-threadsnumber
- The number of ingest request processing threads in a thread pool that processes the asynchronous contribution requests. 
- ingest-charset-namestring
- The name of a character set for parsing the payload of the contributions. 
- ingest-max-retriesnumber
- The maximum number of the automated retries of failed contribution attempts in cases when such retries are still possible. The parameter represents the hard limit for the number of retries regardless of what’s specified in the related parameter - ingest-num-retriesor in the contributions requests made by the workflows. The primary purpose of the parameter is to prevent accidental overloading of the ingest system should a very large number of retries accidentally specified by the ingest workflows for individual contributions. Setting a value of the parameter to- 0will unconditionally disable any retries.
- ingest-num-retriesnumber
- The default number of the automated retries of failed contribution attempts in cases when such retries are still possible. The limit can be changed for individual contributions. Note that the effective number of retries specified by this parameter or the one set in the contribution requests can not exceed the hard limit set in the related parameter - ingest-max-retries. Setting a value of the parameter to 0 will disable automatic retries (unless they are explicitly enabled or requested by the ingest workflows for individual contributions).
- loader-max-warningsnumber
- The maximum number of warnings to retain after executing - LOAD DATA [LOCAL] INFILEwhen ingesting contributions into worker MySQL database. The warnings (if any) will be recorded in the persisent state of the Replication/Ingest system and returned to the ingest workflow upon request.
- async-loader-auto-resumenumber
- The flag controlling the behavior of the worker’s asynchronous ingest service after (the deliberate or accidental) restarts. If the value of the parameter is not - 0then the service will resume processing incomplete (queued or on-going) requests. Setting a value of the parameter to- 0will result in the unconditional failing of all incomplete contribution requests existed prior the restart.- Warning - Requests failed in the last (loading) stage can’t be resumed, and they will require aborting the corresponding transaction. If the automaticu resume is enabled rhese request will be automatically closed and marked as failed. 
- async-loader-cleanup-on-resumenumber
- The flag controlling the behavior of worker’s asynchronous ingest service after restarting the service. If the value of the parameter is not - 0the service will try to clean up the temporary files that might be left on disk for incomplete (queued or ongoing) requests. The option may be disabled to allow debugging the service.
Worker-specific information#
The schema of the relevant section of the respionse object is illustrated by the following example:
{   "config": {
        "workers" : [
            {   "name" : "db02",
                "is-enabled" : 1,
                "is-read-only" : 0,
                "loader-host" : {
                    "addr" : "172.24.49.52",
                    "name" : "sdfqserv002.sdf.slac.stanford.edu"
                },
                "loader-port" : 25002,
                "loader-tmp-dir" : "/qserv/data/ingest",
                "http-loader-host" : {
                    "name" : "sdfqserv002.sdf.slac.stanford.edu",
                    "addr" : "172.24.49.52"
                },
                "http-loader-port" : 25004,
                "http-loader-tmp-dir" : "/qserv/data/ingest",
            },
        ]
    }
}
Where:
- config.workersarray
- A collection of worker nodes, where each object represents a worker node. 
- namestring
- The unique identifier of a worker node. 
- is-enablednumber
- The flag that tells if the worker node is enabled. If the value is set to - 0the worker node is disabled. Workers which are not enables do not participate in the ingest activities.
- is-read-onlynumber
- The flag that tells if the worker node is read-only. If the value is set to - 0the worker node is read-write. Workers which are in the read-only statte do not participate in the ingest activities.
Parameters of the ingest service that supports the proprietary binary protocol:
- loader-hostobject
- The object with the information about the loader host. - addr: string The IP address of the lder host.
- name: string The FQDN (fully-qualified domain name) of the host.
 
- loader-portnumber
- The port number of the ingest service. 
- loader-tmp-dirstring
- The path to the temporary directory on the loader host that is used by the ingest service as a staging area for the contributions. 
Parameters of the HTTP-based ingest service:
- http-loader-hostobject
- The object with the information about the loader host. - addr: string The IP address of the lder host.
- name: string The FQDN (fully-qualified domain name) of the host.
 
- http-loader-portnumber
- The port number of the ingest service. 
- http-loader-tmp-dirstring
- The path to the temporary directory on the loader host that is used by the ingest service as a staging area for the contributions.