Warning
Known Issues and Limitations:
The API has a confusing inconsistency in the naming convention of identifiers
ID
,job-id
,jobId
,query-id
andqueryId
. These are used interchangeably in the documentation and error messages posted by Qserv. In reality, they all refer to the same entity - the autogenerated unique identifiers of the user queries submitted via this interface. This inconsistency will be addressed in the future.There is no way in the current API to get errors on queries failed during the asynchronous processing. The only tests made by the
SUBMIT
command are to ensure that:the query syntax is valid
the query refers to the correct context (names of databases, tables, columns)
Any problems detected by Qserv at the query entry point will be reported right away.
Asynchronous Query API#
This is a summary of Qserv’s asynchronous query API, as developed in:
Submitting Queries#
The general syntax for submitting queries for asynchronous processing is:
SUBMIT <query-text>
This is illustrated by the following example:
SUBMIT SELECT objectId FROM dp02_dc2_catalogs.Object LIMIT 3
If the query validation succeded and the query placed into a processing queue, the command will always return one row with
two columns, where the particularly interesting column is jobId
:
+--------+---------------------+
| jobId | resultLocation |
+--------+---------------------+
| 311817 | table:result_311817 |
+--------+---------------------+
At this poing the query is running asynchronously. The jobId
is the unique identifier of the query that can be used
for checking the query status, retrieving the results, or cancelling the query.
Checking Status#
Based on the jobId
returned by the SUBMIT
command, you can check the status of the query by querying the
information_schema.processlist
table:
SELECT * FROM information_schema.processlist WHERE id=311817\G
Note: \G
is a MySQL command that formats the output. It’s not part of the SQL syntax.
The command is quite handy to display result sets comprising many columns or having long values of
the columns in a more readable format.
The query will return a row with the following columns:
*************************** 1. row ***************************
ID: 311817
USER: anonymous
HOST: NULL
DB: dp02_dc2_catalogs
COMMAND: ASYNC
TIME: NULL
STATE: COMPLETED
INFO: SELECT objectId FROM dp02_dc2_catalogs.Object LIMIT 3
SUBMITTED: 2024-10-06 20:01:18
COMPLETED: 2024-10-06 20:01:18
RETURNED: NULL
CZARID: 9
RESULTLOCATION: table:result_311817
NCHUNKS: 1477
TotalChunks: NULL
CompletedChunks: NULL
LastUpdate: NULL
Particularly interesting columns here are:
ID
: the unique identifier of the original query (it’s the same asjobId
reported by theSUBMIT
command)STATE
: the query status, which can be one of:EXECUTING
,COMPLETED
,FAILED
, orABORTED
One can also use the following information commands to get the status of all active queries:
SHOW PROCESSLIST
SHOW FULL PROCESSLIST
Note: once the query is over and the results are retrieved, the corresponding row in the information_schema.processlist
table will be deleted. And the query status will no longer be available. However, Qserv will still maintain the history
of the queries in other system tables. You may contact the Qserv administrator to get the history of the queries should
you need it.
Retrieving Results#
To retrieve the results of a query, use the following syntax:
SELECT * FROM qserv_result(<query-id>)
This will return the full results (columns and rows) of the original query corresponding to the provided identifier of the query.
For example, the following query will return the results of the query with jobId
of 311817
:
SELECT * FROM qserv_result(311817)
+---------------------+
| objectId |
+---------------------+
| 1248649384967536732 |
| 1248649384967536769 |
| 1248649384967536891 |
+---------------------+
The command may be called one time only. The query result table will be deleted after returning the result set. Any subsequent attempts to retrieve the results will return an error message:
SELECT * FROM qserv_result(311817)
ERROR 1146 (42S02) at line 1: Table 'qservResult.result_311817' doesn't exist
Cancellation#
The general syntax for cancelling running queries is:
CANCEL <query-id>
The following example illustrates the technique for cancelling a query that is supposed to take a long time to complete (the query produces a very large result set):
SUBMIT SELECT objectId FROM dp02_dc2_catalogs.Object
+--------+---------------------+
| jobId | resultLocation |
+--------+---------------------+
| 311818 | table:result_311818 |
+--------+---------------------+
CANCEL 311816
+--------------------------------------+
| command |
+--------------------------------------+
| Trying to kill query: CANCEL 311818 |
+--------------------------------------+
Note that it’s okay to call the CANCEL
command multiple times. If the query has already completed, the command will
post the following error message:
CANCEL 311818
ERROR 4005 (Proxy) at line 1: KILL/CANCEL failed: Unknown or finished query ID: CANCEL 311818