Warning

Known Issues and Limitations:

The API has a confusing inconsistency in the naming convention of identifiers ID, job-id, jobId, query-id and queryId. These are used interchangeably in the documentation and error messages posted by Qserv. In reality, they all refer to the same entity - the autogenerated unique identifiers of the user queries submitted via this interface. This inconsistency will be addressed in the future.
There is no way in the current API to get errors on queries failed during the asynchronous processing. The only tests made by the SUBMIT command are to ensure that:
- the query syntax is valid
- the query refers to the correct context (names of databases, tables, columns)
Any problems detected by Qserv at the query entry point will be reported right away.

Asynchronous Query API#

Note

As of the version 40 of the Qserv API, it’s a responsibility of the client to explicitly delete results of the asynchronous queries. The result of a completed query will not be deleted automatically after the client’s attempt to pull the result. This allows the client application to manage the lifecycle of the query results more effectively and address possible issues on the client side should any problem arise (such as network failures or timeouts). The client should use a technique described in the Deleting Results section to delete the results.

Unclaimed results of the completed asynchronous queries will be deleted by Qserv automatically after a certain period of time as defined in a configuration of Qserv Czar. The default value of this parameter is set to 3600 seconds.

This is a summary of Qserv’s asynchronous query API, as developed in:

https://rubinobs.atlassian.net/browse/DM-4451

Submitting Queries#

The general syntax for submitting queries for asynchronous processing is:

SUBMIT <query-text>

This is illustrated by the following example:

SUBMIT SELECT COUNT(*) FROM dp02_dc2_catalogs.Object WHERE coord_ra != 0

If the query validation succeded and the query placed into a processing queue, the command will always return one row with two columns, where the particularly interesting column is jobId:

+--------+---------------------+
| jobId  | resultLocation      |
+--------+---------------------+
| 313689 | table:result_313689 |
+--------+---------------------+

At this poing the query is running asynchronously. The jobId is the unique identifier of the query that can be used for checking the query status, retrieving the results, or cancelling the query.

Checking Status#

Based on the jobId returned by the SUBMIT command, you can check the status of the query by querying the information_schema.processlist table:

SELECT * FROM information_schema.processlist WHERE id=313689\G

Note: \G is a MySQL command that formats the output. It’s not part of the SQL syntax. The command is quite handy to display result sets comprising many columns or having long values of the columns in a more readable format.

If the query is still being executed the information schema query will return a row with the following columns:

*************************** 1. row ***************************
         ID: 313689
       TYPE: ASYNC
       CZAR: proxy
    CZAR_ID: 9
  SUBMITTED: 2025-02-06 08:58:18
    UPDATED: 2025-02-06 08:58:18
     CHUNKS: 1477
CHUNKS_COMP: 739
      QUERY: SELECT COUNT(*) FROM dp02_dc2_catalogs.Object WHERE coord_ra != 0

Particularly interesting columns here are:

ID: the unique identifier of the original query (it’s the same as jobId reported by the SUBMIT command)
TYPE: the query type, which is always ASYNC for asynchronous queries
SUBMITTED: the timestamp when the query was submitted
UPDATED: the timestamp of the last update of the query status
CHUNKS: the total number of chunks to be processed
CHUNKS_COMP: the number of chunks already processed

The user may periodically repeat this command to compute the performance metrics of the query execution ad to get an estimate of the remaining time to completion.

One can also use the following information commands to get the status of all active queries:

SHOW PROCESSLIST
SHOW FULL PROCESSLIST

For example the SHOW PROCESSLIST command will return:

+--------+---------+-------+---------+---------------------+---------------------+--------+-------------+----------------------------------+
| ID     | COMMAND | CZAR  | CZAR_ID | SUBMITTED           | UPDATED             | CHUNKS | CHUNKS_COMP | QUERY                            |
+--------+---------+-------+---------+---------------------+---------------------+--------+-------------+----------------------------------+
| 313689 | ASYNC   | proxy |       9 | 2025-02-06 08:58:18 | 2025-02-06 08:58:18 |   1477 |           1 | SELECT COUNT(*) FROM dp02_dc2_ca |
+--------+---------+-------+---------+---------------------+---------------------+--------+-------------+----------------------------------+

The result set of the PROCESSLIST queries will be empty if the query has already completed. In this case, the query status can be retrieved by querying the query history table:

SELECT * FROM information_schema.queries WHERE id=313689\G

The query will return:

*************************** 1. row ***************************
            ID: 313689
          TYPE: ASYNC
          CZAR: proxy
       CZAR_ID: 9
        STATUS: COMPLETED
     SUBMITTED: 2025-02-06 08:58:18
     COMPLETED: 2025-02-06 08:58:21
      RETURNED: NULL
        CHUNKS: 1477
         BYTES: 13856
ROWS_COLLECTED: 1477
          ROWS: 1
           DBS: dp02_dc2_catalogs
         QUERY: SELECT COUNT(*) FROM dp02_dc2_catalogs.Object WHERE coord_ra !=0
         ERROR:

Particularly interesting columns here are:

STATUS: the query status, which can be one of: EXECUTING, COMPLETED, FAILED, FAILED_LR, or ABORTED, where:
- EXECUTING means the query is still being processed
- COMPLETED means the query has been successfully completed
- FAILED means the query has failed during processing
- FAILED_LR means the query has failed after hitting the limit of the maximum size of the result set to be returned (the “Large Result” failure mode)
- ABORTED means the query has been cancelled by the user, or the query has been aborted after Qserv was restarted
ERROR: the error message if the query has failed during processing. This column will be empty if the query has been successfully completed.

Here is another example of a query that has failed with the “Large Result” error:

*************************** 1. row ***************************
            ID: 404591
          TYPE: ASYNC
          CZAR: proxy
       CZAR_ID: 9
        STATUS: FAILED_LR
     SUBMITTED: 2025-07-21 05:36:15
     COMPLETED: 2025-07-21 05:36:24
      RETURNED: NULL
        CHUNKS: 86
         BYTES: 860211304
ROWS_COLLECTED: 521359
          ROWS: 521359
           DBS: dp1
         QUERY: SELECT * FROM dp1.Source
         ERROR: MERGE_ERROR 1470 (QI=404591:81; cancelling the query, queryResult
                table result_404591 is too large at 571829258 bytes, max allowed
                size is 536870912 bytes) 2025-07-21T05:36:23+0000

Retrieving Results#

To retrieve the results of a query, use the following syntax:

SELECT * FROM qserv_result(<query-id>)

This will return the full results (columns and rows) of the original query corresponding to the provided identifier of the query.

For example, the following query will return the results of the query with jobId of 313689:

SELECT * FROM qserv_result(313689)
+-----------+
| COUNT(*)  |
+-----------+
| 278318452 |
+-----------+

The command may be called one time only. The query result table will be deleted after returning the result set. Any subsequent attempts to retrieve the results will return an error message:

SELECT * FROM qserv_result(313689)
ERROR 1146 (42S02) at line 1: Table 'qservResult.result_313689' doesn't exist

Deleting Results#

To delete the results of a query, use the following syntax:

CALL qserv_result_delete(<query-id>)

This will delete the result table corresponding to the provided identifier of the query.

Cancellation#

The general syntax for cancelling running queries is:

CANCEL <query-id>

The following example illustrates the technique for cancelling a query that is supposed to take a long time to complete (the query produces a very large result set):

SUBMIT SELECT objectId FROM dp02_dc2_catalogs.Object
+--------+---------------------+
| jobId  | resultLocation      |
+--------+---------------------+
| 311818 | table:result_311818 |
+--------+---------------------+

CANCEL 311816
+--------------------------------------+
| command                              |
+--------------------------------------+
| Trying to kill query: CANCEL 311818  |
+--------------------------------------+

Note that it’s okay to call the CANCEL command multiple times. If the query has already completed, the command will post the following error message:

CANCEL 311818
ERROR 4005 (Proxy) at line 1: KILL/CANCEL failed: Unknown or finished query ID: CANCEL 311818