Benchmark Driver

The benchmark driver can be used to measure the performance of queries in a openLooKeng cluster. We use it to continuously measure the performance of trunk.

Download the appropriate version of the benchmark driver executable jar file from Maven Central, for example presto-benchmark-driver-1.0.1-executable.jar, rename it to presto-benchmark-driver, then make it executable with chmod +x. If the specific version is not available, use 1.0.1 instead.

Suites

Create a suite.json file:

{
    "file_formats": {
        "query": ["single_.*", "tpch_.*"],
        "schema": [ "tpch_sf(?<scale>.*)_(?<format>.*)_(?<compression>.*?)" ],
        "session": {}
    },
    "legacy_orc": {
        "query": ["single_.*", "tpch_.*"],
        "schema": [ "tpch_sf(?<scale>.*)_(?<format>orc)_(?<compression>.*?)" ],
        "session": {
            "hive.optimized_reader_enabled": "false"
        }
    }
}

This example contains two suites file_formats and legacy_orc. The file_formats suite will run queries with names matching the regular expression single_.* or tpch_.* in all schemas matching the regular expression tpch_sf.*_.*_.*?. The legacy_orc suite adds a session property to disable the optimized ORC reader and only runs in thetpch_sf.*_orc_.*? schema.

Queries

The SQL files are contained in a directory named sql and must have the .sql file extension. The name of the query is the name of the file without the extension.

Output

The benchmark driver will measure the wall time, total CPU time used by all openLooKeng processes and the CPU time used by the query. For each timing, the driver reports median, mean and standard deviation of the query runs. The difference between process and query CPU times is the query overhead, which is normally from garbage collections. The following is the output from the file_formats suite above:

suite        query          compression format scale wallTimeP50 wallTimeMean wallTimeStd processCpuTimeP50 processCpuTimeMean processCpuTimeStd queryCpuTimeP50 queryCpuTimeMean queryCpuTimeStd
============ ============== =========== ====== ===== =========== ============ =========== ================= ================== ================= =============== ================ ===============
file_formats single_varchar none        orc    100   597         642          101         100840            97180              6373              98296           94610            6628
file_formats single_bigint  none        orc    100   238         242          12          33930             34050              697               32452           32417            460
file_formats single_varchar snappy      orc    100   530         525          14          99440             101320             7713              97317           99139            7682
file_formats single_bigint  snappy      orc    100   218         238          35          34650             34606              83                33198           33188            83
file_formats single_varchar zlib        orc    100   547         543          38          105680            103373             4038              103029          101021           3773
file_formats single_bigint  zlib        orc    100   282         269          23          38990             39030              282               37574           37496            156

Note that the above output has been reformatted for readability from the standard TSV that the driver outputs.

The driver can add additional columns to the output by extracting values from the schema name or SQL files. In the suite file above, the schema names contain named regular expression capturing groups for compression, format, and scale, so if we ran the queries in a catalog containing the schemas tpch_sf100_orc_none, tpch_sf100_orc_snappy, and tpch_sf100_orc_zlib, we get the above output.

Another way to create additional output columns is by adding tags to the SQL files. For example, the following SQL file declares two tags, projection and filter:

projection=true
filter=false
=================
SELECT SUM(LENGTH(comment))
FROM lineitem

This will cause the driver to output these values for each run of this query.

CLI Arguments

The presto-benchmark-driver program contains many CLI arguments to control which suites and queries to run, the number of warm-up runs and the number of measurement runs. All of the command line arguments can be seen with the --help option.

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫