Cost based optimizations

openLooKeng supports several cost based optimizations, described below.

Join Enumeration

The order in which joins are executed in a query can have a significant impact on the query's performance. The aspect of join ordering that has the largest impact on performance is the size of the data being processed and transferred over the network. If a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query.

With cost based join enumeration, openLooKeng uses /optimizer/statistics provided by connectors to estimate the costs for different join orders and automatically pick the join order with the lowest computed costs.

The join enumeration strategy is governed by the join_reordering_strategy session property, with the optimizer.join-reordering-strategy configuration property providing the default value.

The valid values are:

  • AUTOMATIC (default) - full automatic join enumeration enabled
  • ELIMINATE_CROSS_JOINS - eliminate unnecessary cross joins
  • NONE - purely syntactic join order

If using AUTOMATIC and statistics are not available, or if for any other reason a cost could not be computed, the ELIMINATE_CROSS_JOINS strategy is used instead.

Join Distribution Selection

openLooKeng uses a hash based join algorithm. That implies that for each join operator a hash table must be created from one join input (called build side). The other input (probe side) is then iterated and for each row the hash table is queried to find matching rows.

There are two types of join distributions:

  • Partitioned: each node participating in the query builds a hash table from only fraction of the data
  • Broadcast: each node participating in the query builds a hash table from all of the data (data is replicated to each node)

Each type have their trade offs. Partitioned joins require redistributing both tables using a hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. In particular, broadcast joins will be faster if the build side is much smaller than the probe side. However, broadcast joins require that the tables on the build side of the join after filtering fit in memory on each node, whereas distributed joins only need to fit in distributed memory across all nodes.

With cost based join distribution selection, openLooKeng automatically chooses to use a partitioned or broadcast join. With cost based join enumeration, openLooKeng automatically chooses which side is the probe and which is the build.

The join distribution strategy is governed by the join_distribution_type session property, with the join-distribution-type configuration property providing the default value.

The valid values are:

  • AUTOMATIC (default) - join distribution type is determined automatically for each join
  • BROADCAST - broadcast join distribution is used for all joins
  • PARTITIONED - partitioned join distribution is used for all join

Dynamic Filter Creation

Dynamic filters are created for all JoinNode at the planning phase. However, due to the mechanism of dynamic filtering, under different scenarios, dynamic filters are harming the performance. The waiting for the build side of JoinNode that has high selectivity on the deep-subtrees’ dynamic filtering does not help the performance, while dynamic filtering on low selectivity hurts the performance. To accommodate both cases, openLooKeng can optimize the dynamic filter creation based on this characteristic.

The dynamic filter creation is governed by the optimize_dynamic_filter_generation session property.

The valid values are:

  • true (default) - enable dynamic filter creation
  • false - disable dynamic filter creation

Connector Implementations

In order for the openLooKeng optimizer to use the cost based strategies, the connector implementation must provide statistics.

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫