Table Statistics

openLooKeng supports statistics based optimizations for queries. For a query to take advantage of these optimizations, openLooKeng must have statistical information for the tables in that query.

Table statistics are provided to the query planner by connectors. Currently, the only connector that supports statistics is the hive connector.

Table Layouts

  • Statistics are exposed to the query planner by a table layout. A table layout represents a subset of a table’s data and contains information about the organizational properties of that data (like sort order and bucketing).

    The number of table layouts available for a table and the details of those table layouts are specific to each connector. Using the Hive connector as an example:

    • Non-partitioned tables have just one table layout representing all data in the table
    • Partitioned tables have a family of table layouts. Each set of partitions to be scanned represents one table layout. openLooKeng will try to pick a table layout consisting of the smallest number of partitions based on filtering predicates from the query.

Available Statistics

The following statistics are available in openLooKeng:

  • For a table:
    • row count: the total number of rows in the table layout
  • For each column in a table:
    • data size: the size of the data that needs to be read
    • nulls fraction: the fraction of null values
    • distinct value count: the number of distinct values
    • low value: the smallest value in the column
    • high value: the largest value in the column

The set of statistics available for a particular query depends on the connector being used and can also vary by table or even by table layout. For example, the Hive connector does not currently provide statistics on data size.

Table statistics can be displayed via the openLooKeng SQL interface using the SHOW STATS command. For the Hive connector, refer to the Hive Connector documentation to learn how to update table statistics.

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫