HyperLogLog Functions

openLooKeng implements the approx_distinct function using the HyperLogLog data structure.

Data Structures

openLooKeng implements HyperLogLog data sketches as a set of 32-bit buckets which store a maximum hash. They can be stored sparsely (as a map from bucket ID to bucket), or densely (as a contiguous memory block). The HyperLogLog data structure starts as the sparse representation, switching to dense when it is more efficient. The P4HyperLogLog structure is initialized densely and remains dense for its lifetime.

hyperloglog_type implicitly casts to p4hyperloglog_type, while one can explicitly cast HyperLogLog to P4HyperLogLog:

cast(hll AS P4HyperLogLog)

Serialization

Data sketches can be serialized to and deserialized from varbinary. This allows them to be stored for later use. Combined with the ability to merge multiple sketches, this allows one to calculate approx_distinct of the elements of a partition of a query, then for the entirety of a query with very little cost.

For example, calculating the HyperLogLog for daily unique users will allow weekly or monthly unique users to be calculated incrementally by combining the dailies. This is similar to computing weekly revenue by summing daily revenue. Uses of approx_distinct with GROUPING SETS can be converted to use HyperLogLog. Examples:

CREATE TABLE visit_summaries (
  visit_date date,
  hll varbinary
);

INSERT INTO visit_summaries
SELECT visit_date, cast(approx_set(user_id) AS varbinary)
FROM user_visits
GROUP BY visit_date;

SELECT cardinality(merge(cast(hll AS HyperLogLog))) AS weekly_unique_users
FROM visit_summaries
WHERE visit_date >= current_date - interval '7' day;

Functions

approx_set(x) -> HyperLogLog

Returns the HyperLogLog sketch of the input data set of x. This data sketch underlies approx_distinct and can be stored and used later by calling cardinality().

cardinality(hll) -> bigint

This will perform approx_distinct on the data summarized by the hll HyperLogLog data sketch.

empty_approx_set() -> HyperLogLog

Returns an empty HyperLogLog.

merge(HyperLogLog) -> HyperLogLog

Returns the HyperLogLog of the aggregate union of the individual hll HyperLogLog structures.

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫