VACUUM

Synopsis

VACUUM TABLE table_name [FULL [UNIFY]] [PARTITION partition_value]? [AND WAIT]?

Description

BigData systems usually use HDFS as the storage to achieve durability and transparent distribution and balancing of data among the nodes in the cluster. HDFS being a immutable file system, data cannot be edited in between, but can only be appended. In order to work with the immutable file system, different file formats resort to writing new files in order to support data mutations, and later use asynchronous background merging to maintain the performance and avoid many small files.

For example, in Hive Connector you can update or delete ORC transactional table row by row. But whenever run update, an new delta and delete_delta file will be generated in HDFS file system. Use VACUUM can merge all those small files to a larger file, and optimize parallelism and performance

Types of VACUUMs:

Default

Default vacuum can be treated as a first level of merging small data sets of the table. These will be frequent and usually will be faster compared to FULL vacuum.

Hive:

Default Vacuum corresponds to 'Minor Compaction’ in Hive Connector. Merges all the valid delta directories into one compacted delta directory and similarly merge all valid delete_delta directories to one delete_delta directory. The base file will not be changed. The old, smaller delta files will be removed once all readers are finished reading them.

FULL

FULL vacuum can be treated as the next level of merging of all data sets of table. These will be less frequent and takes longer time to complete compare to default vacuum.

FULL UNIFY

UNIFY option shall help to combine multiple bucket file of each partition into single bucket file with bucket number as zero.

Hive:

FULL Vacuum corresponds to ‘Major Compaction’ in Hive Connector. Merges all base and delta files together. As part of this operation, the deleted or updated rows are permanently removed. All the aborted transactions are removed from the transaction table in the metastore. The old delta files will be removed once all readers are finished reading them.

The FULL keyword indicate whether to start a Major Compaction. Without this option, it will do a Minor compaction;

Use PARTITION clause to specify which partition to vacuum.

Use AND WAIT to identify this vacuum running as synchronous mode. Without this option, it will run as asynchronous mode.

Examples

Example 1: Default vacuum and wait for completion:

VACUUM TABLE compact_test_table AND WAIT;

Example 2: FULL vacuum on partition 'partition_key=p1':

VACUUM TABLE compact_test_table_with_partition FULL PARTITION 'partition_key=p1';

Example 3: FULL vacuum and wait for completion:

VACUUM TABLE compact_test_table_with_partition FULL AND WAIT;

Example 4: Unify all small files within 1 partition:

VACUUM TABLE catalog_sales FULL UNIFY PARTITION 'partition_key';

See Also

UPDATE, DELETE

有奖捉虫

“有虫”文档片段

0/500

存在的问题

文档存在风险与错误

● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;

内容描述不清晰

● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

内容获取有困难

● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;

示例代码有错误

● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;

内容有缺失

● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义

0/500

您对文档的总体满意度

非常不满意
非常满意

请问是什么原因让您参与到这个问题中

您的邮箱

创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。
有奖捉虫