Audit Log openLooKeng audit logging functionality is a custom event listener, which monitors the start and stop of openLooKeng cluster and the dynamic addition and deletion of nodes in the cluster; Listen to WebUi user login and exit events; Listen for query events and call when the query is created and completed (success or failure). An audit log contains the following information: time when an event occurs user ID address or identifier of the access initiator event type (operation) name of accessed resource result of an event In an openLooKeng cluster, only a single event listener plugin can be active at a time.

Distributed sort Distributed sort allows to sort data which exceeds query.max-memory-per-node. Distributed sort is enabled via distributed_sort session property or distributed-sort configuration property set in etc/ of the coordinator. Distributed sort is enabled by default. When distributed sort is enabled, sort operator executes in parallel on multiple nodes in the cluster. Partially sorted data from each openLooKeng worker node is then streamed to a single worker node for a final merge.

Dynamic Catalog This section introduces the dynamic catalog feature of openLooKeng. Normally openLooKeng admins add data source to the engine by putting a catalog profile (e.g. in the connector directory (etc/catalog). Whenever there is a requirement to add, update or delete a catalog, all the coordinators and workers need to be restarted. In order to dynamically change the catalogs on-the-fly, openLooKeng introduced dynamic catalog feature.The principle of a dynamic catalog is to manage the catalog-related configuration files on a shared file system and then synchronize all coordinator and worker nodes from the shared file system to local and load them.

Dynamic Filtering This section describes the openLooKeng dynamic filtering features. Dynamic filtering is suitable for highly selective join scenarios, i.e., most of the tables on the probe side are filtered out after reading because they do not match the join conditions. openLooKeng generates dynamic filter conditions based on join conditions and data read from build side during query run, and is applied to the table scan stage of probe side table as an additional filter condition, to reduce the data volume of probe table participating in join operation and effectively reduce IO read and network transmission.

Extension Physical Execution Planner This section describes how to add an extension physical execution planner in openLooKeng. With the extension physical execution planner, openLooKeng can utilize other operator acceleration libraries to speed up the execution of SQL statements. Configuration To enable extension physical execution feature, the following configs must be added in extension_execution_planner_enabled=true extension_execution_planner_jar_path=file:///xxPath/omni-openLooKeng-adapter-1.6.1-SNAPSHOT.jar extension_execution_planner_class_path=nova.hetu.olk.OmniLocalExecutionPlanner The above attributes are described below: extension_execution_planner_enabled: Enable extension physical execution feature. extension_execution_planner_jar_path: Set the file path of the extension physical execution jar package.

Function Namespace Managers Warn This feature is developing now. So some interfaces and configuration may be changed in the next version. Introduction Function Namespace Managers support storing external function, and the external functions which register from connectors will be stored in it. A function namespace is in the format of catalog.schema(For example:mysqlfun.default). It is only a schema for storing function, but not for storing table and view. Every function in openLooKeng, no matter built in function or external function, belongs to a function namespace catalog.

Horizontal Scaling Use cases Auto or manual scaling Nodes can be added to or removed from openLooKeng clusters dynamically, to support scaling scenarios. While it’s the responsibility of the Resource Provider to determine when to scale and, in the case of scaling-in, which nodes to remove, openLooKeng ensures that these changes function properly. In particular, during a scale-in, removal of a node will not impact workloads on that node. For auto-scaling scenarios, the resource provider may want to base the decision on:

Meta Store This section describes the openLooKeng meta store. Meta store is used to store metadata. Meta store supports to store metadata in the RDBMS or HDFS. Configuring Meta Store Create the meta store property file etc/ first. RDBMS Storage Add the following contents in hetu.metastore.type=jdbc hetu.metastore.db.url=jdbc:mysql://.... hetu.metastore.db.user=root hetu.metastore.db.password=123456 hetu.metastore.cache.type=local The above properties are described below: hetu.metastore.type:The type of meta store, set jdbc to use RDBMS storage. hetu.metastore.db.url:URL of RDBMS to connect to.

JDBC Data Source Multi-Split Management Overview This function applies to JDBC data sources. Data tables to be read are divided into multiple splits, and multiple worker nodes in the cluster simultaneously read the splits to accelerate data reading. Properties Multi-split management is based on connectors. For a data table with this function enabled, add the following attributes to the configuration file of the connector to which the data table belong. For example, the configuration file corresponding to the mysql connector is etc/mysql.

Properties Reference This section describes the most important config properties that may be used to tune openLooKeng or alter its behavior when required. General Properties join-distribution-type Type: string Allowed values: AUTOMATIC, PARTITIONED, BROADCAST Default value: AUTOMATIC The type of distributed join to use. When set to PARTITIONED, openLooKeng will use hash distributed joins. When set to BROADCAST, it will broadcast the right table to all nodes in the cluster that have data from the left table.

Reliable Query Execution Overview When a node in a cluster fails as result of network, hardware, or software issues, all queries with tasks running on the failing node will be lost. This can significantly impact cluster productivity and waste precious resources, especially for long running queries. One way to overcome this is to automatically rerun those impacted queries. This reduces the need for human intervention and increases fault tolerance, but as a result, the total execution time can be much longer.

Resource Groups Resource groups place limits on resource usage, and can enforce queueing policies on queries that run within them or divide their resources among sub-groups. A query belongs to a single resource group, and consumes resources from that group (and its ancestors). Except for the limit on queued queries, when a resource group runs out of a resource it does not cause running queries to fail; instead new queries become queued.

Session Property Managers Administrators can add session properties to control the behavior for subsets of their workload. These properties are defaults and can be overridden by users (if authorized to do so). Session properties can be used to control resource usage, enable or disable features, and change query characteristics. Session property managers are pluggable. Add an etc/ file with the following contents to enable the built-in manager that reads a JSON config file:

Spill to Disk Overview In the case of memory intensive operations, openLooKeng allows offloading intermediate operation results to disk. The goal of this mechanism is to enable execution of queries that require amounts of memory exceeding per query or per node limits. The mechanism is similar to OS level page swapping. However, it is implemented on the application level to address specific needs of openLooKeng. Properties related to spilling are described in tuning-spilling.

State Store This section describes the openLooKeng state store. State store is used to store states that are shared between state store members and state store clients. State store cluster is composed of state store members, and state store clients can do all state store operations without being a member of cluster. It is highly recommended that configure coordinators as state store members and workers as state store clients. Usage State store is currently used by HA and dynamic filters features.

Tuning openLooKeng The default openLooKeng settings should work well for most workloads. The following information may help you if your cluster is facing a specific performance problem. Config Properties See Properties Reference. JVM Settings The following can be helpful for diagnosing GC issues: -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCCause -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:PrintFLSStatistics=2 -XX:+PrintAdaptiveSizePolicy -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1

Web Interface openLooKeng provides a web interface for monitoring and managing queries. The web interface is accessible on the openLooKeng coordinator via HTTP, using the HTTP port number specified in the coordinator config_properties. The main page has a list of queries along with information like unique query ID, query text, query state, percentage completed, username and source from which this query originated. The currently running queries are at the top of the page, followed by the most recently completed or failed queries.

Carbondata Connector Overview The Carbondata connector allows querying data stored in a Carbondata warehouse. Carbondata is a combination of three components: Data files in carbondata storage formats that are typically stored in the Hadoop Distributed File System (HDFS). This metadata is only for table and column schema validation. carbondata metadata is stored along with the data files and is accessed via the Hive Metastore Service(HMS). A query language called HiveQL/SparkSQL.

ClickHouse Connector Overview The ClickHouse connector allows querying on an external ClickHouse database. This can be used to join data between different systems like ClickHouse and Hive, or between two different ClickHouse instances. Configuration Basic configuration To configure the ClickHouse connector, create a catalog properties file inetc/catalog named, for example,, to mount the ClickHouse connector as the clickhouse catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup.

Data Center Connector The Data Center connector allows querying a remote openLooKeng data center. This can be used to join data between different openLooKeng clusters from the local openLooKeng environment. Local DC Connector Configuration To configure the Data Center connector, create a catalog properties file in etc/catalog named, for example, <dc-name>.properties, to mount the Data Center connector as the <dc-name> catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Elasticsearch Connector Overview The Elasticsearch Connector allows access to Elasticsearch data from openLooKeng. This document describes how to setup the Elasticsearch Connector to run SQL queries against Elasticsearch. Note It is highly recommended to use Elasticsearch 6.0.0 or later. Configuration To configure the Elasticsearch connector, create a catalog properties file etc/catalog/ with the following contents, replacing the properties as appropriate: elasticsearch.port=9200 elasticsearch.default-schema-name=default Configuration Properties The following configuration properties are available:

Greenplum Connector The Greenplum connector allows querying and creating tables in an external Greenplum database. This can be used to join data between different systems like Greenplum and Hive, or between two different Greenplum instances. The Greenplum connector is extended from PostgreSQL connector, and We enable query push down feature for Greenplum connector. Configuration The base configuration of Greenplum connector is the same as PostgreSQL connector. For example, you can mount the Greenplum connector by creating a file named greenplum.

Hana Connector Overview The Hana connector allows querying and creating tables on an external Hana database. This can be used to join data between different systems like Hana and Hive, or between two different Hana instances. Configurations Basic configuration First of all, we should finish the following steps before you start to use hana connector. JDBC Connection details to connect to the SAP HANA It should be written in form of a regular openLooKeng connector config (eg.

HBase Connector Overview The HBase Connector allows querying and creating tables on an external Apache HBase instance. Users can create a table in HBase connector, mapping it to an existing table in HBase Cluster, and support insert, select, or delete. The HBase Connector maintains a Metastore to persist HBase metadata, currently support Metastore: openLooKeng Metastore. Note: Apache HBase 2.2.3 version or ealier versions are supported in HBase connector Connector Configuration 1 .

Hive Connector GCS Tutorial Preliminary Steps Ensure Access to GCS Access to Cloud Storage data is possible thanks to Hadoop Cloud Storage connector. If your data is publicly available, you do not need to do anything here. However, in most cases data is not publicly available, and the openLooKeng cluster needs to have access to it. This is typically achieved by creating a service account which has permissions to access your data.

Hive Security Configuration Authorization You can enable authorization checks for the hive by setting the property in the Hive catalog properties file. This property must be one of the following values: Property Value Description legacy (default value) Few authorization checks are enforced, thus allowing most operations. The config properties hive.allow-drop-table, hive.allow-rename-table, hive.allow-add-column, hive.allow-drop-column and hive.allow-rename-column are used. read-only Operations that read data or metadata, such as SELECT, are permitted, but none of the operations that write data or metadata, such as CREATE, INSERT or DELETE, are allowed.

Hive Connector Overview The Hive connector allows querying data stored in a Hive data warehouse. Hive is a combination of three components: Data files in varying formats that are typically stored in the Hadoop Distributed File System (HDFS) or in Amazon S3. Metadata about how the data files are mapped to schemas and tables. This metadata is stored in a database such as MySQL and is accessed via the Hive metastore service.

Hudi Connector Release Notes Currently Hudi only supports version 0.7.0. Hudi Introduction Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi enables storing vast amounts of data on top of existing DFS compatible storage while also enabling stream processing in addition to typical batch-processing. This is made possible by providing two new primitives. Specifically, Update/Delete Records: Hudi provides support for updating/deleting records, using fine grained file/record level indexes, while providing transactional guarantees for the write operation.

JMX Connector The JMX connector provides the ability to query JMX information from all nodes in a openLooKeng cluster. This is very useful for monitoring or debugging. Java Management Extensions (JMX) provides information about the Java Virtual Machine and all of the software running inside it. openLooKeng itself is heavily instrumented via JMX. This connector can also be configured so that chosen JMX information will be periodically dumped and stored in memory for later access.

Kafka Connector Tutorial Introduction The Kafka Connector for openLooKeng allows access to live topic data from Apache Kafka using openLooKeng. This tutorial shows how to set up topics and how to create the topic description files that back openLooKeng tables. Installation This tutorial assumes familiarity with openLooKeng and a working local openLooKeng installation (see Deploying openLooKeng Manually. It will focus on setting up Apache Kafka and integrating it with openLooKeng.

Kafka Connector Overview This connector allows the use of Apache Kafka topics as tables in openLooKeng. Each message is presented as a row in openLooKeng. Topics can be live: rows will appear as data arrives and disappear as segments get dropped. This can result in strange behavior if accessing the same table multiple times in a single query (e.g., performing a self join). Note The minimum supported Kafka broker version is 0.

Kylin Connector The Kylin connector allows querying tables in an external Kylin database. This can be used to join data between different systems like Kylin and Hive, or between two different Kylin instances. Configuration To configure the Kylin connector, create a catalog properties file inetc/catalog named, for example,, to mount the Kylin connector as the Kylin catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Local File Connector The local file connector allows querying data stored on the local file system of each worker. Configuration To configure the local file connector, create a catalog properties file under etc/catalog named, for example, with the following contents: Configuration Properties Property Name Description presto-logs.http-request-log.location Directory or file where HTTP request logs are written presto-logs.http-request-log.pattern If the log location is a directory this glob is used to match file names in the directory Local File Connector Schemas and Tables The local file connector provides a single schema named logs.

Memory Connector The Memory Connector stores data and metadata in RAM on workers to allow for fast queries. Data and metadata are spilled to local disk and automatically reloaded if nodes are restarted. Configuration Memory Connector Configuration To configure the Memory Connector, create or modify the catalog properties file etc/catalog/ for the Memory Connector. For example, you can write: memory.max-data-per-node=200GB memory.spill-path=/opt/hetu/data/spill Other Required Configurations This section will cover other configurations required to use Memory Connector.

MongoDB Connector The MongoDB connector allows MongoDB collections to be used as tables in the openLooKeng. Note: MongoDB 2.6 and later versions are supported, you are advised to use version 3.0 or later. Configuration To configure the MongoDB connector, create a catalog property file etc/catalog/ by referring to the following content and replace the properties as required: mongodb.seeds=host1,host:port Multiple MongoDB Clusters Multiple catalogs can be created as required. Therefore, if there is an additional MongoDB cluster, you only need to add another property file with a different name to etc/catalog (ensure that it ends with .

MySQL Connector The MySQL connector allows querying and creating tables in an external MySQL database. This can be used to join data between different systems like MySQL and Hive, or between two different MySQL instances. Configuration To configure the MySQL connector, create a catalog properties file inetc/catalog named, for example,, to mount the MySQL connector as the mysql catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

OmniData Connector Overview The OmniData connector allows querying data stored in the remote Hive data warehouse. It pushes the operators of openLooKeng down to the storage node to achieve near-data calculation, thereby reducing the amount of network transmission data and improving computing performance. For more information, please see: OmniData and OmniData connector. Supported File Types The following file types are supported for the OmniData connector: ORC Parquet Text Configuration Create etc/catalog/omnidata.

openGauss Connector The openGauss connector allows querying and creating tables in an external openGauss database. This can be used to join data between different systems like openGauss and Hive, or between two different openGauss instances. Configuration To configure the openGauss connector, create a catalog properties file in etc/catalog named, for example,, to mount the openGauss connector as the opengauss catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Oracle Connector Overview The Oracle connector allows querying and creating tables in an external Oracle database. It can be used to join data between different databases, such as Oracle and Hive, or two different Oracle instances. Configuration Basic Configuration Before using the Oracle connector, you should prepare: JDBC connection details for connecting to the Oracle database The details should be written in a regular openLooKeng connector configuration (for example, the openLooKeng catalog named oracle uses oracle.

PostgreSQL Connector The PostgreSQL connector allows querying and creating tables in an external PostgreSQL database. This can be used to join data between different systems like PostgreSQL and Hive, or between two different PostgreSQL instances. Configuration To configure the PostgreSQL connector, create a catalog properties file in etc/catalog named, for example,, to mount the PostgreSQL connector as the postgresql catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Redis Connector Overview this connector allows the use of Redis key/value pair is presented as a single row in openLooKeng. Note In Redis,key/value pair can only be mapped to string or hash value types.keys can be stored in a zset,then keys can split into multiple slice Support Redis 2.8.0 or higher Configuration To configure the Redis connector, create a catalog properties file etc/catalog/ with the following contents, replacing the properties as appropriate:

SQL Server Connector The SQL Server connector allows querying and creating tables in an external SQL Server database. This can be used to join data between different systems like SQL Server and Hive, or between two different SQL Server instances. Configuration To configure the SQL Server connector, create a catalog properties file in etc/catalog named, for example,, to mount the SQL Server connector as the sqlserver catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

System Connector The System connector provides information and metrics about the currently running openLooKeng cluster. It makes this available via normal SQL queries. Configuration The System connector doesn't need to be configured: it is automatically available via a catalog named system. Using the System Connector List the available system schemas: SHOW SCHEMAS FROM system; List the tables in one of the schemas: SHOW TABLES FROM system.runtime; Query one of the tables:

Thrift Connector The Thrift connector makes it possible to integrate with external storage systems without a custom openLooKeng connector implementation. In order to use the Thrift connector with an external system, you need to implement the PrestoThriftService interface, found below. Next, you configure the Thrift connector to point to a set of machines, called Thrift servers, that implement the interface. As part of the interface implementation, the Thrift servers will provide metadata, splits and data.

TPCDS Connector The TPCDS connector provides a set of schemas to support the TPC Benchmark™ DS (TPC-DS). TPC-DS is a database benchmark used to measure the performance of complex decision support databases. This connector can also be used to test the capabilities and query syntax of openLooKeng without configuring access to an external data source. When you query a TPCDS schema, the connector generates the data on the fly using a deterministic algorithm.

TPCH Connector The TPCH connector provides a set of schemas to support the TPC Benchmark™ H (TPC-H). TPC-H is a database benchmark used to measure the performance of highly-complex decision support databases. This connector can also be used to test the capabilities and query syntax of openLooKeng without configuring access to an external data source. When you query a TPCH schema, the connector generates the data on the fly using a deterministic algorithm.

VDM Connector There are use cases that multiple data sources need to be managed and visited together in one single session or view. Also, users may not care about the distribution and source of data at all. The VDM (Virtualize Data Market) connector is aimed at bringing in this feature to openLooKeng. The VDM connector supports to: Create, update and delete views that combines multiple catalogs Visit real data through the views Manage user privilege through the views Log the use of VDM views by each user Usage VDM uses openLooKeng metastore to store its database information.

Connectors Connectors are the source of all data for queries in openLooKeng. Even if your data source doesn't have underlying tables backing it, as long as you adapt your data source to the API expected by openLooKeng, you can write queries against this data. ConnectorFactory Instances of your connector are created by a ConnectorFactory instance which is created when openLooKeng calls getConnectorFactory() on the plugin. The connector factory is a simple interface responsible for creating an instance of a Connector object that returns instances of the following services:

Event Listener openLooKeng supports custom event listeners that are invoked for the following events: Query creation Query completion (success or failure) Split completion (success or failure) This functionality enables development of custom logging, debugging and performance analysis plugins. In a openLooKeng cluster, only a single event listener plugin can be active at a time . Implementation EventListenerFactory is responsible for creating an EventListener instance. It also defines an EventListener name which is used by the administrator in a openLooKeng configuration.

Example HTTP Connector The Example HTTP connector has a simple goal: it reads comma-separated data over HTTP. For example, if you have a large amount of data in a CSV format, you can point the example HTTP connector at this data and write a SQL query to process it. Code The Example HTTP connector can be found in the presto-example-http directory in the root of the openLooKeng source tree. Plugin Implementation The plugin implementation in the Example HTTP connector looks very similar to other plugin implementations.

External Function Registration and Push Down Introduction The connector can register external function into openLooKeng. In Jdbc connector, openLooKeng can push them down to data source which support to execute those functions. Function Registration in Connector The connector can register external function into openLooKeng. The user can find the example code about how to register external function through connector in presto-mysql/src/main/java/io.prestosql/plugin/mysql/. It is an example implement in mysql connector. There are two steps to register external functions through a connector.

Filesystem Access Utilities Overview openLooKeng project includes a set of filesystem client utilities to help access and modifying files. Currently, two sets of filesystems are supported: HDFS and local filesystem. A HetuFileSystemClient interface is provided in the SPI, which defines the common file operations to be used in the project. The goal of this client is to provide unified interface, behaviors and exceptions across different filesystems. Therefore client code can easily reuse codes and transfer their logic without having to change the code.

Functions Plugin Implementation The function framework is used to implement SQL functions. openLooKeng includes a number of built-in functions. In order to implement new functions, you can write a plugin that returns one more more functions from getFunctions(): public class ExampleFunctionsPlugin implements Plugin { @Override public Set<Class<?>> getFunctions() { return ImmutableSet.<Class<?>>builder() .add(ExampleNullFunction.class) .add(IsNullFunction.class) .add(IsEqualOrNullFunction.class) .add(ExampleStringFunction.class) .add(ExampleAverageFunction.class) .build(); } } Note that the ImmutableSet class is a utility class from Guava. The getFunctions() method contains all of the classes for the functions that we will implement below in this tutorial.

Getting Started Requirements Mac OS X or Linux Java 8 Update 161 or higher (8u161+), 64-bit. Both Oracle JDK and OpenJDK are supported. AArch64 (Bisheng JDK 1.8.262 or higher) Maven 3.3.9+ (for building) Python 2.4+ (for running with the launcher script) Building openLooKeng Core openLooKeng Core is a standard Maven project. Simply run the following command from the project root directory: ./mvnw clean install On the first build, Maven will download all the dependencies from the internet and cache them in the local repository (~/.

ORC Cache ORC Cache feature improves the query performance by caching frequently accessed data. ORC Cache reduces time spent on TableScan operation because the network IO This in turn reduces the query latency. This feature is most beneficial for caching raw data from tables that are most frequently accessed and not co-located with the openLooKeng deployment. If enabled, workers automatically cache file tail, stripe footer, row index, bloom index of all ORC files because they are small.

Password Authenticator openLooKeng supports authentication with a username and password via a custom password authenticator that validates the credentials and creates a principal. Implementation PasswordAuthenticatorFactory is responsible for creating a PasswordAuthenticator instance. It also defines the name of this authenticator which is used by the administrator in a openLooKeng configuration. PasswordAuthenticator contains a single method, createAuthenticatedPrincipal(), that validates the credential and returns a Principal, which is then authorized by the system-access-control.

SPI Overview When you implement a new openLooKeng plugin, you implement interfaces and override methods defined by the SPI. Plugins can provide additional Connectors, Types, Functions and System Access Control. In particular, connectors are the source of all data for queries in openLooKeng: they back each catalog available to openLooKeng. Code The SPI source can be found in the presto-spi directory in the root of the openLooKeng source tree. Plugin Metadata Each plugin identifies an entry point: an implementation of the Plugininterface.

System Access Control openLooKeng separates the concept of the principal who authenticates to the coordinator from the username that is responsible for running queries. When running the openLooKeng CLI, for example, the openLooKeng username can be specified using the --user option. By default, the openLooKeng coordinator allows any principal to run queries as any openLooKeng user. In a secure environment, this is probably not desirable behavior and likely requires customization.

Types The Type interface in openLooKeng is used to implement a type in the SQL language. openLooKeng ships with a number of built-in types, like VarcharType and BigintType. The ParametricType interface is used to provide type parameters for types, to allow types like VARCHAR(10) or DECIMAL(22, 5). A Plugin can provide new Type objects by returning them from getTypes() and new ParametricType objects by returning them from getParametricTypes(). Below is a high level overview of the Type interface, for more details see the JavaDocs for Type.

openLooKeng FAQs Common Questions What license is used by the openLooKeng? The openLooKeng uses Apache License 2.0. Does the openLooKeng support x86 and ARM? What OSs are supported? The openLooKeng supports both x86 and ARM. Currently, openLooKeng can run on Linux OSs, such as openEuler, CentOS, and openSUSE. What should I do if I find a vulnerability in the openLooKeng?

Aggregate Functions Aggregate functions operate on a set of values to compute a single result. Except for count, count_if, max_by, min_by andapprox_distinct, all of these aggregate functions ignore null values and return null for no input rows or when all values are null. For example, sum returns null rather than zero and avg does not include null values in the count. The coalesce function can be used to convert null into zero.

Array Functions and Operators Subscript Operator: [] The [] operator is used to access an element of an array and is indexed starting from one: SELECT my_array[1] AS first_element Concatenation Operator: || The || operator is used to concatenate an array with an array or an element of the same type: SELECT ARRAY [1] || ARRAY [2]; -- [1, 2] SELECT ARRAY [1] || 2; -- [1, 2] SELECT 2 || ARRAY [1]; -- [2, 1] Array Functions array_distinct(x) -> array

Binary Functions and Operators Binary Operators The || operator performs concatenation. Binary Functions length(binary) -> bigint Returns the length of binary in bytes. concat(binary1, ..., binaryN) -> varbinary Returns the concatenation of binary1, binary2, ..., binaryN. This function provides the same functionality as the SQL-standard concatenation operator (||). substr(binary, start) -> varbinary Returns the rest of binary from the starting position start, measured in bytes. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string.

Bitwise Functions bit_count(x, bits) -> bigint Count the number of bits set in x (treated as bits-bit signed integer) in 2's complement representation: SELECT bit_count(9, 64); -- 2 SELECT bit_count(9, 8); -- 2 SELECT bit_count(-7, 64); -- 62 SELECT bit_count(-7, 8); -- 6 bitwise_and(x, y) -> bigint Returns the bitwise AND of x and y in 2's complement representation. bitwise_not(x) -> bigint Returns the bitwise NOT of x in 2's complement representation.

Color Functions bar(x, width) -> varchar Renders a single bar in an ANSI bar chart using a default low_color of red and a high_color of green. For example, if x of 25% and width of 40 are passed to this function. A 10-character red bar will be drawn followed by 30 spaces to create a bar of 40 characters. bar(x, width, low_color, high_color) -> varchar Renders a single line in an ANSI bar chart of the specified width.

Comparison Functions and Operators Comparison Operators Operator Description < Less than > Greater than <= Less than or equal to >= Greater than or equal to = Equal <> Not equal != Not equal (non-standard but popular syntax) Range Operator: BETWEEN The BETWEEN operator tests if a value is within a specified range.

Conditional Expressions CASE The standard SQL CASE expression has two forms. The "simple" form searches each value expression from left to right until it finds one that equals expression: CASE expression WHEN value THEN result [ WHEN ... ] [ ELSE result ] END The result for the matching value is returned. If no match is found, the result from the ELSE clause is returned if it exists, otherwise null is returned.

Conversion Functions openLooKeng will implicitly convert numeric and character values to the correct type if such a conversion is possible. For any other types, by default, openLooKeng will not convert it implicitly. For example, a query that expects a varchar will not automatically convert a bigint value to an equivalent varchar. When necessary, values can be explicitly cast to a particular type. Or, you can enable the implicit conversion functionality, then openLooKeng will try to auto apply conversion between source type and target type.

Date and Time Functions and Operators Date and Time Operators Operator Example Result + date '2012-08-08' + interval '2' day 2012-08-10 + time '01:00' + interval '3' hour 04:00:00.000 + timestamp '2012-08-08 01:00' + interval '29' hour 2012-08-09 06:00:00.000 + timestamp '2012-10-31 01:00' + interval '1' month 2012-11-30 01:00:00.000 + interval '2' day + interval '3' hour 2 03:00:00.

Decimal Functions and Operators Decimal Literals Use DECIMAL 'xxxxxxx.yyyyyyy' syntax to define literal of DECIMAL type. The precision of DECIMAL type for literal will be equal to number of digits in literal (including trailing and leading zeros). The scale will be equal to number of digits in fractional part (including trailing zeros). Example literal Data type DECIMAL '0' DECIMAL(1) DECIMAL '12345' DECIMAL(5) DECIMAL '0000012345.

Geospatial Functions openLooKeng Geospatial functions that begin with the ST_ prefix support the SQL/MM specification and are compliant with the Open Geospatial Consortium’s (OGC) OpenGIS Specifications. As such, many openLooKeng Geospatial functions require, or more accurately, assume that geometries that are operated on are both simple and valid. For example, it does not make sense to calculate the area of a polygon that has a hole defined outside of the polygon, or to construct a polygon from a non-simple boundary line.

HyperLogLog Functions openLooKeng implements the approx_distinct function using the HyperLogLog data structure. Data Structures openLooKeng implements HyperLogLog data sketches as a set of 32-bit buckets which store a maximum hash. They can be stored sparsely (as a map from bucket ID to bucket), or densely (as a contiguous memory block). The HyperLogLog data structure starts as the sparse representation, switching to dense when it is more efficient. The P4HyperLogLog structure is initialized densely and remains dense for its lifetime.

JSON Functions and Operators Cast to JSON Casting from BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT, REAL, DOUBLE or VARCHAR is supported. Casting from ARRAY, MAP or ROW is supported when the element type of the array is one of the supported types, or when the key type of the map is VARCHAR and value type of the map is one of the supported types, or when every field type of the row is one of the supported types.

Lambda Expressions Lambda expressions are written with ->: x -> x + 1 (x, y) -> x + y x -> regexp_like(x, 'a+') x -> x[1] / x[2] x -> IF(x > 0, x, -x) x -> COALESCE(x, 0) x -> CAST(x AS JSON) x -> x + TRY(1 / 0) Most SQL expressions can be used in a lambda body, with a few exceptions: Subqueries are not supported.

Logical Operators Logical Operators Operator Description Example AND True if both values are true a AND b OR True if either value is true a OR b NOT True if the value is false NOT a Effect of NULL on Logical Operators The result of an AND comparison may be NULL if one or both sides of the expression are NULL.

Map Functions and Operators Subscript Operator: [] The [] operator is used to retrieve the value corresponding to a given key from a map: SELECT name_to_age_map['Bob'] AS bob_age; Map Functions cardinality(x) -> bigint Returns the cardinality (size) of the map x. element_at(map(K,V), key) -> V Returns value for given key, or NULL if the key is not contained in the map. map() -> map<unknown, unknown> Returns an empty map. :

Mathematical Functions and Operators Mathematical Operators Operator Description + Addition - Subtraction * Multiplication / Division (integer division performs truncation) % Modulus (remainder) Mathematical Functions abs(x) -> [same as input] Returns the absolute value of x. cbrt(x) -> double Returns the cube root of x. ceil(x) -> [same as input] This is an alias for ceiling.

Quantile Digest Functions openLooKeng implements the approx_percentile function with the quantile digest data structure. The underlying data structure, qdigest, is exposed as a data type in openLooKeng, and can be created, queried and stored separately from approx_percentile. Data Structures A quantile digest is a data sketch which stores approximate percentile information. The openLooKeng type for this data structure is called qdigest, and it takes a parameter which must be one of bigint, double or real which represent the set of numbers that may be ingested by the qdigest.

Regular Expression Functions All of the regular expression functions use the Java pattern syntax, with a few notable exceptions: When using multi-line mode (enabled via the (?m) flag), only \n is recognized as a line terminator. Additionally, the (?d) flag is not supported and must not be used. Case-insensitive matching (enabled via the (?i) flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported.

Session Information Functions providing information about the query execution environment. current_user -> varchar Returns the current user running the query. Note This is part of the SQL standard and does not use parenthesis.

String Functions and Operators String Operators The || operator performs concatenation. String Functions Note These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8. Invalid UTF-8 data can be corrected with from_utf8. Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.

Teradata Functions These functions provide compatibility with Teradata SQL. String Functions char2hexint(string) -> varchar Returns the hexadecimal representation of the UTF-16BE encoding of the string. index(string, substring) -> bigint Alias for strpos function. substring(string, start) -> varchar Alias for substr function. substring(string, start, length) -> varchar Alias for substr function. Date Functions The functions in this section use a format string that is compatible with the Teradata datetime functions. The following table, based on the Teradata reference manual, describes the supported format specifiers:

URL Functions Extraction Functions The URL extraction functions extract components from HTTP URLs (or any valid URIs conforming to 2396. The following syntax is supported: [protocol:][//host[:port]][path][?query][#fragment] The extracted components do not contain URI syntax separators such as : or ?. url_extract_fragment(url) -> varchar Returns the fragment identifier from url. url_extract_host(url) -> varchar Returns the host from url. url_extract_parameter(url, name -> varchar Returns the value of the first query string parameter named name from url.

UUID Functions uuid() -> uuid Returns a pseudo randomly generated UUID type (type 4).

Window Functions Window functions perform calculations across rows of the query result. They run after the HAVING clause but before the ORDER BY clause. Invoking a window function requires special syntax using the OVER clause to specify the window. A window has three components: The partition specification, which separates the input rows into different partitions. This is analogous to how the GROUP BY clause separates rows into different groups for aggregate functions.

BitmapIndex BitmapIndex utilizes Bitmaps to allow for early row filtering which can help reduce CPU and memory usage. This can be beneficial in high concurrency queries. BitmapIndex works well for columns with low cardinality (i.e. not many unique values) because the size of the index increases as the number of unique values in the column increases. For example, a column like gender will have a small size. Whereas a column like id will have an extremely large size (not recommended).

BloomIndex BloomIndex utilizes Bloom Filters to allow for filtering during scheduling and while reading data. BloomIndex works well for columns with high cardinality and index size is fairly small. A Bloom Filter is constructed using column values. Then during lookup, the Bloom Filter can tell us if a given value is not in the Bloom Filter. BloomIndex can only support equality expression, e.g. name='Monkey'. Use case(s) Note: Currently, Heuristic Index is only supports the Hive connector with tables using ORC storage format.

BTreeIndex BTreeIndex utilizes the B+Tree data structure to allow for filtering during scheduling. BTreeIndex is similar to BloomIndex and works well for columns with high cardinality. However, index size is can be large because the size of the index increases as the number of unique values in the column increases. A B+Tree is constructed using the unique column values as keys and the values are where the column value can be found.

Usage Index can be managed using any of the supported clients, such as hetu-cli located under the bin directory in the installation. CREATE To create an index you can run sql statements of the form: CREATE INDEX [ IF NOT EXISTS ] index_name USING [ BITMAP | BLOOM | BTREE | MINMAX ] ON tbl_name (col_name) WITH ( 'level' = ['STRIPE', 'PARTITION'], "autoload" = true, "bloom.fpp" = '0.001', "bloom.mmapEnabled" = false, [, …] ) WHERE predicate; WHERE predicate can be used to create index on select partition(s) WITH can be used to specify index properties or index level.

MinMaxIndex MinMaxIndex simply keeps tracks of the largest and smallest value. The size of the index is extremely small. However, this index will only be useful if the table is sorted on the indexed column. Use case(s) Note: Currently, Heuristic Index only supports the Hive connector with tables using ORC storage format. MinMaxIndex is used on coordinator for filtering splits during scheduling. Selecting column for MinMaxIndex Queries that have a filter predicate on a column on which data is sorted can benefit from MinMaxIndex.

Adding your own Index Type Basic ideas ID Each index type must have an ID, which is a unique identifier and the canonical name of an index type used in CLI, server config, file path, etc. The getID() method in the Index interface returns the ID of this index type to any infrastructure that uses it. Level A heuristic index stores additional and usually partial information of a dataset in a more compact way to speed up lookups in various ways.

openLooKeng Heuristic Index Introduction Indexes can be created using one or more columns of a database table, providing faster random lookups. Most Big Data formats such as ORC, Parquet and CarbonData already have indices embedded in them. The Heuristic Indexer allows creating indexes on existing data but stores the index external to the original data source. This provides several benefits: The index is agnostic to the underlying data source and can be used by any query engine Existing data can be indexed without having to rewrite the existing data files New index types not supported by the underlying data source can be created Index data does not use the storage space of the data source Use case(s) Note: Currently, Heuristic Index only supports the Hive connector with tables using ORC storage format.

Benchmark Driver The benchmark driver can be used to measure the performance of queries in a openLooKeng cluster. We use it to continuously measure the performance of trunk. Download the appropriate version of the benchmark driver executable jar file from Maven Central, for example presto-benchmark-driver-1.0.1-executable.jar, rename it to presto-benchmark-driver, then make it executable with chmod +x. If the specific version is not available, use 1.0.1 instead. Suites Create a suite.json file:

Command Line Interface The openLooKeng CLI provides a terminal-based interactive shell for running queries. The CLI is a runnable JAR file, so it can be run as java -jar ./hetu-cli-*.jar. Download CLI jar file corresponding to the server’s version, e.g. hetu-cli-1.0.0-executable.jar, and run: java -jar ./hetu-cli-1.0.0-executable.jar --server localhost:8080 --catalog hive --schema default Run the CLI with the --help option to see the available options. By default, the results of queries are paginated using the less program which is configured with a carefully selected set of options.

Deploying openLooKeng Automatically In addition to the manual deployment of openLooKeng Sever, you can follow below guide to complete the deployment faster and easier. The script is friendly to most of Linux OS. However, to Ubuntu, you need to manually install the following dependencies: sshpass1.06 or above Deploying openLooKeng on a Single Node Executing the below command can help you download the necessary packages and deploy openLooKeng server in one-click:

Deploying openLooKeng with High Availability(HA) The openLooKeng HA solves single point failure of coordinators. Users can submit queries to any coordinator to balance the workloads. Installing HA openLooKeng with HA is required to be installed with minimum of 2 coordinators in the cluster. Make sure the time on all coordinators have been sync up. Please follow Deploying openLooKeng Manually or Deploying openLooKeng Automatically for basic installation. Configuring HA ###Configuring Coordinator and Worker Add following configuration to etc/config.

Deploying openLooKeng Manually This is a Manual deployment method, you can also use automatic deployment via script. (see Deploying openLooKeng Automatically) Installing openLooKeng Download the openLooKeng server tarball, and unpack it. The tarball will contain a single top-level directory, which we will call the installation directory. openLooKeng needs a data directory for storing logs, etc. We recommend creating a data directory outside of the installation directory, which allows it to be easily preserved when upgrading openLooKeng.

JDBC Driver openLooKeng can be accessed from Java using the JDBC driver. Add the jar to the classpath of your Java application. Starting from version 1.0.1, the driver is also available from Maven Central. Specify an appropriate version number: <dependency> <groupId>io.hetu.core</groupId> <artifactId>hetu-jdbc</artifactId> <version>1.0.1</version> </dependency> Driver Name The driver class name is io.hetu.core.jdbc.OpenLooKengDriver. Most users will not need this information as drivers are loaded automatically. Connection The following JDBC URL formats are supported:

openLooKeng ODBC User Manual Overview Introduction This user manual contains information about the openLooKeng ODBC driver for Windows, including driver installation, ODBC data source configuration, and basic driver information. Open Database Connectivity (ODBC) is an interoperable interface protocol proposed by Microsoft for applications to access different DBMSs. It defines a universal database access mechanism and provides a set of ODBC APIs for accessing databases to simplify the interoperability between clients and different DBMSs.

Web Connector for Tableau The openLooKeng web connector for Tableau lets users run queries from Tableau against openLooKeng. It implements the functions in the Tableau web connector API. When creating a new web data source, Tableau will ask for the URL of the web connector. Use the following URL, replacing with the hostname and port number of the openLooKeng coordinator (the default port is 8080): When Tableau first loads the openLooKeng web connector, it will render an HTML form.

openLooKeng Verifier The openLooKeng Verifier can be used to test openLooKeng against another database (such as MySQL), or to test two openLooKeng clusters against each other. We use it to continuously test trunk against the previous release while developing openLooKeng. Create a MySQL database with the following table and load it with the queries you would like to run: CREATE TABLE verifier_queries( id INT NOT NULL AUTO_INCREMENT, suite VARCHAR(256) NOT NULL, name VARCHAR(256), test_catalog VARCHAR(256) NOT NULL, test_schema VARCHAR(256) NOT NULL, test_prequeries TEXT, test_query TEXT NOT NULL, test_postqueries TEXT, test_username VARCHAR(256) NOT NULL default 'verifier-test', test_password VARCHAR(256), control_catalog VARCHAR(256) NOT NULL, control_schema VARCHAR(256) NOT NULL, control_prequeries TEXT, control_query TEXT NOT NULL, control_postqueries TEXT, control_username VARCHAR(256) NOT NULL default 'verifier-test', control_password VARCHAR(256), session_properties_json VARCHAR(2048), PRIMARY KEY (id) ); Next, create a properties file to configure the verifier:

Reserved Keywords The following table lists all of the keywords that are reserved in openLooKeng, along with their status in the SQL standard. These reserved keywords must be quoted (using double quotes) in order to be used as an identifier. Keyword SQL:2016 SQL-92 ‘AGGREGATIONS’ ALTER reserved reserved AND reserved reserved AS reserved reserved BETWEEN reserved reserved BY reserved reserved CASE reserved reserved CACHE CAST reserved reserved CONSTRAINT reserved reserved CREATE reserved reserved CROSS reserved reserved CUBE reserved CURRENT_DATE reserved reserved CURRENT_PATH reserved CURRENT_ROLE reserved reserved CURRENT_TIME reserved reserved CURRENT_TIMESTAMP reserved reserved CURRENT_USER reserved DEALLOCATE reserved reserved DELETE reserved reserved DESCRIBE reserved reserved DISTINCT reserved reserved DROP reserved reserved ELSE reserved reserved END reserved reserved ESCAPE reserved reserved EXCEPT reserved reserved EXECUTE reserved reserved EXISTS reserved reserved EXTRACT reserved reserved FALSE reserved reserved FOR reserved reserved FROM reserved reserved FULL reserved reserved GROUP reserved reserved GROUPING reserved HAVING reserved reserved IN reserved reserved INNER reserved reserved INSERT reserved reserved INTERSECT reserved reserved INTO reserved reserved IS reserved reserved JOIN reserved reserved LEFT reserved reserved LIKE reserved reserved LOCALTIME reserved LOCALTIMESTAMP reserved NATURAL reserved reserved NORMALIZE reserved NOT reserved reserved NULL reserved reserved ON reserved reserved OR reserved reserved ORDER reserved reserved OUTER reserved reserved OVERWRITE reserved PREPARE reserved reserved RECURSIVE reserved RIGHT reserved reserved ROLLUP reserved SELECT reserved reserved TABLE reserved reserved THEN reserved reserved TRUE reserved reserved UESCAPE reserved UNION reserved reserved UNNEST reserved UPDATE reserved USING reserved reserved VALUES reserved reserved VACUUM WHEN reserved reserved WHERE reserved reserved WITH reserved reserved

Data Types openLooKeng has a set of built-in data types, described below. Additional types can be provided by plugins. Note Connectors are not required to support all types. See Connectors for details on supported types. Boolean BOOLEAN This type captures boolean values true and false. Integer TINYINT A 8-bit signed two's complement integer with a minimum value of -2^7 and a maximum value of 2^7 - 1.

Migrating From Hive openLooKeng uses ANSI SQL syntax and semantics, whereas Hive uses a SQL-like language called HiveQL. Use subscript for accessing a dynamic index of an array instead of a udf The subscript operator in SQL supports full expressions, unlike Hive (which only supports constants). Therefore you can write queries like: SELECT my_array[CARDINALITY(my_array)] as last_element FROM ... Avoid out of bounds access of arrays Accessing out of bounds elements of an array will result in an exception.

Overall Design In openLooKeng, we support dynamically loading user-defined Hive scalar functions. Basically openLooKeng will load metadata of Hive functions, register them, convert parameters of the evaluate methods from Hive internal data types to openLooKeng internal data types, and then dynamically generate the functions. Configuration In order to dynamically load Hive functions, users should add function metadata into, with the format: function_name class_path. An example configuration in is presented as below: booleanudf io.

SQL Migration Tool SQL Migration tool helps user to transform SQL Syntax to ANSI 2003 SQL syntax. Currently, only Hive and Impala SQL syntax are supported. Usage of SQL Migration Tool Download hetu-sql-migration-cli-{version number}-executable.jar, rename it to openlk-sql-migration-cli, make it executable with chmod +x, then run it. Interactive mode This tool can be run in interactive mode. The example is like: ./openlk-sql-migration-cli --type hive lk:HIVE> lk:HIVE> INSERT INTO TABLE table1 VALUES(10, "NAME"); ==========converted result========== INSERT INTO table1 VALUES ROW (10, 'NAME') =================Success============= Parameter Description --type or -t The type of input SQL statement, such as hive, impala.

Cost based optimizations openLooKeng supports several cost based optimizations, described below. Join Enumeration The order in which joins are executed in a query can have a significant impact on the query's performance. The aspect of join ordering that has the largest impact on performance is the size of the data being processed and transferred over the network. If a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query.

Cost in EXPLAIN During planning, the cost associated with each node of the plan is computed based on the table statistics for the tables in the query. This calculated cost is printed as part of the output of an EXPLAIN statement. Cost information is displayed in the plan tree using the format {rows: XX (XX), cpu: XX, memory: XX, network: XX}. rows refers to the expected number of rows output by each plan node during execution.

Rule based optimizations openLooKeng supports one rule based optimization, described below. Join Conversion Inner-join is a mature operation with great efficiency in our newest version. To utilize the advantage, the semi-join queries can be converted to inner-join for better performance. With rule-based join conversion, openLooKeng automatically converts the semi-join query to inner-join with equal purpose. The join conversion is governed by the rewrite_filtering_semi_join_to_inner_join session property, with optimizer.rewrite-filtering-semi-join-to-inner-join configuration property providing the default value.

Table Statistics openLooKeng supports statistics based optimizations for queries. For a query to take advantage of these optimizations, openLooKeng must have statistical information for the tables in that query. Table statistics are provided to the query planner by connectors. Currently, the only connector that supports statistics is the hive connector. Table Layouts Statistics are exposed to the query planner by a table layout. A table layout represents a subset of a table’s data and contains information about the organizational properties of that data (like sort order and bucketing).

Table Pushdown Optimizer Rule openLooKeng supports Table Pushdown optimizer rule to improve latency for eligible queries. The user must execute the following command to generate stats for all the tables which are part of the SQL query so that the join sources are reordered based on the size of the tables- ANALYZE tableName; User Hint Comment- The user may also add a special comment in the following format to leverage the benefit from the Table Pushdown rule by specifying the table names and their distinct(primary key) column names.

openLooKeng Concepts Server Types There are two types of openLooKeng servers: coordinators and workers. The following section explains the difference between the two. Coordinator The openLooKeng coordinator is the server that is responsible for parsing statements, planning queries, and managing openLooKeng worker nodes. It is the "brain" of a openLooKeng installation and is also the node to which a client connects to submit statements for execution. Every openLooKeng installation must have a openLooKeng coordinator alongside one or more openLooKeng workers.

Use Cases Cross-Source Heterogeneous Query Scenario Data management systems like RDBMS (such as MySQL and Oracle) and NoSQL (such as HBase, ES, and Kafka) are widely used in various application systems of customers. With the increase of data volume and better data management, customers gradually build data warehouses based on Hive or MPPDB. These data storage systems are often isolated from each other, resulting in independent data silos. Data analysts often suffer from the following problems:

Join Query Support StarTree Cube can help optimize aggregation over join queries as well. The optimizer looks for aggregation subtree pattern in the logical plan that typically looks like following. AggregationNode |- ProjectNode[Optional] . |- ProjectNode[Optional] . . |- FilterNode[Optional] . . . |- JoinNode . . . . [More Joins] . . . . . . . . . |- ProjectNode[Optional] - Left . . . . . |- TableScanNode [Fact Table] .

StarTree Cube Introduction StarTree Cubes are materialized pre-aggregation results stored as tables. This technique is built to optimize low latency iceberg queries. Iceberg queries are a special case of SQL queries involving GROUP BY and HAVING clauses, wherein the answer set is small relative to the data scanned size. Queries can be characterized by their huge input-small output. This technique allows user to build Cubes on an existing table with aggregates and dimensions that are intended to optimize specific queries.

Usage Cubes can be managed using any of the supported clients, such as hetu-cli located under the bin directory in the installation. CREATE CUBE Synopsis CREATE CUBE [ IF NOT EXISTS ] cube_name ON table_name WITH ( AGGREGATIONS = ( expression [, ...] ), GROUP = ( column_name [, ...]) [, FILTER = (expression)] [, ( property_name = expression [, ...] ) ] ) [WHERE predicate] Description Create a new, empty Cube with the specified group and aggregations.

Release 0.1.0 (30 Jun 2020) Key Features Feature Description Adaptive Dynamic Filter The dynamic feature is enhanced so that in addition to bloom filter, hashsets can be used to store the build side values to filter out the probe side. The filters are stored in a distributed memory store so that they can be reused by subsequent queries without having to be rebuilt. Dynamically Add Catalog Add catalog REST api allows an administrator to add new catalogs of any connector during run time.

Release 1.0.0 (23 Sep 2020) Key Features This release focused on making improvements in 3 main areas: Performance, Security, and Usability. Performance New enhancements were made to the engine to further improve the performance of ad-hoc interactive queries. These enhancements include changes to the dynamic filter feature to use a more efficient implementation of bloom filters, as well as optimizing the dynamic filter source operator so that dynamic filters can be collected and used as soon as possible.

Release 1.0.1 (30 Sep 2020) Key Features This release is mainly for fixing version number issue and index-cli issues. Besides, it has enhanced the code to improve the security. Area Feature PR #s Index Merge the index-cli into hetu-cli, and introduce the basic access control to index command 283, 298 Security Handle CVEs 289,287,295,288,285,292,297,249,302 Obtaining the Document For details, see https://gitee.

Release 1.1.0 (30 Dec 2020) Key Features Area Feature PR #s Heuristic Index Btree index – BTree index is used for split filtering, and is used only by the coordinator nodes. If an index is created on a column which is part of a predicate in the query, then openLooKeng may be able to improve the performance of the query by filtering out splits during scheduling time.

Release 1.2.0 (31 Mar 2021) Key Features Area Feature PR #s Heuristic Index Several Hindex improvements:1. Preload index when server starts via config2. Clean up index record if index creation fails (previously, some records may be left behind and required manual deletion)3. Determine index creation level dynamically (previously, for BtreeIndex user had to specify level=table or level=partition, this is no longer required)4. Use HetuMetastore for index metadata management (removes file based metadata management)5.

Release 1.3.0 (30 Jun 2021) Key Features Area Feature PR #s Task Recovery Fixed a few important bugs and further improved the stability of this function. It now can work with spill-to-disk feature. 812,813,837,838,842,843,847,863,868,874,875,885,889,891,901,906,917,930,932 CTE (Common Table Expressions) Additional optimization on top of 1.2.0 CTE optimization. Added cost based decision to decide whether to enable CTE or not. Added support for pushdown of dynamic filters and predicates into CTE nodes.

Release 1.4.0 Key Features Area Feature Star Tree 1. Update APIs to support creating star tree cube for jdbc connectors2. Implemented support for Clickhouse connector, support for additional JDBC based connectors will be added in the future3. Bug fixes Heuristic Index 1. Added support for UPDATE INDEX (see docs for details)2. Added index size, memory usage and disk usage info in SHOW INDEX3.

Release 1.4.1 (12 Nov 2021) Key Features This release mainly adds the introduction of OmniData Connector and jdk8 support under arm architecture. Area Feature PR #s Data Source Connector openLooKeng supports offload operators to the near data side for process through omnidata connector, reducing the transmission of invalid data on the network and effectively improving the performance of big data computing. 1219 Arm architecture Eliminate the mandatory requirements for Java version under arm architecture caused by JDK paused problem, and support jdk1.

Release 1.5.0 Key Features Area Feature Star Tree 1. Support for optimizing join queries such as star schema queries.2. Optimized the query plan by eliminating unnecessary aggregations on top of the cube since the cube already contains the rolled up results. The performance optimizations benefits queries whose group by clause exactly matches the cube’s group.3. Bug fixes to further enhance the usability, and robustness of cubes.

Release 1.6.0 Key Features Area Feature Star Tree Support update cube command to allow admin to easily update an existing cube when the underlying data changes Bloom Index Hindex-Optimize Bloom Index Size-Reduce bloom index size by 10X+ times Task Recovery 1. Improve failure detection time: It need take 300s to determine a task is failed and resume after that. Improving this would improve the resume & also the overall query time2.

Release 1.6.1 (27 Apr 2022) Key Features This release is mainly about modification and enhancement of some SPIs, which are used in more scenarios. Area Feature PR #s Data Source statistics The method of obtaining statistics is added so that statistics can be directly obtained from the Connector. Some operators can be pushed down to the Connector for calculation. You may need to obtain statistics from the Connector to display the amount of processed data.

Release 1.7.0 Key Features Area Feature Operator processing extension Users can customize the physical execution plan of worker nodes. Users can use their own operator pipelines to replace the native implementation to accelerate operator processing. Task Recovery 1. Enables query recovery by restarting the query, without capturing snapshots during query execution 2. Support snapshot related information display in CLI debug mode: Number of snapshots being captured currently/ Number of snapshots captured already/ Number of times query recovered (either by resuming or restarting)/ Size of the snapshots captured and Time taken to capture snapshots/ Size of snapshots restored and time taken to restore 3.

Node Resource GET /v1/node Returns a list of nodes known to a openLooKeng Server. This call doesn't require a query parameter of headers, it simply returns an array with each known node in a openLooKeng installation. In the response, the values recentRequests, recentFailures, and recentSuccesses are decaying counters set to decay exponentially over time with a decay parameter of one minute. This decay rate means that if a openLooKeng node has 1000 successes in a few seconds, this statistic value will drop to 367 in one minute.

Query Resource The Query REST service is the most complex of the rest services. It contains detailed information about nodes, and other details that capture the state and history of a query being executed on a openLooKeng installation. GET /v1/query This service returns information and statistics about queries that are currently being executed on a openLooKeng coordinator. When you point a web browser at a openLooKeng coordinate you'll see a rendered version of the output from this service which will display recent queries that have executed on a openLooKeng installation.

Stage Resource GET /v1/stage Returns detail about a stage in a openLooKeng query. DELETE /v1/stage/{stageId} Deletes a stage in a openLooKeng query.

Statement Resource POST /v1/statement query query SQL Query to execute reqheader X-Presto-User User to execute statement on behalf of (optional) reqheader X-Presto-Source Source of query reqheader X-Presto-Catalog Catalog to execute query against reqheader X-Presto-Schema Schema to execute query against Submits a statement to openLooKeng for execution. The openLooKeng client executes queries on behalf of a user against a catalog and a schema.

Task Resource The Task resource provides a set of REST endpoints that give openLooKeng servers the ability to communicate about tasks and task output. This isn't a service that will be used by end users, but it supports the execution of queries on a openLooKeng installation. GET /v1/task Returns information about all tasks known to a openLooKeng server. Note that the output of a call to /v1/task can be quite large.

Built-in System Access Control A system access control plugin enforces authorization at a global level, before any connector level authorization. You can either use one of the built-in plugins in openLooKeng or provide your own by following the guidelines in System Access Control. openLooKeng offers three built-in plugins: Plugin Name Description allow-all (default value) All operations are permitted. read-only Operations that read data or metadata are permitted, but none of the operations that write data or metadata are allowed.

CLI Kerberos Authentication The openLooKeng Command Line Interface can connect to a openLooKeng coordinator that has Kerberos authentication enabled. Environment Configuration Kerberos Services You will need a Kerberos KDC running on a node that the client can reach over the network. The KDC is responsible for authenticating principals and issuing session keys that can be used with Kerberos-enabled services. KDCs typically run on port 88, which is the IANA-assigned port for Kerberos.

Hazelcast Security Hazelcast is embedded in the openLooKeng. When hazelcast is used in openLooKeng, in order to ensure the communication security between the client and the server of hazelcast, as well as between the members of each server, It is recommended to enable hazelcast authentication and SSL/TLS channel encryption. Hazelcast Authentication Hazelcast only supports Kerberos authentication. Because the hazelcast is used by state-store module, so user wants to enable Hazelcast authentication, the state-store has to enable first.

Secure Internal Communication The openLooKeng cluster can be configured to use secured communication. Communication between openLooKeng nodes can be secured with SSL/TLS. Internal SSL/TLS configuration SSL/TLS is configured in the file. The SSL/TLS on the worker and coordinator nodes are configured using the same set of properties. Every node in the cluster must be configured. Nodes that have not been configured, or are configured incorrectly, will not be able to communicate with other nodes in the cluster.

LDAP Authentication openLooKeng can be configured to enable frontend LDAP authentication over HTTPS for clients, such as the cli_ldap, or the JDBC and ODBC drivers. At present only simple LDAP authentication mechanism involving username and password is supported. The openLooKeng client sends a username and password to the coordinator and coordinator validates these credentials using an external LDAP service. To enable LDAP authentication for openLooKeng, configuration changes are made on the openLooKeng coordinator.

Password Encryption Overview openLooKeng manages configuration details in properties files of catalogs. These files may need to include values such as usernames, passwords and other strings, the password often required to be kept secret, that can’t be stored as plaintext. Here is a typical configuration file of a MySQL connector: connection-url=jdbc:mysql://localhost:3306 connection-user=root connection-password=123456 openLooKeng can be configured to enable password encryption, these passwords will be encrypted. Principle The asymmetric encryption algorithm (RSA) is used for encrypting password.

Ranger Access Control Overview Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a centralized platform to define, administer and manage security policies consistently across Hadoop components. Check Apache Ranger Wiki for detail introduction and user guide. openlookeng-ranger-plugin is developed based on Ranger 2.1.0, which is a ranger plugin for openLooKeng to enable, monitor and manage comprehensive data security. Build Process Check out openlookeng-ranger-plugin code from GIT repository

Coordinator Kerberos Authentication The openLooKeng coordinator can be configured to enable Kerberos authentication over HTTPS for clients, such as the openLooKeng CLI, or the JDBC and ODBC drivers. To enable Kerberos authentication for openLooKeng , configuration changes are made on the openLooKeng coordinator. No changes are required to the worker configuration; the worker nodes will continue to connect to the coordinator over unauthenticated HTTP. However, if you want to secure the communication between openLooKeng nodes with SSL/TLS, configure Secure Internal Communication.

Java Keystores and Truststores Java Keystore File for TLS Access to the openLooKeng coordinator must be through HTTPS when using Kerberos and LDAP authentication. The openLooKeng coordinator uses a Java Keystore <server_java_keystore> file for its TLS configuration. These keys are generated using keytool and stored in a Java Keystore file for the openLooKeng coordinator. The alias in the keytool command line should match the principal that the openLooKeng coordinator will use.

Authentication User Mapping Authentication user mapping defines rules for mapping from users in the authentication system to openLooKeng users. This mapping is particularly important for Kerberos or LDAP authentication where the user names are complex like alice@example or CN=Alice Smith, OU=Finance, O=Acme, C=US. Authentication user mapping can be configured with a simple regex extraction pattern, or more complex rules in a separate configuration file. Pattern Mapping Rule The pattern mapping rule maps the authentication user to the first matching group in the regular expression.

ALTER SCHEMA Synopsis ALTER {SCHEMA|DATABASE} name RENAME TO new_name Description Change the definition of an existing schema. Examples Rename schema web to traffic: ALTER SCHEMA web RENAME TO traffic ALTER DATABASE web RENAME TO traffic Limitations Some connectors do not support renaming schema, such as Hive Connector. See connector documentation for more details. See Also CREATE SCHEMA

ALTER TABLE Synopsis ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMN column_name data_type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ] ALTER TABLE name DROP COLUMN column_name ALTER TABLE name RENAME COLUMN column_name TO new_column_name Description Change the definition of an existing table. Examples Rename table users to people: ALTER TABLE users RENAME TO people; Add column zip to the users table:

ANALYZE Synopsis ANALYZE table_name [ WITH ( property_name = expression [, ...] ) ] Description Collects table and column statistics for a given table. The optional WITH clause can be used to provide connector-specific properties. To list all available properties, run the following query: SELECT * FROM system.metadata.analyze_properties Currently, this statement is only supported by the Hive connector. Examples Analyze table web to collect table and column statistics: ANALYZE web; Analyze table stores in catalog hive and schema default:

CACHE TABLE Synopsis CACHE TABLE table_name WHERE condition; Description CACHE TABLE updates coordinator metadata with table name and partition that should be cached by the connector. condition must be provided and is defined on only partition column(s) at this time. Examples Cache all sales data for stores located in ‘CA’: CACHE TABLE store_sales WHERE location = 'CA'; Cache all sales data from stores sold after 20 Feb 2020: CACHE TABLE store_sales WHERE ss_sold_date_sk > 20200220; Cache data with complex predicate string:

CALL Synopsis CALL procedure_name ( [ name => ] expression [, ...] ) Description Call a procedure. Procedures can be provided by connectors to perform data manipulation or administrative tasks. For example, the /connector/system defines a procedure for killing a running query. Some connectors, such as the PostgreSQL Connector, are for systems that have their own stored procedures. These stored procedures are separate from the connector-defined procedures discussed here and thus are not directly callable via CALL.

COMMENT Synopsis COMMENT ON TABLE name IS 'comments' Description Set the comment for a table. The comment can be removed by setting the comment to NULL. Examples Change the comment for the users table to be master table: COMMENT ON TABLE users IS 'master table';

COMMIT Synopsis COMMIT [ WORK ] Description Commit the current transaction. Examples COMMIT; COMMIT WORK; See Also ROLLBACK, START TRANSACTION

CREATE ROLE Synopsis CREATE ROLE role_name [ WITH ADMIN ( user | USER user | ROLE role | CURRENT_USER | CURRENT_ROLE ) ] Description CREATE ROLE creates the specified role in the current catalog. The optional WITH ADMIN clause causes the role to be created with the specified user as a role admin. A role admin has permission to drop or grant a role. If the optional WITH ADMIN clause is not specified, the role is created with current user as admin.

CREATE SCHEMA Synopsis CREATE {SCHEMA|DATABASE} [ IF NOT EXISTS ] schema_name [ WITH ( property_name = expression [, ...] ) ] Description Create a new, empty schema. A schema is a container that holds tables, views and other database objects. The optional IF NOT EXISTS clause causes the error to be suppressed if the schema already exists. The optional WITH clause can be used to set properties on the newly created schema.

CREATE TABLE AS Synopsis CREATE TABLE [ IF NOT EXISTS ] table_name [ ( column_alias, ... ) ] [ COMMENT table_comment ] [ WITH ( property_name = expression [, ...] ) ] AS query [ WITH [ NO ] DATA ] Description Create a new table containing the result of a SELECT query. Use CREATE TABLE to create an empty table. The optional IF NOT EXISTS clause causes the error to be suppressed if the table already exists.

CREATE TABLE Synopsis CREATE TABLE [ IF NOT EXISTS ] table_name ( { column_name data_type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ] | LIKE existing_table_name [ { INCLUDING | EXCLUDING } PROPERTIES ] } [, ...] ) [ COMMENT table_comment ] [ WITH ( property_name = expression [, ...] ) ] Description Create a new, empty table with the specified columns. Use create-table-as to create a table with data.

CREATE VIEW Synopsis CREATE [ OR REPLACE ] VIEW view_name [ SECURITY { DEFINER | INVOKER } ] AS query Description Create a new view of a SELECT query. The view is a logical table that can be referenced by future queries. Views do not contain any data. Instead, the query stored by the view is executed every time the view is referenced by another query. The optional OR REPLACE clause causes the view to be replaced if it already exists rather than raising an error.

DEALLOCATE PREPARE Synopsis DEALLOCATE PREPARE statement_name Description Removes a statement with the name statement_name from the list of prepared statements in a session. Examples Deallocate a statement with the name my_query: DEALLOCATE PREPARE my_query; See Also PREPARE

DELETE Synopsis DELETE FROM table_name [ WHERE condition ] Description Delete rows from a table. If the WHERE clause is specified, only the matching rows are deleted. Otherwise, all rows from the table are deleted. Examples Delete all line items shipped by air: DELETE FROM lineitem WHERE shipmode = 'AIR'; Delete all line items for low priority orders: DELETE FROM lineitem WHERE orderkey IN (SELECT orderkey FROM orders WHERE priority = 'LOW'); Delete all orders:

DESCRIBE INPUT Synopsis DESCRIBE INPUT statement_name Description Lists the input parameters of a prepared statement along with the position and type of each parameter. Parameter types that cannot be determined will appear as unknown. Examples Prepare and describe a query with three parameters: PREPARE my_select1 FROM SELECT ? FROM nation WHERE regionkey = ? AND name < ?; DESCRIBE INPUT my_select1; Position | Type -------------------- 0 | unknown 1 | bigint 2 | varchar (3 rows) Prepare and describe a query with no parameters:

DESCRIBE OUTPUT Synopsis DESCRIBE OUTPUT statement_name Description List the output columns of a prepared statement, including the column name (or alias), catalog, schema, table, type, type size in bytes, and a boolean indicating if the column is aliased. Examples Prepare and describe a query with four output columns: PREPARE my_select1 FROM SELECT * FROM nation DESCRIBE OUTPUT my_select1; Column Name | Catalog | Schema | Table | Type | Type Size | Aliased -------------+---------+--------+--------+---------+-----------+--------- nationkey | tpch | sf1 | nation | bigint | 8 | false name | tpch | sf1 | nation | varchar | 0 | false regionkey | tpch | sf1 | nation | bigint | 8 | false comment | tpch | sf1 | nation | varchar | 0 | false (4 rows) Prepare and describe a query whose output columns are expressions:

DESCRIBE Synopsis DESCRIBE table_name Description DESCRIBE is an alias for SHOW COLUMNS

DROP CACHE Synopsis DROP CACHE table WHERE condition; Description DROP CACHE deletes cache metadata of the table from coordinator only. Workers' caches purged automatically by expiry time or by reaching size limit but recurring splits will not reuse any cached node assignments. Examples DROP CACHE command supports dropping specific cache records by matching the corresponding predicate string. For example, if SHOW CACHE command shows that there is a record with predicate string of sale_id = 24 under the table of sales, then running the following query can delete it without affecting any other cache record:

DROP ROLE Synopsis DROP ROLE role_name Description DROP ROLE drops the specified role in the current catalog. For DROP ROLE statement to succeed, the user executing it should possess admin privileges for the given role. Examples Drop role admin : DROP ROLE admin; Limitations Some connectors do not support role management. See connector documentation for more details. See Also CREATE ROLE, SET ROLE, GRANT ROLES, REVOKE ROLES

DROP SCHEMA Synopsis DROP {SCHEMA|DATABASE} [ IF EXISTS ] schema_name [{CASCADE | RESTRICT}] Description Drop an existing schema. The schema must be empty. The optional IF EXISTS clause causes the error to be suppressed if the schema does not exist. Examples Drop the schema web: DROP SCHEMA web DROP DATABASE web Drop the schema sales if it exists: DROP TABLE IF EXISTS sales Limitations Functionally, CASCADE and RESTRICT is not supported yet.

DROP TABLE Synopsis DROP TABLE [ IF EXISTS ] table_name Description Drop an existing table. The optional IF EXISTS clause causes the error to be suppressed if the table does not exist. Examples Drop the table orders_by_date: DROP TABLE orders_by_date Drop the table orders_by_date if it exists: DROP TABLE IF EXISTS orders_by_date See Also ALTER TABLE, CREATE TABLE

DROP VIEW Synopsis DROP VIEW [ IF EXISTS ] view_name Description Drop an existing view. The optional IF EXISTS clause causes the error to be suppressed if the view does not exist. Examples Drop the view orders_by_date: DROP VIEW orders_by_date Drop the view orders_by_date if it exists: DROP VIEW IF EXISTS orders_by_date See Also CREATE VIEW

EXECUTE Synopsis EXECUTE statement_name [ USING parameter1 [ , parameter2, ... ] ] Description Executes a prepared statement with the name statement_name. Parameter values are defined in the USING clause. Examples Prepare and execute a query with no parameters: PREPARE my_select1 FROM SELECT name FROM nation; EXECUTE my_select1; Prepare and execute a query with two parameters: PREPARE my_select2 FROM SELECT name FROM nation WHERE regionkey = ? and nationkey < ?

EXPLAIN ANALYZE Synopsis EXPLAIN ANALYZE [VERBOSE] statement Description Execute the statement and show the distributed execution plan of the statement along with the cost of each operation. The VERBOSE option will give more detailed information and low-level statistics; understanding these may require knowledge of openLooKeng internals and implementation details. Note The stats may not be entirely accurate, especially for queries that complete quickly. Examples In the example below, you can see the CPU time spent in each stage, as well as the relative cost of each plan node in the stage.

EXPLAIN Synopsis EXPLAIN [ ( option [, ...] ) ] statement where option can be one of: FORMAT { TEXT | GRAPHVIZ | JSON } TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO } Description Show the logical or distributed execution plan of a statement, or validate the statement. Use TYPE DISTRIBUTED option to display fragmented plan. Each plan fragment is executed by a single or multiple openLooKeng nodes. Fragments separation represent the data exchange between openLooKeng nodes.

GRANT ROLES Synopsis GRANT role [, ...] TO ( user | USER user | ROLE role) [, ...] [ GRANTED BY ( user | USER user | ROLE role | CURRENT_USER | CURRENT_ROLE ) ] [ WITH ADMIN OPTION ] Description Grants the specified role(s) to the specified principal(s) in the current catalog. If the WITH ADMIN OPTION clause is specified, the role(s) are granted to the users with GRANT option.

GRANT Synopsis GRANT ( privilege [, ...] | ( ALL PRIVILEGES ) ) ON [ TABLE ] table_name TO ( user | USER user | ROLE role ) [ WITH GRANT OPTION ] Description Grants the specified privileges to the specified grantee. Specifying ALL PRIVILEGES grants DELETE, INSERT and SELECT privileges. Specifying ROLE PUBLIC grants privileges to the PUBLIC role and hence to all users. The optional WITH GRANT OPTION clause allows the grantee to grant these same privileges to others.

INSERT OVERWRITE Synopsis INSERT OVERWRITE [TABLE] table_name [ ( column [, ... ] ) ] query Description Insert overwrite basically do two things: 1) drop the data rows according to the dataset created by the query. 2) insert the new data created by query. Insert overwrite can work on both partition and non-partition table, but the behaviors are different: If the table is non-partition table, the existing data will be all deleted directly, and then insert the new data.

INSERT Synopsis INSERT INTO table_name [ ( column [, ... ] ) ] query Description Insert new rows into a table. If the list of column names is specified, they must exactly match the list of columns produced by the query. Each column in the table not present in the column list will be filled with a null value. Otherwise, if the list of columns is not specified, the columns produced by the query must exactly match the columns in the table being inserted into.

JMX Synopsis SELECT * FROM jmx.current."table_name"; Description All JMX metrics are stored as tables in schema current of catalog jmx. table_name must be put in quotation mark. Users need to find the table corresponding to the jmx metrics first, and then run SELECT queries to view these metrics. Examples View row data cache stats: SELECT * FROM jmx.current."io.prestosql.orc:name=hive,type=rowdatacachestatslister"; View bloom filter cache stats: SELECT * FROM jmx.current."io.prestosql.orc:name=hive,type=bloomfiltercachestatslister";

PREPARE Synopsis PREPARE statement_name FROM statement Description Prepares a statement for execution at a later time. Prepared statements are queries that are saved in a session with a given name. The statement can include parameters in place of literals to be replaced at execution time. Parameters are represented by question marks. Examples Prepare a select query: PREPARE my_select1 FROM SELECT * FROM nation; Prepare a select query that includes parameters.

REFRESH META CACHE Synopsis REFRESH META CACHE [FOR CATALOG] Description REFRESH META CACHE refreshes the current catalog connector metadata cache. REFRESH META CACHE FOR CATALOG refreshes the specified catalog connector metadata cache. Examples Refresh the current catalog connector metadata cache REFRESH META CACHE Refresh the specified catalog connector metadata cache REFRESH META CACHE FOR catalog Limitations Only Hive connector support this functionality at this time.

RESET SESSION Synopsis RESET SESSION name RESET SESSION Description Reset a session property value to the default value. Examples RESET SESSION optimize_hash_generation; RESET SESSION hive.optimized_reader_enabled; See Also SET SESSION, SHOW SESSION

REVOKE ROLES Synopsis REVOKE [ ADMIN OPTION FOR ] role [, ...] FROM ( user | USER user | ROLE role) [, ...] [ GRANTED BY ( user | USER user | ROLE role | CURRENT_USER | CURRENT_ROLE ) ] Description Revokes the specified role(s) from the specified principal(s) in the current catalog. If the ADMIN OPTION FOR clause is specified, the GRANT permission is revoked instead of the role.

REVOKE Synopsis REVOKE [ GRANT OPTION FOR ] ( privilege [, ...] | ALL PRIVILEGES ) ON [ TABLE ] table_name FROM ( user | USER user | ROLE role ) Description Revokes the specified privileges from the specified grantee. Specifying ALL PRIVILEGES revokes DELETE, INSERT and SELECT privileges. Specifying ROLE PUBLIC revokes privileges from the PUBLIC role. Users will retain privileges assigned to them directly or via other roles.

ROLLBACK Synopsis ROLLBACK [ WORK ] Description Rollback the current transaction. Examples ROLLBACK; ROLLBACK WORK; See Also COMMIT, START TRANSACTION

SELECT Synopsis [ WITH with_query [, ...] ] SELECT [ ALL | DISTINCT ] select_expression [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ OFFSET count [ ROW | ROWS ] ] [ LIMIT { count | ALL } | FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] where from_item is one of

SET ROLE Synopsis SET ROLE ( role | ALL | NONE ) Description SET ROLE sets the enabled role for the current session in the current catalog. SET ROLE role enables a single specified role for the current session. For the SET ROLE role statement to succeed, the user executing it should have a grant for the given role. SET ROLE ALL enables all roles that the current user has been granted for the current session.

SET SESSION Synopsis SET SESSION name = expression SET SESSION = expression Description Set a session property value. Examples SET SESSION optimize_hash_generation = true; SET SESSION hive.optimized_reader_enabled = true; See Also RESET SESSION, SHOW SESSION

SHOW CACHE Synopsis SHOW CACHE; Description SHOW CACHE displays the Split cache coordinator metadata. Split cache contains information about table and partition information that are cached. Examples Show all cache metadata SHOW CACHE; Show cache metadata for sales table SHOW CACHE sales; Limitations Only Hive connector support this functionality at this time. See connector documentation for more details. See Also CACHE TABLE, DROP CACHE

SHOW CATALOGS Synopsis SHOW CATALOGS [ LIKE pattern ] Description List the available catalogs. The LIKE clause can be used to restrict the list of catalog names.

SHOW COLUMNS Synopsis SHOW COLUMNS FROM table Description List the columns in table along with their data type and other attributes.

SHOW CREATE CUBE Synopsis SHOW CREATE CUBE cube_name Description Show the SQL statement that creates the specified cube. Examples Create a cube orders_cube on orders table as follows CREATE CUBE orders_cube ON orders WITH (AGGREGATIONS = (avg(totalprice), sum(totalprice), count(*)), GROUP = (custKEY, ORDERkey), format= 'orc') Use SHOW CREATE CUBE command to show the SQL statement that was used to create the cube orders_cube: SHOW CREATE CUBE orders_cube; CREATE CUBE orders_cube ON orders WITH (AGGREGATIONS = (avg(totalprice), sum(totalprice), count(*)), GROUP = (custKEY, ORDERkey), format= 'orc')

SHOW CREATE TABLE Synopsis SHOW CREATE TABLE table_name Description Show the SQL statement that creates the specified table. Examples Show the SQL that can be run to create the orders table: SHOW CREATE TABLE sf1.orders; Create Table ----------------------------------------- CREATE TABLE tpch.sf1.orders ( orderkey bigint, orderstatus varchar, totalprice double, orderdate varchar ) WITH ( format = 'ORC', partitioned_by = ARRAY['orderdate'] ) (1 row) See Also CREATE TABLE

SHOW CREATE VIEW Synopsis SHOW CREATE VIEW view_name Description Show the SQL statement that creates the specified view. See Also CREATE VIEW

SHOW EXTERNAL FUNCTION Synopsis SHOW EXTERNAL FUNCTION function_name [ ( parameter_type[, ...] ) ] Description Show the external function information which have been registered in our system. If parameter type list is omitted, show one row for each signature with the given function_name. Example Show the external function information of example.default.format(double, integer): show external function example.default.format(double, integer); External Function | Argument Types --------------------------------------------------------------------------------------------------------------------------------------------+----------------- External FUNCTION example.default.format ( | double, integer k double, | h integer | ) | RETURNS varchar | | COMMENT 'format the number ''num'' to a format like''#,###,###.

SHOW FUNCTIONS Synopsis SHOW FUNCTIONS [ LIKE pattern [ ESCAPE 'escape_character' ] ] Description List all the functions available for use in queries. The LIKE clause can be used to restrict the list of function names. Note If you want to list the external functions, you need to set session’s list_built_in_functions_only property to false.

SHOW GRANTS Synopsis SHOW GRANTS [ ON [ TABLE ] table_name ] Description List the grants for the current user on the specified table in the current catalog. If no table name is specified, the command lists the grants for the current user on all the tables in all schemas of the current catalog. The command requires the current catalog to be set. Note Ensure that authentication has been enabled before running any of the authorization commands.

SHOW ROLE GRANTS Synopsis SHOW ROLE GRANTS [ FROM catalog ] Description List non-recursively the ROLEs that have been granted to the session user in catalog, or the current catalog if catalog is not specified.

SHOW ROLES Synopsis SHOW [CURRENT] ROLES [ FROM catalog ] Description SHOW ROLES lists all the roles in catalog or in the current catalog if catalog is not specified. SHOW CURRENT ROLES lists the enabled roles for the session in catalog, or in the current catalog if catalog is not specified.

SHOW SCHEMAS Synopsis SHOW SCHEMAS [ FROM catalog ] [ LIKE pattern ] Description List the schemas in catalog or in the current catalog. The LIKE clause can be used to restrict the list of schema names.

SHOW SESSION Synopsis SHOW SESSION Description List the current session properties. See Also RESET SESSION, SET SESSION

SHOW STATS Synopsis SHOW STATS FOR table SHOW STATS FOR ( SELECT * FROM table [ WHERE condition ] ) Description Returns approximated statistics for the named table or for the results of a (limited) query. Statistics are returned for each column, along with a summary row. Column Description column_name The name of the column (NULL for the summary row) data_size The total size in bytes of all of the values in the column distinct_values_count The number of distinct values in the column nulls_fractions The portion of the values in the column that are NULL row_count The number of rows (only returned for the summary row) low_value The lowest value found in this column (only for some types) high_value The highest value found in this column (only for some types)

SHOW TABLES Synopsis SHOW TABLES [ FROM schema ] [ LIKE pattern ] Description List the tables in schema or in the current schema. The LIKE clause can be used to restrict the list of table names.


UPDATE Synopsis UPDATE table_name SET column_name = expression[, column_name = expression, ... ] [ WHERE condition ] Description UPDATE changes the values of the specified columns in all rows that satisfy the condition. Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values. Examples Update table users, change the name Francisco where the id equal to 1: UPDATE users SET name = 'Francisco' WHERE id=1; Limitations Right now only Hive Connector and transactional ORC table support UPDATE.

USE Synopsis USE catalog.schema USE schema Description Update the session to use the specified catalog and schema. If a catalog is not specified, the schema is resolved relative to the current catalog. Examples USE; USE information_schema;

VACUUM Synopsis VACUUM TABLE table_name [FULL [UNIFY]] [PARTITION partition_value]? [AND WAIT]? Description BigData systems usually use HDFS as the storage to achieve durability and transparent distribution and balancing of data among the nodes in the cluster. HDFS being a immutable file system, data cannot be edited in between, but can only be appended. In order to work with the immutable file system, different file formats resort to writing new files in order to support data mutations, and later use asynchronous background merging to maintain the performance and avoid many small files.

VALUES Synopsis VALUES row [, ...] where row is a single expression or ( column_expression [, ...] ) Description Defines a literal inline table. VALUES can be used anywhere a query can be used (e.g., the FROM clause of a SELECT, an INSERT, or even at the top level). VALUES creates an anonymous table without column names, but the table and columns can be named using an AS clause with column aliases.






● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;


● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;


● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;


● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;


● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义