Overall Design

In openLooKeng, we support dynamically loading user-defined Hive scalar functions. Basically openLooKeng will load metadata of Hive functions, register them, convert parameters of the evaluate methods from Hive internal data types to openLooKeng internal data types, and then dynamically generate the functions.



  1. In order to dynamically load Hive functions, users should add function metadata into udf.properties, with the format: function_name class_path. An example configuration in udf.properties is presented as below:
booleanudf io.hetu.core.hive.dynamicfunctions.examples.udf.BooleanUDF
shortudf io.hetu.core.hive.dynamicfunctions.examples.udf.ShortUDF
byteudf io.hetu.core.hive.dynamicfunctions.examples.udf.ByteUDF
intudf io.hetu.core.hive.dynamicfunctions.examples.udf.IntUDF
  1. Users should upload Hive functions jars and dependencies onto a separate directory under ${node.data-dir} which is setting in node.properties. The directory is user configurable by setting external-functions.dir and the default value is externalFunctions. An example configuration in config.properties is presented as below:

so by default, user should upload their hive functions to the externalFunctions file folder.

  1. Users should upload Hive functions configuration files udf.properties into ${node.data-dir}/etc.

Asynchronous execution

Considering the system safety, we provide a mechanism to execute hive function asynchronously.

  1. The user-defined Hive functions can be executed in another thread by setting max-function-running-time-enable. The default value is false
  2. Users can limit the maximum running time of hive functions by setting max-function-running-time-in-second. The default value is 600 seconds.
  3. Users can also limit the thread pool size of running the hive functions by setting function-running-thread-pool-size. The default value is 100

An example configuration in config.properties is presented as below:


Attention: Since each row data of the table may use the hive function once, so enable the hive function asynchronous execution may lead to seriously performance degradation.Please choose a balance between security and performance according to the actual situation


  1. In openLooKeng, we only support UDF with the following types:
boolean, byte, short, int, long, float, double
Boolean, Byte, Short, Int, Long, Float, Double
List, Map
  1. Only functions with equals to or less than five parameters are supported. If users add functions with more than five parameters, openLooKeng will ignore the function and print error logs.
  2. If users add inaccurate function metadata into udf.properties, such as wrong formats or non-existing class paths, openLooKeng will ignore the metadata and print error logs.
  3. If users add duplicated function metadata into the properties file, openLooKeng will recognize and discard the duplicated ones.
  4. If user-defined functions have same signatures as the internal ones, openLooKeng will ignore user-defined ones and print error logs.
  5. Users can add functions with overloaded evaluate methods. openLooKeng will recognize all signatures and create functions for each signature.
  6. If users execute functions with null parameters, the system will directly return null instead of parsing the null values to the function.
  • Notes for UDAF

Currently we don’t support loading user defined Hive UDAFs. But user can still use their UDAF functions which are developed by the openLooKeng’s function framework under this feature to use the asynchronous execution mechanism.User can copy the functions and dependencies into a directory under ${node.data-dir} dir. The directory is also user configurable by setting external-functions-plugin.dir and the default value is externalFunctionsPlugin. An example configuration in config.properties is presented as below:







● 拼写,格式,无效链接等错误;

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 版本号不匹配:文档版本或内容描述和实际软件不一致;

● 对重要数据或系统存在风险的操作,缺少安全提示;

● 排版不美观,影响阅读;


● 描述存在歧义;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;


● 很难通过搜索引擎,openLooKeng官网,相关博客找到所需内容;


● 命令、命令参数等错误;

● 命令无法执行或无法完成对应功能;


● 关键步骤错误或缺失,无法指导用户完成任务,比如安装、配置、部署等;

● 逻辑不清晰,该分类、分项、分步骤的没有给出

● 图形、表格、文字等晦涩难懂

● 缺少必要的前提条件、注意事项等;

● 描述存在歧义