Dataflow Column Used in Aggregate Function
Dataflow between column used as aggregate function argument and the aggregate function¶
Aggregate function usually takes column as an argument, in this article, we will discuss what's kind of dataflow will be created between the column used as function argument and the aggregate function.
1. COUNT()¶
COUNT() may takes a star column, or any column name or even empty argument.
If the argument is empty or a star column, no dataflow will be generated between the argument and function.
If a column is used as the argument, a direct dataflow will be generated between the column and function
by setting /treatArgumentsInCountFunctionAsDirectDataflow
option to true
.
1.1 A direct dataflow¶
1 |
|
In SQLFlow Cloud, a direct dataflow will be generated between the empId column and COUNT() function by default.
However, in Dlineage command line tool, a direct dataflow will not be generated between the empId column and COUNT() function by default.
To enable the direct dataflow in Dlineage command line tool, you can use the /treatArgumentsInCountFunctionAsDirectDataflow
option.
1 |
|
This dataflow may seems strange since the result value of COUNT() doesn't depends on the value of empId column. But, this is an option if our users prefer to have such a dataflow.
1.2 No dataflow¶
You can use an option to decide not to generate a dataflow between empId and COUNT() if preferred.
Please note that, no matter a direct dataflow is generated between the empId and COUNT() or not. The following indirect dataflow will always be created.
1 |
|
2. Aggregate function exclude COUNT()¶
COUNT() function is a little bit difference when creating dataflow. All other aggregate functions such as SUM() will create a direct dataflow with the column used as the argument.
1 2 3 |
|
A direct dataflow will be created from SAL to SUM().
1 |
|