Data Flows¶
Table of Contents
A central concept in HTSQL is data flow, a sequence of homogeneous values.
HTSQL is a data flow transformation language. Every HTSQL expression operates on flows; that is, it accepts an input flow and transforms it into an output flow.
(diagram: input flow -> expression -> output flow)
The initial input flow consists of a single empty record. HTSQL processor interprets the given HTSQL query as a sequence (or rather a directed graph) of flow operations, which it applies one by one. The resulting flow is then displayed to the user in a tabular form.
Different operations affect the flow in various ways: multiply it, or remove elements from it, apply a scalar function to each element, etc. In the next sections, we discuss different types of flow operations.
Scalar Expressions¶
A simplest example of a flow operation is an application of some expression to each value in a flow. The output flow consists of a results of the expression.
That kind of expression does not change the number of elements in the flow; we call such expressions scalar.
42 |
In this example, a scalar expression (3+4)*6
is applied to the
initial flow; the value of this expression forms the resulting flow.
(diagram: [] -> (3+4)*6 -> 42)
code | count(department) |
---|---|
art | 1 |
bus | 3 |
edu | 2 |
eng | 4 |
In this example, two scalar expressions code
and
count(department)
are applied to the flow consisting of school
records. For each school entity, they extract the value of the
attribute code
and the number of associated departments.
- (diagram: [art], [bus], [edu], [eng], ...
- -> {‘art’,2}, {‘bus’,3}, {‘edu’,2}, {‘eng’,4}, ...)
A scalar expression is an example of a singular expression; one which does not increase the number of elements in the flow, as opposed to a plural expression, which may produce more output elements than in the input flow.
Filtering¶
A sieve expression filters the input flow leaving only those elements which satisfy the given condition.
A sieve expression takes one argument: a scalar logical expression called the filter. It applies the filter to each element of the input flow. The output flow consists of those elements of the input flow for which the filter is evaluated to TRUE value.
code | name | campus |
---|---|---|
eng | School of Engineering | north |
la | School of Arts and Humanities | old |
mus | School of Music & Dance | south |
ns | School of Natural Sciences | old |
(diagram)
In this example, the sieve expression evaluates a filter condition
count(department)>3
for each record from the school class; those
records for which this condition is valid generate the output of the
query.
A sort expression reorders elements in the flow according to a given argument.
code | name | campus |
---|---|---|
edu | College of Education | old |
ph | Public Honorariums | |
art | School of Art & Design | old |
la | School of Arts and Humanities | old |
(diagram)
In this example, the school records are ordered in the ascending order
with respect to the value of name
attribute.
A truncation operation makes a slice of the input flow.
code | name | campus |
---|---|---|
art | School of Art & Design | old |
bus | School of Business | south |
edu | College of Education | old |
(diagram)
In this case, we take the top 3 records from the school class.
Aggregates¶
An aggregate function converts a plural expression into a scalar.
The argument of an aggregate function must be a plural expression. Then for each element of the input flow, the aggregate evaluates the respective sub-flow and applies a set function to the result to generate a scalar value.
(diagram)
In this example, count()
aggregates produces the number of elements
in the flow generated by expression school
.
code | max(course.credits) |
---|---|
acc | 6 |
arthis | 6 |
astro | 5 |
be | 8 |
(diagram)
In this example, max(course.credits)
starts with evaluating the flow
department.course.credits
. Then for each department record of the
input flow, max()
finds the maximum value in the respective
sub-flow.
Projection¶
A projection expression takes a scalar argument called the kernel. The output flow of projection consists of all unique values of the kernel as it runs over the input flow.
campus |
---|
north |
old |
south |
(diagram)
The output of this query consists of all distinct values of
school.campus
attribute.