HTSQL Grammar¶
Table of Contents
A valid input of an HTSQL processor is called an HTSQL query.
In a regular mode of operation, an HTSQL processor is started as a web
service and accepts queries as HTTP GET
requests. However, an HTSQL
query can also be executed using a command-line utility htsql-ctl
or
via internal Python API.
Encoding¶
An HTSQL query is a string of characters in UTF-8 encoding. Octets
composing the string could be written literally or percent-encoded. A
percent-encoded octet is serialized as three characters: %
followed
by two hexdecimal digits encoding the octet value.
‘HTSQL’ | ‘HTSQL’ | ‘HTSQL’ |
---|---|---|
HTSQL | HTSQL | HTSQL |
Percent-encoding is useful for transmitting an HTSQL query via channels
that forbid certain characters in literal form. The list of characters
that should be encoded depends on the channel type, but the percent
(%
) character itself must always be percent-encoded.
‘%25’ |
---|
% |
A NUL
character cannot appear in an HTSQL query, neither in literal
nor in percent-encoded form.
The HTSQL processor decodes percent-encoded octets before parsing the query. As a consequence, a percent-encoded punctuation or operator character still plays its syntax role.
Lexical Structure¶
An HTSQL query is parsed into a sequence of tokens. The following tokens are recognized.
Name¶
A sequence of alphanumeric characters that does not start with a digit.
code | name | campus |
---|---|---|
art | School of Art & Design | old |
bus | School of Business | south |
edu | College of Education | old |
Number¶
A numeric literal: integer, decimal and exponential notations are recognized.
60 | 2.125 | 271828e-5 |
---|---|---|
60 | 2.125 | 2.71828 |
String¶
A string literal enclosed in single quotes; any single quote character should be doubled.
‘HTSQL’ | ‘O’‘Reilly’ |
---|---|
HTSQL | O’Reilly |
Symbol¶
A valid symbol in the HTSQL grammar; that includes operators and
punctuation characters. Some symbols are represented by more than one
character (e.g. <=
, !~
).
Individual tokens may be separated by whitespace characters.
See htsql.tr.scan.Scanner
for detailed description of HTSQL
tokens.
Syntax Structure¶
A sequence of HTSQL tokens must obey the HTSQL grammar.
An HTSQL query starts is a segment expression optionally followed by a sequence of commands.
The following table lists HTSQL operations in the order of precedence, lowest to highest.
Operation | Description | Example Input | Output |
---|---|---|---|
/ T |
segment | /school |
|
x :fn |
infix function call | 'HTSQL':length |
5 |
x :fn y |
1/3 :round 2 |
0.33 |
|
x :fn (y,z,...) |
'HTSQL':slice(1,-1) |
'TSQ' |
|
x + , x - |
sorting direction | program{degree+} |
|
T ? p |
sieve | program?degree='ms' |
|
T ^ x |
projection | program^degree |
|
T {x,y,...} |
selection | school{code,name} |
|
p | q |
logical OR | true|false |
true |
p & q |
logical AND | true&false |
false |
! p |
logical NOT | !true |
false |
x = y , x != y , |
comparison | 2+2=4 |
true |
x == y , x !== y |
'HTSQL'==null |
false |
|
x ~ y , x !~ y |
'HTSQL'~'SQL' |
true |
|
x < y , x <= y , |
12<7 |
false |
|
x > y , x >= y |
12>=7 |
true |
|
x + y , x - y |
addition, subtraction | 'HT'+'SQL' |
'HTSQL' |
x * y , x / y |
multiplication, division | 12*7 |
84 |
- x |
negation | -42 |
|
x -> T |
attachment | 'south' -> school{campus} |
|
T := x |
assignment | num_prog := count(program) |
|
S . T |
composition | school.program |
|
T [id] |
locator | course[comp.304] |
|
@ T |
detachment | @school |
|
{x,y,...} |
record | {'bs','ms'} |
|
(...) |
grouping | (7+4)*2 |
22 |
* |
wildcard selection | school.* |
|
* number |
school.*1 |
||
^ |
projection complement | count(^) |
|
$ name |
reference | $code |
|
fn (...) |
function call | round(1/3,2) |
0.33 |
name |
school |
||
number |
60 , 2.125 ,
271828e-5 |
||
string |
'HTSQL' |
A command starts with /
followed by a function application in infix
notation.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
Next we describe individual syntax elements.
Atomic Expressions¶
An atomic expression is a basic syntax unit. HTSQL recognizes the following atoms.
Identifier¶
An identifier is a sequence of characters which contains Latin letters,
underscores (_
), decimal digits and those Unicode characters that
are classified as alphanumeric. An identifier must not start with a
digit.
In HTSQL, identifiers are case-insensitive.
Identifiers are used to refer to database entities such as tables and attributes, to define calculated attributes, and to call functions.
name,count(department)
School of Art & Design,1
School of Business,3
…
In this example, four identifiers school
, name
, count
and
department
represent respectively a table, a table attribute, a
built-in function and a table link.
Literal¶
HTSQL supports two types of literal values: quoted and unquoted.
An unquoted (or numeric) literal is a number written in integer, decimal or exponential notation.
60 | 2.125 | 271828e-5 |
---|---|---|
60 | 2.125 | 2.71828 |
The range of allowed numeric values depends on the database backend.
The type of a numeric literal is determined from notation: literals
written in integer, decimal and exponential notation are assigned to
integer
, decimal
and float
data type respectively.
A quoted literal is a (possibly empty) sequence of arbitrary characters enclosed in single quotes. Any single quote in the value must be doubled.
‘HTSQL’ | ‘O’‘Reilly’ |
---|---|
HTSQL | O’Reilly |
The data type of a quoted literal is inferred from the context in which
the literal is used; the default data type is string
.
Wildcard¶
A wildcard selection (*
) selects all output columns of the table.
code | name | campus | code | name | school_code |
---|---|---|---|---|---|
bus | School of Business | south | acc | Accounting | bus |
la | School of Arts and Humanities | old | arthis | Art History | la |
ns | School of Natural Sciences | old | astro | Astronomy | ns |
When followed by an integer literal N
, a wildcard selects N
-th
output column of the table. N
starts from 1
and should not
exceed the number of output columns.
name | count(department) |
---|---|
School of Engineering | 4 |
School of Arts and Humanities | 6 |
School of Music & Dance | 4 |
Complement¶
A projection complement (^
) represents a complement link from a
projection to the projected flow.
Do not confuse a projection complement with a binary projection
operator, which is also represented with the ^
character.
degree | count(^) |
---|---|
ba | 10 |
bs | 10 |
ct | 4 |
In this example, the first and the second occurrences of ^
indicate
a projection operator and a projection complement respectively.
Grouping¶
Any expression enclosed in parentheses ((...)
) is treated
syntactically as a single atom. Use grouping to override the default
operator precedence.
22 |
Do not confuse a grouping operation with a function call, which also uses parentheses.
Record¶
A comma-separated list of expressions enclosed in curly brackets
({...}
) is called a record expression. Some functions and operators
accept records as a way to specify multiple values.
code | name | campus |
---|---|---|
eng | School of Engineering | north |
ns | School of Natural Sciences | old |
Reference¶
A reference is an identifier preceded by a dollar sign ($
). A
reference is used to access a value defined in a different naming scope.
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
In this example, a reference $avg_credits
is defined in the root
scope, but accessed in the scope of course
.
For a more detailed description of references, see the section on naming scopes.
Function Calls¶
HTSQL has a large library of built-in functions and can be extended with user-defined functions.
A function call is represented as a function name followed by (
, a
comma-separated list of arguments, and )
.
0.33 |
A function may accept no arguments, but the parentheses are still required.
2016-10-13 |
For functions with at least one argument, HTSQL supports an alternative
infix call notation. In this notation, the expression starts with the
first argument followed by :
and a function name, and then the rest
of the arguments. The trailing arguments must be enclosed in
parentheses if their number is greater than one.
today() :year | 1/3 :round 2 | ‘HTSQL’ :slice(1,-1) |
---|---|---|
2016 | 0.33 | TSQ |
This example could be equivalently expressed as
year(today()) | round(1/3,2) | slice(‘HTSQL’,1,-1) |
---|---|---|
2016 | 0.33 | TSQ |
Infix function calls are composable and have the lowest precedence among the operators.
‘h’+’t’+’t’+’p’ :replace(‘tp’,’sql’) :upper |
---|
HTSQL |
A function which argument is a segment expression is called a command.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
To use infix call notation with a command, prepend the :
indicator
with /
.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
Consider the difference between regular infix function call and a command application.
code,name,campus
art,School of Art & Design,old
This example could be equivalently expressed as
code,name,campus
art,School of Art & Design,old
For a list and description of built-in functions, see Functions and Operators.
Operators¶
An HTSQL operator is denoted by a special character or a sequence of
characters (e.g. +
, <=
). HTSQL has infix, prefix and postfix
operators, and some operators admit all three forms.
The current version of HTSQL does not support user-defined operators; future versions may add this ability.
In HTSQL, the order in which operators are applied is determined by operator precedence. For example, multiplication and division operators have a higher precedence than addition and subtraction.
Some HTSQL operators are composable (e.g. arithmetic operators) and some are not (e.g. equality operators). We call the former associative and the latter non-associative.
Below we describe the syntax of HTSQL operators. For a more comprehensive description, see Functions and Operators.
Logical Operators¶
HTSQL supports the following logical operators:
- logical OR
p | q
- logical AND
p & q
- logical NOT
! p
In this list, the operators are sorted by the order of precedence, from lowest to highest. All logical operators are left-associative.
true|false | true&false | !false |
---|---|---|
true | false | true |
Comparison Operators¶
HTSQL supports the following comparison operators:
- equality operators
x = y
,x != y
,x == y
,x !== y
- containing operators
x ~ y
,x !~ y
- ordering operators
x < y
,x <= y
,x > y
,x >= y
2+2=4 | ‘HTSQL’~’SQL’ | 12>7&7>=2 |
---|---|---|
true | true | true |
All comparison operators have the same precedence and are not associative.
Future versions of HTSQL may make ordering operators left-associative
to express between operation (e.g. a <= x <= b
).
Arithmetic Operators¶
HTSQL supports the usual set of arithmetic operators:
- addition
x + y
- subtraction
x - y
- multiplication
x * y
- division
x / y
- negation
- x
Arithmetic operators have standard precedence and associativity.
Flow Operators¶
HTSQL supports specialized operators to work with flow expressions:
- sieve
T ? p
- projection
T ^ x
- selection
T {x,y,...}
The sieve operator (T ? p
) produces rows of T
satisfying
condition p
.
code | name | campus |
---|---|---|
bus | School of Business | south |
mus | School of Music & Dance | south |
The projection operator (T ^ x
) produces a flow of unique values of
x
as it ranges over T
. Do not confuse the projection operator with
a projection complement.
degree |
---|
ba |
bs |
ct |
The selection operator (T {x,y,...}
) specifies output columns.
code | name |
---|---|
art | School of Art & Design |
bus | School of Business |
edu | College of Education |
Sieve, projection and selection operators have the same precedence and are left-associative.
campus | avg(school.count(department)) |
---|---|
north | 4.0 |
old | 5.0 |
south | 3.5 |
Composition¶
HTSQL supports a flow composition operator:
T . x
The composition operator evaluates x
for each row of the flow T
.
The values of x
form the resulting flow.
code | name | school_code |
---|---|---|
acc | Accounting | bus |
econ | Economics | bus |
mm | Management & Marketing | bus |
The composition operator is left-associative.
Locator¶
The locator operation selects an individual entity by its identity:
T [id]
The identity is a sequence of attributes and links which uniquely
identify an entity. Values of separate attributes are separated by a
period (.
). An individual value could be left unquoted when it
consists of digits, alphabetical characters or dash (-
). Values
could be grouped using pairs of brackets ([]
) or parentheses
(()
).
code | name | school_code |
---|---|---|
comp | Computer Science | eng |
department_code | no | title | credits | description |
---|---|---|---|---|
comp | 515 | Software Design | 3 | Study of good software development techniques: UML, object-oriented design, design patterns, GUI design principles, testing, debugging and profiling. |
department_code | course_no | year | season | section | instructor_code | class_seq |
---|---|---|---|---|---|---|
comp | 515 | 2010 | spring | 001 | rmejia | 1405 |
Attachment and Detachment¶
HTSQL has two operators for generating ad-hoc links:
- detachment
@ T
- attachment
x -> T
The detachment operator (@ T
) generates an ad-hoc link to T
by
associating each row from the input flow with all rows from T
.
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
The precedence of the detachment operator is higher than of any other operator. Therefore, to apply the operator to any expression other than an identifier or a function call, use parentheses.
The attachment operator (x -> T
) generates ad-hoc link to T
by
associating each row from the input flow with all rows from T
such
that values of expression x
evaluated against respective rows
coincide.
name | dob |
---|---|
Virginia Hester | 1986-03-20 |
Marvin Hall | 1986-03-20 |
Douglas Robles | 1986-03-20 |
Segment Operator¶
A segment expression is a prefix operator:
/ T
A segment expression converts a flow to a list value.
name | department |
---|---|
name | |
School of Art & Design | Studio Art |
School of Business | Accounting |
Economics | |
Management & Marketing | |
Sorting Decorators¶
The following postfix decorators indicate ascending and descending sorting order respectively:
x +
,x -
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
Sorting decorators have the same precedence as infix function call.
Sorting decorators are only meaningful when used as arguments of the
sort()
function and in a selector expression.
Assignment¶
An assignment expression has the form:
T := x
The left side of an assignment expression indicates the name and formal parameters (if any) of a calculated attribute. It must be an identifier, a reference or a function call and can be preceded by an optional dot-separated sequence of identifiers.
The right side of an assignment is an arbitrary expression indicating the value of a calculated attribute.
name | num_dept |
---|---|
School of Engineering | 4 |
School of Arts and Humanities | 6 |
School of Music & Dance | 4 |
An assignment expression could be used only as an argument of functions
define()
and given()
, or in a selector expression.