Using and Configuring HTSQL¶
Table of Contents
Usage¶
Invoking htsql-ctl
¶
Installing HTSQL creates a command-line application htsql-ctl
:
$ htsql-ctl
The htsql-ctl
script is a collection of subcommands called
routines. The command-line syntax of htsql-ctl
is
$ htsql-ctl <routine> [options] [arguments]
<routine>
is the routine name;options
are any routine options in short (-X
) or long (--option-name
) form;arguments
are routine arguments.
To get a list of routines, run:
$ htsql-ctl help
To describe a specific routine, run:
$ htsql-ctl help <routine>
Database Connection¶
Many routines require a connection URI parameter, which specifies how to connect to a database. The connection URI has the form:
<engine>://<user>:<pass>@<host>:<port>/<database>
<engine>
is the type of the database server;sqlite
for SQLite,pgsql
for PostgreSQL,mysql
for MySQL,mssql
for MS SQL Server,oracle
for Oracle.<user>:<pass>
are authentication parameters;<host>:<port>
is the address of the database server;<database>
is the name of the database.
For SQLite, <user>:<pass>
and <host>:<port>
are omitted, and
<database>
specifies the path to the database file. Thus, to
connect to SQLite database htsql_demo.db
located in the
current directory, use the URI:
sqlite:htsql_demo.db
For PostgreSQL, if user:pass
is omitted, the credentials of the
current user are used; if host:port
is omitted, the server is
assumed to run on the local machine. Thus, to connect to a database
htsql_demo
running on the same host under credentials of the
current user, use the URI:
pgsql:htsql_demo
Other database servers use similar conventions.
You can use option -p
to prompt for a password if you do not want
to specify the database password in a command line.
Command-line Shell¶
To start a command-line HTSQL shell, run:
$ htsql-ctl shell <DBURI>
That starts an interactive HTSQL shell, where you could type and execute HTSQL queries against the specified database.
For example, to start the shell on a PostgreSQL database htsql_demo
,
run:
$ htsql-ctl shell pgsql:htsql_demo
Interactive HTSQL Shell
Type 'help' for more information, 'exit' to quit the shell.
htsql_demo$
For more details on the shell
routine, run:
$ htsql-ctl help shell
HTTP Server¶
To start a demonstration web server running HTSQL, run:
$ htsql-ctl server <DBURI> [<HOST> [<PORT>]]
That starts an HTTP server on the address <HOST>:<PORT>
.
If <HOST>
and <PORT>
are omitted, the server is started on
*:8080
.
For example, to start the HTSQL web server against PostgreSQL
database htsql_demo
on localhost:3128
, run:
$ htsql-ctl server pgsql:htsql_demo localhost 3128
Starting an HTSQL server on localhost:3128 over htsql_demo
If database connection configuration is provided
by -C
, you could use -
as a place holder for the mandatory
database URI parameter so that you could provide a HOST and PORT. For
example, to run the server on localhost:80
you would write:
# htsql-ctl serve - localhost 80 -C demo-config.yaml
For more details on the server
routine, run:
$ htsql-ctl help server
Extension Mechanism¶
HTSQL has an extensive addon system that can be used to override almost every aspect of server operation or query construction with an adapter. Extensions can live in third party modules or be included in the HTSQL distribution as part of our supported “tweaks”. To list supported extensions, you could type:
$ htsql-ctl extension
To find out more about an extension, such as tweak.autolimit, write:
$ htsql-ctl extension tweak.autolimit
Using Extensions¶
An extension can be enabled using -E
parameter on the htsql-ctl
command line. For example, to enable the tweak.meta addon on the
HTSQL demo database, you’d write:
$ htsql-ctl shell pgsql:htsql_demo -E tweak.meta
Then, you could use the /meta()
command registered by this addon:
Interactive HTSQL Shell
Type 'help' for more information, 'exit' to quit the shell.
htsql_demo$ /meta(/table)
Some addons have parameters which can be added to the command line.
For example, the tweak.autolimit extension truncates output at
limit
number of rows. The default is 10k, but this value
can be changed:
$ htsql-ctl shell pgsql:htsql_demo -E tweak.autolimit:limit=10
If more than one parameter is possible, use ”,” to separate them:
$ htsql-ctl serve pgsql:htsql_demo \
-E "tweak.shell:server-root='http://demo.htsql.org',limit=10"
HTSQL plugins are found using Python’s entry point feature. When a
Python package is installed, it can register itself as an
htsql.addon
extension so that it could be loaded in this manner.
Configuration Files¶
Extension configuration can be provided with a YAML (or JSON) file
using -C
on the command line. The top level of this file is a
dictionary listing the plugins that are enabled. The second nesting
level are plugin parameters, if any.
# demo-config.yaml
htsql:
db:
engine: pgsql
database: htsql_demo
username: htsql_demo
password: secret
host: localhost
port: 5432
tweak.autolimit:
limit: 1000
tweak.shell.default:
In this example, there are three plugins enabled, htsql
(which is a
mandatory plugin), tweak.autolimit and tweak.shell.default.
The htsql
plugin has one argument, db
which has sub-structure
providing connection information. You could then use this
configuration file using -C
:
# htsql-ctl shell -C demo-config.yaml
If both -E
and -C
are used, explicit command line options override
values provided in the configuration file. This permits a configuration
file to be used as a default perhaps using a different database URI.
A configuration parameter may be loaded from another file using
!include
tag. In the following example, definitions of custom
functions for tweak.override addon are loaded from
globals.yaml
.
# demo-config.yaml
htsql:
db:
engine: pgsql
database: htsql_demo
tweak.override:
globals: !include "globals.yaml"
# globals.yaml
num_school: (count(@school))
trunc_month($d): (date(year($d), month($d), 1))
Extension Reference¶
The HTSQL distribution ships with several built-in extensions which we describe here.
htsql
¶
The HTSQL core that provides the SQL translator and an HTTP server
is implemented in form of an addon htsql
. This addon is always
included.
Parameters:
db
- The connection URI.
The parameter db
specifies how HTSQL connects to the database. It
could be written in a short or an expanded form. In the short form,
the parameter is an URI:
htsql:
db: pgsql://htsql_demo:secret@localhost:5432/htsql_demo
In the expanded form, a mapping notation is used:
htsql:
db:
engine: pgsql
username: htsql_demo
password: secret
host: localhost
port: 5432
database: htsql_demo
Every component except engine
and database
is optional.
password
- The database password.
The parameter password
allows you to override the database password
keeping the other connection parameters intact.
debug
- Enable debugging output.
When debug
option is set, text output, as well as some error
messages will contain a respective SQL query.
engine.*
¶
Extensions in the engine
namespace implement database backends.
An appropriate extension is loaded automatically based on the engine
parameter of the connection URI.
HTSQL supports the following database servers:
- SQLite 3+ (
engine.sqlite
) - PostgreSQL 8.3+ (
engine.pgsql
) - MySQL 5.1+ (
engine.mysql
) - Oracle 10g+ (
engine.oracle
) - Microsoft SQL Server 2005+ (
engine.mssql
)
tweak.autolimit
¶
This addon truncates query output to a given number of rows (10,000 by default). It could be used to prevent accidental denial of service caused by queries returning a large number of rows.
The addon adds LIMIT <limit>
to every generated SQL query.
Parameters:
limit
- Truncation threshold (default: 10,000).
tweak.autolimit:
limit: 1000
tweak.cors
¶
This addon adds CORS support to permit AJAX requests to the HTSQL server by web pages hosted on a different domain.
To prevent data leaks, web browsers do not allow AJAX requests to cross domain boundaries. The CORS (Cross-Origin Resource Sharing) specification defines a way for a server to provide a list of domains which are permitted to make AJAX requests.
CORS relies on browser support and may not work with older web browsers.
Parameters:
origin
- Domains allowed to access the server (default:
*
).
The origin
parameter is a list of domains which are
allowed to access the server. The value must either be
*
(means any, which is the default) or a space-separated
list of host names:
http[s]://domain[:port]
Warning
The default settings permit HTSQL queries from any domain. Do not use the default settings with non-public data!
For example, to enable AJAX requests to the HTSQL demo server (located at http://demo.htsql.org/) from domains http://htsql.org/ and http://htsql.com/, we could write:
tweak.cors:
origin: http://htsql.org http://htsql.com
tweak.csrf
¶
This extension provides protection against cross-site request forgery (CSRF) attacks.
A CSRF attack tricks the user to visit the attacker’s website, which then submits database queries to the HTSQL service from the user’s account. Even though the browser would not permit the malicious website to read the output of the queries, this form of attack can be used for denial of service or changing the data in the database. For background on CSRF, see http://en.wikipedia.org/wiki/Cross-site_request_forgery.
This addon requires all HTSQL requests to submit a secret token in two forms:
- as a cookie
htsql-csrf-token
; - as HTTP header
X-HTSQL-CSRF-Token
.
If the token is not submitted, the addon prevents the request from reading or updating any data in the database.
Parameters:
allow_cs_read
- If set, a request is permitted to read data from the database even when the secret token is not provided.
allow_cs_write
- If set, a request is permitted to update data in the database even if the secret token is not provided.
tweak.csrf:
allow_cs_read: true
allow_cs_write: false
tweak.django
¶
This extension provides integration with Django web framework. It replaces built-in database introspection and connection handling with facilities provided by Django.
When using tweak.django
addon, do not specify the connection
URI as it is determined from the Django project database
settings.
Parameters:
settings
- Path to the settings module (default: use
DJANGO_SETTINGS_MODULE
).
tweak.django:
settings: mysite.settings
tweak.etl
¶
This extension provides the following commands:
insert(feed)
- Adds records to a table.
update(feed)
- Updates table records.
merge(feed)
- Adds or updates records in a table.
delete(feed)
- Deletes records from a table.
truncate(name)
- Truncates a table.
do(command, ...)
- Performs a series of command in a single transaction.
tweak.etl:
Warning
This extension is work-in-progress. The interface and implementation of commands may change in the future. Currently only PostgreSQL is supported.
tweak.filedb
¶
This extension makes a database from a set of CSV files. Each source CSV file becomes a table in the database. The name of the table is derived from the file name; the column names are taken from the first row of the CSV file. The remaining rows become the records in the table.
The database is realized as an in-memory SQLite database.
Use optional parameter cache-file
to specify a persistent
storage for the database.
Parameters:
sources
List of entries describing the source files; each entry has the following fields:
file
- The path to the CSV file.
encoding
- Encoding of CSV files (
UTF-8
by default). cache-file
- Persistent storage for the database.
tweak.filedb:
sources:
- file: school.csv
- file: department.csv
- file: program.csv
- file: course.csv
cache-file: cache.sqlite
tweak.gateway
¶
This extensions allows you to create a gateway to another database and execute HTSQL queries against it.
Parameters:
gateways
A mapping of names to datebase configuration.
Each mapping entry creates a function which takes a query as a parameter and execute it against the gateway database.
Database configuration could be either connection URI or nested HTSQL configuration.
tweak.gateway:
gateways:
sqlite_gw: sqlite:database.sqlite
remote_gw:
htsql:
db:
engine: pgsql
database: remote_db
The gateway name becomes a command which executes a query against the gateway database:
/table/:sqlite_gw
tweak.meta
¶
This extension provides a meta database describing tables, columns and links of the primary database.
The tweak.meta
addon has no parameters.
tweak.meta:
The meta database is composed of the following tables:
table
- all available tables
field
- columns and links for a given table
column
- all columns for a given table
link
- all links from one table to another
Use function meta()
to make a query against the meta database.
To get a list of tables:
name |
---|
appointment |
class |
classification |
confidential |
Mapping call notation is also permitted:
name |
---|
appointment |
class |
classification |
confidential |
To list all columns of a given table in the output order:
table_name | name | domain | is_mandatory |
---|---|---|---|
course | department_code | text | true |
course | no | integer | true |
course | title | text | true |
course | credits | integer | false |
To get all links to and from a specific table:
table_name | name | is_singular | target_name | reverse_name |
---|---|---|---|---|
appointment | department | true | department | appointment |
course | department | true | department | course |
department | appointment | false | appointment | department |
department | course | false | course | department |
To describe the meta database itself, apply meta()
twice:
name |
---|
column |
field |
link |
table |
tweak.meta:
tweak.override
¶
This extension provides several ways to adjust database metadata. It allows the user to restrict access to specific tables and columns, specify additional database constraints, change the generated names for tables, columns and links, and define calculated attributes.
Parameters:
included-tables
- Tables permitted to use.
excluded-tables
- Tables forbidden to use.
included-columns
- Columns permitted to use.
excluded-columns
- Columns forbidden to use.
not-nulls
- Additional
NOT NULL
constraints. unique-keys
- Additional
PRIMARY KEY
andUNIQUE
constraints. foreign-keys
- Additional
FOREIGN KEY
constraints. class-labels
- Labels for tables and top-level calculations.
field-labels
- Labels for columns, links and calculated fields.
field-orders
- Default table fields.
unlabeled-tables
- Tables to hide.
unlabeled-columns
- Columns to hide.
globals
- Global definitions.
To restrict access to a specific set of tables, use parameters
included-tables
and excluded-tables
. Parameter
included-tables
is a list of tables allowed to be used
by HTSQL. If this parameter is provided, any table not
in this list is completely hidden from the HTSQL processor.
Parameter excluded-tables
allows you to forbid access
to a set of tables.
To forbid use of table confidential
:
tweak.override:
excluded-tables: [confidential]
To allow access only to tables in ad
and ed
schemas:
tweak.override:
included-tables: [ad.*, ed.*]
We could also use block form of a sequence:
tweak.override:
included-tables:
- ad.*
- ed.*
In general, the table name may have the form <table>
or
<schema>.<table>
and could include *
meta-character to
indicate any number of characters. Table names are
case-insensitive and normalized: any non-alphanumeric
character is replaced with _
.
Similarly, to restrict access to a specific set of columns,
use parameters included-columns
and excluded-columns
.
Parameter exclude-columns
is a list of column forbidden
for use by the HTSQL processor.
To exclude column SSN
of table confidential
, write:
tweak.override:
excluded-columns: [confidential.ssn]
The column name may have the form <column>
, <table>.<column>
,
or <schema>.<table>.<column>
and could include *
meta-character.
Note that columns listed in excluded-columns
are removed
together with all associated key constraints. If you want
to hide a column from output, but keep associated primary and
foreign keys, use the unlabeled-columns
parameter.
HTSQL discovers database constraints from the schema definition.
If some constraints are not explicitly defined in the schema,
you may provide them using parameters not-nulls
, unique-keys
and foreign-keys
.
Warning
When specifying additional constraints, make sure they are respected by the data; otherwise, the output produced by HTSQL may be invalid.
Parameter not-nulls
is a list of columns with NOT NULL
constraints.
To indicate that all columns named code
and id
, as well as
column student.full_name
do not contain NULL
value, write:
tweak.override:
not-nulls: ["*.code", "*.id", student.full_name]
Note that we need to put the column patterns into quotes
since YAML syntax does not permit *
character at the
beginning of a scalar value.
Parameter unique-keys
is a list of key specifications
of the form <table>(<column>,...)[!]
. The trailing
symbol !
indicates a PRIMARY KEY
constraint.
All columns in a primary key are marked as NOT NULL
.
To indicate that school.code
is a primary key and
school.name
is unique, write:
tweak.override:
unique-keys:
- school(code)!
- school(name)
Parameter foreign-keys
is a list of foreign key
specifications, which have the form
<origin>(<column>,...) -> <target>(<column>,...)
.
Target columns could be omitted when they coincide with
the target primary key.
To define two foreign keys on table program
, write:
tweak.override:
foreign-keys:
- program(school_code) -> school(code)
- program(school_code, part_of_code) -> program
In HTSQL, database tables, columns and links have a label,
an identifier by which they are referred in HTSQL queries.
Normally, entity labels coincide with their names, but parameters
class-labels
and field-labels
allow you to assign
them arbitrary labels. In addition, these parameters allow
you to assign a label to an arbitrary HTSQL expression.
Use parameter class-labels
to assign custom labels
to tables and top-level HTSQL expressions. Parameter class-labels
is a mapping; each key is a label, the corresponding value
is either a table name or an HTSQL expression enclosed in
parentheses.
To rename table classification
to c14n
and to assign
a label to expression school^campus
, write:
tweak.override:
class-labels:
c14n: classification
campus: (school^campus)
Calculated classes defined by class-labels
may accept
parameters. For instance, to add a class students_by_year()
which takes the year of admission as an argument, write:
tweak.override:
class-labels:
students_by_year($year): (student?year(start_date)=$year)
Use parameter field-labels
to assign custom labels
to table fields. This parameter is a mapping; each
key has a form <table>.<field>
, where <table>
is the table label, <field>
is the field label
to define. The corresponding value is one of:
- a column name;
- a link specification;
- an HTSQL expression enclosed in parentheses.
A link specification is a comma-separated list of
patterns <origin>(<column>,...) -> <target>(<column>,...)
.
Each pattern must match a foreign key or a reverse foreign
key. Column lists could be omitted if the foreign key
could be determined uniquely.
To rename a column student.name
to full_name
, write:
tweak.override:
field-labels:
student.full_name: name
To add a many-to-many link between student
and class
via enrollment
table, write:
tweak.override:
field-labels:
student.class: student -> enrollment, enrollment -> class
class.student: class -> enrollment, enrollment -> student
Note that link specifier student -> enrollment
uniquely
matches foreign key enrollment(student_id) -> student(id)
while enrollment -> class
matches foreign key
enrollment(class_seq) -> class(class_seq)
so we do not
need to provide column lists.
The self-referential link from program
to all included
programs is called, by default, program.program_via_part_of
.
To assign a different label to this link, write:
tweak.override:
field-labels:
program.includes:
program(school_code, code) -> program(school_code, part_of_code)
To define a calculated field student.avg_grade
, write:
tweak.override:
field-labels:
student.avg_grade: (avg(enrollment.grade))
Calculated fields may accept a parameter. To define a calculated
field department.students_by_year()
accepting the year of
admission as a parameter, write:
tweak.override:
field-labels:
department.students_by_year($year): (student?year(start_date)=$year)
By default, when an HTSQL query does not contain a selector
expression, all table columns are displayed. To set a custom
list of fields for this case, use parameter field-orders
.
tweak.override:
field-orders:
program: [code, title, degree]
Parameter unlabeled-tables
is a list of tables without an
assigned labels, which effectively hides the tables from the users.
The tables could still be used in SQL generated by the HTSQL
translator.
tweak.override:
unlabeled-tables: [enrollment]
Parameter unlabeled-columns
is a list of columns without
an assigned tables. Unlabeled columns are hidden from the users,
but could be used in SQL generated by the HTSQL translator.
To hide all id
columns, write:
tweak.override:
unlabeled-columns: [id, "*_id"]
Use parameter globals
define global attributes and functions.
This parameter is a mapping: each key is the attribute name with
an optional list of parameters, the value is an HTSQL expression.
tweak.override:
globals:
num_school: (count(@school))
trunc_month($d): (date(year($d), month($d), 1))
tweak.pool
¶
This addons caches open database connections so that the same connection could be reused to execute more than one query. Use this addon with backends where opening a database connection is an expensive operation.
tweak.pool:
tweak.resource
¶
This extension adds a mechanism for serving static files via HTTP. This mechanism is used by other extensions to provide access to static resources such as Javascript and CSS files.
Parameters:
indicator
- HTTP root for static files, excluding leading and trailing
/
(default:-
)
Usually you don’t need to enable this extension explicitly as it is done by extensions with static resources.
tweak.shell
¶
This extension adds an in-browser HTSQL editor called the HTSQL shell. The shell provides a visual query editor (based on CodeMirror) with support for syntax highlighting and code completion.
The shell is invoked by command /shell()
, which takes an optional
query to edit.
Parameters:
server-root
- The root URL of the HTSQL server (default: guess)
limit
- Truncation threshold for shell output (default: 1000)
tweak.shell:
server-root: http://demo.htsql.org
limit: 100
tweak.shell.default
¶
Enable addon tweak.shell.default
to make the shell the default
output format.
tweak.shell.default:
tweak.sqlalchemy
¶
This extension provides integration with SQLAlchemy toolkit. It replaces built-in HTSQL database introspection and connection handling with SQLAlchemy facilities.
When using tweak.sqlalchemy
addon, do not specify the
connection URI as it is determined from the SQLAlchemy
engine settings.
Parameters:
engine
- The SQLAlchemy engine object.
metadata
- The SQLAlchemy metadata object.
The value must have the form <module>.<attr>
or
<package>.<module>.<attr>
.
tweak.sqlalchemy:
engine: sademo.engine
metadata: sademo.metadata
tweak.system
¶
This extension adds access to system catalog tables.
Currently, only PostgreSQL backend is supported.
tweak.timeout
¶
This extension limits query execution to a given amount of time (1 minute by default). Use it to ensure against accidental denial of service caused by complex queries.
Parameters:
timeout
- The timeout value, in seconds (default: 60).
tweak.timeout:
timeout: 300
Currently, this addon is only supported with PostgreSQL.