Unity 0.5
Parser for unit strings
|
NOTE: The library should currently be regarded as alpha quality – the implementation and interface may change in response to experience and comments.
This is the unity parser (version 0.5), which is a C library to help parse unit specification strings such as W mm^-2
. There is also an associated Java class library, which uses the same grammars.
As well as parsing various unit strings, the library can also serialise a parsed expression in various formats, including the three formats that it can parse, a LaTeX version with name latex
(which uses the {siunitx}
package) and a debug
format which lists the parsed unit in an unambiguous, but not otherwise useful, form.
The CDS specification permits non-round factors (that is, factors which aren't a power of ten). These are not permitted in this CDS parser, partly because they're arguably quantities rather than units, but more practically because it significantly complicates the implementation.
Current limitations:
The library's home page is (at present) at http://www.astro.gla.ac.uk/users/norman/ivoa/unity/
; the source is on bitbucket.
You can parse units using a couple of different syntaxes since, unfortunately, there is no general consensus on which syntax the world should agree on. The ones supported (and their names within this library) are as follows:
See also the IAU style manual, section 5.1, 1989, though this is by now rather old.
Each of these has an associated writer, whcih allows you to write a parsed UnitExpression to a string, in a format which should be conformant with the particular syntax's standard. See unity_write_formatted.
In addition, there is a latex writer, which produces a formatted form for the the expression, in a form suitable for inclusion in a LaTeX document, using the siunitx
package. This is incompletely developed.
In addition, there is a latex writer, which produces a formatted form for the the expression, in a form suitable for inclusion in a LaTeX document, using the siunitx
package. To use the resulting output in a LaTeX document, include the following in the preamble of the file:
\usepackage{siunitx} \DeclareSIQualifier\solar{$\odot$}
You may add any siunitx
options that seem convenient, and you may omit the declaration of \solar
if the units in the document do not include the various solar ones.
The parsing is permissive, and permits non-recognised and deprecated units. The result of the parse may be checked for conformance with one or other standard using the functions unity_check_unit and unity_check_expression. Note that SI prefixes are still noticed for unrecognised units: thus furlongs/fortnight
will be parsed as femto-urlongs per femto-ortnight. The same is not true of recognised units: a pixel/s
is a pixel per second, and does not involved a pico-ixel.
If you want to experiment with the library, build the program src/c/unity
(in the distribution):
% ./unity -icds -oogip 'mm2/s' mm**2 /s % ./unity -icds -ofits -v mm/s mm s-1 check: all units recognised? yes check: all units recommended? yes check: all units satisfy constraints? yes % ./unity -ifits -ocds -v 'merg/s' merg/s check: all units recognised? yes check: all units recommended? no check: all units satisfy constraints? no % ./unity -icds -ofits -v 'merg/s' merg s-1 check: all units recognised? no check: all units recommended? no check: all units satisfy constraints? yes
In the latter cases, the -v
option validates the input string against various constraints. The expression mm/s
is completely valid in all the syntaxes. In the FITS syntax, the erg
is a recognised unit, but it is deprecated; although it is recognised, it is not permitted to have SI prefixes. In the CDS syntax, the erg
is neither recognised nor (a fortiori) recommended; since there are no constraints on it in this syntax, it satisfies all of them (this latter behaviour is admittedly slightly counterintuitive).
The three supported grammars have a fair amount in common, but the differences are nonetheless significant enough that they require separate grammars. Important differences are in the number of solidi they allow in the units specifications, and the symbols they use for products and powers.
In the grammars below, the terminals are as follows:
[+-]?[1-9][0-9]*\.[0-9]+
-- that is, there are no exponents allowed input: product_of_units | factor product_of_units // The FITS spec isn't completely clear on the topic of // solidi, saying "Parentheses are used for symbol grouping and are // strongly recommended whenever the order of operations might be // subject to misinterpretation" and "The IAU style manual forbids // the use of more than one solidus (/) character in a units // string. However, since normal mathematical precedence rules apply // in this context, more than one solidus may be used but is // discouraged" (p27). Therefore, it's not clear whether, for // example, "kg/m s" should be parsed as "kg m-1 s-1", as "kg m-1 s", or // forbidden on the grounds that 'normal mathematical precedence // rules' would forbid it (it's probably arguable all ways, but I // don't think that 'normal mathematical precedence rules' are going // to resolve it). Here, we resolve this by declaring that there can // be only a single expression to the right of the solidus. That is, // we do not have "product_of_units DIVISION product_of_units" here. // // Note: it is I think a consequence of this that nothing can be // successully parsed in two different grammars, with different // meanings. If the right-hand-side of the division could be a // product_of_units, then "kg /m s" would parse in both FITS and OGIP, // but mean "kg m-1 s-1" in FITS and "kg m-1 s" in OGIP. | product_of_units DIVISION unit_expression | factor product_of_units DIVISION unit_expression // The FITS spec may or may not be intended to permit "10+3 /m", // but we don't (because ... for heavens' sake!) | DIVISION unit_expression ;
unit_expression: unit | OPEN_P product_of_units CLOSE_P ;
product_of_units: unit_expression | product_of_units product unit_expression ;
// We require the following prefix factors to be powers of ten, // as an extra-syntactic condition factor: INTEGER power numeric_power // Represent exponents as a pair of integers, eg 10+3 | INTEGER INTEGER ;
unit: STRING | STRING power numeric_power | STRING numeric_power ;
numeric_power: INTEGER | OPEN_P INTEGER CLOSE_P | OPEN_P FLOAT CLOSE_P | OPEN_P INTEGER DIVISION INTEGER CLOSE_P ;
power: CARET | STARSTAR ;
product: WHITESPACE | STAR | DOT;
input: product_of_units | factor product_of_units ;
unit_expression: unit | OPEN_P product_of_units CLOSE_P ;
// We conceive of the product_of_units as a sequence of terms 'times // expression' or 'dividedby expression', multiplying them together // after, in the latter case, reciprocating them. product_of_units: unit_expression | division unit_expression | product_of_units product unit_expression | product_of_units division unit_expression ;
// We require the following prefix factors to be powers of ten, // as an extra-syntactic condition factor: INTEGER power numeric_power | FLOAT | INTEGER ;
// OGIP recommends no whitespace after the slash division: DIVISION | WHITESPACE DIVISION | WHITESPACE DIVISION WHITESPACE | DIVISION WHITESPACE;
unit: STRING | STRING power numeric_power ;
numeric_power: INTEGER | OPEN_P INTEGER CLOSE_P | OPEN_P FLOAT CLOSE_P | OPEN_P INTEGER DIVISION INTEGER CLOSE_P ;
power: STARSTAR;
product: WHITESPACE | STAR | WHITESPACE STAR | WHITESPACE STAR WHITESPACE | STAR WHITESPACE;
This is quite similar to the OGIP grammar, but with more restrictions
input: product_of_units | factor product_of_units ;
unit_expression: unit | OPEN_P product_of_units CLOSE_P ;
// We conceive of the product_of_units as a sequence of terms 'times // expression' or 'dividedby expression', multiplying them together // after, in the latter case, reciprocating them. product_of_units: unit_expression | division unit_expression | product_of_units product unit_expression | product_of_units division unit_expression ;
// We require the following prefix factors to be powers of ten, // as an extra-syntactic condition factor: INTEGER power numeric_power // Represent exponents as a pair of integers, eg 10+3 | INTEGER INTEGER | FLOAT | INTEGER ;
division: DIVISION
unit: STRING | STRING numeric_power ;
numeric_power: INTEGER ;
power: STARSTAR;
product: DOT;