This document describes the basic structure of the IRATE format.
All IRATE files are HDF5 files, and hence usually have either a .h5 or .hdf5 extension.
The main data file for an IRATE format is referred to as simply an “IRATE file”. These files may store any number of the actual outputs of a simulation, associated halo and/or galaxy catalogs and merger trees, and any other data that fits in such a format (e.g. black hole catalogs). To conform to the IRATE standard, such a file must satisfy the following conditions:
At the root of the file, there must be a Group ‘Cosmology’. This Group must have the following HDF5 attributes to specify the cosmology that defines the data:
- ‘HubbleParam’
- ‘OmegaMatter’
- ‘OmegaLambda’
- ‘OmegaBaryon’
- ‘PowerSpectrumIndex’
- ‘sigma_8’
Furthermore, if the cosmology used has an accepted name (e.g. WMAP-7), it is strongly recommended that the Group have an additional attribute, ‘Name’, for human readability; such an attribute, however, is not required.
The root of the file should also contain a Group named ‘SimulationProperties’. Various properties of the simulation, such as the boxsize and assorted flags, should be provided in this Group. If it’s possible, they should be given as attributes; however, it is accepted in the format that this group contain datasets as well.
Also at the root of the file, there may be any number of Groups named in the format ‘Snapshot#’, where the # is typically a number identifying the output in the context of the simulation. Each Snapshot Group may contain only other Groups, which may be ‘ParticleData’ or ‘GridData’ (whose individual requirements will be discussed later), along with any number of halo or galaxy catalogs (or any other type of catalog that fits with the data).
The root of the file may also contain a ‘MergerTrees’ Group, which holds information about the merger trees in the simulation.
The root of the file may also contain any other data that is defined across several different times. None are required, but if they exist, they should follow the same conventions with regards to units and naming structure that are laid out elsewhere in this documentation. (Readers, do we want to allow this, or should there be nothing else allowed at the root level?)
There must not be spaces in any group names so as not to confuse some HDF5 tools that don’t play well with spaces.
CHANGE THIS: To conform to the “strict” format, there must also be no other groups in the root beyond the five previously mentioned, aside from optionally two more that match the halo catalog format described in the next section. Further, there can be at most one nested group in the ‘Dark’,’Gas’, and ‘Star’ groups, and all datasets in those groups must have the same first dimension. In the non-strict format, deeper nested groups, mis-matched datsets and additional groups at the root are allowed.
Units should be stored either in the individual datasets as attributes, or as attributes of the Group that contains the datasets. (To those reading this draft, how does this work?) In either case, it should be presented in both human readable and in the form of a conversion factor to CGS units.
If the units are attached directly to the Dataset that they relate to, they must be named ‘unitname’ and ‘unitcgs’; if they are instead attached to a Group above them, the names should be prepended with the exact name of the Dataset that they relate to; e.g. the units for the Dataset ‘R200b’ would be named ‘R200bunitname’ and ‘R200bunitcgs’, if they are attributes to the group that contains that Dataset.
The ‘unitname’ attribute should be a string defining the unit, e.g. ‘kpc/h’. The unitcgs attribute must be a three element array, where the stored values are, in order, the numerical conversion factor to CGS, the value of the exponent on the Hubble Parameter that the conversion factor should be multiplied by, and lastly the value of the exponent on the scale factor that the conversion factor should be multipled by.
For example, if ‘unitname’ is ‘comoving Mpc/h’, ‘unitcgs’ should be an array containing [3.0857e24, -1, 1]. (Readers, we mostly took this from Andrew’s work–any complaints?)
The ParticleData Group, if it exists, must contain at least one group, of which the most common are ‘Dark’, ‘Gas’, and ‘Star’; these contain the data for dark matter, stars, and gas, respectively. All three share a common format: They may either contain only data sets, only groups, or be empty (if no particles of that type are present).
If they contain groups, those groups are used to sub-divide the data in whatever way the simulation likes. In a simulation of a single galaxy, the stars might include groups for Bulge, Thin Disk, Thick Disk, and Halo stars, with no seperate groups for the dark matter and gas. A cosmological simulation, on the other hand, might sub-divide all three classes into Cluster, Group, and Field based on some overdensity criterion.
Regardless of whether the three classes are subdivided into groups, all groups of datasets must have a certain format. For particle data, the following Dataset objects must be present, even if they have 0 particles:
- ‘Position’ (N x d)
- ‘Velocity’ (N x d)
- ‘Mass’ (N)
where d is the dimensionality (presumably pretty much always 3) and N is the total number of particles. Additional data sets (e.g. ‘Metallicity’,’Entropy’, ‘Density’, etc.) may be present, but the above 3 are the minimum required. Any other data sets are encouraged to either be shape N for scalar data, or N x d for vector data.
Note
Particle IDs are not specifically required in the format. The recommended convention (unless particle number changes over multiple timesteps) is that the implicit IDs match the index of the particle in the dataset. The standard order should be Dark, Star, Gas and indecies should be 0-based. Thus, if there are 100 dark particles 50 star particles and 50 gas particles, the last dark particle has id 99, the first and last star particles are 100 and 149, and the first gas particle is 150. If particle IDs are not contiguous (e.g. datasets from multiple timesteps over which particles are removed or added), the above convention should be used when possible for the initial dataset (e.g. the first timestep).
The grid data specification has not yet been defined.
Halo catalogs must include, as a part of their name, the phrase ‘HaloCatalog’, For example, both ‘AHFHaloCatalog1’ and ‘RockstarHaloCatalog’ are valid names; ‘AHFCatalog’, however, is not. (Sound ok readers?)
Any halo catalogs that are contained within a Snapshot Group should have, as attributes, any parameters that are relavant to the halo finder, such as FOF linking lengths, overdensity criterion, or the code used to produce that catalog (though the former may be obvious from the name of the group).
Any halo catalogs must contain a Dataset with the name ‘Center’ that has shape N x d, where N is the number of halos in the catalog, and d is the dimensionality (typically 3). All other datasets in the catalog must have a matching first dimension, and should be in the same order. That is, the ith entry in ‘Center’ should correspond to the same halo as the ith entry in any of the other datasets.
The specifications for galaxy catalgos have not yet been defined.
Merger tree specifications have not yet been defined.
A sample IRATE Format file might have the following structure. Note that the ‘Halo’, ‘Bulge’, and ‘ThinDisk’ groups are not actually a part of the specification, but are examples of possible ways one might wish to sub-divide the particle data. Also note that a typical IRATE file will contain many more datasets, particularly in the catalogs:
/
/Cosmology (contains attributes with information about the cosmology)
/SimulationProperties (contains attributes with information about the simulation)
/Snapshot300
/Snapshot300/ParticleData
/Snapshot300/ParticleData/Dark
/Snapshot300/ParticleData/Dark/Halo (may contain units relating to datasets below it)
/Snapshot300/ParticleData/Dark/Halo/Position (dataset)
/Snapshot300/ParticleData/Dark/Halo/Velocity (dataset)
/Snapshot300/ParticleData/Dark/Halo/Mass (dataset)
/Snapshot300/ParticleData/Dark/Bulge/Position (dataset)
/Snapshot300/ParticleData/Dark/Bulge/Velocity (dataset)
/Snapshot300/ParticleData/Dark/Bulge/Mass (dataset)
/Snapshot300/ParticleData/Gas/Position (dataset)
/Snapshot300/ParticleData/Gas/Velocity (dataset)
/Snapshot300/ParticleData/Gas/Mass (dataset)
/Snapshot300/AHFHaloCatalog (contains attributes with information about the halo finder; may contain units)
/Snapshot300/AHFHaloCatalog/Center (dataset)
/Snapshot300/AHFHaloCatalog/Rvir (dataset)
/Snapshot305/
/Snapshot305/ParticleData
/Snapshot305/ParticleData/Dark
/Snapshot305/ParticleData/Dark/Halo
/Snapshot305/ParticleData/Dark/Halo/Position (dataset)
/Snapshot305/ParticleData/Dark/Halo/Velocity (dataset)
/Snapshot305/ParticleData/Dark/Halo/Mass (dataset)
/Snapshot305/ParticleData/Star
/Snapshot305/ParticleData/Star/ThinDisk
/Snapshot305/ParticleData/Star/ThinDisk/Position (dataset)
/Snapshot305/ParticleData/Star/ThinDisk/Velocity (dataset)
/Snapshot305/ParticleData/Star/ThinDisk/Mass (dataset)
/Snapshot305/RockstarCatalog/Center (dataset)
/Snapshot305/RockstarCatalog/M200b (dataset)
...