Data Organiser

The data organiser (DO) is the backbone of the pipeline. It is required to sort and prepare data within a workspace, predict which products will be produced during data reduction, write out all of the set-of-files (SOF) files required by each of the soxspipe recipes and keep track of all data-products generated during the reduction cascade. The DO also provides functionality to rewrite SOF files on the fly if a recipe fails to produce a product required by a future recipe (e.g. a master flat frame), switching out the failed product for the next-best product (e.g. the next master flat frame generated closest in time to the recipe data). Finally, on subsequent executions of the pipeline, the organiser prevents data from being re-reduced if the products already exist (unless the user chooses to override this feature).

The algorithm the DO uses to prepare a workspace is shown in Fig. 33.

Fig. 33 The algorithm used by the soxspipe data-organiser to prepare a workspace for data reduction.

At the heart of the DO is a SQLite database called soxspipe.db. Here, the organiser’s bookkeeping is performed, recorded, and maintained.

The ESO Science Archive Facility delivers FITS data in a .Z compressed format. When running soxspipe prep, the DO first finds and uncompresses any .Z compressed FITS frames within the workspace root. The DO then reads the FITS headers of all of the FITS frames in the workspace root and selects out the raw (unreduced) frames, recording one entry per raw frame in the raw_frames table of soxspipe.db. The DO then moves these raw frames to a raw directory within the workspace. Any remaining files are moved out of the workspace root and into a misc directory.

A sanity check is performed to ensure that the data in the raw_frames database table matches the data in the raw directory. If frames have been removed from the file system, the corresponding records in the database table are deleted. Also, frames within the raw directory missing from the database table are added.

The next step is for the DO to define all sets of raw frames that can be used to produce next-stage products (master bias, master flat, order location tables, etc). The rules for these associations are read from the soxs_sof_map.yaml file is shipped with the pipeline code. These sets are recorded sets in the raw_frame_sets database table. These raw frame sets derive the raw frame content for all possible SOF files, which are recorded in the sof_map database table, assigning a human-readable ‘tag’ (e.g. BIAS_UVB) to individual frames and mapping the frames to named sof files.

The initial set of SOF files in the sof_map table is used to predict the product files written when soxspipe recipes are executed on the SOF files. The expected product information is written to a product_frames database table. From this product_frames table, products are assigned to SOF files later in the reduction cascade (recorded again in the sof_map table).

Finally, all SOF files from the sof_map table are written to a sof directory in the workspace root and are ready to be used by the various soxspipe recipes during a data-reduction session.

During the running of each pipeline recipe, Quality Control (QC) metrics are generated, and within the pipeline settings file, there are qc-acceptable-ranges for each recipe. These acceptable ranges act as guardrails for the pipeline, so that if a QC metric falls outside an acceptable range, the pipeline forces a ‘fail’ on this data, preventing it from cascading into further data-reduction stages.

Utility API

class data_organiser(log, rootDir, vlt=False, dbConnect=True)[source]

Bases: object

The soxspipe Data Organiser

Key Arguments:

  • log – logger

  • rootDir – the root directory of the data to process

  • vlt – prepare the workspace using the standard vlt /data directory

Usage:

To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here https://fundamentals.readthedocs.io/en/master/initialisation.html).

To initiate a data_organiser object, use the following:

from soxspipe.commonutils import data_organiser
do = data_organiser(
    log=log,
    rootDir="/path/to/workspace/root/"
)
do.prepare()

Initialization

build_sof_files()[source]

scan the raw frame table to generate the listing of products that are expected to be created and then write out all of the needed SOF files

Usage:

self.build_sof_files()
close()[source]

close the database connection

Usage:

do.close()
get_incomplete_raw_frames_set()[source]
get_raw_frames_and_groups(ttype=None, arm=None, tech=None, recipe=None, recipeOrder=None, filterName=None, unprocessedOnly=False)[source]

Process raw frames to group and calculate mean MJD values.

Key Arguments: - ttype – optional data product eso dpr type to filter by - arm – optional instrument eso seq arm to filter by - tech – optional list of eso dpr tech to filter by - recipe – recipe name to assign to groups - recipeOrder – recipe reduction order - filterName – optional name to filter the groups - unprocessedOnly – if True, only return unprocessed raw frames

Return: - rawFrames – processed raw frames dataframe - rawGroups – grouped raw frames with calculated MJD values

Usage:

rawFrames, rawGroups = self.get_raw_frames_and_groups()
list_obs()[source]

list all observation names and IDs in the current workspace

list_raw(sofFile)[source]

list the all the raw frames associated with a given science object SOF file

list_sofs()[source]

list all science object SOF files in the current workspace

predict_product_frames(productTypes, rawGroups, recipe)[source]

Process product frames for a given set of product types and raw groups.

Key Arguments:

  • productTypes – List of product types to process.

  • rawGroups – DataFrame containing raw groups.

  • recipe – Recipe name.

Return:

  • incompleteProducts – Number of incomplete products.

prepare(refresh=False, report=True)[source]

Prepare the workspace for data reduction by generating all SOF files and reduction scripts.

Key Arguments:

  • refresh – trigger a complete refresh the workspace during preparation (delete database and do a complete prepare)

raw_frames_to_sof_map(rawGroups, containerSofs)[source]

Generate the SOF map from raw groups and complete product SOFs.

Key Arguments:

  • rawGroups – DataFrame containing raw frame groups.

  • containerSofs – array of complete product SOFs.

Return:

  • sofMapDF – DataFrame containing the generated SOF map.

session_create(sessionId=False)[source]

create a data-reduction session with accompanying settings file and required directories

Key Arguments:

  • sessionId – optionally provide a sessionId (A-Z, a-z 0-9 and/or _- allowed, 16 character limit)

Return:

  • sessionId – the unique ID of the data-reduction session

Usage:

do = data_organiser(
    log=log,
    rootDir="/path/to/workspace/root/"
)
sessionId = do.session_create(sessionId="my_supernova")
session_list(silent=False)[source]

list the sessions available to the user

Key Arguments:

  • silent – don’t print listings if True

Return:

  • currentSession – the single ID of the currently used session

  • allSessions – the IDs of the other sessions

Usage:

from soxspipe.commonutils import data_organiser
do = data_organiser(
    log=log,
    rootDir="."
)
currentSession, allSessions = do.session_list()
session_refresh(silent=False, failure=True)[source]

refresh a session’s SOF files (needed if a recipe fails)

Usage:

from soxspipe.commonutils import data_organiser
do = data_organiser(
    log=log,
    rootDir="."
)
do.session_refresh()
session_switch(sessionId)[source]

switch to an existing workspace data-reduction session

Key Arguments:

  • sessionId – the sessionId to switch to

Usage:

from soxspipe.commonutils import data_organiser
do = data_organiser(
    log=log,
    rootDir="."
)
do.session_switch(mySessionId)
use_vlt_environment_folders()[source]

use vlt environment folders

Key Arguments: # -

Return: - None

Usage:

usage code

.. todo::

    - add usage info
    - create a sublime snippet for usage
    - write a command-line tool for this method
    - update package tutorial with command-line tool info if needed