5. Data organization

The built-in data-organizer organizes files within a hierarchical scheme composed of 4 classes:

_images/workspaces3.png

Fig. 5.1 This figure shows the SIPGI data organization scheme.

5.1. Workspace

A Workspace is a SIPGI environment that depends on the instrument (e.g. MODS or LUCI) and must be defined by the user before starting the ingestion of raw frames. In a Workspace all the recipes necessary to reduce and calibrate data of the selected instrument are activated. The behavior of each recipe depends on a number of parameters. When a Workspace is created, the parameters of all the recipes are set to their default values. If a user changes the parameters of a specific recipe, these changes are valid throughout all of the Workspace.

5.2. Project

A Project is the ensemble of all the data necessary to the reduction (both science and calibration frames). Several Projects may exist within the same Workspace. Before ingesting raw data, the user must specify both the Project name and a Project directory on his/her disk in which the built-in data organizer will store a copy of the raw frames in the SIPGI format (see The importing procedure).

Users can create many Workspaces and Projects. This allows to organize reductions in the most efficient way. For example, the user can create a Workspace with all the setting parameters tuned for the reduction of some targets and another Workspace with different settings best suited to other targets.

5.3. Dataset

A Dataset is the collection of data with the same characteristics (see later). Datasets are defined by SIPGI. The organizer splits all of the raw data in several Datasets within a Project. Datasets are of two types: scientific datasets and calibration datasets. Scientific datasets contain data belonging to the same PI and of the same target. Normally, for one run, the user will have a scientific dataset collecting the frames of the standard star and as many scientific datasets as the number of targets he/she observed. Each dataset collects the frames of one target. Calibration datasets contain calibration data (flat fields, lamps, darks, bias, etc…). The “organizer” groups calibration frames in different Datasets depending on whether they are imaging frames, LS frames or MOS frames.

The type of file (e.g. science or calibration), the PI and target name, the observing mode (e.g. imaging, LS, MOS) are all identified by the organizer by reading and combining specific keywords in the header of the raw files. These keywords are automatically written in the header of the file at the time of the observations. SIPGI will not digest and categorize raw files with incorrect entries for these specific keywords. In Appendix B the list of keywords required by SIPGI for a successful import are reported.

5.4. Reduction Unit

In each Dataset, SIPGI organizes raw files in Reduction Units. A Reduction Unit collects all the files with the same instrument configuration, i.e. the same instrument, grism, filter, and mask. As example, the scientific frames of a target observed with MODS with the G400L and the G670L grisms will be split in two Reduction Units: one for the G400L data on both instrument arms, and one for the G670L on both arms [1]. Within each Reduction Unit, data can be further grouped together according to, e.g., the observing night, the exposure time, etc… using filter facilities provided by SIPGI (see Frames panel). This organization allows the user to easily have at disposal the group of data that have to be processed together minimizing the possibility that the wrong input data is provided to the SIPGI reduction recipes.

5.5. The importing procedure

During the import of the data in their Datasets and Reduction Units, SIPGI:

  • Rotates and flips the files The orientation of raw data is modified. This is mostly for historically reasons. In particular, raw LBT frames are rotated by 90 degrees and flipped. This means that while in raw frames the spectral dispersion is along the x-axis, in SIPGI raw spectra are dispersed along the y-axis, with wavelength increasing from bottom to top.

_images/sipgi_frame_rot.png

Fig. 5.2 This figure shows the new orientation of spectra after the ingestion in SIPGI. The solid yellow line indicates the target spectrum while the bluish/reddish pattern indicates the versus of wavelength scale.

  • Corrects LUCI files for non-linearity Raw LUCI data are linearized. This means that ADU of raw frames (\(ADU_{raw}\)) are corrected to \(ADU_{lin}=ADU_{raw} + k \left(ADU_{raw}\right)^2\) with \(k=2.898 \cdot 10^{-6}\) for LUCI1 and \(k=2.767 \cdot 10^{-6}\) for LUCI2 (LBT staff, private communication).

  • Reads all of the keywords necessary to categorize the data

  • Renames the files The MODS LBT raw frames have the following format:

    modsArmFilter.yyyymmgg.????.fits

    where Arm can be 1 or 2 depending on whether the data was taken on LBT arm 1 or 2, Filter can be r o b depending on the filter used, yyyymmgg is the observing data, and ???? is a four digits incremental number.

    Similarly, LUCI raw frames are in the form:

    luciArm.yyyymmgg.????.fits.

    These names do not give much of an idea about the nature of the data contained within the file: whether the file is a bias, a flat, a scientific imaging or a spectroscopic observation.

    Once SIPGI acquires the file information from the keywords header, it re-names each file in the form:

    CC_target_grism_filter_m.fits

    where CC is the type of data (sc for science, ff for flat field, lp for arc lamp images, bs for bias, dk for dark, off for lamp off), target is the target name, followed by a string that indicates the grism/filter. Finally, the file name ends with m, the exposure sequence number of the frame. In case of MOS observation, the mask ID is inserted too.

    Example:

    sc_MyQuasar_ID140777_G400L_Dual_001.fits

    is a scientific frame of the target MyQuasar observed with the mask ID140777 using the grism G400L in Dual configuration. These new names simplify the browsing of the data.

  • Append calibration tables to files As part of the organization step, a number of auxiliary files are appended to the raw FITS files. The built-in data organizer appends as FITS file extension the: Grism table (GRS), CCD table (CCD), Lines catalog (LIN) and Mask file (MMS). The Paf file content is stored in the Primary header of the file.

Table 5.1 The auxiliary files appended to each imported frame for LUCI observations.

LUCI frame

CCD Table

Paf File

Grism Table

Lines Cat

dark

slitless flat

throughslit flat

lamp

science

Table 5.2 The auxiliary files appended to each imported frame for MODS observations.

MODS frame

Mask File

CCD Table

Paf File

Grism Table

Lines Cat

bias

slitless flat

throughslit flat

lamp

science

  • Copies the raw files in the Project directory and organize them in Datasets and Reduction Units The rotated, renamed raw files, with all of the auxiliary files appended, are organized in the Project directory according to their type, PI, target name, and observing mode.

Caution

The user must not delete by hand data from the Project directory. If the user needs to remove a file, it must be done within SIPGI.