Tag data file formats
When you recover an archival tag you connect it to a computer and offload the data. For some tags, this might be as simple as copying files from a USB drive; other tags require a proprietary program supplied by the tag manufacturer. Some tags (e.g., DTAG, Acousonde, Little Leonardo) require a second data translation step to unpack a data archive file. In either case, you end up with one or more files that represent the totality of data collected by the tag. These files could have any of a number of formats. Common ones are as follows:
- CSV: comma-separated-variable text file.
- TXT: plain text file.
- XLS: Excel format binary file.
- WAV: multi-channel binary data file.
- XML: markup text file.
- BIN: packed binary data file.
For example, unpacking a DTAG
.dtg archive file will generate a set of files for each dtg file, typically a WAV-format file containing the audio data, another WAV-format file (with a suffix
.swv) containing the sensor data, a CSV file (with a suffix
.wavt) containing timing information for the audio and sensor data, and an XML file containing information about the configuration of the tag and the sensor channels.
Why there are so many file formats?
This in part reflects the different sensor suites, applications, and manufacturers of biologging tags as well as the plethora of software platforms that may be used to work with tag data. The variety of data formats has also grown out of a need to represent increasingly complex and large datasets. Some biologging tags collect gigabytes of sensor data and support a dozen different sensor channels each sampling at a different rate and which may separately turn on and off to conserve power.
Despite the diversity of tag data formats, there is no one best one. Each format has strengths and weaknesses: some support different sensor sampling rates, some allow data and metadata in the same file, some are easily read by text editors allowing you to look at the raw data, others are more efficient at storing large amounts of data and are quick to read into data analysis packages such as MATLAB or R.
Given the variety of raw data file formats, there is a need for a standard sensor data archive and exchange format that is independent of tag type and software platform. This file format needs to support the different sensor types and sampling rates of biologging data. It also needs to allow both data and metadata in the same file to provide a complete self-contained archive of the deployment. Finally, the file format must be readable by R, MATLAB, Octave, Python, Igor-Pro and other software platforms used for tag data analysis. These requirements led us to choose NetCDF (
.nc suffix) as the file format supported by the animaltags tools. NetCDF is a professional open file format for scientific data that has a massive user community. Originally developed over 30 years ago for sharing geospatial data, it is now used for all kinds of time-series and spatial data.
What is metadata?
Metadata is all of the additional information needed to make sense of the raw sensor data. This includes information that is specific to the sensor such as the sampling rate, the time at which sampling started, the type of sensor and its configuration and calibration information. Metadata also includes more general information that applies to all of the sensor channels of the tag, e.g., the species of the animal, the location of tagging, the method of tag attachment, the name and contact information of the data owner, and the conditions for use of the data. Although all of this information could be stored in a text file that you keep with the raw data, it is much better to put the metadata and raw data together in a single file – this forms a self-contained archive that preserves all of the information needed to understand the sensor data.
The animaltags NC file format
The NetCDF file format is very general and we use only a small subset of the features in the animaltags toolbox. An animaltags NC file contains a set of general metadata and one or more sensor structures. Each sensor structure is a package containing both the sensor data and the sensor-specific metadata. The tools for creating, changing and reading NC files are:
Workflow to convert raw tag data into NC files
The steps needed to generate an NC file for a tag deployment depend on the tag type, the complexity of the data, and the analysis goals. Because of this, there is no single turn-key function that does all of the steps. However, the animaltags tools include functions to help with each step. Once you figure out the steps needed for your tags and data needs, you can easily make a script that combines the steps and does most of the hard-work for you.
The work flow involves the following 11 steps, some of which are not needed for some tag types. It may seem like a lot of work but very little of it is actually to do with generating the NC file – most of the work is involved in dealing with the metadata and in massaging the data into a format that can be used for later analyses. The steps are:
- Reading the raw data into MATLAB/Octave/R: In this step you read data from the native file format into variables (vectors or matrices) in your workspace.
- Grouping sensor channels: Some tags store data from each axis of a multi-axis sensor such as an accelerometer in separate files. These axes need to be grouped together into a matrix with a column for each axis.
- Removing redundant data: When data from multiple sensors are stored together in a text file (e.g., a CSV file), the data from slow-sampling sensors is repeated to match the sampling rate of faster sensors. These redundant samples can be removed to restore each sensor channel to its original sampling rate.
- Dealing with data gaps: The data from each sensor may have some missing samples, e.g., because it was turned off to same power or to avoid interference with a GPS acquisition. If you know the precise time and duration of each gap, you can add filler samples in each sensor channel so that the time series are continuous. This makes it easier to plot and analyze the data.
- Collecting the metadata: Some metadata may be contained in the files offloaded from the tag and it may be possible to read in this information automatically. Other metadata will need to be entered by hand. To save time, metadata that is common across several deployments can be typed into a script that can be run for each tag dataset.
- Pick a deployment name: This name will be used to refer to the deployment and avoid data overwrites. You may have your own in-house naming convention or you could use the species initials and date.
- Generating an info structure: The general metadata (i.e., information that applies to all of the sensors) is collected together in a structure called
'info'. This structure contains a field for each piece of information. You can use the template generated by
make_info()and then add your own additional metadata.
- Generating sensor structures: The raw data for each sensor, and the corresponding sensor-specific metadata are combined in a sensor structure. The tool
sens_struct()automatically adds metadata for common sensor types and you can enter additional metadata to the structure.
- Calibrating and correcting the sensor data: Although some tags apply preset calibration constants to the sensor data, the raw data from a biologging tag are rarely ready for reliable analysis without some post-calibration and correction. Sensor drift and changes in sensitivity during deployments need to be corrected, and the sensor axes may need to be rearranged to match the cardinal axes of the tag. Finally, the data may need to be corrected for the orientation of the tag on the animal.
- Saving sensor data to an NC file: This is easy: use
save_nc()to make an NC file that contains the info structure and all of the sensor structures for the deployment.
You’ve learned the importance of NetCDF files as they provide a standard sensor data archive and exchange format that is independent of tag type and software platform.
If you have already tag data deployments in a different format you should go ahead and convert them into NetCDF files. For a detailed explanation of how to convert mat files into NetCDF files, refer to the tutorial: converting .mat files into .nc files.