What is a Format?
By Ruth Wainman, on 7 September 2018
A format is essentially the form your data will take once you collect and archive it. Researchers are strongly advised to think very carefully about the final format their data will take so that it can be preserved for future use.
There are two main two main categories of files – proprietary and non-proprietary formats. Proprietary formats are more limited as they only work with the software provided by the creator of that data. On the other hand, non-proprietary formats can be used by anyone, are usually free of charge and therefore have more utility for future researchers. Plus open formats provide instant and easy access to data. In most cases, you should aim for your data to take the following formats:
- Non-proprietary
- Unencrypted
- Uncompressed
- Open, documented standard
- Commonly used by your research community
- Use common character encodings – ASCII, Unicode, UTF-8
There will always be cases where you will inevitably need to change the format of your data during the course of your research. This is why it is important that you provide further details about the format your data will take in your DMP and any features that may be lost once you convert it for archiving. Open formats may not support all of the original functionality of proprietary formats so you must take steps to hold on to both your raw and converted data sets. Some funders may also have specific requirements surrounding the final form your data should take so be sure to check their policies before committing to any set format.
Further links
- Scientific Data Formats: http://justsolve.archiveteam.org/index.php/Scientific_Data_formats
- LOC: http://www.loc.gov/preservation/resources/rfs/
- Digital Preservation Coalition: https://www.dpconline.org/handbook/technical-solutions-and-tools/file-formats-and-standards