Theme » File formats

File formats determine how data is coded within a file, and eventually what type of information the file can contain. Examples are JPG or PNG for images. All file formats have certain characteristics, and determining the right file format often begins with identifying your need to preserve – or maybe not preserve – a specific type of information.

When choosing file format, you should consider the following:

Restrictions on choice

There might be restrictions regarding file format that you have to take into account. The restrictions may be due to software, lab equipment, or similar that usually have some limitations regarding which file formats are supported.

Discipline specific standards

Some scientific disciplines have certain standards. If you are working on making your data according to the FAIR principles, you should work with file formats that support the need for interoperability and re-usability especially.

Demands for certain file types

If you are going to deposit or archive your data, there will often be restrictions regarding acceptable file formats. If you do not wish to spend unnecessary time on converting data between file types, you must choose a file format that will suit the entire research process.

Can my data fit within this file format?

Deciding on the right file format is often a matter of finding a suitable file format with a structure that can contain the data you want to save. One solution could be to store some data in one file format and supplement these data in another file and then link them together using e.g. an identifier. Another solution could be to convert between file formats.

Lossy compression vs lossless file formats

When working with media and multimedia (i.e. pictures, sound, film), in particular, you will often need to choose between file formats that use lossy compression and lossless file formats. Lossy file formats use an algorithm that often throws away data to reduce the file size. This holds for e.g. JPEG images. Often this is perfectly okay, but at other times you may want to preserve the data “intact”.

Open (non-proprietary) vs closed (proprietary) file formats

Some file formats are based on open standards with available documentation on how to utilize the files in this format. This is files like CSV and txt-files. Other file formats are closed and therefore named proprietary, which basically means that the “recipe” for unlocking them is secret. If possible, you should always strive for using open file formats that are non-proprietary. This will ensure that data can be preserved and read in the future.

Embedded metadata

Some file formats have an option for storing embedded metadata. An example of such formats is .jpg images that contain EXIF data about data, time, camera type etc. Word files also contain metadata regarding reviewing history etc.