This brief guide presents a set of good data management practices that researchers can adopt, regardless of their data management skills and levels of expertise.
Save your raw data in original format
- Don't overwrite your original data with a cleaned version.
- Protect your original data by locking it or making it read-only.
- Refer to this original data if things go wrong (as they often do).
Backup your data
- Use the 3-2-1 rule: Save three copies of your data, on two different devices, and one copy off site.
- Do not backup or store sensitive data on a commercial cloud (Dropbox, Google Drive, etc.).
Describe your data
- Machine-friendly: Describe your dataset with a metadata standard for discovery (e.g. DataCite, Dublin Core, DDI, etc.)
- Human-friendly: Describe your variables, so your colleagues will understand what your data means. Data without good metadata is useless. Give your variables clear names.
- Use "NA" for missing data, since computer programs don't like blank cells.
- Convert your data to open, non-proprietary formats (more sustainable and easier to preserve in the long term).
- Name your files well with basic metadata in file names. Make sure that the names:
- Are machine‐readable: the characters can be handled by all computer systems, and the names are brief and easily searchable
- Are human‐readable: the names provide concise information and are easily understandable to anyone who may access them in future
- Play well with default ordering
Process your data
- Store your data in a tabular layout: variables in columns and observations in rows.
- Store units of measurement as separate variables in their own columns (e.g. 25kg -> 25 | kg).
- Document each step processing your data in a README file (see Cornell University's example). Some tools (like OpenRefine or Git) allow for documenting steps automatically.
Archive and preserve your data
- Submit final data files to a repository assigning a persistent identifier(e.g. DOIs).
- Provide good metadata for your study so others could find it (use your discipline’s metadata standard, e.g. Darwin Core, DDI, etc.).
This guide was reproduced by the Portage Training Expert Group with permission from its original creator, Eugene Barsky, University of British Columbia (original version). It can be modified and re-used freely under the CC-BY license.