This brief guide presents a set of good data management practices that researchers can adopt, regardless of their data management skills and levels of expertise.
Save your raw data in original format
- Don't overwrite your original data with a cleaned version.
- Protect your original data by locking it or making it read-only.
- Refer to this original data if things go wrong (as they often do).
Backup your data
- Use the 3-2-1 rule: Save three copies of your data, on two different devices, and one copy off site.
- Do not backup or store sensitive data on a commercial cloud (Dropbox, Google Drive, etc.).
Describe your data
- Machine-friendly: Describe your dataset with a metadata standard for discovery (e.g. DataCite, Dublin Core, DDI, etc)
- Human-friendly: Describe your variables, so your colleagues will understand what your data means. Data without good metadata is useless. Give your variables clear names.
- Use "NA" for missing data, since computer programs don't like blank cells.
- Convert your data to open, non-proprietary formats.
- Name your files well with basic metadata in file names.
Process your data
- Store your data in a tabular layout: variables in columns and observations in rows.
- Store units of measurement as separate variables in their own columns (e.g. 25kg -> 25 | kg).
- Document each step processing your data in a README file. Some tools (like Open Refine or Git) allow for documenting steps automatically.
Archive and preserve your data
- Submit final data files to a repository assigning a persistent identifier(e.g. DOIs).
- Provide good metadata for your study so others could find it (use your discipline’s metadata standard, e.g. Darwin Core, DDI, etc.).
This guide was reproduced by the Portage Training Expert Group with permission from its original creator, Eugene Barsky, University of British Columbia. It can be modified and re-used freely under the CC-BY license.