The Data Management Program provides resources and consultations on metadata standards, controlled vocabularies, codebooks and readMe files. We also assist with designing data collection templates and database schemas.

Metadata is information about a data set.  Typically metadata is created to help potential users understand how the data was created and other important factors that cannot be determined by looking at the data itself.  Various organizations have created metadata standards to guide data developers to provide key metadata and standardize how metadata is written within a given field of research.

A controlled vocabulary is a list of words or phrases that can be used in response to a question in a survey or field in a database.  Reasons to use a controlled vocabulary include reducing variation in responses, preventing extraneous variants of the same term (such as spelling mistakes or plurals), or making it easier for participants to provide a response.

Resources coming soon!

Data collection and analysis needs to be well documented for the data to be useful. Different disciplines has different conventions on how to record those. If you do not have an established convention available to you, consider adopting one of the following:

A protocol or a standard operating procedure (sop) documents the actions involved in sample processing and data collection.

A log documents actions taken to either collect data or analyze a dataset with specific software.

A codebook is a document that lists the codes and meanings assigned to each code used in a research project.

A readMe file is a file that describes the files present in a file collection, gives more information about a given file, or describes a piece of software.

Resources coming soon!

Databases are a useful way to structure and organize your data or metadata. Choosing the correct type of database based on the type of data you will collect will allow you to more easily draw conclusions from your data.

It is important to remember that having a database by itself is not going to guarantee data quality and reusability. When planning a database, it is important to set up data validation whenever possible and to use self-explanatory names for column headers. Databases should be accompanied by documentation explaining the structure of the database. For example, in a relational database, a document will describe the different tables, headers, codes, types of data and validations, as well as the relationships between the tables.