Describe Your Data

The Data Management Program provides resources and consultations on metadata standards, controlled vocabularies, codebooks and readMe files. We also assist with designing data collection templates and database schemas.

Metadata is information about a data set.  Typically metadata is created to help potential users understand how the data was created and other important factors that cannot be determined by looking at the data itself.  Various organizations have created metadata standards to guide data developers to provide key metadata and standardize how metadata is written within a given field of research. For example, if you are working with sequencing data, in many cases you will be required to submit data to the Sequence Read Archive. We can help you prepare to collect the right metadata, so that the submission process goes smoothly at the end of your research project.

A controlled vocabulary is a list of words or phrases that can be used in response to a question in a survey or field in a database.  Reasons to use a controlled vocabulary include reducing variation in responses, preventing extraneous variants of the same term (such as spelling mistakes or plurals), or making it easier for participants to provide a response.

Data collection and analysis needs to be well documented for the data to be useful. Different disciplines has different conventions on how to record those. If you do not have an established convention available to you, consider adopting one of the following:

A protocol or a standard operating procedure (sop) documents the actions involved in sample processing and data collection.

A log documents actions taken to either collect data or analyze a dataset with specific software.

A codebook is a document that lists the codes and meanings assigned to each code used in a research project.

A readMe file is a file that describes the files present in a file collection, gives more information about a given file, or describes a piece of software or an analysis script. These structural_readme_and_naming_conventions and analysis_readme are based on GeorgiaTech Library and Stanford Library recommendations and will help you get started in organizing your research files.