Michele Tobias
DataLab
Geospatial Data Specialist
(530) 752‑7532
There are two important components to storing data. First, the data itself must be in a format that is likely to be usable for several years to come. Second, the data needs metadata. Medatada is documentation about the data. It should include contact information for the data’s producer, information about the methods used to create the data, and any other information a person would reasonably need to know to use the data properly.
Data contained in a single file makes exchange easier. Consider using .geojson (vector), geopackage (vector or raster), or geotiff (raster) for sending data to others. Their single file format makes them an easier option for moving between computers. Like storage formats, the second and equally important part of data exchange is to include metadata.
Gespatial science is one of the few academic disciplines that teaches students about metadata as a part of their core learning (Library Science is another). It will not surprise geospatial professionals to know that ideally, data stored for long-term use or sent to another user should be accompanied by metadata. Metadata should contain information that a person using the data might need to know to use the data effectively. For example, you might include the name of the person who made the data, which organization they work for, how to contact the person, a summary of methods or a citation for a paper that explains the methods, and data facts like the size of raster cells or the minimum mapping unit for digitized vector data and the projection. Geospatial data typically stores metadata in an .xml file (structured plain text, with tags similar to html), but you could also store the necessary information a text file called README.txt. The advantage to using the .xml standard to geospatial data is that graphical user interface-based GIS programs can read the data and display it in the layer properties for easy reading.
The USGS has a helpful collection of information and software related to metadata. The Federal Geographic Data Committee (FGDC) sets standards for metadata.
Geopackage was designed as an exchange format, but also functions well for storage because it has been incorporated into the GDAL library and can be opened by all the common GIS programs. Geopackage can contain raster or vector data. Geopackage is not a good choice for all raster data. Because it stores rasters as either a .png (for data with an alpha or transparency channel) or .jpg, geopackage can only store three data bands (plus alpha for .png).
Geojson stores vector data in structured human-readable text. Vertex locations are stored as latitude/longitude coordinates in decimal degrees. As a storage format, this is ideal because minimal technology would be needed to recover the data in the event of massive technology failure. Because the files are text-based (rather than binary), simple text comparison programs can be used to determine differences in files. Geojson is a good exchange format for open source GIS programs, but can be tricky to use in the ESRI suite of software.
Geotiff stores raster data. It can store an unlimited number of bands and supports “no data” cells, both of these features are an advantage over geopackage raster data. Geotiff is a good exchange format because it can be natively loaded in all of the common GIS programs.