PG-Farm

PG-Farm plans to support research projects by hosting read-only database access to important data sets.

Many of the data management research projects require public access to their data. Additionally the projects often hire web developers to create GUI user query tools. These requirements come with the additional challenges of providing database access to either the webapp and/or the public in a secure manner. Often campus IT departments security rules will not allow access outside their VPN or campus network; or if the database is hosted in the cloud, researchers are not versed in potential threats of opening up the database.

A possible solution benefiting both research projects and the library’s mission to the public would have the library hosting a read-only replica of the database and providing a public API via hosted PostgREST as well as a public socket level connection to the database. This would provide web developers and the public with multiple access mechanisms allowing them to focus on their applications or studies. The library would play a central role in the discovery and distribution of the data while not in charge of the primary database data administration or system hosting/administration. Further the library could provide access to the read-only data for some agreed upon set time after the project/grant has stopped. If a set time limit for database hosting is set, the database could still be snapshotted and stored after this time expires, allowing public access via snapshot download.

General Concept

The overall concept is relatively simple.  For database applications that require a public interface, we will deploy a small set of dockerized containers; These containers contain a postgres database instance; JWT authentication system;  and a standardized database API (postgrest).  This database instance acts as read only sync of the research primary.  This secondary will periodically sync to the primary database, using standard postgres tooling.

By using containers each individual project can maintain their own preferred version.  In addition, specific DNS entries and Postgres ports can be assigned on a per project basis.

A preview (alpha) version of PG Farm is available online at https://pg-farm.library.ucdavis.edu