Creating a Data Services Model

By: John Meier, Physical and Mathematical Sciences Library

One of the key objectives of our Strategic Plan is to develop and implement programs to promote discovery, access, and preservation.  In the 2017-2018 Action Plan, this includes developing a library data warehouse.  A data warehouse is a central system for gathering and storing data collected from all parts of an organization (example Google BigQuery).  The methods and formats of data the University Libraries collects are extremely diverse but the overall goal is to use the data we gather.  A data warehouse would allow us to easily keep track of the data we gather, and to give staff access to the data they need to work more effectively.  To better understand the need for a data warehouse and how it is being developed at the University Libraries, I sat down with Rob Olendorf, the Prystowsky Early Career Science Librarian.  Rob came to Penn State with a background in software and system, supports research data management in the Eberly College of Science, and is the product owner for ScholarSphere.  The Data Warehouse Steering Committee includes Steve Borelli, Sherry Lonsdale, Julia Proctor, and Matt Ciszek.

The team interviewed staff from around the Libraries and developed user personas for the various ways we use and gather data.  They attempted to find all data sources within our organization, including: Desk Tracker, circulation, gate counts, ILLIAD, teaching, and more.  For each data source, they need to develop a data dictionary for interpreting each unique source and later development.  This can take quite a while as some systems only allow a few users to access the data.  Even in the information gathering process, the team has already made some recommendations for data cleanup or streamlining our gathering to create more usable data.  They have also given feedback to different units in the University Libraries to help improve data entry.

You can follow their work in progress on their GitHub website.  It is setting goals and defining what needs to be developed over the next few years to make the data warehouse a reality.  This includes policies about what data can be added, some manually, some automatically.  It is also important to sketch out how open the data can be and determine the levels of access for staff to parts of the collection.  The design has to be open and flexible to operate with other application programming interfaces (APIs).  One example might be a data dashboard, which visualizes data from the warehouse.  This will also leave it open to new data inputs, such as an app for counting patrons that also indicates where they are in our spaces.

We can look forward to exciting new ways to gather data and to use the data we are currently gathering in visual, high-impact forms.  While the full data warehouse is still a few years away, stakeholders from around the University Libraries and Penn State can help build it.