Big Data or Big Brother? Information collection at an academic library

Academic libraries are interesting places. Right now there is a huge push towards assessment and linking library use to student success. In the library world, library use goes beyond book checkout, and encompasses online use (whether on or off campus), instruction classes, tutorials, physical use of the space, reference questions asked, and the use of special services. So basically, anything library related that can be quantitatively or qualitatively measured. Student success can be defined in many different ways as well: graduation rate, graduation rate within a time frame, grades, or even how quickly a graduate lands a job. So why is this important to libraries? Well, if a library can prove that it is essential to the university and the university’s continued existence, if it can prove that it will increase (or at least maintain) the university’s reputation and attraction of new students, then the university will be more likely to allocate funds to said library. Yes. No matter how you dress it up, it’s all about the Benjamins.

Ok, it may not be completely finance-driven. I personally feel that I make a difference in the lives of students, faculty and staff. I get all warm and fuzzy just thinking about it. But am I really making a difference? And what if I’m wasting my time doing something when I should be doing the exact opposite? You can see why data-driven decisions are important.

But there is another factor in the works. Where do we get this data? From our users. This means monitoring what they do, what they check out, and how they use our services. This type of monitoring has become ubiquitous in today’s society. My Fitbit literally knows every step I take. My phone knows where I’m going before I do, and has a much better sense of direction. My Amazon account knows me better than my husband. As consumers, we have become accustomed to this kind of data collection and think it is the norm.

But libraries are a little different. All libraries, academic libraries included, have a strong commitment to intellectual freedom and against censorship. The very idea of being observed may create an influence on that behavior. One way to think of it is this: The book that goes missing from the shelves the most in our library is a very large book about human sexuality. We’ve had to replace it several times. It doesn’t get checked out…it just disappears. Our users don’t want to be seen checking out this book about human sexuality, because they don’t want to be observed by their peers and library staff. What happens, then to circulation when students know that we are observing their behavior? Will it change their behavior? Will they self-censor and only check out books they think are “appropriate”?

Now let’s move a step further into predicting user behaviour, or even steering user behavior. Is this appropriate for an academic institution where the focus is on learning and critical thinking? This has ethical implications that are not often considered in the business world.

The library I currently work for made a decision in 2004 when it changed its ILS system (integrated library system) to deliberately blind itself to certain kinds of data. We are unable to access records of patron activity once a patron has returned an item (as long as there is no fine, etc). So while we can see that John Doe has checked out 23 books since the account was created, that is the only information we can access. As someone who wanted to attempt to make a simple recommender system, similar (but more primitive) than what you would find on Amazon or Netflix, this was both frustrating and disappointing.

But there was a history there. The decision was made in the wake of 9/11, with the passage of the Patriot Act and the fear of government surveillance. This is not an unfounded fear, even up to today, although as of 2016, section 215, the “libraries provision” was allowed to expire. But even so the prevailing method of prevention in this case was to sidestep the issue. If we don’t collect the information in the first place, then we can’t provide it. This is not standing up for what is right, this is avoiding the issue, and avoiding trouble.

There is nothing wrong with wanting to avoid trouble! But this avoidance is hamstringing future possibilities. We are basically saying as librarians that we cannot be trusted with data.

My library director, my supervisor and I have had discussions about data collection, most recently in relation to a change in our interlibrary loan article service. We are working in cooperation with other libraries, and as such had to make data collection decisions together. I, of course, recommended to get the data in its raw form, before it had been anonymized. I was asked the question, why would we want to get the requesting patron’s identification if the details are just for statistical purposes. I replied the following (via email…I’m not this rhetorical in person):

I’ve been thinking about this for a while now… and while I understand and respect the historical reasons for the reluctance to collect this kind of information (Patriot Act, etc), I think we need to also look at what is happening with data all around us every day, and have a conversation about what we are potentially missing out on by not collecting this kind of data.

In the past we’ve had patrons that request the exact same article multiple times- this may be a way to identify these patterns in a systematic way. This information also could be useful for collection development purposes: This journal has been requested 8 times…by the same person. This journal has been requested 7 times…by 7 different people. I would think that this would make a difference. It wouldn’t have to be the patron’s name, but some unique identifier.

This can be used for data analysis purposes as well- Is there a correlation between student success and library usage (in this case interlibrary loan usage)? Jane Smith has ordered xxx articles and has xxx GPA. John Doe has ordered 1 article and has xxx GPA. Is it totally necessary? No, but I would argue that by collecting this kind of information we can answer some of the broader questions that allow us to justify budgets.

Much of our consortial reporting is currently confusing. If the information is found one way there is one answer and another way, the numbers can be completely different. But currently we are unable to identify why the numbers are so different- the process is opaque (even after speaking with consortial staff). By being able to track a single person, it may be easier to see where the discrepancies and variances happen, and thus get a more accurate report.

I would just like to add that I strongly urge us to recommend this, but considering how conservative the cluster has been historically about collecting data, I don’t anticipate it being adopted. This could start a conversation, however. As universities rely increasingly on metrics and data collection, as public libraries use past checkouts to drive recommendation systems, and as society grows accustomed to data collection by corporations in our everyday lives, why do we not trust that entities such as academic libraries can’t be ethically responsible with this type of data collection? Why do we continue a “no touch” policy when this information could be used to impact budgets? We are missing opportunities to demonstrate our relevance, get buy in and help our students, faculty and staff by building recommender systems, or even just giving our users access to their past checkout history for their own edification.

We can’t just ignore what is happening with data and data collection today. If we do, we will be in danger of becoming obsolete in order to maintain the status quo. But it is our responsibility to consider the ethical implications that data collection may have on user behavior. It is important to think about and come up with a plan. Can we foresee unintended consequences? Who is responsible for data security? What kind of information will we make publicly available? These are all considerations that need to be taken into account.


About Ellie Kohler

I'm the Access and Learning Services Librarian at Rockhurst University, and was a founding member of the ILL Special Interest Group. My specialties include interlibrary loan, instruction, reference, circulation, reserves, and wrangling 40 (or so) student assistants. I continue to defy the librarian stereotype by keeping a cat-free household.
This entry was posted in Data Science, etc., Library Trends. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s