These are the notes I used at panel discussion at the 2017 AIR Forum. The panel was "Big Questions About Big Data" with Jeffrey Alan Johnson and Loralyn Taylor.
Being here today, I wonder if we are becoming the people William Gibson warned us about. If you haven’t read Gibson’s works, especially some of his early works like Neuromancer and Virtual Light, it might be time. His vision in the early 1990s of identifying and tracking people through their economic transactions helped define how I, and I think others, began to think about tracking students across time and location.
What we see happening today with Big Data goes beyond the early days of Gibson’s Cyberpunk. He did not imagine social media to extend quite the way it did and to become another a dataset, at least as I recall. I remember though how it struck me to think of people as being nothing more than their economic transactions. I could see the same applied to students. Track their own economic transactions, financial aid and tuition and fee payments, course/credit attempts, credentials earned. All of this combined with demographics and definitions to add shape and structure to the patterns became my interests for a while.
The reporting almost became secondary. It was the attempt to envision the world through the flow of what was then very limited data.
Of course, it all came at the price of not looking at the world, only its representation in the data. It was too easy to forget that these images in the data were real people and I was making recommendations for policies and interventions that could change their lives.
Remember, Jeff Goldblum’s character in Jurassic Park, Dr. Malcom? “Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should.” I know we all have good intentions, we want to serve students better, to ensure their success. We want to be more efficient and effective, always seeking better use of each dollar regardless of its source. But, what are the implications for what we are doing?
Two years ago, I sat in a six-year planning meeting for a college, where institution leadership announced its partnership with a leading tech company to significantly increase retention and graduation through predictive analytics and intrusive advising. In fact, they went so far as to say, “If the model predicts a student heading for trouble and is unresponsive to attempts at communication, we will go so far as to have someone waiting for them in the parking lot at their car.”
Say what? I was taken aback. I understood the intent and appreciated their commitment to student success, but asked something along the lines of, “At what point does an intrusive advisor look different than prospective rapist? Aren’t you normalizing behavior, a use of data, and student identification that you would not normalize or even accept under other circumstances?”
This was when I blogged about the need for ethics and suggested and each following meeting that every Virginia institution considering the use of big data and predictive analytics should develop a statement of ethics about use.
If we take Big Data and predictive analytics to their idealistic conclusion, we would seem to remove all doubt from outcomes and changing individual performance. But what would happen to maturation along the way? Would some numbers of our students mail to mature during college since they failed to fail and thus learn from the experience? Would we end many of the failures and happy accidents like those that led me to becoming an art major which set off a chain of events that put me here today? Admittedly, I had privilege, support, and a fair amount of well-being, so I was not that much at-risk, save for the riskiness of being in the Army, which is where I went after dropping out of college.
One of the most exciting things of my job is the Virginia Longitudinal Data System. Using processes of de-identified matching, we have the ability to observe individual and group outcomes across nine agencies (and growing) to learn how to better deliver education, workforce, and social services. We do this under some of the most restrictive state privacy laws in the nation, and complying fully with federal privacy laws and regulations. The underlying story of longitudinal data is that time, place, demography, and experience create a “dataprint” of an individual that is at least as unique as a thumbprint. Patterns emerge from large datasets describing not just unique groups, but unique individuals. In committing to the VLDS privacy promise we rely on adherence to the letter and spirit of the law, and best practices.
It is through developing a real understanding of privacy laws that guides us in this work. There are three critical components to know:
- A data subject (someone from who you collect data) is entitled to know what data the government has on her. This includes any identified data acquired from a third-party.
- Data and information through analysis and sharing cannot be used to cause “harm” to a data subject unless the possibility of such harm was disclosed to the subject at time of disclosure.
- The subject has the ability to give consent or not to the provision of data and is informed as to what limits of service that may create.
Big Data can push us in a direction away from these principles. We can’t allow that. We must embrace these to ensure fair treatment of each person in the data.
I had to leave the conference after the first day for a meeting in Williamsburg. While away, I followed some of the action on Twitter. I saw a photo from session on analytics that listed “Black,” “Hispanic,” and other demographic variables as “risk factors.” It seems to me that when demographics and identity are risk factors in your education enterprise, you are not trying to educate the students you have, but the students you wish you had. There is a fundamental difference.