Social Data and Academic Research

Social Data and Academic Research

Farida Vis, Kevin Driscoll, and Chris Cieri discuss the role that social data plays in research.

Social data and academia go hand-in-hand. Social data possesses near limitless research application as demonstrated by the diverse paths (background/journeys) of our Social Data and Academic Reseach panel. Farida Vis’ interest in social data originated in 2005 during Hurricane Katrina around crisis communication. As a Faculty Research Fellow at University of Sheffield, UK, she now runs a funded social media lab and focuses her studies on image sharing – using computer science, sociology and art history expertise to analyze visual culture. Dr. Kevin Driscoll is a recent graduate of the Annenberg School for Communication and Journalism at University of Southern California. His research examines the media infrastructures that undergird popular culture and political communication. More specifically, his recent dissertation research details the origins of social computing in the dial-up bulletin-board systems of the 1980s. As the third and final member of our Academic panel, Chris Cieri began his career as a sociolinguist; studying how languages changes and varies across demographics. He now oversees the Language Data Consortium (LDC), a non-profit organization that collects, enhances and shares language resources to support linguistic education, research and technology development. As Stu Shulman led the conversation in became clear that while the research applications for social data are many, there are also several prominent challenges the Academic community faces as well.

Sourcing & Paying for Social Data

Most academic research falls under non-profit classification according to Cieri. The non-profit classification is  accompanied by several specific requirements – research often must aim for public good, certain legalities allow for tax breaks but also may include special fees and other “strings-attached” that make acquiring data a lengthy process and quite costly. To combat this, the LDC pools financial resources (membership fees) and work with social data providers to arrange favorable licensing conditions and fees, which its members then agree to resulting in access to data and lower costs. Driscoll added that its fairly costly to buildout local infrastructure, so the Academic community relies heavily on relationships and collaborations between institutions [to maximize funding]. This engagement of broader communities with the Academic world presents a unique opportunity to build an industry from the beginning according to Vis. However this presents a tricky, uncertain business model as it is difficult to secure future funding in advance. The “buy-as-you-go” model is commonplace and works the best for today’s academic world.

Replication: It’s Role in Data Science & the Pressure to Make Data Sets Available

The notion of replication of social data in the Academic realm is not a straight forward concept according to Driscoll.  While it is the expectation and norm is to replicate data and make it freely available for others in the research community to reuse, traditional boundaries are becoming harder to define in certain overlapping areas. In the UK, data reuse is an important element of the industry, however social data comes with terms of service that those who fund research do not fully understand. Therefore, researchers often struggle communicating what they believe they can do with the data they aim to acquire to funders. Cieri stressed that replicating data sets is critical for the research community (the LDC has done so for the past 20 years), but there are growing challenges such as compliance that make it difficult to keep data sets current.

The Unsettled Social Data Terrain

In many ways, the social data industry is still  young and constantly evolving. This presents a number of key challenges for academics. The topic of ethics was discussed at length: Vis noted the standard framework for ethics in academic research has been to do no harm, but this traditional view isn’t necessarily tailored for the current moment. Users don’t often think about how their data is used in different spaces. Driscoll cited his research of the origins of social computing in the dial-up bulletin-board systems of the 1980s and 1990s noting that decentralizing of the movement has resulted in a loss of personal connection and hierarchy that makes it extremely hard to address the wrong-doings of data collection and use.

The Terms of Service that accompany social data also present dilemmas for the academic research community. There are complex restraints and differing bodies of law that effect social data. The panel agreed that the industry shouldn’t expect researchers to have expertise in these laws.  It is through the help of intermediaries like the LDC and initiatives such as Big Boulder that we can continue to educate the community, enable it to remain nimble and adapt, and build confidence within industry.

Acknowledging Social Data Research

The recognition of social data research and the profession has been uncertain and greeted with skepticism in some disciplines compared to others. Vis began her work in 2005 at a time when the profession was viewed as less favorable and not sustainable. No one knew how the academic world would progress to now. The risks are still very real – at the end of the day, what matters is that academic research results in published articles and positive revenue. If these two items aren’t being realized, support falters. But there is promise for the future. Cieri noted that different stakeholders hold different expectations for academic research in social data and it’s to the mutual benefit of both parties to work in parallel to justify everyone’s work. The panel advised the community as it moves forward, removing the barriers of social data to academic researchers will be essential. For it is the products of academic research that result in new technology that in turn effects the greater world. Today, most technology is based on machine learning and content providers can almost guarantee this technology will be trained and function on their data… if it is made available.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s