Changing the World with Social Data (Part 1)

Twitter #DataGrants Winners (Part 1)

Twitter #DataGrants offer academics access to social data with the intention to change the world. At today’s panel, three researchers spoke on how they plan to use Twitter data to answer big questions around health, disaster response, and sentiment analysis and the best ways for the social data industry to work with academia at large to encourage new ideas, collaboration, and how to train the next generation of scientists to effectively use social data.

John Brownsten of Boston Children’s Hospital / Harvard Medical school plans to use Twitter Data to track foodborne illness, which generally goes unreported due to its fleeting presence. Tomas Holderness of the University of Wollongong will use Twitter data to track and test disaster response and decision making during annual flooding in Jakarta, Indonesia, so that future flood damage can be mitigated in real time. Finally, Mehrdad Yazdani of UCSD is using machine learning and artificial intelligence to track and predict sentiment analysis via selfies and other images.

One theme that persisted through this panel was the need for joint effort between the social data industry and academia. The researchers were primarily interested in the sharing of information and resources between both groups and how the creativity and novelty of academia can bring big gains to the social data industry as whole. Currently, researchers are facing two main challenges in dealing with social data: Firstly, the storage and infrastructure of such large data sets can be challenging for academics working with limited budget and limited expertise in dealing with big data. Secondly, it would be helpful for the industry to help clearly communicate, standardize, and define things such as compliance, privacy, and data protection, so that academics aren’t unknowingly using data incorrectly.

Overall, the tone for the use of social data in academic and policy-making capacities was optimistic. The researchers pointed out that the social data stream is reflective of a person’s life as whole, not necessarily just of, for example, their medical history. The whole-picture approach to someone’s life can provide clearer data for academics to test their hypotheses, and use their findings to effect real world policy change and promote and protect public health.

Social Data and Academic Research

Social Data and Academic Research

Farida Vis, Kevin Driscoll, and Chris Cieri discuss the role that social data plays in research.

Social data and academia go hand-in-hand. Social data possesses near limitless research application as demonstrated by the diverse paths (background/journeys) of our Social Data and Academic Reseach panel. Farida Vis’ interest in social data originated in 2005 during Hurricane Katrina around crisis communication. As a Faculty Research Fellow at University of Sheffield, UK, she now runs a funded social media lab and focuses her studies on image sharing – using computer science, sociology and art history expertise to analyze visual culture. Dr. Kevin Driscoll is a recent graduate of the Annenberg School for Communication and Journalism at University of Southern California. His research examines the media infrastructures that undergird popular culture and political communication. More specifically, his recent dissertation research details the origins of social computing in the dial-up bulletin-board systems of the 1980s. As the third and final member of our Academic panel, Chris Cieri began his career as a sociolinguist; studying how languages changes and varies across demographics. He now oversees the Language Data Consortium (LDC), a non-profit organization that collects, enhances and shares language resources to support linguistic education, research and technology development. As Stu Shulman led the conversation in became clear that while the research applications for social data are many, there are also several prominent challenges the Academic community faces as well.

Sourcing & Paying for Social Data

Most academic research falls under non-profit classification according to Cieri. The non-profit classification is  accompanied by several specific requirements – research often must aim for public good, certain legalities allow for tax breaks but also may include special fees and other “strings-attached” that make acquiring data a lengthy process and quite costly. To combat this, the LDC pools financial resources (membership fees) and work with social data providers to arrange favorable licensing conditions and fees, which its members then agree to resulting in access to data and lower costs. Driscoll added that its fairly costly to buildout local infrastructure, so the Academic community relies heavily on relationships and collaborations between institutions [to maximize funding]. This engagement of broader communities with the Academic world presents a unique opportunity to build an industry from the beginning according to Vis. However this presents a tricky, uncertain business model as it is difficult to secure future funding in advance. The “buy-as-you-go” model is commonplace and works the best for today’s academic world.

Replication: It’s Role in Data Science & the Pressure to Make Data Sets Available

The notion of replication of social data in the Academic realm is not a straight forward concept according to Driscoll.  While it is the expectation and norm is to replicate data and make it freely available for others in the research community to reuse, traditional boundaries are becoming harder to define in certain overlapping areas. In the UK, data reuse is an important element of the industry, however social data comes with terms of service that those who fund research do not fully understand. Therefore, researchers often struggle communicating what they believe they can do with the data they aim to acquire to funders. Cieri stressed that replicating data sets is critical for the research community (the LDC has done so for the past 20 years), but there are growing challenges such as compliance that make it difficult to keep data sets current.

The Unsettled Social Data Terrain

In many ways, the social data industry is still  young and constantly evolving. This presents a number of key challenges for academics. The topic of ethics was discussed at length: Vis noted the standard framework for ethics in academic research has been to do no harm, but this traditional view isn’t necessarily tailored for the current moment. Users don’t often think about how their data is used in different spaces. Driscoll cited his research of the origins of social computing in the dial-up bulletin-board systems of the 1980s and 1990s noting that decentralizing of the movement has resulted in a loss of personal connection and hierarchy that makes it extremely hard to address the wrong-doings of data collection and use.

The Terms of Service that accompany social data also present dilemmas for the academic research community. There are complex restraints and differing bodies of law that effect social data. The panel agreed that the industry shouldn’t expect researchers to have expertise in these laws.  It is through the help of intermediaries like the LDC and initiatives such as Big Boulder that we can continue to educate the community, enable it to remain nimble and adapt, and build confidence within industry.

Acknowledging Social Data Research

The recognition of social data research and the profession has been uncertain and greeted with skepticism in some disciplines compared to others. Vis began her work in 2005 at a time when the profession was viewed as less favorable and not sustainable. No one knew how the academic world would progress to now. The risks are still very real – at the end of the day, what matters is that academic research results in published articles and positive revenue. If these two items aren’t being realized, support falters. But there is promise for the future. Cieri noted that different stakeholders hold different expectations for academic research in social data and it’s to the mutual benefit of both parties to work in parallel to justify everyone’s work. The panel advised the community as it moves forward, removing the barriers of social data to academic researchers will be essential. For it is the products of academic research that result in new technology that in turn effects the greater world. Today, most technology is based on machine learning and content providers can almost guarantee this technology will be trained and function on their data… if it is made available.

Social Data Analytics in China

Social Data Analytics in China

CIC conducts social media research and monitoring in China for global brands that have a presence in China, such as L’Oreal. Social Touch is an end-to-end marketing solutions provider in China for more than 50 internationally recognized brands, including P&G and Airbnb, their newest customer.

Yu and Zhang began by describing the current landscape of social data as an evolution unique to the preferences of Chinese consumers. Not surprisingly, China’s social landscape did not always mirror changes in the US or global social communities. One reason may be the different cultural context. In general, Chinese customers have been less focused on privacy issues, although both later mentioned that Chinese consumers privacy has recently begin to change with privacy possibly playing a more prominent role. Also, the large population requires social companies to ‘tier’ different regions and cities. This, of course, introduces different adoption and usage and therefore, challenges to collecting and measuring the data.

The dialogue shifted to potential challenges and opportunities in China, at least as understood by Yu and Zhang. One potential challenge is data integration across multiple media. For example, understanding relationships (and possibly correlation) between television viewership and social data. This will likely to be a potential challenge and an ongoing dialogue in the years to come that extends beyond China. As for opportunities, tapping into China’s older generation was suggested as a new potential market. Here is why. Improved technology has introduced tech gadgets and software that is more intuitive to use and this alone has removed a significant barrier to adoption.

This conversation touches the tip of the iceberg when considering opportunities within social data analytics in China. Hopefully, Big Boulder will facilitate more in depth conversations among its attendees and within the larger, global social data community.

Journalism in a Realtime World

Journalism in a Realtime World

Andrew Fitzgerald, Chris Anderson, and John Melloy discuss the role of social data in breaking news.

The morning’s presentations started with a discussion about how social media is truly transforming the face of journalism. In a world full of self-reporting eyewitnesses, social is often the source for breaking news well in advance of traditional media outlets. Chris Moody moderated a panel of guests from Stocktwits, Pixable and Twitter who’ve collectively worked at CNN, CNBC, Bloomberg. Theyl shared their unique perspectives on the current state of journalism and the role “citizens-as-reporters” play in it.

We heard how sources for breaking stories are often now found via social platforms, and sometimes those sources are even documented in court records as testimonial evidence. Major natural disasters are another area where social has played a major role, both in emergency alerting as well as first-hand reporting from those affected. The presidential elections in 2008 were one of the earliest places that saw the power and societal shift that comes with socially generated content. And on a lighter note, social has also given movie studios incredibly accurate predictions for box office reception thanks to early audience conversations. Can you say, “Sharknado, Part 2??”

The discussion then moved on to the actual content itself, and how videos and photos in social activities dramatically increase engagement and tell more of the story than text alone. The panel finally talked about how journalists need to be careful to monitor the validity (or lack thereof) behind news events shared via social media. Editorial judgement still plays a key role in reporting, and it is up to journalists to ensure that they are vetting their sources appropriately.

While each panelist shared a unique take on the role of social in journalism, they all held the common belief that it brings an undeniably positive and disruptive influence. The way in which we report and consume the news will never be the same again.

A Picture is Worth A 1,000 Words


David Rose from Ditto Labs and Sharad Verma from Piqora discuss the challenges and opportunities in visual media.

In Mary Meeker’s recent Internet Trends Report, she states there are 1.8 billion images shared daily across Flickr, Snapchat, Instagram, Facebook, and WhatsApp. This doesn’t contain any of the images shared on highly visual networks such as Pinterest and Tumblr or mixed media networks such as Twitter. The potential to analyze images and derive insights is huge; if a picture is worth 1,000 words then the potential value in analyzing images is at least as large as the value in analyzing text.

The conversation led with a discussion on the features of the networks that facilitate the creation and sharing of visual content. David and Sharad briefly debated whether mixed media networks would be able to harness the ease of consumption and emotional response that the more visual focused networks have used to catalyze their growth. Their conclusion was it is too soon to tell. They also discussed the motives of the users of the more visual networks citing discovery and self-selected feeds as reasons why people opt to use highly visual social networks.

David shared some stats around image analysis such as 130 million images a day are shared on Tumblr and of those images 28% contain text within the image that can be extrapolated. This began the discussion on types of analysis that are possible with images. David mentioned that 3-4% of those Tumblr images referenced were selfies and that there is a “smile score” where it is possible to quantify the emotion of the person in the image. They also shared some statistics around the cross-posting of images on various social networks such as 20% of Tumblr images get posted on Pinterest within two weeks of their initial post, and 40% of Tumblr images live on Pinterest. This indicates two types of users according to Sharad, people who cross-post strategically and optimize per network, and people who simply cross-post their content to as many networks as possible.

David shared a real-time feed of images from Tumblr, Instagram and Twitter which were related to clothing. He said that they can take feeds which contain filters on brand names, expressions, content and more and sell them to brands. According to David, approximately 85% of user-generated images on social networks that are relevant to a specific brand cannot be identified with text or hashtags, the image itself must be analyzed. Being able to do this allows brands to use social analytics to improve their consumer research, audience discovery, advertising, and understand the ROI of visual social posts.

E-commerce and the marketing funnel were strong themes throughout the discussion. Sharad made it clear that images posted on Pinterest are toward the bottom of the marketing funnel. They can link directly to a product page and can be analyzed for ROI easily, but measuring ROI beyond that direct response could be very valuable. Images on Instagram and other networks he considered more top of funnel and measuring their impact on ROI, along with things like category or price point, were very insightful in his experience. David mentioned the ability to analyze images from users on Twitter and find people with an affinity for a brand, or a competitor’s brand, and then reach out to them with tailored audiences could represent a unique marketing opportunity from visual social data.

Both agreed that social media analytics dashboards needed to be more visual and include more images to be more effective.


Brands and Social Data: Minding (and Mining) the Fine Line Between Helpful and Creepy

Marketing Use Case Panel

Chad Parizman from Scripps Networks Interactive and Vince Golla from Kaiser Permanente discuss marketing use cases for social data.

Few topics are more deeply personal and intensely guarded than information related to our health. Enter Vince Golla, Digital Media and Syndication Director, for Kaiser Permanente and its 9.3 million members. With millions of health-related conversations taking place each day on social platforms, Vince is charged with deciding where and when Kaiser should engage – if at all. The question of the day, every day, is, “What is the balance between being helpful and being creepy?” The answer, unfortunately, is rarely binary, with Vince and his team relying on a mixture of common sense and trial and error.

Kaiser has thousands of doctors and other subject matter experts at its disposal so Vince and his team figure out ways to get them into social conversations. “Any time we can we’re going to involve one of our physicians who are on Twitter already.” Still, there’s a catch related to what Vince described as the “tyranny of metrics”: how do brands resolve the conflict between responding with an institutional handle that may have tens of thousands of followers or the account of an expert who may only have a hundred? Again, the panelists had no single answer.

Chad Parizman, Director of Convergent Media at Scripps Networks Interactive, faces a different set of challenges with audiences of networks such as HGTV.  Programming and live content driven by social data analysis is a pillar of his team’s approach – a strategy solidified by the tremendous success of HGTV’s New Year’s Day campaign that saw “more Twitter mentions by noon than during the best week of the brand’s history.” Chad added, “We put 1,100 Tweets on TV over the course of the entire day and the analytics around that were resoundingly positive.” One thing became abundantly clear: HGTV’s audience is active on Twitter in a big way.

Despite very different customer bases and social media audiences, both Chad and Vince described the pain of quantifying the impact of their work – for business leaders, advertisers, etc. – as a growing issue. Chad sees increased interest from Scripps’ sales force, adding, “Coke, GM, Lexus, they’re spending money with us to reach potential buyers. At some point, someone is going to say, what is the value of all of these eyeballs. Is it the same eyeballs? What’s the value of an individual Tweet?” For advertisers and ad sales teams, those sorts of questions have always existed. “The answers are more mature on the print, radio, and TV side and people generally agree,” argued Chad. “There’s way less agreement on digital.”


A Code of Ethics for Social Data: We Need Your Help!

Update: Nov 14, 2014. Revised Draft Code of Ethics

One of the most important functions that the Big Boulder Initiative can provide is to help establish and clarify a code of ethics for the proper use of social data for industry, academia and other organizations who use it. To this end, the Big Boulder Initiative Board of Directors has drafted the following document, which we hope will serve as a starting-point for a final code of ethics to be posted on this blog and shared widely elsewhere.

Thank you to all the board members who have contributed, and also to the many others whose work provided a foundation for our thinking, in particular, the work of Jon Lovett and Eric Peterson of the Web Analytics Association (now Digital Analytics Association). We need your help and hope you’ll add your thoughts and comments so we can finalize it knowing that it was a collaborative effort by the social data community. If you don’t want to comment publicly, please feel free to email feedback to


– – – – – – – – – – –

DRAFT Code of Ethics for Social Data


The Big Boulder Initiative was founded to establish the foundation for the long-term success of the social data industry. To accomplish that, we must address the many and complex issues that social data poses: to interpretation, to analysis, to custodianship, to business value, and, of course, to individual protection and privacy.

The following Code of Ethics represents an effort to begin to define a set of ethical values and practices for the treatment of social data. It represents a commitment of the Big Boulder Initiative to proper data stewardship and an effort to educate the industry about ethical social data collection, processing and utilization practices.

Consider: what’s the worst that can happen?

About Social Data

Social media offers an unprecedented set of opportunities and risks for individuals and organizations. For individuals, social media offers new routes to self-expression mixed with a complex and ever-shifting set of contexts and expectations regarding ownership and privacy of that data. For organizations, social data offers new ways to glean insight into customer and consumer attitudes, but also raises ethical dilemmas with regard to proper use of that data in areas such as privacy, stewardship and storage.

The Code of Ethics

This document represents a starting point for articulating and honoring the most ethical business practices surrounding social data and its use for organizations.

1. Privacy

First, do no harm. Because of the many platforms, privacy settings and contexts for social data, privacy is much more complex than a simple “on or off” setting. It is highly contextual. For example, while tweets are generally public, broadcasting a specific tweet on television, with attribution, may represent more public scrutiny than an individual intends. The BBI board of directors believes that, in addition to honoring explicit privacy settings, organizations should do their best to honor implicit privacy preferences where possible. This may mean broadcasting a tweet without attribution, or with a blurring of the name. Specifically, the best practice is to preserve content within its original context so as not to surprise the user.

2. Transparency and Methodology

Social data can be used to make business or personal decisions, so it is critical that data sources are as clearly articulated as possible. A best practice is to include methodology, including sources and sample percentages, where possible, to enable readers to draw their own conclusions about the scientific validity of a particular set of recommendations. Be honest, especially when you don’t have all the answers.

3. Education

Because much of social data is unstructured, and its applications still relatively new, you must consider the implications when working with it. Be curious: what’s the worst that can happen? Your job is to facilitate effectively positive conversations and education within the industry versus fear and hype, and provide actionable and practical advice to users of social data, whether in the public sector or industry.

4. Accountability

Finally, prepare an action/crisis plan in case something goes wrong. As we’ve seen with many, many social media crises, social data can give rise to a host of unintended consequences. Do scenario planning: what options will you offer your consumers, providers, partners, customers if something—an outage, data corruption, hacking, privacy breach, or just poor judgment—goes wrong?


By agreeing to the four sections outlined in this Social Data Code of Ethics, I pledge to uphold these standards across the Internet. I will support the Big Boulder Initiative’s efforts to safeguard consumer data and privacy by providing feedback, referencing this Code and other related publications, and by advocating for adherence to these standards. If I observe a violation of these standards, I will make a reasonable effort to notify the site owner and provide feedback directly and privately, referencing this Code of Ethics as warranted.

[Note: the following will have live links when we finalize the COE:]

I agree to the above Social Data Code of Ethics and am ready to pledge.

View the current list of supporters.

Many thanks to the BBI Board for their input, and many thanks in advance to all of you who contribute!

About the Author

Susan Etlinger is a founding board member of the Big Boulder Initiative. She is an industry analyst at Altimeter Group, where she works with global organizations to develop social data and analytics strategies that support their business objectives. Susan has a diverse background in marketing and strategic planning within both corporations and agencies. She’s a frequent speaker on big data, social data and analytics and has been extensively quoted in outlets including Fast Company, BBC, New York Times and The Wall Street Journal. Find her on Twitter at @setlinger and at her blog, Thought Experiments, at

The Future of Social Data Starts Now

We first announced the formation of the Big Boulder Initiative last June to build the foundation for the long-term success of the social data industry. We’ve made a lot of progress since then.

Last fall, we held a series of small workshops in four cities to discuss the future of social data and the challenges we all face in creating the future we believe is possible. Across all of the workshops, participants – publishers, brands, solution providers, analysts, academics, public sector, finance and more – represented a range of perspectives. We achieved two key outcomes from these events: we identified six high-level topic areas to address, and we elected a board of directors.

This year, the board has gone to work. We established the Big Boulder Initiative as a 501(c)(6) nonprofit organization. We refined and focused the topic areas identified in the workshops. We developed a working draft of a code of ethics for the the social data industry. We created a structure and definition for what it means to be a member of the Initiative. And last but not least, we pulled together the 2014 edition of the Big Boulder conference.

Today, we’re releasing several items so we can drive the discussion forward with the entire industry. First and foremost, our draft Code of Ethics is now available and open for discussion. Stewardship and appropriate use of data is a key topic that we’re planning to address, and the Code of Ethics is the foundation for this discussion. Second, you can now sign up to be a member of the Big Boulder Initiative. We have memberships for companies, academics, nonprofits and individuals. Third, we’re launching a new website for the Initiative. This includes membership information, this blog, and a forum where members can discuss topics we’re addressing. And finally, we’ll kick the conversation into high gear with the start of the Big Boulder conference tomorrow morning.

I’m thrilled to see us take this big step forward with this Initiative. I hope you’ll consider becoming a member to take part in defining the future of the social data industry. This is the beginning of something big.