Cutting through social data noise with machine learning

By Madeline Para, CEO and Co-Founder at Twizoo.  (Madeline will be speaking at Big Boulder 2017!)

All of us that work with social data know the signal to noise ratio is ever increasing, and not in our favor. As the irrelevant baby picture or the 1000th Trump meme pops up in our social, that micro-dopamine hit of ingesting content from social media may not fire like it used to, and you lose interest. In fact, it is this increasing noise that can cause businesses to become disillusioned or skeptical that they can use social data to achieve their goals.

As these noise levels increase, it’s not hard to recognize that businesses need ever more powerful tools to find those sparkly signal-diamonds in the social rough. The basics of attributing sentiment scores and broad topic categories are outgrowing their usefulness and competitive advantages to drive significant results.

So, what is the next step for reaching an even a higher level of value when tackling the signal to noise problem with your social data? After all, the more accurate you are at collecting contextually relevant social data for your use case, the better your outcome and analysis will be.

You may have thought about writing some of your own rules on top of your dataset to improve it’s relevancy and quality. We’ve all seen the complex Boolean search queries your teams may use in TweetDeck or similar tools. For example – perhaps you want to analyze all social posts that are about US President Donald Trump. You might create some search queries for his handle(s), as well as keywords like “Donald Trump” and “President”. However, the latter keyword term will bring back contextually irrelevant noise, compromising your dataset. You might then add some heuristics on top to reduce the noise, such as, must mention keyword “President” but NOT “Obama”. However, these rules can soon get out of control and unscalable, or give you misleading results.

Machine learning is a term coined in the 1950s, but has only recently become so prevalent in the tech press that it may leave you feeling left behind. Put simply, machine learning allows us to give the computer the ability to learn from data provided, and predict an outcome given incoming data—like predicting which social posts mentioning the keyword “President” are actually about Donald Trump versus someone else. With the massive increases in computing power at ever cheaper costs, improved tool sets and a growing abundance of data, we are living in a golden age for machine learning.

So, can machine learning solve all your complex signal to noise problems? Maybe. If you have enough data, patience and time – almost certainly. Here are the steps to get you started:

  • Define the desired outcome clearly. Take samples from your dataset and examine them carefully to really understand what is signal and what is noise. Have your team manually tag say 20 samples with what they think is the desired outcome. This may sound silly or you make think the desired outcome is obvious, but I promise you will find that your team (or even your customers) will differ in opinion at this step, especially with any noisy or complex dataset. Getting on the same page of what is actually signal and what is truly noise will save you a lot of pain down the road.
  • Evaluate if this is actually more than one problem. When you examine the data as part of step 1, does it seem like there are multiple different contributing factors that define noise? For example, are there certain types of accounts you don’t want in your dataset regardless of the content they contribute? If there are multiple different contributing factors, each sub-problem is usually better tackled separately. Define these problems here.
  • Don’t re-invent the wheel. Next, understand if any of your problems have already been solved. For example, if you think accounts with profile pictures that aren’t people is contributing to your noise, don’t go build a face detection machine learning model as this is a well-solved problem. Instead, use face detection output from existing technologies, and test it as a feature in your machine learning model unique to your problem.
  • Get technical. For any of your problems that are not already publicly solved, you now need to pick the best machine learning approach to apply. Unless you’re feeling completely wild, crazy and academic, you will likely not need to invent a new machine learning algorithm, you just need to find the most suitable one for your task to build a model unique to your problem. (You also may now need to gather a large labelled dataset using your clear desired outcome definition, but this is a separate blog post!). Microsoft has a great cheat-sheet to help you through this step:

We know how tempting it is to jump straight to step 4, especially for a team of smart engineers and scientists. At Twizoo, we’ve been solving signal to noise problems with machine learning for years, and have the battle wounds to show first-hand how diligently completing steps 1-4 may save you months of pain and ultimately drive higher precision and accuracy. If you want to talk more about your machine learning problem, or if you want to learn more about how Twizoo can help you mine social media for great user-generated content, feel free to reach out at madeline <at> twizoo <dot> com. See you at Big Boulder 2017!

Call for Big Boulder 2017 Sponsors!

rocky-mtWe’re excited to announce Big Boulder’s first ever sponsorship opportunities! Check out the details below to see how you can reach the influencers in social data.  If you’re interested in being one of our sponsors, please contact sponsorship@bigboulderinitiative.org.

  • Mt. Elbert: All Attendee Happy Hour Sponsor
  • Gray’s Peak: All Attendee Dinner Sponsor
  • Torreys Peak: Welcome Reception Sponsor
  • Mt Evans: Lunch or Breakfast Sponsor
  • Longs Peak: Round Table Sponsor
  • Pikes Peak: Break Sponsor

Mt. Elbert (elevation 14’443)

All Attendee Happy Hour Sponsor

  • Sponsor of Thursday evening all attendee happy hour
    • Opportunity to provide promotional signage with prior approval from BBI
    • Opportunity to provide promotional give-aways with prior approval from BBI
  • Branded sponsorship signage in prominent conference locations
  • Recognition and thank you during keynote by BBI board
  • Conference website recognition
    • Logo on sponsor section
    • 500 word company description
  • Conference brochure recognition
    • Logo on sponsorship section
    • 500 word description
  • Full page advertisement in conference brochure
  • Logo on footer of conference emails
  • Five full conference tickets
  • $200 discount code to apply to full conference passes for sponsor employees only

Gray’s Peak (elevation 14,270’)
All Attendee Dinner Sponsor

  • Sponsor of Thursday night all attendee dinner
    • Opportunity to provide promotional signage with prior approval from BBI
    • Opportunity to provide promotional give-aways with prior approval from BBI
  • Branded sponsorship signage in prominent conference locations
  • Recognition and thank you during keynote by BBI board
  • Conference website recognition
    • Logo on sponsor section
    • 500 word company description
  • Conference brochure recognition
    • Logo on sponsorship section
    • 500 word description
  • Full page advertisement in conference brochure
  • 
Logo on footer of conference emails
  • Five full conference tickets
  • $200 discount code to apply to full conference passes for sponsor employees only

Torreys Peak (elevation 14,267’)
Welcome Reception Sponsor

  • Sponsor of Wednesday night all attendee welcome reception
    • Opportunity to provide promotional signage with prior approval from BBI
    • Opportunity to provide promotional give-aways with prior approval from BBI
  • Branded sponsorship signage in prominent conference locations
  • 
Recognition and thank you during keynote by BBI board
  • Conference website recognition 
Logo on sponsor section
    • 500 word company description
Conference brochure recognition
    • 
Logo on sponsorship section 
500 word description
  • 
Full page advertisement in conference brochure
  • Logo on footer of conference emails
  • 
Five full conference tickets
  • $200 discount code to apply to full conference passes for sponsor employees only

Mt Evans (elevation 14,264’)
Lunch or Breakfast Sponsor

  • Sponsor of Thursday or Friday breakfast or lunch
    • 
Opportunity to provide promotional signage with prior approval from BBI
  • Conference website recognition
    • Logo on sponsor section
    • 
500 word company description
  • Conference brochure recognition
    • Logo on sponsorship section
    • 
500 word description
  • Half page advertisement in conference brochure
  • 
Logo on footer of conference emails
  • Four full conference tickets
  • $100 discount code to apply to full conference passes for sponsor employees only

Longs Peak (elevation 14,255’)
Round Table Sponsor

  • Sponsor of round table discussion for up to 20 self-selected attendees during lunch on Thursday or Friday
    • Opportunity to select a topic related to conference themes with prior approval from BBI
    • Opportunity to provide promotional signage with prior approval from BBI
    • BBI moderator will be provided to help facilitate the conversation
  • 
Conference website recognition
    • 
Logo on sponsor section
    • 500 word company description
  • Conference brochure recognition
    • Logo on sponsorship section
    • 500 word description
  • Quarter page advertisement in conference brochure
  • 
Logo on footer of conference emails
  • 
Four full conference tickets
  • $100 discount code to apply to full conference passes for sponsor employees only

Pikes Peak (elevation 14,110’)
Break Sponsor

  • Sponsor of Thursday or Friday morning or afternoon break
    • 
Opportunity to provide promotional signage with prior approval from BBI
  • Conference website recognition
    • 
Logo on sponsor section
    • 
500 word company description
  • 
Conference brochure recognition
    • Logo on sponsorship section
    • 500 word description
  • 
Quarter page advertisement in conference brochure 
Logo on footer of conference emails
  • 
Three full conference tickets

Call for Big Boulder 2017 Speakers

Big Boulder 2017 is just around the corner and we can’t wait to welcome you to Boulder in a few months. The board has been working hard to organize amazing content for this year’s conference on June 1st & 2nd. Along with our traditional set of compelling speakers/topics, we’re also looking for community contribution. As such, we’re opening a call for speakers! 

Themes

Every year, the BBI board works to determine what are the most relevant topics in the industry. After much deliberation, the themes for this year’s event are:

  • Abuse on Social Media
  • Messaging Apps Continued Rise
  • 
AI & Machine Learning
  • The Political Landscape of Data
  • Disruption: The New Norm

Do you have something important to share with the others in the industry? Are you passionate about one (or more) of the themes listed above? Well, here’s your chance. We’re using Pecha Kucha format to give you the opportunity to share it with the member community. So, craft up your best ideas in and send ’em in! If your Pecha Kucha is selected, you get a free ticket to the conference!

How to Submit

We invite provocative Pecha Kucha submissions related to our themes. Please send your 20-slide Pecha Kucha-compliant slide deck to mike@bbi.org by Friday, April 28, 2017. No product mentions or sales pitches please!

If you have any questions, please email mike@bbi.org

Big Boulder 2017 Registration is Open!

Register now!
Big Boulder 2017 will be June 1-2 and registration is now open! It’s hard to believe, but this is our 6th year and we’re excited to see everyone. Our themes for this year’s Big Boulder are:

  • Messaging Apps Continued Rise
  • AI & Machine Learning
  • The Political Landscape of Data
  • Abuse on Social Media
  • Disruption: The New Norm

In order to ensure the sustainability of our conference, we will be charging $1,000 for tickets this year. We are no longer able to provide comped tickets, unless your membership is current.  Please contact Michael at mike@bbi.org if you have any questions.

We’re also looking for sponsors! If you’re interested please contact Michael at mike@bbi.org.

Book Your Accommodations for the Conference
As with other years, Big Boulder will take place at the St Julien Hotel and Spa. If you’d like to reserve a room at the St Julien, please click here or call the St Julien at 720.406.9696 or 877.303.0900. If you make your reservations over the phone, please be sure to mention that you are part of the Big Boulder Conference in order to receive discounted pricing.

Again, if you have any questions, please contact mike@bbi.org..
Thank you,
Michael