Crowdsourcing and Big Data

The Wikipedia article on Big Data says it “requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.” The examples given (Hadoop, MapReduce, Cloud Computing, etc.) do not include one very exceptional technology, the human brain, and a new way to harness its power, “crowdsourcing.” In the 2006 Wired article in which he coined the term, Jeff Howe wrote: “Just as distributed computing projects like UC Berkeley’s [email protected] have tapped the unused processing power of millions of individual computers, so distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains.” Isn’t crowdsourcing one of the “exceptional technologies” required by Big Data?

To find out more about crowdsourcing and its role in the service of Big Data,  I attended yesterday a Crowdsortium Meetup. Karim Lakhani from the Harvard Business School opened with a brief keynote, reminding us of (Bill) Joy’s Law: “No matter where you are, most smart people work for someone else.” Following him was a panel with the aforementioned Howe, Dwayne Spradlin (CEO of Innocentive), Doron Reuveni (CEO of uTest), Dan Sullivan (CEO of Appswell), moderated expertly by Jim Savage, partner and co-founder of Longworth Venture Partners. From the wide-ranging discussion it became clear to me that the term crowdsourcing is applied today to many different activities. And four words kept popping up in the discussion: Competition, collaboration, crowd, community. Which led me to think about using a 2×2 matrix as a way to clarify where crowdsourcing start-ups fall within the crowdsourcing landscape: Competition vs. collaboration and crowd vs. community. Here are my definitions for these in the context of crowdsourcing:

Crowd — An individual or teams working on an activity and completing it with no visibility to other individuals or teams

Community — Individuals or teams working on an activity with some level of visibility to other individuals and teams

Competition — Individuals or teams working on and completing an activity independently (only one winner)

Collaboration — Individuals or teams working on parts of an activity and contributing to its completion (everybody wins)


Crowd/collaboration: Amazon’s Mechanical Turk

Crowd/competition: InnoCentive

Community/collaboration: Appswell

Community/competition: uTest

A Big Data example? Kaggle, where “companies or researchers post a prediction problem and the world’s best data scientists compete to offer the best answer.” Kaggle is a crowdsourcing venture of the crowd/competition type.

Is this taxonomy helpful? And do you know of other examples where a crowdsourcing venture is focused on Big Data tasks?

About GilPress

I launched the Big Data conversation; writing, research, marketing services; &
This entry was posted in Appswell, Business Impact, Crowdsourcing, Data Scientists, Innocentive, Kaggle, uTest. Bookmark the permalink.

8 Responses to Crowdsourcing and Big Data

  1. Great Article Gil, I also came away from the event thinking about the variety of objectives and methods of implementation for crowdsourcing. As interesting as the differences in each is the threads of commonality. For example, we might want to drive user participation of different kinds of users towards different kinds of goals, but we’re all still looking at driving participation, or building an experience that warrants deep participation. I very much liked your codification on crowd/community and collaboration/competition, and will keep it in mind as I continue to evaluate other crowdsourcing companies.

    • gp says:

      Which leads me to think that maybe this typology describes different user experiences(possibly all four in one crowdsourcing site) rather than different types of crowdsourcing ventures.

  2. Julian Awad says:

    Very helpful Gil. Thanks!

  3. Gil, wish we had met the other night — great extrapolation from Karim’s talk. Your reply to Dan’s comment is right on: the typography does describe the different user experiences that come from crowdsourcing sites. In many ways, each of the four in your codification drive towards the goal of community engagement.

    I wonder if a crowdsourcing site that sits closer to the center axis would have the highest levels of engagement. Would be interesting to plot various sites as you started to do.

    Would you say that sites that are focused on competition but have multiple winners (ie runner-ups, bonus points) are moving towards collaboration? For example, I work at Article One Partners, and we have found that our crowd is more engaged now that we have additional smaller rewards beyond the grand prize for each Study. I don’t know if I would classify this as collaborative (in the normal sense of the word) though. Not trying to go deep into semantics, but you’ve got me thinking!

    • gp says:

      Thanks, David. Given the approach I took to thinking about the types of crowdsourcing, I would say that multiple winners (as opposed to one) move a site a bit toward the “community” end of the crowd/community spectrum, as will any other dimension that will make the participants less anonymous and more visible (e.g., uTest promoting and ranking the testers as opposed to keeping them unknown). The competition/collaboration spectrum in my typology is more about how the task is performed and completed: by a single individual/team from beginning to end or by multiple individuals/teams working on different pieces/parts of the task.

  4. Pingback: 12 Rules for Managing Crowdsourcing Communities | What's The Big Data?

  5. Pingback: Revisiting Big Data and Crowdsourcing: Kaggle Today | What's The Big Data?

  6. Pingback: Learning Big Data « TechnoBuzz

Leave a Reply