Crowdsourcing and Big Data

By GilPress
June 22nd, 2011
- Data Scientists

The Wikipedia article on Big Data says it “requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.” The examples given (Hadoop, MapReduce, Cloud Computing, etc.) do not include one very exceptional technology, the human brain, and a new way to harness its power, “crowdsourcing.” In the 2006 Wired article in which he coined the term, Jeff Howe wrote: “Just as distributed computing projects like UC Berkeley’s SETI@home have tapped the unused processing power of millions of individual computers, so distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains.” Isn’t crowdsourcing one of the “exceptional technologies” required by Big Data?

To find out more about crowdsourcing and its role in the service of Big Data, I attended yesterday a Crowdsortium Meetup. Karim Lakhani from the Harvard Business School opened with a brief keynote, reminding us of (Bill) Joy’s Law: “No matter where you are, most smart people work for someone else.” Following him was a panel with the aforementioned Howe, Dwayne Spradlin (CEO of Innocentive), Doron Reuveni (CEO of uTest), Dan Sullivan (CEO of Appswell), moderated expertly by Jim Savage, partner and co-founder of Longworth Venture Partners. From the wide-ranging discussion it became clear to me that the term crowdsourcing is applied today to many different activities. And four words kept popping up in the discussion: Competition, collaboration, crowd, community. Which led me to think about using a 2×2 matrix as a way to clarify where crowdsourcing start-ups fall within the crowdsourcing landscape: Competition vs. collaboration and crowd vs. community. Here are my definitions for these in the context of crowdsourcing:

Crowd — An individual or teams working on an activity and completing it with no visibility to other individuals or teams

Community — Individuals or teams working on an activity with some level of visibility to other individuals and teams

Competition — Individuals or teams working on and completing an activity independently (only one winner)

Collaboration — Individuals or teams working on parts of an activity and contributing to its completion (everybody wins)

Examples?

Crowd/collaboration: Amazon’s Mechanical Turk

Crowd/competition: InnoCentive

Community/collaboration: Appswell

Community/competition: uTest

A Big Data example? Kaggle, where “companies or researchers post a prediction problem and the world’s best data scientists compete to offer the best answer.” Kaggle is a crowdsourcing venture of the crowd/competition type.

Is this taxonomy helpful? And do you know of other examples where a crowdsourcing venture is focused on Big Data tasks?

Last updated on June 22nd, 2011.