Collaboratively crowdsourcing workflows with Turkomatic

anhong
New Member

Posts: 25

Collaboratively crowdsourcing workflows with Turkomatic Apr 23, 2016 22:37:14 GMT

Quote

Post by anhong on Apr 23, 2016 22:37:14 GMT

[written jointly by Toby and Anhong]

Authors

Anand Kulkarni: Was a Ph.D. Student at UC Berkeley Department of Industrial Engineering and Operations Research at the time of this paper. Now president and co-founder of LeadGenius: a startup using human computation and machine intelligence to automate sales at scale.

Matthew Can: Stanford University Computer Science Department. No bio found.

Björn Hartmann: Björn Hartmann is an Associate Professor of EECS at UC Berkeley. His research in Human-Computer Interaction focuses on novel design, prototyping, and implementation tools for the era of post-personal computing. His group investigates how better software and hardware can facilitate the exploration of interactive devices that leverage novel form factors and technologies (e.g., sensors and actuators). He also investigates how software can help students, designers, and makers to learn and share their expertise online. He got his Ph.D. in Computer Science at Stanford with Scott Klemmer.

Venue:
CSCW ‘12

Same year, same conference from the same group:
Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 1013-1022. DOI=http://dx.doi.org/10.1145/2145204.2145355

Summary:

Background: Crowdsourcing problems are often solved as a series of microtasks. For quality assurance and to accomplish complex work, multiple microtasks are frequently chained together into workflows. Such workflows may decompose larger tasks into smaller subtasks, and later recompose subtask solutions into an overall work product.

Problem: However, workflow design remains a major challenge. Requesters commonly rely on an iterative process to construct good workflows. The high cost & complexity of workflow design limit participation in crowdsourcing marketplaces to experts willing to invest substantial time. It also limits the kinds of work that are crowdsourced.

Solution: “Turkomatic” system: a novel crowdsourcing tool that allows the crowd to collaboratively design and execute workflows in conjunction with a requester. It accepts a requester’s specification of a broad objective, then asks crowd workers to determine how to structure workflows to achieve the objective.

Turkomatic executes a continuous price-divide-solve loop that asks workers to recursively divide complex steps into simpler ones until they are at an appropriately simple level, then to solve them. Other workers are asked to verify the solutions and combine to results into a coherent answer to the original request. See the below figures for an illustration of 4 kinds of subtasks: subdivision, verification, solution and merging.

Evaluation - Unsupervised Crowd Planning:

The authors test Turkomatic on Amazon Mechanical Turk platform for 4 types of tasks, and have”unsupervised crowd planning” (i.e. crowd be guided algorithmically to plan and solve problems without any input from requesters):

Essay writing: “Write a 3-paragraph essay.”
Natural language query: “Create a list of the names of the Department Chairs of the top 20 computer science college programs in the U.S.”
Itinerary planning: “Plan a complete road trip from San Francisco, California, to New York City. Completely include the location of all necessary hotels, restaurants, and sights along the way.”
Java programming: “Please write a short piece of Java code to reverse a string. The algorithm should take as input a string and output its reverse. Make sure it compiles.”

In the results, we see 2 tasks are successful, which one was snap judgement (turker marks the top-level problem to be solvable, and then the next turker gives a correct solution) and one was successfully solved with complex planning and divided into 7 subtasks.

Two tasks failed because of derailment, which occurred because (1) workers were confused about appropriate task granularity; (2) workers authored subdivisions that could be executed by a single worker, but not split across separate workers and (3) workers who had lost the context of the overall workflow generated decompositions that restated previous tasks as subtasks, leading to cyclic behavior.

One failed because of starvation – after a while, no new workers attempted the available tasks, and the time limit for the experiment expired without the execution of work continuing further.

Evaluation - Collaborative Planning and Execution:

The authors then informally re-ran the experiment for 3 similar tasks for 2 conditions, where condition 1 uses an expert crowd of workers recruited at UC Berkeley; and second uses a Mechanical Turk crowd, augmented through active requester participation and intervening using Turkomatic’s monitoring interface. All task succeeded for both conditions.

Discussion Question:

1. Three tasks used in this paper for evaluating unsupervised crowd planning failed because of starvation or derailment in the recursive workflow generating process. Do you think non-expert turkers can ever reliably design workflow for complex tasks? Why or Why not?

2. The authors examined the effectiveness of unsupervised crowd planning, as well as collaborative planning. Do you think this is enough? Should the authors also compare their system against letting the requesters come up with the workflow on paper, and using the Turkomatic system? How would you do the evaluation?

3. What kind of information do you think is missing from the different steps of the Turkomatic workflow that made it fail in the unsupervised crowd planning part? How can it be improved to better support this “meta-crowdsourcing” of workflows?

4. Collaboration and coordination are important challenges in designing HCI systems, how can we leverage theories in communication, collaboration, etc to inform this area of research?

5. What can we learn about crowdworkers and the design of crowd computing systems from this paper?

xuwang
New Member

Posts: 20

Collaboratively crowdsourcing workflows with Turkomatic Apr 24, 2016 11:43:29 GMT

Quote

Post by xuwang on Apr 24, 2016 11:43:29 GMT

I like the idea of letting crowd workers break down tasks for themselves and write instructions. Intuitively, i think expert crowd workers should be better at writing instructions, breaking down tasks than a lot of requesters, who don’t have much experience doing crowdsourcing work, as expert crowd workers should have a better sense of what is a doable task for a certain amount of compensation. But at the same time, I agree that it’ll be very difficult for non-expert turkers to design the workflow, especially at an earlier stage of the workflow. I think we can probably recruit experts-only to break down the task in the first several (3, for example) steps, and show subsequent turkers how the task has been divided, so that novice turkers will have a better sense of how to break down tasks, and probably follow in a correct direction.

For the evaluation, yeah, I was also wondering when I read the paper, how much time it costs the requester to intervene during the process compared to designing the workflow and writing the instruction himself in the first place. I’ll probably compare the time spent by the requester, the quality of the final response, and ask the requesters to rate their experience.

I think from the two failure conditions, derailment and starvation, the problem is that turkers are not clear about the granularity of each step. I think probably in the instruction given to turkers to break down the task, give some examples of what makes a good answer.

And this paper presents a sequential workflow for collaboration in MTurk, I’m wondering whether the collaboration in crowd can be done in parallel. For example, if a group of us are given the task to "create a list of the names of the department chars of the top 20 computer science programs in the US”, what we’ll probably do is to create a google doc, and we can put in names and rank of the program into it, so that everyone sees the progress, and could contribute accordingly until we have all 20 there. I think this process is more visible compared to breaking down the task into subtasks.

julian
New Member

Posts: 25

Collaboratively crowdsourcing workflows with Turkomatic Apr 24, 2016 18:55:22 GMT

Quote

Post by julian on Apr 24, 2016 18:55:22 GMT

5. What can we learn about crowdworkers and the design of crowd computing systems from this paper?

Crowdworkers and workflows in this space, require and are subject to similar conditions seen in a regular office. As it is demonstrated in this paper, crowdworkers are capable of planning their own tasks. Leveraging and measuring performance in this coordination tasks however is more difficult and may offset the benefits of crowdsourcing a task or project since it doubles or triples the effort for any crowdsourcing task on crowdworkers side. It is also surprising that despite the low complexity of the tasks only two out of five tasks succeeded. The problems with unsuccessful tasks however highlight that there were basic problems with the task definition. For example, the task about finding the department chairs of computer science programs, which ended up with IKEA related results. This was a lost in translation kind of problem which is evidence that more information about the task should have been provided. The trip planning tasks suffered from a similar problem. Hence, I would risk to say that if those tasks were better described, they could have been a success to.

Xu wrote : this paper presents a sequential workflow for collaboration in MTurk, I’m wondering whether the collaboration in crowd can be done in parallel.

I have been wondering the same. We have used this parallel collaboration for class projects a lot thanks to googleDocs. One of the things I have noticed is that although it is in parallel, people take the lead on writing different sections and once they are done with the section they were leading, they revise other people's sections. In a way is not truly parallel but more of a mix between parallel and sequential. Sometimes it works like a map-reduce where one person takes on combining distributed pieces of knowledge into a single piece.

sciutoalex
New Member

Posts: 21

Collaboratively crowdsourcing workflows with Turkomatic Apr 24, 2016 19:08:11 GMT

Quote

Post by sciutoalex on Apr 24, 2016 19:08:11 GMT

I'm curious about the limits of questions this system could answer if it were properly tuned. Could this system write my the literature review we have due next week? What would the tasks look like for that? There'd be tasks for ideating different sections, gathering sources, writing the sections, combining the sections. There'd also be tasks about correct formatting, citation style. Would there need to be style tasks to find poorly written sentences? Would there need to be plagiarism checks to see if any worker copy-pasted their work? Once I got back this finished PDF, I'm also trying to imagine the interface if I felt there was too much introduction and not enough conclusion. Also, what would the worker need to see beyond the input/outputs? When writing a paper, I don't just look at what I'm writing, but I have a list of sources already consulted, a list of sources needing to be read, etc. There's a lot of side work done that isn't contained in the document itself. Turkomatic would need to manage just the main thread of work, but these other threads so that the workers do not get stuck in a loop. I like thinking about this question because the tasks the paper asked were very tame. But I could see something interesting coming out of much bigger, multi-layered tasks like this one.

toby
New Member

Posts: 21

Collaboratively crowdsourcing workflows with Turkomatic Apr 25, 2016 4:45:08 GMT

Quote

Post by toby on Apr 25, 2016 4:45:08 GMT

The idea of doing task in parallel talked by @xu and julian is super interesting! The first thing came into my mind is how can we ensure "thread safety" for distributed human computing. An example "race condition" would be, let's say for "naming chairs for the top 20 CS program" task, one turker puts down CMU in the school column. Then another turker starts looking for the chair of the CMU SCS, while turker 3 changing the entry of CMU to another school (e.g. MIT) at almost the same time.

We can certainly deploy methods from distributed non-human computing and ask turkers to put a mutex lock or semaphore on resources they need to secure for doing their job. But I highly doubt if most turkers can do this correctly unless we find a good interface to instruct them.

Another idea is, how can we break down tasks using crowdsourcing more "programatically"? Seems like turkers have problem writing subtasks that meet the requirement of "Each step contains all information required to do the task" and "Each step can be understood by itself without reading the original task written in red". Can we just remove these two requirements in this stage, and add another stage where turkers mark the topological relations (i.e. A needs to be done before B) for all the subtasks? This way we can also re-assemble subtasks into "minimum solvable units" and determine if any of them can be done in parallel.

JoselynMcD
New Member

Posts: 22

Collaboratively crowdsourcing workflows with Turkomatic Apr 25, 2016 16:37:59 GMT

Quote

Post by JoselynMcD on Apr 25, 2016 16:37:59 GMT

5. What can we learn about crowdworkers and the design of crowd computing systems from this paper?

One thing I learned from this paper about crowdworkers, that I didn't understand before, was that some workers were so engaged, with the task offered by the authors that they were willing to write emails asking for a change to be made. This would be unpaid time that distracted from HITS that could be used to make more money. This runs counter to what I had long heard about being a universal truth of MTurkers: they just want to do the bare minimum to get out of a task so they can move on to the next one.

This is the passage from the paper detailing what I'm talking about:
Interestingly, eager workers sometimes arrived in already
derailed workflows and made efforts to correct the state of
the system. In several cases workers accepted Turkomatic
HITs, but were not satisfied with the workflow provided by
previous members of the crowd: these workers chose to
email the authors directly to suggest improvements to the
task. This suggests that making the Turkomatic requester
interface available to motivated workers to allow the crowd
to self-police and improve its own work.

This new insight about the investment of some MTurkers reminds me of some surprinsing findings found in Dana Goldstein's 'The Teacher Wars', which through qualitative research reveals that many of the most invested teachers leave the field because there is a lack of protocol to recognize their extra work. I'd love to see a more human-oriented MTurk system that really lets the best and brightest receive proper recognition for their work. I'd be more likely to employ MTurkers if I knew there was a way for quality workers to be supported by the system, and for more nuanced tasks, I'd be more inclined to employ them.

?

nhahn
New Member

Posts: 22

Collaboratively crowdsourcing workflows with Turkomatic Apr 25, 2016 17:29:26 GMT

Quote

Post by nhahn on Apr 25, 2016 17:29:26 GMT

When an individual does a task, do they break it up into microtasks? I think it's interesting that if you were to consider this at the most extreme, it would end up looking like something akin to a cognitive tutor. The microtasks would be a single or set of production rules that need to be completed in order to complete the task. Creating a production rule model for a cognitive tutor is no trivial task, even for an extremely experienced individual. I would imagine then, for a crowdworker, writing instructions isn't easy. As noted by the authors, good instructions encapsulated all of the variables necessary to complete the task (or fulfill the production rules). Also, as noted by the authors, downstream individuals would model their tasks after the workers higher up in the process. Could this be similar to a "solution strategy" that you might encounter in a cognitive tutor? This analogy might be a little stretched, but I think viewing microtasks as a set of production rules could be an interesting analogy to explore the somewhat sparse information and knowledge of microtasks.

jseering
New Member

Posts: 26

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 0:56:16 GMT

Quote

Post by jseering on Apr 26, 2016 0:56:16 GMT

Apr 25, 2016 17:29:26 GMT nhahn said:

When an individual does a task, do they break it up into microtasks?

I'm interested in this as a theoretical question. Our to-do lists are often consciously relatively fine-grained on the level of wash dishes, sponge off counter, return food to fridge, send email to advisor... but we definitely don't consciously think "type the letter T, type the letter H, type the letter E." What's the smallest task that we're consciously aware of? How is this distinguished, and how could it be measured?

It was kind of cool to see workers engaged in "fixing" a task even at the cost of their own time for relatively little meaningful reward. It's relevant to think here about how we view our participation on MTurk (as employers/researchers) if we see Turkers as members of our community rather than employees.

judy
New Member

Posts: 22

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 2:45:10 GMT

Quote

Post by judy on Apr 26, 2016 2:45:10 GMT

Apr 25, 2016 17:29:26 GMT nhahn said:

Creating a production rule model for a cognitive tutor is no trivial task, even for an extremely experienced individual. I would imagine then, for a crowdworker, writing instructions isn't easy. As noted by the authors, good instructions encapsulated all of the variables necessary to complete the task (or fulfill the production rules).

Yep. This seems to me to be one of the takeaways from the paper. Instruction design is hard. People train for years in instruction design (many of them are called teachers), and they still struggle with it. I am, however, impressed with how invested some of the workers were in the task design, but I'm not sure what to make of it.

xiangchen
New Member

Posts: 14

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 3:01:56 GMT

Quote

Post by xiangchen on Apr 26, 2016 3:01:56 GMT

I do not think using the crowd to design workflow is an indispensable task for the requester - that is, I hypothesize that the requester could have done it on their own with a result not significantly different. However, I feel like there's some benefit in having the crowd involved, just like for non-crowd tasks we always feel beneficial to have a 'second opinion', even if the end result is pretty much the same. Involving the crowd in high or meta level tasks, therefore, probably represents our humanly natural way of working with others on cognitive tasks, picking their brains and incorporating their input.

mkery
New Member

Posts: 22

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 3:26:47 GMT

Quote

Post by mkery on Apr 26, 2016 3:26:47 GMT

I like what xiangchen suggests about crowds working on higher level meta tasks. This goes back to writing google docs in parallel, why can’t the crowd work as a team on higher-level portions of a task like essay development, rather than reducing the work down to sentence level? The task “writing Java code to reverse a string” worked well because it only took one human to answer this correctly. Meanwhile “plan a roadtrip” did not need 55 subtasks (and 55 people!) working on this simple problem. Here, having a ranking system for Turker reliability would help perhaps in having just a few workers respond to such simple tasks?

(I suppose I struggle with the core crowdsourcing idea that all tasks should be micro-tasks.)

Last Edit: Apr 26, 2016 3:27:36 GMT by mkery

francesx
New Member

Posts: 28

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 3:47:12 GMT

Quote

Post by francesx on Apr 26, 2016 3:47:12 GMT

Very interesting topic; I wonder how and if it can be applied to learning sciences or education in general. As Nathan mentions above, is it possible to break certain tasks like curriculum building, or problem selection into "production rules" so that clearly and easily each worker cane execute them? What about our work as researchers, could it be split into such chunks?

On the other hand, what worries me a little bit in all this discussion about crowd work is, where does creativity go in this process? Or do we expect crowd workers to be able to do that?

Last Edit: Apr 26, 2016 3:48:33 GMT by francesx

cgleason
New Member

Posts: 15

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 4:08:11 GMT

Quote

Post by cgleason on Apr 26, 2016 4:08:11 GMT

Re: Three tasks used in this paper for evaluating unsupervised crowd planning failed because of starvation or derailment in the recursive workflow generating process. Do you think non-expert turkers can ever reliably design workflow for complex tasks? Why or Why not?

I doubt that this is possible. Planning workflows like this require a lot of experience, as mentioned by Nathan and Judy. People who have been through the workflow many times will probably come up with more efficient ways to do things, such as programmers who customize their text editors or anyone who has pushed back on a process suggested by HR. However, I doubt that most crowd workers (at least on Turk) spend enough time doing any entire workflow to come up with an optimal workflow. The requesters probably don't know terribly well either, since they are not completing the task, but at the very least they have the 100-foot view and the statistics to investigate inefficiencies in the system.

Is it a good idea to investigate? Yeah, but I'm not holding out for much except a "Suggest feedback" button on task design.

bttaylor
New Member

Posts: 12

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 4:26:42 GMT

Quote

Post by bttaylor on Apr 26, 2016 4:26:42 GMT

Apr 26, 2016 0:56:16 GMT jseering said:

Apr 25, 2016 17:29:26 GMT nhahn said:

When an individual does a task, do they break it up into microtasks?

I'm interested in this as a theoretical question. Our to-do lists are often consciously relatively fine-grained on the level of wash dishes, sponge off counter, return food to fridge, send email to advisor... but we definitely don't consciously think "type the letter T, type the letter H, type the letter E." What's the smallest task that we're consciously aware of? How is this distinguished, and how could it be measured?

It was kind of cool to see workers engaged in "fixing" a task even at the cost of their own time for relatively little meaningful reward. It's relevant to think here about how we view our participation on MTurk (as employers/researchers) if we see Turkers as members of our community rather than employees.

I think this starts to get towards the point of how experts differ from novices. We don't think 'type T,H,E' now, but we did when we were first learning to type. I think part of the difficulty with crowd work is the inability to adapt tasks as workers become more familiar with tasks. You can't easily build expecting a learning curve b/c you can't count on the same workers. I wonder if something like the Turkomatic framework could take into account task familiarity among the workers to adapt the subtask break down more appropriately for different groups?

stdang
New Member

Posts: 24

Collaboratively crowdsourcing workflows with Turkomatic Apr 26, 2016 4:43:09 GMT

Quote

Post by stdang on Apr 26, 2016 4:43:09 GMT

Three tasks used in this paper for evaluating unsupervised crowd planning failed because of starvation or derailment in the recursive workflow generating process. Do you think non-expert turkers can ever reliably design workflow for complex tasks? Why or Why not?

This system is effectively mapreduce for human workflows. Like map-reduce, not all problems are suitable to be solved by the system and of the ones that are, the structure of the problem dictates how the problem should be divided to be solved and solved efficiently. Furthermore, it takes deep understanding of the problem space to learn how to best represent the problem in order to be able to utilize the turkomatic. But these difficulties fall into a realm where HCI has a lot of experience scaffolding amateurs towards more expert performance through appropriate interface design and workflows.

Another problem latent in the starvation and derailment of the various tasks is a lack of recognition of the dynamics of the crowd. When assuming the crowd is a a collection of humans surfing to tasks and selected ones according to some personal criteria, you would not expect that over time a task that many crowd workers initially were attracted to now fails to attract anyone. There is no reason this type of institutional memory should develop. However, there is work recognizing the communal networks behind crowd workers that allows them to share their lessons learned. Once you start to recognize this dynamic it becomes possible to model both sides of the problem and tackle the challenges in this space.

Cogmini16

Collaboratively crowdsourcing workflows with Turkomatic

Post by anhong on Apr 23, 2016 22:37:14 GMT

Post by xuwang on Apr 24, 2016 11:43:29 GMT

Post by julian on Apr 24, 2016 18:55:22 GMT

Post by sciutoalex on Apr 24, 2016 19:08:11 GMT

Post by toby on Apr 25, 2016 4:45:08 GMT

Post by JoselynMcD on Apr 25, 2016 16:37:59 GMT

Post by nhahn on Apr 25, 2016 17:29:26 GMT

Post by jseering on Apr 26, 2016 0:56:16 GMT

Post by judy on Apr 26, 2016 2:45:10 GMT

Post by xiangchen on Apr 26, 2016 3:01:56 GMT

Post by mkery on Apr 26, 2016 3:26:47 GMT

Post by francesx on Apr 26, 2016 3:47:12 GMT

Post by cgleason on Apr 26, 2016 4:08:11 GMT

Post by bttaylor on Apr 26, 2016 4:26:42 GMT

Post by stdang on Apr 26, 2016 4:43:09 GMT

Quick Reply