|
Post by kjholste on Apr 17, 2016 4:53:52 GMT
[Below is a joint discussion post from Qian and Ken] Brief recap:
The authors develop a bounded rational model, DIV (Discriminating Information Value), for information foraging in the context of a decision among a known set of alternatives (e.g. using online rating information on an online marketplace, to decide which of a set of alternative products to purchase) -- focusing on the way prior beliefs about the relative utilities of the alternatives ought to guide an agent’s reasoning about the relative value of alternative information-gathering actions (e.g. choosing to examine a certain online review for a certain product), thus shaping that agent’s information search behaviors. The authors then evaluate this model, as an account of human decision making based on online rating information, through an experimental study. Comparison of DIV against some related models:IFT (Information Foraging Theory) - model(s) of the search problem (not necessarily including the embedding decision task): efficient search for more information about each option BSM (Bayesian Satisficing Model) - model of the ‘stopping problem’: when to terminate search and make a decision among available options? DIV and rational models based on EVSI (Expected Value of Sample Information) - model of the search problem: how should an agent’s current preferences among available options inform search behavior (gather more information about options that are currently more strongly preferred, or ones that are currently less preferred)? Very coarse summary of prescriptions of DIV and EVSI: EVSI - given current beliefs about the world, an agent should aim to gather information that seems likely to result in a profitable/worthwhile change to its current preferences between alternatives DIV - given current beliefs about the world, an agent should aim to gather information that seems likely to widen the 'gap' in preference between alternatives An example of an empirical effect that EVSI does not explain: in a previous empirical study on human information search behavior, humans tended to gather information with negative valence (e.g. reading low star reviews on an online marketplace) about options that they already believe to be inferior to other available alternatives. An EVSI-based model (...at least, a naive one) would not predict this behavior because accessing information with negative valence, about already unfavorable options “cannot reverse the choice, while information with positive valence about inferior alternatives can”. Possible entry-points for discussion (feel free to answer your favorite subset of the questions below, and to introduce new questions into the conversation):1) Compare the (predictions of the) DIV and/or EVSI models against your own intuitions about how you make choices based on online rating information. 2) Any qualms with the empirical evaluation of DIV and/or any of its underlying assumptions? 3) Model limitations: can you think of any interesting choice information search contexts where you might expect DIV and/or EVSI to be very poor models? Why? 4) What potentially relevant features/considerations (if any) are missing from the paper and its framing of the related work it cites? 4.b) Do you feel the models discussed in this paper adequately consider the potential role of trust and other social information, in guiding human inferences and decisions? 4.c) Slightly related, but mostly a tangent... it might be fun to try applying some of these models to online environments where the subjects being rated/reviewed (either explicitly or implicitly) are humans, and the user is tasked with making choices in this context. 5) What are your thoughts on the overall approach this paper takes? The paper addresses its questions via a bounded rational analysis (see: www.dectech.co.uk/publications/LinksNick/FoundationsTheoryAndMethodology/ten%20years%20of%20the%20rational%20analysis%20of%20cognition.pdf): an attempt to characterize a task and the goals and constraints of an agent faced with this task, to specify optimal behavior given this characterization, and to compare observed human behavior against this specification of optimal behavior. 6) How would you edit the paper’s “Design Implications”, “Alternative Explanation”, and/or Limitations sections? Feel free to make additions, deletions, or revisions. Also, feel free to delete entire sections if you’re so inclined (and justify this choice however you feel fit).
|
|
mkery
New Member
Posts: 22
|
Post by mkery on Apr 17, 2016 23:46:24 GMT
Much of what this paper attempted to model matches my intuitive understanding of how we use reviews (their modeling choices were informed by marketing literature as well). We spend more time on negative reviews across products, for example, to understand if those defects are mitigated in the best alternatives. For example, recently my cat destroyed all of my dishes, so I was searching online to quickly replace drinking glasses. One (poor alternative) review said that “poor packaging, some arrived broken” leading me to check in better alternatives that they did not share this flaw. Several things this paper does not address: The experiment measured a choice between several near-identical cameras. How does comparison between only categorically similar objects relate to these models? For example, a clothing website will show you “similar items” which may be various dresses and skirts, for example, that may only have direct comparison on a few dimensions. Also, how do these models cover difficult choices? Over the winter I was purchasing a litter box for a recently adopted cat. All alternatives had terrible reviews. This necessitated comparing “badness”, and the shape of the distributions of ratings to discover the least terrible of a necessary evil. When faced with distributions of ratings that are all low, what strategies do people use to make a choice?
|
|
|
Post by anhong on Apr 18, 2016 1:55:53 GMT
I think it is interesting that people tended to gather information with negative valence (e.g. reading low star reviews on an online marketplace) about options that they already believe to be inferior to other available alternatives. Besides DIV, I think another reason for this is that people are always trying to find evidence to confirm themselves in decision making, and are generally blind to the ones that conflicts with their existing perspective. I'm not sure how online decision support tools can help with this.
As for ratings on human, Uber and Lyft are good examples, where passengers and drivers rate each other after a trip. However, these ratings are often not comprehensive, in that it did not provide the categories it was rated on, also the number of ratings that resulted in the overall score. For example, when I see a driver with 5 stars, I will think of it as a new driver, since it's extremely hard to get all 5 stars. Maybe showing how long this person has been driving or how many trips might help with this specific scenario. However, this is already used in ratings for Amazon, etc. So I'm wondering why Uber don't provide this information.
|
|
|
Post by kjholste on Apr 18, 2016 3:51:39 GMT
....Over the winter I was purchasing a litter box for a recently adopted cat. All alternatives had terrible reviews. This necessitated comparing “badness”, and the shape of the distributions of ratings to discover the least terrible of a necessary evil. When faced with distributions of ratings that are all low, what strategies do people use to make a choice? Another tangent: - In addition to the above question, a more general question: does anyone else regularly use rating distributions to guide online purchasing (or similar) decisions? If so, how? And (if applicable) how might your observations relate to the arguments presented in this thread's paper? - Also, can anyone think of any literature focused on human decision-making when presented with representations of distributions over online (or more generally: human-generated) ratings? The shape of the rating distribution (when available) is usually one of the first features I attend to when making purchasing decisions like these. From previous introspection, I think I usually make decisions in roughly the following way (in reality, I may be using much subtler features of the shapes of distributions): If distribution is bimodal: If the product otherwise seems promising, I might examine reviews around each peak of the distribution, to try to infer contextual variables that might cause bimodality... then decide whether these contextual variables are ones I care about (e.g. in the case of litter boxes, maybe a quick skim of positive and negative reviews will suggest that the litter box is poorly designed for kittens, but a good choice for older cats). If distribution is weighted towards lower ratings: If I already have a negative view of the product (e.g. it seems poorly designed), I might take a look at some negative reviews, for entertainment purposes. Also, if I'm relatively unfamiliar with a certain type of product, I might check out some negative reviews just to familiarize myself with the domain, and figure out what design features I should watch out for when looking at other alternatives (as Mary Beth mentioned). This seems consistent with a more complex EVSI-based model, and the authors note that Chater et al. made a similar argument: If distribution is weighted towards higher ratings: If I already have a positive view of the product, I might initially avoid looking at the highest ratings, because I worry that these reviewers may not be trustworthy (for a variety of possible reasons). I may first look at 4 star ratings instead of 5 star ratings, followed by very negative ratings. If distribution is near-uniform: I might just continue searching. I could try looking for contextual variables, as I might in the bimodal case, but I suspect I'm unlikely to discover anything useful without spending an unreasonable amount of time reading reviews. ...Well, if the distribution looks like it might actually be trimodal (e.g. peaks at 1-star, 3-stars, and 5-stars), rather than uniform, I might be curious.
|
|
|
Post by kjholste on Apr 18, 2016 4:01:04 GMT
I think it is interesting that people tended to gather information with negative valence (e.g. reading low star reviews on an online marketplace) about options that they already believe to be inferior to other available alternatives. Besides DIV, I think another reason for this is that people are always trying to find evidence to confirm themselves in decision making, and are generally blind to the ones that conflicts with their existing perspective. I'm not sure how online decision support tools can help with this. This is an interesting possibility, and raises another question for discussion: "Is confirmation bias necessarily distinct from DIV? Is DIV a model that predicts confirmation bias?" Linking back a bit to question 5, about the overall approach this paper takes: You might be interested in this paper, arguing that confirmation bias can actually be understood as the optimal strategy for testing hypotheses (under certain assumptions): cocosci.berkeley.edu/tom/papers/seekconfisrational.pdfThe abstract is below:
|
|
|
Post by JoselynMcD on Apr 18, 2016 21:27:09 GMT
I think it is interesting that people tended to gather information with negative valence (e.g. reading low star reviews on an online marketplace) about options that they already believe to be inferior to other available alternatives. Besides DIV, I think another reason for this is that people are always trying to find evidence to confirm themselves in decision making, and are generally blind to the ones that conflicts with their existing perspective. I'm not sure how online decision support tools can help with this. As for ratings on human, Uber and Lyft are good examples, where passengers and drivers rate each other after a trip. However, these ratings are often not comprehensive, in that it did not provide the categories it was rated on, also the number of ratings that resulted in the overall score. For example, when I see a driver with 5 stars, I will think of it as a new driver, since it's extremely hard to get all 5 stars. Maybe showing how long this person has been driving or how many trips might help with this specific scenario. However, this is already used in ratings for Amazon, etc. So I'm wondering why Uber don't provide this information. I thought the same thing as anhong when I considered why humans would seek out low-star reviews (a time-suck and wholly unnecessary) once they had already decided to move on from that product or idea. I think the cause is related to DIV, but also related to self-esteem maintenance in humans. I was reminded of all the times I've known people who have talked about a narrow escape from a bad situation or bad relationship. I think we are built to seek out opportunities where our decision making self-esteem is reinforced. From the other readings this week, we know that human don't like making tough decisions, so it would make sense that our brains seek out opportunities to feel competent at doing so. Perhaps, in our field, this insight could inspire design principles for tools/platforms that require rapid decision-making to have some sort of "wind-down" interaction that reinforces that the user has made good decisions up to that point.
|
|
aato
New Member
Posts: 16
|
Post by aato on Apr 18, 2016 23:42:08 GMT
Re: 1) Compare the (predictions of the) DIV and/or EVSI models against your own intuitions about how you make choices based on online rating information.
I *really* rarely ever look at online ratings of things because I find it extremely stressful. When I do look online, I think I would fall in the DIV model camp because I'm struggling to make a choice. However, this again, is only when I'm desperate and more when I am looking for something new (vs. trying to differentiate between known options).
For example, when I moved to Pittsburgh last summer, I really wanted to eat at new restaurants every week. So I sometimes browsed Yelp and sometimes browsed TripAdvisor trying to 'gather information that would widen a gap' but I was also simultaneously discovering those things that I needed to widen the gap between. And the reason I find this stressful is that it's really impossible for me to discount negative reviews. So for a restaurant that is extremely highly rated but has one hugely negative review, I can't ever forget about that and I end up wanting to go nowhere. So for me the model is more like having a basket where I'm picking fruit. And once the basket is full of fruit I keep examining each piece and if I find even a tiny bruise or blemish I throw it out of the basket until nothing is left and I'm sad and I have to eat a piece of fruit off the ground where I threw it and now I'm sad about eating the dirty fruit.
|
|
|
Post by Anna on Apr 18, 2016 23:49:58 GMT
I was kind of surprised the paper didn't bring up trust and authenticity in online reviews, and how that might have affected participants' preferences. The authors said they couldn't fully explain with either DIV or EVSI why participants would have a preference for 4-star over 5-star ratings, regardless of whether they were examining the best alternative or an inferior alternative. My first thought was that you often can't trust 5-star ratings of products because you never know when it's a planted review by the seller. I'm much more skeptical of a 5-star review than I would be of a 4-star review. And I actually think I might be more skeptical when those 5-star reviews are longer because they often feel a little too oily to be true. (The researchers kept the length for all reviews to between 100 and 150 words).
|
|
|
Post by stdang on Apr 18, 2016 23:58:21 GMT
I think one of the limitations of the study in attempting to sculpt an ecologically valid task, they failed to recognize and account for the different types of information embedded in reviews. To some degree, by excluding reviews that draw comparisons, the study intentionally excludes any means for participants to make a decision based on any absolute criterion. While this is likely a natural decision making method for many purchases, it does seem to leave out the most natural process that people might follow for some purchases like appliances or vehicles as evidenced by the popularity of consumer reports. Additionally, the prevalence of robot generated and false reviews likely leads to the heuristic of ignoring the top reviews (as mentioned in the discussion). Thus in reality, due to an unfactored heuristic latent in the data, the participants were likely operating on a compressed information scale of reviews from 1-4 instead of from 1-5.
|
|
|
Post by francesx on Apr 19, 2016 2:04:06 GMT
I agree with what MaryBeth says, in that I find myself as well looking at the ratings when I want to buy something online. Just the other day, I was picking a birthday gift from Amazon for a friend, and for two similar products that I was trying to make a decision on, the ratings (4 and 5) helped me make my mind (after reading for a while the product with 4 ratings). I think I mostly do this with Amazon, and rarely with other sites. Maybe because (as Ken mentioned earlier) Amazon pays more attention to ratings: it gives you an overall value, number of people who reviewed, as well as a distribution of the reviews per rating (1-5) [plus all the comments and answered questions).
|
|
|
Post by Amy on Apr 19, 2016 2:39:05 GMT
This plays back to confidence in decision making. I agree with Jocelyn - we look at the bad reviews of things we know we don't want to feel confident in our decision to not want them.
I think the number of comments or ratings an item receives is important, but I also think there's a difference between "overall satisfaction" ratings and more specific ratings. For example, on Amazon I pay much more attention to the ratings for if clothing runs small or runs large than to the comments. I don't know if that's related to trust, or more related to what factors I care about for that particular product.
|
|
judy
New Member
Posts: 22
|
Post by judy on Apr 19, 2016 2:49:52 GMT
I use Yelp pretty often. I used to rely heavily on the rating system, but I notice that now I balance reading yelp reviews with "expert" reviews from local papers and magazines. Side note: I have a friend who's written over 2400 Yelp reviews. Based off of her Yelp reviews, she got a regular gig writing food reviews for the LA Times. But, as Anna said, how do you decide what reviews are credible?
It's far more difficult to rely on reviews for items that are not evaluated by "experts." For example, I bought a couch online last year. It was a big purchase. I looked for "top 10 couches under xxx $'s" lists, but there were not many of those! I ended up buying from a well-known, name-brand store. On that store's site, I found myself reading reviews trying to decide, "how much is this person like me? Am I likely to have the same issues as this person?"
|
|
|
Post by jseering on Apr 19, 2016 3:25:52 GMT
There are of course lots of ways to evaluate credibility of reviews. I find myself in a similar position to Alexandra, where I tend to be preoccupied with overly negative reviews. It's a little creepy in this context, but I'm inspired here to think about Fannie's work on biofeedback. Could we evaluate reviews better if we could see skin conductance readings or brain waves for the people who posted the reviews? I don't think I could fake being particularly angry about a restaurant for long enough to write a convincing negative review. I'm imagining a (yes, creepy) future state where when we're all fully bio-integrated with all of our devices the honesty of our reviews is judged by biofeedback signals captured by our devices, and they're only posted if we can ~prove that we're really feeling what we're writing.
|
|
|
Post by fannie on Apr 19, 2016 3:59:30 GMT
Thanks Joseph for pointing out an interesting application for the biofeedback work! We’ve been talking about using it as a signal of honesty, but that people might try to control their biofeedback in deceptively (look like they’re paying attention by thinking hard about something else) so false reviewers may also try to abuse it in that way. On the other hand, it could make a rater seem “human” and therefore more relatable (and potentially trustworthy). Also in the broader decision-making realm, it could potentially be useful to help reduce the stress that comes with decision-making by monitoring stress levels and cognition. Like others here I rely a lot on online ratings for lots of things (Amazon, Yelp, AirBnB, etc..). Amazon has the whole “top positive review” and “top negative review” which I find useful. In terms of rating/reviewing people - there are also some signals for reviewing the reviewers themselves: like Judy said, someone who might have written 2400 reviewers, or someone who bought similar things as you/seems similar to you (actually in terms of similarity/synchrony biofeedback can also be interesting here..). Uber and AirBnB have an interesting set up because both the reviewer and the person providing the service gets rated, so the credibility will be checked on both sides.
|
|
|
Post by judithodili on Apr 19, 2016 4:15:48 GMT
I *really* rarely ever look at online ratings of things because I find it extremely stressful. When I do look online, I think I would fall in the DIV model camp because I'm struggling to make a choice. However, this again, is only when I'm desperate and more when I am looking for something new (vs. trying to differentiate between known options). For example, when I moved to Pittsburgh last summer, I really wanted to eat at new restaurants every week. So I sometimes browsed Yelp and sometimes browsed TripAdvisor trying to 'gather information that would widen a gap' but I was also simultaneously discovering those things that I needed to widen the gap between. And the reason I find this stressful is that it's really impossible for me to discount negative reviews. So for a restaurant that is extremely highly rated but has one hugely negative review, I can't ever forget about that and I end up wanting to go nowhere. So for me the model is more like having a basket where I'm picking fruit. And once the basket is full of fruit I keep examining each piece and if I find even a tiny bruise or blemish I throw it out of the basket until nothing is left and I'm sad and I have to eat a piece of fruit off the ground where I threw it and now I'm sad about eating the dirty fruit. Very interesting - I ALWAYS look at reviews for everything before going anywhere or purchasing anything. Unfortunately, my spirit is not very adventurous, and I really want someone to just tell me what to do (most times) so the idea of going to a restaurant and looking through the menu stresses me out. I'd rather have you pick the restaurant AND tell me what to try (thank you aato), or have a reviewer tell me if the place is worth going and what is worth trying there. I guess I fall more in the BSM camp, which pretty much echo's the maximizer/satificer paper.
|
|