A/B testing / survey results analysis

Chegg snt124
Oct 15, 2018 13 Comments

i saw a glassdoor post with this question and thought i'd ask smart people on blind how they would go about tackling it:

A user satisfaction survey was conducted for two groups for a social media platform. Assume large sample sizes.

Group 1 opted into security features
Group 2: has not opted into security features

It was found that user satisfaction (with the overall app) with group 1 was 30% lower than with group 2.

Why do you think so?

What can we conclude?

Should we recommend eliminating this feature?

I think it's quite obvious that there is a huge sampling bias here. The sampling is not random, and there is an inherent bias for people who might opt into security features that will make them less satisfied with a social media app.

I was told to consider stratified sampling, and I've read up on stratified sampling, but I'm not sure how it really helps us answer the main questions such as "should we recommend eliminating this feature?"

side question - how does stratified sampling help us in A/B testing/experiments? is it just to help us ensure that our samples are truly random/balanced?

Thanks!

comments

Want to comment? LOG IN or SIGN UP
TOP 13 Comments
  • Microsoft pMqT16
    From the group of users who didn't opt-in, could you turn the security features on anyway as your treatment group, and leave the control group alone? I'm not sure how obtrusive the feature is. Forcing multi-factor authentication on people who don't want it, for example, seems like something that would really anger some users and isn't likely a good experiment.

    It seems like you really needed to have some satisfaction score for users before they had the option, and then see the delta for those users after they were presented with the option to opt-in.
    Oct 16, 2018 1
    • Microsoft pMqT16
      Or I guess really it's that I would have had a treatment group that was presented with the option to opt-in, and a control that wasn't. If the opt-in option has 100% exposure already, I'm not sure what I'd do. Hopefully someone else has a better idea. :-/
      Oct 16, 2018
  • Bloomberg / Eng iVX372
    Seems like a fairly obvious conclusion to me that average app satisfaction rates are lower with the security features enabled. Security features (e.g. multi factor authentication) are intrusive, and likely to frustrate users when enabled.
    Oct 16, 2018 2
    • Chegg snt124
      OP
      I believe this really omits biases we might have. We can’t make any strong inferences let alone causal inferences with such biased samples
      Oct 16, 2018
    • Bloomberg / Eng iVX372
      If you know what security features are being A/B tested here and that there aren't any confounding variables, then it seems obvious that enabling the security features causes the dissatisfaction. The reasons for that dissatisfaction can't be inferred solely from statistics, unless you have a survey that directly addresses them. As long as you're familiar with the security features, it should be possible to infer the reasons for the dissatisfaction in lieu of a survey.
      Oct 16, 2018
  • Chegg snt124
    OP
    Sorry I but I believe you cannot turn on those features. The user has to opt in or else you cannot turn it on
    Oct 16, 2018 1
    • Bloomberg / Eng iVX372
      Depends. Some platforms require it
      Oct 16, 2018
  • Chegg snt124
    OP
    Upon thinking about it, how does this alternative experiment sound:

    Because of the strong biases that might live here such as the inherent characteristics a person who opts into safety features might have, we cannot make any inferences about the feature.

    Instead, we can randomly sample users into two groups and give group A the option to opt in (note we are not forcing them to opt in because we can’t do that). Group B does not get this feature. We can then observe different metrics and ask re survey.

    We can also do stratified sampling with the users to make sure the users who are biased aka the users from our initial experiment group A are balanced out in our new experiment into both groups.
    Oct 16, 2018 0
  • Along with opt-in bias of sample another bias needs to be considered is survey respondents itself. Is the sample size equal in both groups? It could be that dissatisfied users respond at higher rate and more negatively. So was 30% satisfaction drop normalized or absolute?

    My opinion is surveys are generally a bad tool for causal inference. A better tool is designing randomized control trials (online a/b tests) that track product usage metrics to make a causal inference.
    Oct 16, 2018 0
  • Uber BrettK
    What they mean by stratified sampling is generating a synthetic control. Even if you do this though, you cannot determine if users opted in because they were unsatisfied or opting in caused them to be unsatisfied. You can never be sure you've removed all of the selection bias.
    Oct 16, 2018 0
  • KPMG / Mgmt
    Bad_Kitty

    KPMG Mgmt

    PRE
    LinkedIn, McKinsey
    Bad_Kittymore
    Can you gather more data about these users? For example, product usage data. Make segments across the entire sample population (opted in/out) of security and then compare satisfaction by security/no security within these tight usage segments. If there is still a difference, redesign your security features until the CSAT gap is eliminated.
    Oct 16, 2018 0
  • Northrop Grumman / Eng exp(iπ)+1=
    So for the stratified sampling it's just saying you can take subsamples from each of the two groups and then recompute the satisfaction score. Averaging the results would reduce the variance on your estimate since you only have one observation in the case described above. Whether this helps with sampling bias I'm unsure. But it could turn out that there is no significant difference in the mean after you resample.
    Oct 16, 2018 0
  • Intel Ia64
    I think the data does indicate that users are not happy with the feature. Rather than turning it off completely, you can try to see if you can break the feature down into small tasks and see you can narrow down which part of the feature is least valuable. Once identified, perhaps try to rework the ux and explanation of the feature rather than eliminating it. If after these steps the user satisfaction is still low then at least now you have fine-grained data to justify your decision to remove it.
    Oct 16, 2018 0

Salary
Comparison

    Real time salary information from verified employees