【英文版05】Todd Yellin:Optimize the User Experience: A Case Study in A/B Testing, with Netflix

【英文版05】Todd Yellin:Optimize the User Experience: A Case Study in A/B Testing, with Netflix

00:00
06:01

【Background】

【背景介绍】


今天,我们很荣幸地邀请到了Netflix,网飞公司的产品创新副总裁托德·耶伦,他将用实际案例向我们展示,如何利用AB测试的方法,去探索用户真正喜爱的节目。


Netflix started out as a more convenient alternative to brick-and-mortar video stores like Blockbuster. At this point, most of those stores no longer exist and Netflix is competing against Amazon, Hulu, and more for control of the streaming video space.


说到网飞,你一定不会感到陌生,它出品了《纸牌屋》、《超胆侠》等众多大热的美剧,一直以来都保持了高质量的制作水准和关注度。在事业起步阶段,网飞作为更加便捷的视频租赁提供商,替代了像百视达这样的实体录像带租赁公司。在这之后,绝大多数出租实体录像带的商店都销声匿迹了,而网飞的竞争对手,也变成了亚马逊和Hulu这样的IT企业,他们在流媒体领域展开了激烈的争夺。


What sets Netflix apart is its unique approach to user experience. Beyond just acquiring great content company has focused most of its efforts on connecting individual viewers with precisely the content they’ll enjoy, at a level of specificity heretofore unseen in the industry. Todd Yellin, VP of Product Innovation, explains the core principles by which Netflix continually refines its user experience.


网飞是凭借什么脱颖而出的呢?答案就在于它为用户提供的独特的使用体验。这家公司并不仅仅满足于获取优质内容;更重要的是,它把绝大多数资源聚焦在一件事上,那就是,把每一位观众,和他(她)所喜爱的内容精确地联系起来。这种极具专门化的定制服务,在整个业内称得上前所未见。今天,网飞的产品创新副总裁,托德·耶伦将向我们解释公司在持续优化用户体验方面时所秉持的核心原则。


【Course】

【课程】


Optimize the User Experience

优化用户体验


Big data has been kind of a cliché in Silicon Valley for the last few years, big data this, big data that. Big data is really one big mountain of garbage with little gems buried it in this tremendous trash heap and you want to find those gems, you really want to find out what’s going to make the experience better. Once you find those gems what it does is it doesn’t make it a more alienated, you know, more machine experience, it actually makes it a more personal experience; it becomes much more about the individual member.


这些年来,在硅谷,大数据这个词听起来就像是某种陈词滥调,这里也是大数据,那里也是大数据。

事实上,大数据就像是一座庞大的垃圾山,在这座巨大的垃圾山里,埋藏着一些极小的,又极为珍贵的宝石;只有找到了这些宝石,你才能把用户的体验做到更好。


一旦你找到了这些宝石,你就会发现,这些珍贵的数据能够提供的并不是充满距离感的机器体验,而是更具个人化的用户体验。换句话说,用户体验变得更加关注每一个个体用户。


Actual Behavior Trumps Qualitative Research

观察现实行为,胜过进行定性研究


So, there’s a lot of ways we do research on Netflix to keep on improving the experience. We do a lot of qualitative research and then we do a lot of quantitative research. More important than quantitative, call it behavioral research. So in other words, qualitative research is really about what people say they’re going to do or what they say they like. And that’s helpful to give ideas or fuel us with hypotheses about what we want to test, it also helps us refine what we have, but it doesn’t answer the questions about what really people want. What they want is more reflected in their behavior. What answers the core questions about what they’re going to do we don’t find out until we do large-scale A/B testing.


所以,为了不断优化用户的体验,我们使用了许多不同的手段来对网飞的用户进行研究。我们做了大量的定性研究,又对此进行了许多定量分析。不过相比之下,我们发现比定量和定性研究更重要的,是研究用户的行为。


换句话说,定性研究探索的是人们口中所说的内容,比如他们宣称自己打算怎么做,或者他们宣称自己喜欢什么东西。这确实能为我们的测试带来新的想法,或者给我们的测试提供一些新的假设,让我们知道如何改进现有的服务。但是,这类定性研究始终无法回答一个问题,那就是:人们真正想要的是什么?


人们真正的需求,体现在他们的实际行为中。


用户究竟打算做什么?在我们在网飞开始进行大规模的AB测试之后,我们才能够真正回答这个核心问题。


So what that means is we might have an idea. We might have an idea for, let’s change the whole design for how people interact with Netflix on the TV. Maybe we should have more rows. Maybe we should forget about rows. Maybe we should have big hero images. Maybe we should use trailers. Maybe trailers are too long. Maybe we should use short montage videos. So we try all kinds of things like that. And so what we’ll do is we’ll do double blind experiments with 300,000 Netflix members who are brand-new; we’ll get a new Netflix experience that has all these video montages instead of regular box shots, and based on that that will help them choose what they want to watch. And then another 300,000 people will be randomly chosen at the same time and they’ll get the existing experience. And they don’t even know they’re in a test, they’re brand-new. We think that new idea about video montages is a better way to go, but we’re not sure.


比如说,有时候我们会产生新的想法。比如,我们可以彻底改变人们与网飞在电视上进行交互的模式。也许我们应该提供更多的栏目选择?也许我们应该彻底抛弃栏目这种东西?或者,我们应该建立更多的英雄形象?我们可能需要使用预告片;但是,说不定预告片又会显得太冗长。做一些短视频剪辑,说不定会是个好主意?这些想法,我们通通都尝试过。


如何来判断这些想法的好坏呢?我们会安排三十万网飞的新用户进行双盲测试。比如,人们可以通过观看视频剪辑,而不是常规的定格画面来挑选自己想要观看的节目,这是一种全新的网飞体验。与此同时,另外的三十万名用户,也会被随机抽取出来,他们使用的是现有的观看模式,这样就可以作为参照。这些人完全不知道自己参加了测试,他们完全都是新用户。


在我们看来,使用视频剪辑来吸引用户的方法似乎更好,但是我们并不肯定。


So we let the test run for six/eight weeks and we see based on behavior, not what people say, but what they actually do. Are they retaining better at Netflix? Are they, in other words, unknown it to them, voting with their wallet and realizing they’re getting more value out of Netflix maybe with those video montages? Or, are they abandoning at a higher rate? Are they watching more? Because in the end, it’s all about delivering value and the more they watch on Netflix, the more we know we’re doing a better job of giving them more value for their monthly subscription fee.


所以我们会让这个测试跑上6到8周,在这期间,我们观察用户行为,也就是他们的实际行动,而不是他们口中所宣称的说法。我们会看,用户对网飞的粘性是否增加了?或者说,在不自觉的情况下,他们是否在用钱包投票,并认为他们从这些网飞剪辑的视频中获得了更高的价值?还是说,因为我们使用了新的策略,更多的用户开始流失了?他们是否观看了更多的节目呢?


追根究底,我们想要知道网飞是否为用户提供了价值。只有当用户观看网飞节目的时长越来越长,我们才能确信,我们为这些每月支付订阅费的用户们提供了更多价值。


Common Sense Might Be Nonsense

你以为的常识,可能是胡说八道?


So constantly, and this is always amusing, you think you have a great idea. You think it’s going to work. Then you A/B test it, you put it in front of users and you see, does the idea at work? And sometimes it doesn’t work. And it’s taught us a bunch of things, this testing. It’s taught us things that aren’t as obvious as they seem. For example, the first thing we came in that we thought about, I’ve been at Netflix over ten years, is we thought, if you really want to connect someone to a perfect movie or TV show, if you just understand their age and their gender, that will really help. So in other words, if I know it’s a 71-year-old woman watching versus a 19-year-old guy, well that solves a lot of it because then I’ll know put the 19-year-old guy in front of action superhero TV shows and movies and put the 71-year-old woman in front of very sentimental female-driven romantic comedies, whatever. That’s clichéd and actually it doesn’t work.


说来有趣,你常常会觉得自己想出了一个绝妙的注意,而且你觉得它肯定能行。于是你就用AB测试的方法来验证它。你把这个想法放到观众面前,然后看看,观众会怎么做?说真的,有时候你会发现,自己的想法不一定是对的。不过,不管结果如何,AB测试总能让我们学到很多东西。我们会发现,事物并不像表面看上去那样显而易见。举个例子,我在网飞工作超过10年了,一开始,我们认为,只要我们知道了观众的年龄和性别,我们就能准确地为他推荐他喜欢的电影或电视剧换句话说,如果现在有两位观众,一位是72岁的女性,而另一位是19岁的男性,那么我们就会非常自信地为这位72岁的女性推荐多愁善感的女性向的浪漫喜剧,然后为这位19岁的男性推荐超级英雄的动作剧。但事实上,这些你自以为是的做法根本不能解决任何问题。


Because what you quickly learn is these superficial traits about people, age, gender, even where they live, they don’t really matter. The most important thing that matters is, what do they actually watch? Yes, there are 19-year-old guys who will watch reality shows about wedding dresses. There are 70-year-old women who will watch Daredevil, a violent blind superhero action show we have on Netflix. So don’t stereotype, what we learn is, pay attention to what people’s actions are.


这是为什么呢?因为你很快就会发现,用户的那些表面特征,诸如年龄、性别乃至居住地,对于分辨用户的喜好而言并不重要。最重要的是这些观众到底看些什么?


没错,在19岁的小伙子当中,同样有人喜欢看有关婚纱的真人秀;而在70岁的老太太里,也会有人喜欢看《超胆侠》,要知道,这是网飞平台上一部充满暴力场景、讲述盲人超级英雄的动作剧。所以,我们应该忘记那些刻板印象,转而去关注人们的实际行动。


Man and Machine, not Man vs. Machine

人类携手机器,而不是人类对抗机器


The way we do things at Netflix, it’s very much a collaboration of people and machines. If we just had these very sophisticated, very mathematical algorithms doing everything we would make a lot of mistakes. We have a big team at Netflix of people who are just tagging all of our content, watching everything we have, understanding all the content. So basically, we can use all that human knowledge about all the content, we can use all the little diamonds in the rough of the big data and we can put all that together to make a better experience that’s better for each person.


我们在网飞工作的方式,很像是让人类与机器携起手来,通力合作。如果我们放手把一切交给机器,去依赖那些非常精密、纯碎由数学决定的算法,让它们去完成所有事情,我们肯定会犯下不少错误。所以,在网飞,除了机器之外,我们同时拥有一个庞大的,由人组成的团队,这些人的任务就是去为我们所有的内容打标签,观看我们的每一档节目,理解我们全部的内容。只有这样,我们才能更好地将我们对于内容的深刻理解,和我们在海量大数据中发现的真正有意义的数据结合起来,为我们的每一位观众提供更好的用户体验。


【Summary】

【总结】


在这节课里,托德·耶伦结合了实际案例,向我们展示如何利用AB测试的方法,去探索用户真正喜爱的节目。


Big data is an enormous amount of trivial information that is only valuable when mined properly.

Use big data wisely to create a more organic, personal user experience.


他告诉我们:人们常说的大数据,其实只是海量的琐碎信息。只有经过合理的挖掘,这些信息才能真正发挥价值;如果我们能够明智地利用大数据,那么我们就能创造出更原汁原味,同时也更加精准的个性化用户体验。


现在,我们再来回顾一下这节课的要点: 


Actual Behavior Trumps Qualitative Research

1.用户的实际行为,比定性研究更有用


Qualitative research is about what people say they like.


This approach can be good for generating new hypotheses or refining ideas, but reveals nothing about actual user behavior.


Conduct double-blind experiments to evaluate how different user experiences affect user behavior. Where are users deriving the greatest value?


定性研究所取得的信息,只是关于人们宣称他们喜欢些什么的信息;


假如你想要获取新的假设,或者是优化现存的观点,这些方法都会有所帮助;但假如你想了解用户在现实中的行为,定性研究就显得无能为力了;


通过AB测试,我们可以了解不同的用户体验是如何影响用户的行为的。也能够帮我们搞清楚,用户究竟是从什么地方获得最大的观看价值的。


Common Sense Might Be Nonsense

2.你以为的常识,可能是胡说八道


A/B testing often reveals user behavior that is not obvious or even counterintuitive.


AB测试常常能揭示出用户的行为真相,他们可能并不十分明显、甚至违背你的直觉。


Man and Machine, not Man vs. Machine

3.人类应该与机器携起手来,而不是进行对抗


Mining big data and algorithmic decision-making must be blended with human innovation and creativity to be effective.


无论是进行大数据挖掘,还是使用算法进行决策,这些机器的工作,必须与人类的创造力结合在一起,才能发挥最大的效果。


感谢收听,我们下集节目见!







以上内容来自专辑
用户评论
  • 林沐说

    到了考听力的时候了

  • Athink9慧如

    什么情况?我已经购买过了!

  • nikkkor

    哪有字幕?

  • 孟欢精华读书

    谢谢分享

  • 冯亦慈

    哇,还有中文字幕耶,再也不怕听不懂了

  • 我也来听听听

    【 Scalers 五一五天英语听说训练营 】 Day 4 完成人: Helen 1 、复习前半部分知识点(主文档) brush yourself off: remove dirt from clothes with hands plant : put sth firmly on in a particular place a firm face plant fall on one's face a compelling reason out of stubbornness