Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Andreas Weigend (www.weigend.com) Data Mining and Electronic Business: The Social Data Revolution STATS 252 June 1, 2009 Class 8_2Summary: (Part 2 of 2) This transcript: http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Corresponding audio file: http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.mp3 Previous Transcript – 8_1 Ads: (Part 1 of 2): http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_1ads_2009.06.01.doc To see the whole series: Containing folder: http://weigend.com/files/teaching/stanford/2009/recordings/audio/ Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 1 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Andreas: We started off by talking about how communication costs have dramatically changed the world; that we can hear, basically only using TC/ IP, and we can reach the world with Ustream. We can be very open and free. People can see us and interact with us by potentially posting stuff here. That bidirectional communication, which has been possible because of the costs having basically gone to zero, is what we are exploring here, in the Social Data Revolution. Going back; twenty or thirty years ago, the task was to connect computers. That was infrastructure play and it was about hooking up the back office computers with each other. Think about airlines, about number crunching, and I think it was the President of Intel who said that he believes the main use of computers would be to store recipes; he couldn’t really think of many personal computer uses. It was about connecting programmers to computers, at best. Then, it was connecting front office people, people who work in the airline office, for instance, to the computer. What really changed is that in 1994 we suddenly all got connected, essentially, to the back office. Looking at the next class, I want to emphasize the Mars landing, which must have been in 1998; it was a big event because more people were watching it on the web than on television. In 1998, people were still thinking that the web was just a better television. Think about MySpace and Facebook and how dramatically that has changed. Facebook and MySpace is not just better television. It’s really about interaction. It’s about Nokia’s slogan of connecting people – my apologies to Orange. Here is the first distinction I want to make. List versus stream, when we say connecting people do we view the stuff we get from people as list, being defined as something we need to work through, or do we see it as a stream; you stand in the streaming water and you might grab a fish. If you don’t grab this fish, you could grab the next fish. That’s one of the differences; the streaming, real time search on Twitter that we discussed two weeks ago is very different in terms of expectations to what we had before. Leaving a trace and not making us feel bad because we haven’t gotten to it is one of the things we really think about differently from how we thought about it a few years ago. 0:03:28.7 So many people have their laptops out. If some want to use the EtherPad to convince ourselves that it actually works, and drop down some of the things as I sometimes do when I’m listening to somebody else; it gives me the feeling that the technology is not failing us, here. It does not require registration. It seems open to abuse. It very rarely happens. That’s one of the good things we learned on the web. What are the expectations? In email the expectation is that it gets read. Where do we have the expectation that it doesn’t get read? I think tweets have the expectation Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 2 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics that nobody reads them. That real time aspect in tweets, in this communication, is very unclear to me. Where does real time really matter? What is important is the difference between Twitter and Facebook. Facebook and other predecessors are symmetrical. Facebook, at its heart, is about distribution. The “share this” button is the essence of Facebook. Think about newsfeeds; it was about distribution. The crutch of chronology or the “sorry state of relevance,” as Ray calls it in his blog post, is an example where people have not yet mastered machine learning to actually make that stream a little bit more interesting. There is sharing and news feed. Photo tagging is a beautiful example of social data. I tag somebody in a photo and that is social data that is very different from what we expect the web to be. Why do people do it? You had some slides on that; why do people do this? I think it’s self recognition. People want to be immortal. If I tag somebody on a photo or I get tagged, maybe that survives me. People actually want to be funny. It’s interesting; when Facebook asked people, 8 out of 100 people said they made comments because they wanted to be funny. I think none of us want to be funny, but it’s an interesting element. It’s not about giving attention, but about getting attention. It’s this reciprocity. I give you some attention so maybe you will give me some back. The next step here was that we go beyond connecting people, beyond trying to measure the attention stream of people, to connecting data. As we were driving here today, John said, “You had somebody talk about the semantic web; I thought it was dead?” It actually has been declared dead many times. Still, some people are doing it, hoping that if we have the right metadata, then someday we’ll be able to actually have those data connected to each other and become smart. I am one step behind that. I am a big believer in that people should do what people do well, and computers should be doing what computers do well. What we call the Social Data Revolution means that we share, and the mind set is where the revolution happens. How people look at data differently and what they want to get back in exchange for the data they give is, for me, the revolution. It’s about consumer behavior and expectation shifting. Do you remember the Coca Cola example, where there were two guys who had a fan page on Facebook? A couple of years ago, a company like Coca Cola would probably have decided to sue them for trademark violation. When I was teaching my course at Berkeley, somebody was worried about using the term BART because he would be using the logo without having permission, the Bay Area Rapid Transit. How has that changed? 0:07:24.7 Interesting companies like Coca Cola have realized that by giving those couple of people a video camera and a tour through the office, they were getting much more bang for the buck than they would receive from negative press. That whole thing about putting up things in the public that you don’t like, for instance, at the Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 3 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics conference somebody put up the list of attendees publicly saying, “These are all the rich people who can actually afford to go to this conference,” and nobody could do anything about it. He then put up the letter from the conference organizers saying, “Please, take down this list,” and doubly enforced that they tried to keep it private. That shift from the private to the public certainly has happened. To me, the Social Data Revolution is a shift in the individual’s expectations towards what they can get in exchange for sharing data, sharing data about themselves, and sharing data about relationships to others. The plumbing is here. Knowingly and voluntarily sharing data, that is where the new generation of data sits. It’s on MySpace, on Facebook, on dating sites, on Craig’s List; people actually realize that knowingly and willingly sharing data is where to go. It’s not sniffing the digital exhaust or going through the digital trash. We are using the EtherPad here. I love peoples’ creativity here. Is there anything I should know about? I see all of you laughing here, but I don’t quite know what’s going on. Are there any comments? This was the first part where I just set the stage before we move to the applications of the Social Data Revolution. [Laughter] Can I get some attention please? [Laughs] It’s not about the plumbing. It’s not about smart algorithms. It’s about getting people to share stuff. How do you get people to ultimately share stuff? It’s by giving them [0:09:59.7 unclear] in return. I now want to explore and review with you what that means for companies. I’m not thinking about Facebook or MySpace. I’m thinking about companies like Coca Cola, Fortune 500 companies, or smaller companies; what does it mean for them? I want to investigate one of the four Ps of marketing, which you remember are Product, Placement, Pricing, and Promotion. I want to investigate one of them, namely the P for product. Who knows the product best? It used to be that the company that produced the product knew the product best. If you have a Nokia phone, Nokia engineers who think hard about it and then push it to the market, as they pushed tires onto the market before; those guys thought they really knew the product best. When you look for Nokia Map Activation, none of the top links lead to you Nokia. All the top ten links on Google, at least when I checked it, led you to other sites where people know more about the product than Nokia does. Ultimately, Google really knows more about the product by indexing, storing, and searching the web, than any company does. 0:11:48.1 Earlier this week, I heard Steve Ballmer Microsoft’s CEO, give a great performance talking about Bing. Do you know what Bing stands for? It’s a new search engine. Bing Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 4 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics is not Google – funny. What we have learned is that there is so much information that people voluntarily and knowingly share that if we incorporate this into the process of product marketing, we can do an amazingly better job. At the heart of it, it’s the bidirectional communication. It’s that feedback loop, which is so fast. I am always surprised, and I think the same is true for pretty much everybody, to come up with any company or product you are thinking about – your JVC camera there – and do a Google search or search on Twitter. I was just at Zuni Cafe two nights ago. The Maitre d’, the manager didn’t know about Twitter. I showed him what people had been tweeting in the last half hour about Zuni. That real time element that people often don’t realize how much they’re being talked about or how much their enterprise is being talked about, or how much their product is talked about. That is part of the Social Data Revolution for me. One question is should the company provide the platform? One example is www.mystarbuckidea.com, which is a site that had 60,000 ideas by people about anything from pricing to the atmosphere in the restaurant, to Wi-Fi and so on. 60,000 people share stuff, vote for stuff, and discuss stuff. Isn’t that the dream of any marketer, to have people truly engage with your brand, without even paying anything but just to share, vote, and discuss, and see. When I added this up, the number of people was about 60,000 to 70,000. That’s a platform that is hosted, in this case, by Starbucks. An alternative example is www.getsatisfaction.com, a company here in San Francisco who advocates people powered customer service. They are a neutral platform. They call themselves the “Switzerland of customer service.” If people have a question, they can ask that question and usually get it answered. They have company representatives sitting on that. Another neutral example is a website called www.flatseats.com, which is about business and first class seats. Things are described in amazing detail, by people who actually sit in those seats, not by people who write the marketing materials that we all love, but by people who actually endure the 14 hours to Singapore from here. They share their experiences. What are their incentives? Maybe it’s contributing to the community. It is very interesting that this site is sponsored by a company called Skytrax, which in the business of aggregating information or the airline industry. They saw the opportunity in getting people to actually contribute to their site, by not intermediating it but by keeping their space and giving the information back. You probably heard about that Yelp problem; apparently for a certain fee Yelp would help you by making not so positive reviews disappear. There is a question about whether or not that should be happening. 0:16:16.9 Does this help anybody to bring this up, or do you see it anyway – the EtherPad. I’m not sure whether I want to see it. [Laughter] I will leave that for your entertainment. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 5 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics The point is that these platforms, which help people with product development and product marketing, can be organized and sponsored by the company. Starbucks is an example or they can be neutral like FlatSeats or GetSatisfaction. FixYa is another example of an Israeli company; if you have any tech problems you have a chance of actually finding the answer there. The product isn’t the message, but the perception of the product is the message. Think for a moment about how that has dramatically differed from how traditional advertising used to be. In traditional advertising, you don’t try to sell stuff to people. You don’t tell them that the product really rocks. Geoffrey Miller explained in his half hour podcast on my blog that you make them believe that you will have certain qualities that other people will see in you if you have that product. You wear that cologne, have that car, or wear that diamond. Traditional advertising actually is not about the product; it’s about what other people think about you when you buy the product. That’s a very interesting twist. Now, I would say the perception about the product is the message. Moving towards marketing, I want to spend the next 15 minutes or so on what I think truly is the core of data mining and e-business. This is how recommendations have changed. Amazon.com makes 10% to 20% of its revenues, depending on the product group, through recommendations. I don’t know whether you have some figures from Yahoo or from other companies, but I think low double digits is pretty common if you have a decent recommendation engine. On the one hand that’s amazing; 10% to 20% of the bottom line is just from having a little piece of code that figures out what to show to people. That’s about the best bang for the buck that I could get. On the other hand, if you think about how people don’t know what they want; people don’t know what they get and people don’t know how things are devolving. It’s not surprising that they can be influenced by recommendations. I want to give two examples from Amazon and then show you how things have developed. We talked about “people who bought X also bought Y” and “people who clicked on X also clicked on Y,” and “people who clicked on X eventually bought Y” as three different ways of using the same algorithm which is similar to what we talked about before today. The good thing about this is it’s a good example for a simple algorithm, where the problems are to get the data and to make it scale. 0:19:44.4 The other example from Amazon is social recommendations, “Share the Love.” Again, voluntarily and knowingly sharing information that I just bought that book with a friend, and in a perceived fair way, getting something exchange like if that friend buys it within a week I get a 10% discount and he gets 10% off. Amazon has sold another book. That was a pretty good idea that Amazon had about 10 years ago. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 6 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics In some way, what we learned here about leveraging the social graph; Amazon did. What is the difference? Amazon did it in a private way. If I forward that link or if I tell Gary about me having bought that book, you won’t know about it. How this has changed in the shift from private to public is that if I post on Facebook that I just bought this book, then all my Facebook friends potentially get notified, if there happens to be this crutch of chronology in their little window where they see what I just did. If I do it on Twitter, it’s there for eternity because it gets saved, indexed, and people will know five or ten years down the road when you apply for a job that you bought that book on How To Sue Your Employer, or whatever the book is. I’m distinguishing now the architecture of interaction, which is C-to-C and the architecture of sharing, which I call C-to-W or customer to world. Let me give you a few examples of C-to-C, customer-to-customer. Those of you who remember; that term used to be used for eBay, where people buy things from each other. Purchasing stuff is really a small part of the social interactions we have. How often do we buy stuff? We may buy breakfast, maybe lunch, maybe dinner, maybe a bus ticket and that’s about it, during the day. Contrast this with the interactions we have in a less [0:21:54.0 unclear], more an intentional exchange way, our orders might be more. An example here is Skydeck. If you give them your phone number, your carrier, and your password to the website, it shows you insights to your calling behavior. That’s very interesting. It suggests actions; that dear friend of yours in L.A., did you notice that it’s been you who has been calling him in the last couple of weeks? What does that mean? Maybe it’s time for a month’s free subscription on a dating site. Those are data that are passive data. Rapleaf, a company I mentioned earlier today, is another example of a company that uses the C-to-C data. It looks at my social network data, across social networks. It figures out what kind of people I hang out with. In German we say, [0:23:03.3 unclear], of “birds of a feather flock together.” If my friends tend to be frosters, maybe that insurance should not be super interested in me. If my friends are all super clean, then no worries. Interestingly enough, if there are wrong data about you, unfortunately, you have no way of fixing it. That pretty much sucks. If somebody makes himself an accountant – I’m not suggesting this – let’s say Jay [0:23:29.0 unclear] and he does interesting things, Rapleaf may indicate that this may be the same guy that worked at Fox, which might cause John a problem. It’s not clear how we can fix that. He knows the people. John can fix many things; breaking into computers is one of them. For normal people, that’s not easy. 0:23:51.8 There are good things about it; it helps companies save money, insurance can use their scarce resources to investigate the right claims. On the other hand, it can cause people a lot of trouble. One of the open things that remain is how do we Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 7 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics deal with wrong data about us? That’s very difficult; I don’t know. How do we deal with aging data? In this C-to-C area, we briefly talked about a redefinition of customer value. When we talked about customer value, sometimes called customer lifetime value if you bring the time component into the game, we asked first of all “what do you mean by customer value?” Is it the value the company has for the customer, or is it the value the customer has for the company? Flip the arrow sometimes, as in not asking a carrier what they can do for the customer, but what can the customer do for the carrier? How can the customer help the carrier? What is the value of the carrier for the customer is a very interesting, new way that evolves from this bi-directional communication, which is very democratic now. We learned from our Facebook friends, from Eric Sun and his talk, that it’s no longer this notion that you try to find the influencers. With the threshold of interactions being so low now, it’s so easy for anybody to propagate stuff and distribute information. The notion has come to make it as easy as possible to show you an attention stream, to give you ways and to allow you, with lightweight interactions, to become influencers for those few things you care for. In my case, and I your case, it’s audio – as you found out on Facebook last night. You love to spend too much money on audio. That is very interesting, in C-to-C, customers talking to customers, it’s not customers to the world, yet. When people talk about architecture of interaction, which three years ago at DLD, I called it web 3.0 because web 2.0 was participation. Web 3.0 was for interaction. It’s that we leverage the graph of connections between customers that could be phone, Facebook, email, and the underlying hope is that people rub off what their neighbors say, in some way. How does one do this in an advanced way? I could specify coupling constants between me and my friends. I really trust Ron Chung. If Ron does something well, I want to get .9% of the credit he gets. If he screws up, I am happy to take .9% of the blame he gets. If we allow people to attach metadata to the connections, then you can actually learn much more about the graph than that sorry state of binary connections that we live in right now. 0:27:12.7 It’s also interesting; from the perspective that maybe everyone trusts Ron, but maybe Ron doesn’t trust anybody. Maybe he is a trust sync, maybe he’s a trust source. If you give people metadata, which could be reflected in them making the same purchases he makes getting higher prioritized than the purchases from somebody you don’t trust and aren’t willing to couple your reputation to. That would be an example of getting people to create metadata so their life, their feed, their stream is more interesting than otherwise. That was C-to-C and I think that is about 5 years old, or something like that. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 8 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics What has been happening in the last year or two is this C-to-W, customer-to-world, when Tim O’Reilly, maybe five years ago, coined the term “web 2.0” and used it for architecture of participation? I prefer the term architecture of sharing because participating and sharing may be different. I think the sharing aspect is more important for me than participating. The key feature is the shift from private to public. I want to use two perspectives on this. First, I want to use the perspective of the individual. Identity has become socially constructed, more than ever. How much time do people spend in thinking about the self marketing they do on their favorite social network? That’s how people socially construct their reality. On the one hand, it’s really what they would like to say; on the other hand, if they say something that isn’t true, some friends or maybe not so close friends would quickly debunk that. When we had Reid Hoffman in class, he made the point that the resumes that people post on LinkedIn tend to be much closer to the truth than what they post on some drop site, it made total sense to me. If I see that my friend claims to be something that he wasn’t, the least I would do is send him an email and say, “Typo, or what happened?” It’s social pressure or an alignment of this social reality, but yet allowing us to constantly recreate it by which photos we put up – at Enrique’s birthday party last Saturday, the question was, “Is this a good picture for Facebook?” I still have the same Facebook picture I had up when I created my Facebook account, which shows that I don’t care all that much about my picture. With that marketing, should I be in the picture with somebody like Gary is in the picture with Mark Zuckerberg, is very interesting from the social construction of self identity, in relationship to others. From our homework, from our questions about how the notion of relationship has changed, by potentially knowing whatever other people are doing, I want to put it like this; who knows Robert Scoble? That’s a bad example. Let’s say Tim O’Reilly, who has heard of Tim O’Reilly. Who is popular, Oprah? [Laughter] If you follow Oprah on Twitter, you probably know more about Oprah than you know about your mom. Do you know what your mom had for breakfast? Probably not, but you may know what Oprah had for breakfast. That’s quite interesting, to see how our notion of friendships and relationships have changed so much. That is the individual perspective. 0:31:58.4 Now, let’s talk about the company perspective. On the one hand, there are amazing new possibilities. I’m going to give you two. JetBlue listening in on Twitter, when people use real time search, and that’s where real time actually is an important element. They search for United or airline, or complaints, and having agents who say, “We can help you out here; that wouldn’t have happened at JetBlue.” The other example is BestBuy talking to customers who are unhappy with competitors. The third example is Comcast, where people complaining about Comcast get attended to by a special department at Comcast. Those are opportunities where you can actually listen to customers, and market yourself to them very differently from this broad, mass marketing that you are used to. Somebody just had Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 9 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics a problem with his refrigerator. He is more open to refrigerator buying than if you put a beautiful GE ad in the newspaper. There is risk, on the other hand, and a standard example is Kryptonite. There was a funny video out there where somebody showed how to pick a Kryptonite lock. I see people looking it up, now. Kryptonite decided the blogosphere had nothing to do with them. They had a significant hit in revenues because people felt if someone could pick my lock, why buy a Kryptonite lock. Between those two, the CEO of Continental Airlines was told by his PR department that he should look into this blogosphere thing. He told them, “Whoever wants to come, I’ll treat them to a tour through the back offices here.” 280 people showed up, bought their own ticket, and got the tour. That was a big PR effect. The best example of all PR effects ever was the $1 million that Netflix never had to pay out for the Netflix contest. That was C-to-W, the consumer-to-world and how people post a blog entry about Kryptonite, the last element and on that I think we are not here yet is how can we model situations. Besides C-to-C and C-to-W; what models can we build of situation and modeling? Women’s shopping behavior is significantly different from men’s shopping behavior. I just had a phone call this morning with a company in the U.K. that does apparel recommendations. Any study ever done, internally or externally, shows that women shop differently from men. One example, it turns out, is if men buy an expensive item like a car, they are done with it. Ads for cars don’t get picked up. If women buy an expensive item like a car, it turns out they go back and back and think about what other car they might have bought instead. That was in a study that was done at the University of Michigan a few years ago. That is psychographics, knowing that it’s a person, the person is a male or a female is certainly one element that Facebook and MySpace knows quite well. That’s not situation. What is situation? Situation is trying to understand which situation somebody is in. For instance, on the mobile, how is he moving or is it on a mobile at all or is he on his PC at home? What is the bottleneck then? 0:35:53.2 I think attention is the bottleneck. How can we get the person’s attention and how can we retain the person’s attention? The real time interactions triggered by what’s happening at the moment is still not well understood. We have understood the C-to-C, and the social part. We have said it’s pretty much out of control. I think we are understanding the C-to-W by finding out what people are thinking and then hitting them at that moment. Modeling the situation is certainly a machine learning problem where richer data sources, and for me the mobile is the richest element of creating data, will actually have some promise. We had a question on the survey of Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 10 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics whether you think if your mobile is not used the right way in marketing, how it would influence your relationship to the company. All of this is really about how the expectations of people are changing. There are three revolutions we could say have happened, with the apologies to the French people in the room, and Russian people etc. The first one was the Industrial Revolution in which we could transport energy. That meant that by transporting energy we could place the factories not necessarily where the coal was or where the energy was, where the windmills or the water was, but wherever other things were, where the resources were. The Information Revolution is about transporting bits. That meant we no longer had to be physically where the bit generation was. There was a famous war in America, I forgot which one, but it had ended already and the people in the war didn’t have word that an armistice or treaty had been reached. Does anyone know what that war was? Perhaps it was the Civil War. In this war, people had already agreed that it was over but people were still fighting because bits were tied to atoms. What is the Social Data Revolution? What is it for you? What is now movable? Is it reputation? Is it the creation that is the important part? Is it the distribution that is the important part, or the interaction? What do you think? What is at the heart of it? We already have the energy moving. We have the bits moving; bits move quite well out of the classroom. What is really at the heart, for you; this is actually a question for you now. What is at the heart – we have a Facebook page for Social Data Revolution, which is actually super well going. We have all kinds of stuff. We’ve been talking about it for 3 times 8 hours. What is it, for you? To help you, I will talk for two more minutes and then we will have a discussion for the last ten. I want to give two perspectives that might be useful. First, I am a physicist as you know; I got my PhD in physics. One of the most important things in physics is time scales. Danny Kohneman at DLD said, “People have time scales of seasons; of weeks, as in you should rest every seventh day; of days; of hours.” Interestingly enough, the Greeks actually divided the time between sunrise and sunset into a fixed number of intervals every day, as opposed to we who have a more global time scale of hours. We have this really high frequency stuff of extremely transient, real time stuff that we know what somebody did just a few seconds ago. 0:40:12.1 When does real time matter? Real time is possible; there is no question about it. When does it matter? It matters in the case where we want to hit somebody up that is in a certain situation right then and there. It matters if we want to learn about what’s cool, what’s hot, or whatever the right word is. More important than the real time element, I think, is the feedback element, the product marketing element. We can throw stuff out and see what’s coming back, the experimentation. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 11 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics The other way I want to look at time scales is biology versus technology. I don’t know how old or [0:40:53.0 unclear] our DNA is, ten thousand? What’s a good estimate here? I don’t know. How old are you, ten thousand years, one hundred, one thousand years? I’m not good at these things. How old is our DNA. What’s the half width of DNA? If you take my DNA, ten thousand years until half of it is swapped out? It’s a reasonable sum. It should be more than a thousand. The people a thousand years ago looked pretty similar to us. Student: It depends on how it’s preserved. Andreas: Let’s say it’s ten thousand years. What is the time scale of technology? It’s an order of one year, whether you take Moore’s Law or the amount of data created this year, it’s about one year. That’s about four orders of magnitude different. That’s damn interesting, where we may be limited by biology and fool ourselves in the belief that we’re not limited. Continuous partial attention is one of the examples where we are genuinely able to embrace the technology. When I was coordinating with Enrique today about the food for the reception, we were wondering how we would have done that fifteen years ago, pre-mobile phones. How could I reach you if I couldn’t send you an email saying, “Here is another idea”? That’s one way to think about it, about the Social Data Revolution is about. The other way is to think about economics. For me, I’m not an economist. Economics means three things. One, we can measure things. Two, we focus on measuring meaning like what the costs are, or the hidden costs that we have. Three, we know what is scarce and what’s abundant. The economics of attention means that we understand where things shift, things that used to be difficult to get to, like the examples I gave you; they’re now abundant. What has become scarce now? What are the bottlenecks? For the sender, it’s become trivially easy to create stuff? No, creativity is still a bottleneck. It’s become trivially easy to distribute stuff, C-to-W. For the recipient, it’s pretty difficult to select and to consume. What I want to ask you is do you have any thoughts about what you learned here, about this meta level, about data, about social data, about the Social Data Revolution. I want to open up the last five or ten minutes for discussion to learn from you. My question is what is it for you? What did you learn? Where did the bits flip? 0:43:51.3 At the beginning of the quarter, I was telling you that my objective was to have you think about things differently. My metric is how many bits flip per class for you. I want to know, from you, and you can write it here, maybe you can write it on the You Stream or whatever works. I want to understand what the bits that flipped were. How do you think about things differently now than before? Is it privacy issues that we will talk about on June 10? Where have you changed the way you think about the world, about your identity, about what you share with the world, about relevance? Where did something change for you? Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 12 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Student: You talked about how you thought real time search really isn’t worth that much, but then you talked about how important promotion was and knowing real time feedback. I was wondering about the discrepancy in those two comments. Andreas: I didn’t want to over hype real time search. I gave some examples where knowing what is good in a restaurant, for instance, if you are somewhere right now, right there, that evening is certainly interesting. What is the opportunity cost? By looking at stuff in real time, you don’t look at stuff in the broader time horizon. I just want to make sure that we understand the tradeoffs, given that the day is twenty-four hours and our attention is limited. I didn’t want to put down real time search, in general; I just don’t think it’s the solution to everything. By the way, real time search really has two elements to it. There is the real time element, which Ray and I are not so sure if that’s the best thing since sliced bread, but the search element is an important one. That makes real time extend into the past. If you search for something on Twitter, which is not that frequent if you search for your favorite rapper. The question is how do things change? If things don’t change the real time information is probably useless. At one startup, I was at ShockMarket. We realized that the statistics we had were just too low. A super smart guy, J.C., built the [0:46:45.5 compilation] and the board, put the pen down and said, “This is why I am now quitting.” A guy who went to GSB said, “Great! New things every day.” We have [0:47:00.2 unclear] random fluctuations and people will look at it every day. That shows the two world views, whether you’re interested in the deep, underlying structure such as what do we learn about stocks, or whether you’re interested in the random fluctuations. In finance you have both, people constantly look at how tickers change, and other people are interested in seeing how things change in the long term. Your question was about real time search and its role. I don’t think we know its role yet. I think like any technology, it’s not just a better or faster search; it’s a different search. In generating content, which may get picked up by real time search engines, they may change the way, like search engine optimization made a whole cottage industry of companies that help people actually get better Google, Yahoo, etc. rankings – not real time. What other comments do you have? Do you think differently about recommendations? Do you think differently about your own data? Do you think differently about what companies should do with their data? 0:48:23.6 Matt: I sort of got more excited about recommendation stuff. I think a couple of speakers who came had interesting points; the amount that you can glean from really simple analysis on a lot of data is almost as powerful as what you can glean with very advanced analysis. There is definitely a dimension where … returns to what degree of analysis you do on Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 13 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics data. That is sort of encouraging in that it’s not that hard to play with stuff. I had a lot of fun doing stuff with Twitter and Delicious and all that. Simple is good. Andreas: The fact is those data just didn’t exist five years ago, there was no Delicious. There was no Twitter five years ago. There was no data to mine. The fact is it’s still kind of difficult on the You Stream today, to get data. Part of this experiment today was that Enrique thought it would be really interesting to see, and Ron did the work as always, if we give people the chance to say stuff in real time, what is it that they do say. It’s very difficult for me to concentrate at this end and at the same time watch the screen, so I don’t know what they said. I’ll look at it later. Those data sources – it’s probably just that we wonder how we lived without mobile phones; that in a couple of years people will say, “How did we live without Twitter?” Recommendations – many people get stuck in this because at Amazon all this was done way before I was there, “People who bought X also bought Y,” and they think that is state of the art. There is no social component to it. In some way, explaining the way, you have this situation to do it but if you look at [0:50:11.2 unclear], that means that you are in the high end camera range. But, there is no C-to-W component to it. There is very little of a search component to it, all these things that we now have. There is almost no behavior component, no gender component, and still it makes them 10% to 20%. Student: I think what I got out of this class was when the LinkedIn people came, I thought that was the best one. These short term... fooling around with the data got a lot of the short drives and algorithms that … more long drive is the rare thing. It was taking this data mining work and thinking of it as an active process, instead of just looking at the static data, doing something, and not interacting with people. Andreas: Yes, the notion of data mining really has changed. I said in the first class that it used to be given a set of data; what insights can you get. It’s now given a problem; what data can you get. That’s what I said in the first class. What I’ve learned in the last eight weeks is it’s not only that. It’s also how quickly you can actually interact with the data. What questions can you ask? Thinking about it; PHAME framework is a powerful framework and it’s not about data mining; it’s about doing experiments. 0:51:51.9 For you, P stands for problem, what really is the problem. Hypothesis is if you think about cognitive psych, maybe people value more something in the future, maybe they value more now. M is for metrics; what are they? A is for the different actions you could take, and E is for running the experiment. There is no explicit data or data mining in the PHAME framework. What you said, that it is interacting with data is one thing. It’s also interacting with people, and that’s the experiment element. Nobody thought about that when they talked about data mining. If you live in genetics, where things take a while to change, Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 14 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics you can’t do this. If you live on the web and in the mobile space, that’s where you have these amazing opportunities to really interact with stuff on a very fast time scale. For me, the interaction part is the most exciting one. That’s why I went to Amazon. Student: One thing that really flipped for me was around instrumenting the world. It’s so easy to get caught up on the data because it’s so exciting, but you’re ultimately bottlenecked by how much you have instrumented. That was a good awareness point for me, particularly around areas like healthcare, where there is so much that could be done but there isn’t yet the instrumentation, necessarily, around all the things you would want, and the records are piss poor. That’s one thing that jumped out for me. I enjoyed it. Andreas: So healthcare, for me, we all carry our mobile phones with us. Potentially, they could measure an incredible amount, about how we walk, how we talk, how we feel, how we interact with our friends, which probably have huge predictive character about whether we’re going to have back problems, or whether we’re sleeping enough. I was preparing class at 4:00 in the morning last night, with Ron. It is probably not good for our long term health. Fitbit is one of the things we tried to get but unfortunately, they’re not ready. Look at our mobile phones. I’m not sure; does Orange have any initiatives in the healthcare world? Not just getting healthcare records together, but to use the phone as a device for constantly collecting stuff? Student: [too far from mic] Andreas: That’s an interesting question because if your health insurance has access to those data, or your car insurance knows your blood alcohol content before you get in the car, maybe you’re not that comfortable having those data collected by your mobile. Student: Since you mentioned healthcare, I’m mentioning education. They aren’t allowed to use social networks in the school. The bit that flipped for me was when you originally started with the premise that optimization happens with the volume of use. I began to see that potentially, sponsors that could solve some of the funding problems in education, by sponsored content, could have a better relationship because it could have a monitored relationship with education instead of being seen as evil. When you mentioned that social networks create a different relationship with the sponsors, it opened a door for me that said that potentially, we can solve education problems with sponsored content. 0:55:43.0 Andreas: The notion about sponsoring stuff – due to a hundred years of being mass marketed to, where marketing dollars were primarily spent on bad products; you need to tell people the good things about the bad product. It’s probably carrying over into education. I don’t know how much sharing is going on there. I would expect that a lot of sharing is going on in the educational world. Google Docs etc. is probably pretty standard in classes, now. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 15 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics We will continue at the reception, but before doing this I am going to acknowledge money we received from John’s company, MySpace, which is paying for the reception - not only for that, but also for an award. Having realized at the [0:56:42.2 unclear] conference, last week, that video is a new medium that many people are using, here is what I am going to spend $1,000 on. I ask you, and many of you have a mobile phone that is a camera or something. It doesn’t matter what the quality is. I sent out the email, which hopefully by now is reaching more people than before. Come up with some short, funny video that you create. I am giving a lot of talks this year. If I use it at least once, I will give you $200 bucks, for the first five videos. That’s $1,000, 5 times $200. It should be somewhat related to social data, the Social Data Revolution, to marketing, to the stuff which we’re doing, recommendations. It doesn’t really matter to me what it is. Maybe you want to interview your parent, or your parrot, or your RA or your prof. I don’t know what. That’s really up to you. I will give you some examples here of a couple of videos. I don’t want to influence you. Think about it. It doesn’t have to be in English. One of my talks is in Milano. I want to get some cool stuff, maybe 30 seconds or a minute. Quality is not primary. What do you think would be stuff you could use in a talk? I suggest you upload this. There are 20 different video formats that Facebook takes. You upload it to our Social Data Revolution site, and in order to give us 24 hours to look at it before the last class, I need to ask you to do this within a week, actually, by Tuesday next week at noon, so I can get a feeling about what’s happening. Please, do me the favor; Facebook things tend to disappear at some stage, like my first videos did. Send an email to this www.video.aweigend.gmail.com with a link. If it’s less than 10 MB you can attach it so we have it in one space. That’s what I wanted to say about the $1,000. It’s not much, but still, I think $200 for a 30 second video is not a bad hourly wage. I have three things to announce. Thanks to MySpace, to [0:59:21.3 unclear] Fox. Thank to Enrique who got the food. We now have food, some punch, and an equally attractive non-alcoholic beverage. I was an RA when I was a grad student. It’s waiting for us in the lounge in the Sequoia, in hall in the STATS department. If you invited some friends, they will be there. If you still have some other friends you want to bring over, I think there should be enough for all of us. I know 40 people RSVP’d on the Facebook page. It’s just an opportunity for us to talk a little more, and for you to tell me more privately what you think, what you liked, what you didn’t like. If you have questions about homework, Ron is here. Ron does this totally voluntarily. He is actually will not be in the last class, so I want to really say thank you, Ron, for having helped out in so many ways. [Applause] You certainly have a lot of social capital accrued here, from me, to help you with the startup, and probably from the students, as well. 1:00:29.2 For the last class, privacy is the topic. We will have Cynthia Dwork, formerly from IBM Research, and now with Microsoft Research. If you have time beforehand, you should watch a half hour video of the IBM [1:00:44.3 unclear] 2008 workshop. I got to know her a little bit. We shared a taxi in San Diego once, but that’s really the first time we talked Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 16 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics about stuff. She is extremely smart, a cryptologist by training, and understands about the risks of sharing data. Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/ Page 17 http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc