Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business Andreas Weigend (www.weigend.com) The Social Data Revolution: Data Mining and Electronic Business: MS&E 237, Stanford University March 29, 2010 Class 1 Overview: This transcript: http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc Corresponding audio file: http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.mp3 To see the whole series: Containing folder: http://weigend.com/files/teaching/stanford/2010/recordings/audio/ Course Wiki: http://stanford2010.wikispaces.com Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business Andreas: Welcome to MS&E 237, the Social Data Revolution. This term was coined a year and a week ago when we were thinking about what we should call this phenomenon, that people now knowingly and willingly share all kinds of data about themselves, about their relationship with others, data the KGB wouldn’t have gotten out of them under torture. If you think back in history, I think we could say there were three revolutions. The first revolution was enabled by people being able to transport energy. That was the Industrial Revolution. What happened there is that people were independent from where the coal was, or where the water was, and they could basically start manufacturing wherever they wanted. Then people learned how to transport bits. That meant the knowledge creation was independent from where the knowledge production was, and that of course gave rise to the Information Revolution. What has happened in the last few years, I think, is that in addition to being able to communicate on the distance, to move the bits, the creation and sharing of those bits has become very, very easy. For instance, last Tuesday, there was a group here at Stanford called Quantified Self. A friend of mine, Kevin Kelly, who was the founder of Wired magazine, he runs that group; it’s amazing how many devices some people carry with them, where they record all kinds of things. For instance, the Fitbit is a box which you carry with you and it tells you whether you sleep enough, most of us probably don’t; whether you walk enough, and exercise enough, probably most of us don’t either. The question is why do we need such a device? Ultimately, it’s about changing behavior. What really happened with the Social Data Revolution is that it’s not only that the technology is there, but people do things differently now, and not only young people. I talked to Danny Kahneman, the Nobel Prize winner in 2002 for Behavioral Economics. He was a teacher at Princeton then. He said, “Andreas, it’s amazing that what people now have when they apply for a junior faculty position is more than I needed when I got tenure.” He said, “It’s just because of this people sharing stuff.” I call it “glue programmers” versus “industrial strength programmers”. I’m a glue programmer. I take building blocks, put them together, and hope it works. Physicists are all glue programmers because we want stuff to work EE (double E) and I know that some of you are people who don’t worry about the deep underlying structure as long as we have something which works, as long as we need it to work and if it breaks afterwards it breaks. What I’m doing in this class, this is the first time that we’re teaching it, here as MS&E 237. I want to have it much more project focused than I had the previous one, which was STATS 252, which I’ve taught for 6 or 7 years. Who here has heard about STATS 252, which I used to teach? It’s about a quarter of the people. It was more problem-set focused, where we had to build a recommender system in Delicious using Python and stuff like that. This one is much different in that at the end of the quarter I want 10-12 groups which all have pretty much startup quality ideas in the area of people and data, of social data, of shared data. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business 0:04:18 In order to get there, I’m also bringing some cool people in. For instance, on April 20, we have the chief product guy of Bitly come to class, and tell us what they can figure out by Bitly because it’s basically the measurement of the world’s attention right now. Who here knows Bitly? That is great. We’re rolling. This is Triptia, by the way. Triptia just came back from India today, fresh off the plane. She is our TA, and as some of you know we have talked, I’ve talked to some other people because given the class size; we actually have some more resources. She doesn’t have to worry about 80 people, but only around 40 or so. Foursquare, geolocation and here - “Hang on while I check into Foursquare, Yelp, Google, MyTown, MySpace, Facebook, and Twitter.” That’s an example of sharing data and two years ago we had a really interesting project where I put the students in class, and I was wearing one of these devices that recorded my location every 15 minutes. I didn’t turn it off. I told them, “You come up with some interesting insights.” That is one of the other areas we want to have, geolocation. We had Foursquare coming, Bitly coming. I have Auren Hoffman who runs a company called Rapleaf. Does anybody know what Rapleaf does? Isn’t it cool, that sort of spy on your social graph. Maybe you have an insurance claim and they say to the insurance company, “Let me check out how her friends are, because friends of a feather steal together. It turns out that her friends are kind of shady. Maybe you should investigate a little bit more resources into investigating that claim, as opposed to her neighbor’s claim, who of course has super clean friends and no worries about it.” You think that’s the future? No, that’s actually practiced. I worked with an insurance company in Chicago that actually uses Rapleaf and those of you in MS&E who think that decisions are irrevocable, commitments of resources; they precisely think about what the cost is of collecting data for that claim and how much they can learn from the social graph for that. We have a bunch of guest speakers, but it’s not a speaker series. I tortured these people well enough before class, so they only say what I want them to say, basically. That is the difference from last year, which was less project emphasis and slightly more of an emphasis on problem sets. We talked about the change of behavior in people, how that has changed. The plan for today is I want to give you about a half hour of the material we will talk about. It’s more like a lecture. If you have questions, do interrupt. Then I want to see what questions you have and I want to give you an opportunity, if some of you have great ideas, please briefly speak up. Class ends at 5:30. I’m a big fan of starting on time and I apologize for the walk today. I’m a big fan of ending on time. Do we need a break? It’s not clear for a 75 minute class, or given that we had a walk, can we go straight through to 5:30 today? Is that okay? I want to make sure I leave enough time for you to ask questions about logistics or whatever you want to know. Not everything has been figured out yet. I just came back from Beijing myself. I did the course at [0:08:31 Xien Xua], a 4-day course on this stuff, so by Thursday we’ll have it figured out. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business 0:08:38 That’s the Social Data Revolution. You can find me relatively easily at www.weigend.com. As I give you examples, I want to show you where I had invested something, where I sort of figured out how something should be, and then how the world has changed. For instance here is my simple home page. When I created this page, I thought it was pretty cool a few years ago, to have a simple box like this where people can simply put in their name or their friend’s name and some email address, and then just hit return and reach me that way. The world has changed. How? If you think about Facebook Connect to contrast this, for instance on my blog, what is the difference between leaving a comment on Facebook Connect versus having a nice box? Both are boxes, a simple implementation. What is the deep difference between those two things, in terms of social data? Jess: I was going to say whether or not all of your Facebook friends can see what your comment is and what blog you’re… Andreas: Right, so if you say your name as well, then I have a chance to actually learn it. What is your major? Jess: Economics Andreas: Okay, Jess’s point is that in Facebook your friends can see it and that actually is a very powerful driver. Frankly, who is interested in just me seeing some random thing they’re putting on there? Very few people are. Why do you think people like to actually have their friends see what they’re commenting on? They might appear smart. They may be “self-pimping”, just to create a reason. One of the survey results from last year was that most people give attention to get attention. Harvard Business Review had an article about 11 months ago on the Social Data Revolution and it’s really interesting to see that most of the articles, most of the comments people left were “I will give attention but please give me some love back.” That’s another element of the course, to constantly think about what has really changed and why is it that the world is a different place now from what it was a few years ago, when it was just pretty cool not to only have an email address but to have a little box where people could write stuff. While we walked, I deflated this from 3 points to 2 points. Does anyone want to share any of the points you thought about, that you want to make sure the other students understand clearly? If anybody has a point from when we were walking, this would be the time to bring it up. This was a point I want to make, Facebook is about distribution. Devin: I’m Devin. One of the points that I thought was important was the sense that people still don’t realize how much of their Facebook data is accessible when they give access to the application. 0:12:35 Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business Andreas: One of the questions is do you worry more about your own data when you share with a Facebook application or do you worry more about your friends getting spammed? I’m not an economist, but there is this notion about capital, about social capital. I think capital is used if you want people to do stuff for you, right? With your financial capital you pay someone and they fix a website or fix slides for you or make you an ice cream or bake bread for you. Social capital probably has a similar purpose in that it helps you get stuff done through other people. What’s the tradeoff here between financial capital and social capital? Can it be measured in the same currency? I think that’s another very interesting question of the Social Data Revolution; what is the unit of capital here. Does anybody else want to share something? Jonathan: For me mobile social in real time is a very compelling real time application… Andreas: I wish I had a white board where I could jot some stuff down. I haven’t used chalk for about 10 years. I learned in this room, actually. I was a grad student here, about the Cuban Missile Crisis, from Bernstein. In those days they still had chalk. What I want to say are three things. The 1990’s were clearly search. Before the 1990’s, people had to tell people where something was. It’s amazing; you had to tell people on an FTP site where you could find something. Then the 1990’s came along, it was search again but social data - people publically sharing links. Page rank was an important element for a good search. Allowing us to actually find stuff was the first big innovation, I remember, after my PhD. The second one was clearly social in the 2000’s. Why? Because of the social filter that we now discover, through our friends. I had one question in Beijing, at [Xian Xua] that you will also get, and that was a question which came from a friend of mine in Singapore who said, “Imagine that Facebook was shutting down and all your data will be destroyed in 2 days. What would you do?” That’s a good question. Most of the people said “nothing”. But then I realized that Facebook is banned in China so I think it was a fair answer. The notion of using the social graph to discover stuff is something which simply wasn’t around before and I believe what the third dimension is, for the 2010’s is clearly mobile. Mobile - what is the device which is always with us, talking to us every now and then? If you’re in a shop you would use RedLaser and it suddenly tells you what that item that sells for so much in [0:16:23] would actually cost at Amazon.com or even better; you can take a picture and use the Amazon app and it tells you what the item is. That is very powerful. I think search has certainly changed the universe. Social has changed the universe, and what we’re seeing here is we are just seeing the beginning of mobile. I haven’t gotten through all of your 80-something forms yet, so I’m surprised about the variety of mobile phones and how few people actually have iPhones. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business 0:16:54 Jonathan made the point of real time. What is real time for you? Right now, in the big history of the universe. I’m a physicist by training and time scales is something super important for us. When we talk real time, do we mean that we know where the attention of the world is right now? Or, do we mean that we’re talking about a day, a week? Product development cycles, if you think about a new logo, it used to be the timescale of months for companies. The packaging for toothpaste takes months to change. Amazon takes minutes to make such changes. One of the things you’ll hear many times in class is the PHAME methodology. Does anybody know what the meaning of PHAME is? P stands for problem. H stands for hypothesis. A stands for action. M stands for metric. E stands for experiment. This timescale aspect has been reduced from months to minutes, by pretty rigorously following this framework. You start with what the problem is. An example is Amazon has a co-branded credit card. It makes the company about $60 million a year. The deal is this; if you sign up for this co-branded Chase-Amazon card, Amazon gets to keep $100 for each customer they get to Chase, and in addition to that, Chase gives them $30 which they pass on to the customer. The problem is how do we get most people to actually sign up for the card. Amazon doesn’t care whether they use them. There might be a second order effect that if they don’t use them then Chase reduces the bounty, but the current game is get as many people as possible to sign up for the card. The hypotheses we had, Jeff Bezos and I discussed this back and forth. There were two reasonable competing hypotheses. One was we give them $30 towards their next purchase, and that means they will come back, repeat customers, it’s engrained to think about Amazon all the time and the $30 voucher, etc. The other hypothesis was giving them $30 now; they’ll sign up now and forget about the future. What would you think would be the right thing for Amazon? Student: $30 delayed would probably promote return customers and that’s really what you’d be hoping to do with a card, a loyalty program. Andreas: Whatever you said I would have said wrong because we haven’t decided on what the metrics are. It totally depends on what it is you’re looking for. If you’re trying to get loyalty you do different things than if you’re trying to go for the quick buck. What was your point? 0:21:26 Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business It’s likely that more people will sign up for it right now, but how many more? 10%? What is the tradeoff? It’s a beautiful example where we ran the experiment where we had two different actions which was just different wording in the two cases, and we ran the experiment where half of the customers, the ones who had an even customer ID, were shown the wording “You get $30 off today. You have $48 worth in your shopping cart. You get this for only $18.” The other customers were shown $30 on their next purchase. It was a factor of almost 5 different, that almost 5 times more signed up right now, as opposed to the delay and even a good company that has a good reputation like Amazon.com, with a factor of 5, how bad would that be for a company that didn’t have as good a brand or as good a reputation as Amazon.com. That ends all arguments. Another example that ended the argument was in about 2002-2003, Amazon.com was scanning the Philippines, a lot of Philipinos scanning page after page, without telling the publishers anything about it. Basically, your computer got busy at night OCR-ing all these pages to our friends in the Philippines for scanning. Suddenly, there were a lot of scanned OCR books available at Amazon.com. What we saw a few years later at Google, the publishers said, “What about copyright?” We anticipated something like this so we had pairs of similar books, similar price, similar page rank, similar Amazon sales rank etc., and one of them we actually surfaced the scanned version. The other one, we didn’t surface the scanned version. For instance, if someone searched for one of my books, they wouldn’t see all the references in those books which were in the hold-out group. We measured what the difference was in sales. Sure enough, it was 7% more sales for those books that were scanned and indexed. That quickly ended the argument with the publishers. This methodology is very much a physics or science-oriented methodology, which now with communication, not just that exists in the Social Data Revolution between people and people, or people and companies, but here from a company perspective it’s now possible. Experiments is what the game is. There, it’s not complicated multivariable testing, but just doing simple A/B testing, just do it. There were a couple of hands up. Student: When you made the comment about 5 times as many people signed up when they got instant gratification, did you track how many returned within a week or a month or a year? Were they one-time-only shoppers or did they come back more and more? Andreas: That’s a good point. It’s easy to measure short-term effects, and it’s quite hard to measure long-term effects because people might forget their customer name, ID, password, etc. I did some long-term studies, not on this one. We did one very good long-term study with a Stanford intern I had a few years ago, Jimmy [0:25:08 Pang] and I ran the first survey ad at Amazon.com. Let me give you some background. [0:25:15], who is a VC here, asked me at a board meeting, “Why do people come to Amazon,” thinking I would do my data mining and then come back with beautiful results for the next board meeting. My view was very simple. I just ask people. One of the questions I snuck into a survey which I’ll give you a link for at the end Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business of the meeting today, is what would you do if you could ask a million people on Facebook or at Amazon any question? What question would you ask them? 0:25:43 The question I asked them was “Why did you come to Amazon.com today?” Then I had another Stanford student, a former intern of mine, manually classify the first thousand, writing a little postscript on the side, which then did the remaining 900,000 automatically. Interesting findings came out of that. Correlating what they say with what they actually do is where the important information lies. They say, “I’m going to buy…” a specific item. Only about a third of the people who say that, end up buying within that session. A session is defined as within that hour. You have many people who say they want to buy something, they want to do something, yet they don’t get to do it. Now you look at what they actually did. It turned out that product search really sucked. You enter “blue pants” because you want to buy blue pants and it showed you books that were titled with “blue” or “pants,” which wasn’t exactly what you wanted. Jeff said, how about ten million, and you do some search companies [0:26:56]. On the other hand, the other interesting thing was if you conditioned the other way around, if you conditioned on people actually making a purchase, and then see what those guys said at the beginning of the session, it turned out that only half of them actually came with an intent to buy. Only one half of the people who actually buy something wanted to buy something. They said, “I’m just trying to kill some time,” or “wanted to see what’s new.” That’s powerful. That’s the power we’re talking about here, the power of recommendations. I want to get through another few things here. The next point I’ve borrowed from Kevin Kelly, who I mentioned before. Kevin is wondering what this connection business is really about. What are we actually connecting here? Again, 20 years ago, the world was about connecting computers. I remember when I used Quicken. I was actually logging into Citibank. You remember trying to connect, and the modem was busy. Eventually, I got through, so it was really a computer talking to a computer. Afterwards, it was connecting pages. The web was not about connecting computers primarily. It was truly about having the data about what objects, what pages were connected. What came afterwards? It was connecting people. This course is really trying to look at what is underlying all that. I would say that underlying it is data. That’s what we call social data. I found an estimate that I made a long time ago, in The Economist, in the special issue that came out about 4 weeks ago, that I sent to people in the class. The amount of data each person creates doubles every 1.5 - 2 years. We take a certain day and the amount of data we create in a day, after 5 years, is about a factor of 10; in 10 years it’s a factor of 100; in 20 years it’s a factor of 1,000; a factor of [0:30:50 10,000] after 20 years. This is the cover sheet of the Secret Police in East Germany had about me. After 1989 when East Germany fell, we could look at our stuff. There were people who were paid to actually collect information about other people. It was pretty thin. I was a grad student at Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business the time, so there was not all that much there. Compare this to what we have now. There are a billion plus connected Flash players. 0:31:43 When I got that screen, allowing Amazon.com web service to access my camera and microphone, of course the answer is yes. Remember it. I want whoever is using web services here to know whether I’m smiling, whether I’m in a good mood, whether I look tired, whether I may be watching something with a friend. I want to be able to actually pick up the quality of my voice because the voice tells a lot about the mood I’m in; then, serve me the relevant ads. What do you think about that? Student: That sounds horrible. It seems like privacy … Andreas: But your time is better spent that way. If you’re ready for a beer in the evening, assuming you’re 21 here, then it will show you a beer ad; whereas if you’re in class it will show you a coffee ad or something. Student: Can I make my own decisions about what I’m going to drink? Andreas: Totally, but what is the scarcest good we have? Our attention. Recommendation systems try to play to attention. The more data you can collect about someone’s state, the better you can serve their attention, the less time you waste. If I see you are slowly nodding away, then I know I’m not doing a super good job. There is actually this trick in class which I sometimes use when I actually have benches, that people have a cube in front of them. Green means yeah, you’re rolling. Red means I’m basically falling asleep, so it’s good to get that feedback. Student: I think my attention is valuable to other people but my attention, myself, I prefer not to be distracted. Andreas: Yes, the cost of interrupt, it’s vastly underestimated. Here, you’re attention is you’re willing to give your attention to whatever is being shown to you. Then we might as well give you the highest ROA, return on attention. I’m just making some points about data here and I’m very much looking forward to the discussion about privacy. It goes beyond privacy. It goes to the discussion about identity. Who are we? Is identity now socially constructed? Presentation of self, is it just what people show on Facebook? An interesting study has shown that people are pretty damn good at managing their profiles on Facebook. People really are presenting that “self” that they want to present. Whoever they are is a different story. Another example was the world’s third-largest retailer called Metro Group. They’re in Europe, Asia. I don’t think they’re in the United States. They have this Future Store near Dusseldorf, Germany. Every single item has an RFI, radio frequency identifier. It costs $.02. It’s about as big as a grain of salt. As you’re going through the aisle and thinking about that cream cheese, “Maybe I should lose a couple of pounds,” and you go for the low-fat cottage cheese version. They just learned something about you. They learned that you’re slightly worried about your weight so as you walk down the aisles - by the way the prices Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business are not fixed. The prices vary as you’re walking along. As you’re walking, more dieting items like Weight Watchers might have arrows pointing to them. I’m not making this up. That’s the Future Store. The arrows pointing to them might say, “Weight Watchers is for you” right now because you just exhibited that maybe that date last night didn’t go so well because the person thought you had a couple of pounds too many. It got them the Big Brother Award, not an award you necessarily want. 0:36:06 Here is another one. I love geolocation so if you’re willing to put a device like that in your car, then you’re only charged when you drive. That’s why it’s called Pay As You Drive car insurance. I live in San Francisco. I was out of town the last month and my car was in the garage. No insurance. What do you think would be the risks here? How would you actually compute how much insurance I should pay driving to Stanford, for instance? Student: It depends on your driving style. Andreas: One might be my driving style. Student: Speed, velocity, what roads you’re taking. Andreas: Maybe it also knows that when I’m starting, it knows when my class starts. Of course, it knows because it knows my Outlook file or my Google calendar, and it knows that I’m pretty damn late so in this case I might be speeding so that would be $40 to actually insure the car; whereas after class I take it easy; unless of course, it’s after dinner at a restaurant which is known for its good wines, where the risk might be going up again. If you can change behavior by telling people this is what it is, then maybe they’ll take [0:37:38] instead. This is unfortunately no longer in business, or fortunately, depending on your view. There were a couple of cases where it’s like a black box in airplane crashes. After the fatal crashes, the insurance company looked at the boxes and the dude was speeding, like on the German Audubon, 200 km an hour. You don’t want to pay for somebody who has clearly broken the law. It’s not my idea of insurance. Another example I got from The Economist, and I don’t know much myself about the example, was a woman 16 or 18 on the East Coast who was anorexic. Apparently she was pretty bad so she had to go to the hospital. Apparently there are two kinds of anorexia. In one case, the insurance company needs to pay. In the other case, the insurance doesn’t need to pay. The moment the girl ended up in the hospital, the insurance company checked out her Facebook profile, became friends with her friends, the same on MySpace, and then made a case against her so they didn’t have to pay. For the insurance company and their lawyers, it was clearly a case of the second kind of anorexia. What do you think about that? How do you know the next guy who is trying to be your friend on Facebook isn’t a cop? Who wants to check out what the pot situation is no campus, or something? We don’t know. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business Timescales are the potpourri of ingredients for the class. Timescale of data and technology is one year. Social norms may be 10 years. Biology, I would say, may be 100 thousand years. Let me ask you, what do these two have in common? 99% of their DNA overlaps. Somebody brought up before the point about cost of interrupt being pretty high. For the last 100 thousand years, we got used to if there is a little bit of information, run [0:40:08]. If there is a little bit of sugar, grab it. If there is a little bit of salt, grab it because you never know when the next [0:40:15] of sugar or fat comes along. 0:40:20 That reflex we have, where if something changes, a shining object, to go after it, is actually a reflex that might not serve us all that well. Brian [0:40:30], who was a grad student with me and is teaching in the Psych Department, has some wonderful studies on FMRI, functional magnetic resonance imaging, where he basically shows that how we react on an ad, meaning what do we buy weeks down the road, is happening in the first second. He knows from the activations how our brain is firing, how the oxygen content is distributed there. He knows whether we’re buying Pepsi or Coke when we go to the store weeks later. It’s interesting how much is really in the biology, what we can’t change, and how much is learnable. I have two more slides here. This one I stole from Mark Zuckerberg, not exactly but sort of in spirit. We have some EE’ (double E’s). I am personally a big fan of EE’s. I started off in EE myself. I learned about channels and [0:41:36] and all that stuff. You learn that the purpose of communication of a channel is to transmit information, these modems. Zuckerberg said unfortunately you got it wrong. It is really that information is an excuse for communication. If you think about it, there is some truth to it. Many of the apps we see are just good examples of excuses to hit somebody up, to just make sure we’re not alone. That means on YouTube, that every minute 20 hours of video is uploaded. That’s social data, shared data. 1 billion videos are watched every day. That’s about a seventh of the world population. That’s a high number. That’s shared data. Twitter, what do you think here? How long does it take to create a billion tweets? A minute, a day, a year, a decade? Right now it’s about a month. What we all have in common is we have Web 0 which is computers. Web 1.0 is pages. Web 2.0 is people. We have all of this, due to the ability to very cheaply create data and to share data. I have a couple of blog posts that Eric Sun, who is now at Facebook, wrote for me in class about 2 years ago. He’s a brilliant guy. He thought about what the effect of all this is, what should we do for communication media now, given that the world has changed, and [0:43:34] have become cheap. I have a couple of things in terms of housekeeping URLs before I jump to whatever questions you have for the last 10 minutes. The first question I have for you, and I just made the slide today. Who is this guy? You probably don’t know him but his name is [0:44:13] Young. He created an amazing survey for us, fresh off the press. Triptia Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business checked it out after she got back from India this morning. It’s a bitly/sdr2010. It should take you about 45 minutes and that’s the first assignment. I hope you will genuinely enjoy those questions. The guy is creative. I told you about one of them, how would your life change in China with Facebook, and the answer was nothing. We also have a couple of visitors coming next week. Norbert Schwartz is a social psychology professor in Michigan. He’s visiting at the Center of Behavioral Studies, and he is the best survey guy in the world. Norbert and I worked together in a startup called [0:45:10] 8 years ago. When he saw that survey, I asked him for feedback. He said he is coming and sitting in the class, which is pretty cool. 0:45:20 Write the URL down if you can’t remember it. It’s bitly - the guy who is coming on April 20th - bitly/sdr2010. SDR stands for Social Data Revolution. It’s case sensitive. 2010 is this year. That’s what I want you to do before Thursday. It’s about 45 minutes, to an hour maximum. Why do I want you to do this? Two things - I want you to think about some of these issues. How society, individuals, and businesses are changes in respect to the Social Data Revolution. We are creating interesting data. As the Dean of the University in Berlin, who will run it there, and [0:46:02] is running it in Singapore, so it will be quite interesting to see the cultural differences, not huge sample sizes, but still 50-100 smart kids. We have a couple of resources. This is my email address. I made a Gmail address, which his [email protected]. Since our TA situation isn’t yet totally stable, we have Triptia but we have certain money for more people, so whoever thinks they have skills, some extra time to deal with the projects mainly of other students, come see me after class today. Right now, MS&E 237 is the only thing you need to know when you want to reach TAs or course assistance. Is this a global, campus-wide shift from TA to CA? Or is this from STATS and Humanities course is TA and MS&E calls it CA? They call it CA, I’ve learned. Then Stanford 2010 Wikispaces.com is probably, for the moment, the wiki that I will use. Who is the person who emailed me yesterday day night that he has set up a wiki, wiki media for his company? Did he make it to class, or did we lose him on the way? My plan is to migrate it somewhere else where we have more control. I know the Wikispaces people are good people but what I’m planning is to have every single class transcribed so you’ll get a transcript of every single class after a couple of days. Then it will be great if we can simply annotate this. I don’t know yet whether this software is something we can easily get done, get installed, or not, so for the time being this is where we’ll update stuff. I’ll let you know there if we move it somewhere else. We did lose a bit of time at the beginning. We have 7 minutes left to answer questions you might have. Student: The materials you mentioned … whether you think we can still be successful without ….? Andreas: It is somewhat an iterative process that the main thing in class is your group project. If you’re not a strong programmer, then look around and find someone who looks like a Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business strong programmer. Make friends with them. I know that undergrad computer science students are in high demand in this class, so grab them while they last. I personally think the synergy between having people from GSB, we have 16 GSB students, undergrads, MS&E grad students, computer grads, is actually a super great mix of people. I’m actually happy about this. If you don’t have the skill, then find somebody in the group who has it. I was thinking about trying to assign groups. I don’t know the interpersonal preference between people, so I think keep in mind that in the next class and maybe Tuesday next week, you really need to find some people - maybe we should take a break in the middle of the next class. It’s too big to do an introduction for everybody. 0:49:48 I was thinking whoever is not comfortable for me putting what they sent to me yesterday out in some form, maybe password protected for the class, should come see me. That’s where you said what you’re interested in, what you’re good at, what you’re known for. If somebody is not comfortable with me putting out what they sent me, then let me know. Otherwise, that’s probably one good way that when you have nothing better to do, you might as well look through there. I forgot one other resource, which is facebook.com/socialdatarevolution, which is the Facebook fan page where people post stuff that is interesting for the class. Sorry about not having a unique name here but things grow, over time. I also would like to introduce some friends we have here from Intuit. Intuit is giving us some money so we can get another TA. Angus, do you want to say a couple of words, why you’re here, about what you did before, that this school isn’t alien to you? I’m working with another couple of companies. It’s not that easy to get people to share data with you, but two years ago we had Friends For Sale, where on Facebook you get a pet and you sell the pet, etc. They shared all their data with us, which was very interesting. We found out that Stanford students stay up later than Harvard students, by just looking at time of day access and stuff like that. We always have a couple of companies who are interested in playing with us. What you do in your project is really up to you. Find people you want to work with. I will show you the structure of the quarter on Thursday, when the first proposals are due. I’ll need to know who is in the group with whom, and then when we need to pitch. At the very last class, I’m inviting a few VC friends of mine who will be sitting there giving us their opinions. If they really like it, maybe even more than their opinions. That’s the first concrete case. Angus will be coming to a good bunch of the classes, so will his colleagues. Maybe you’ll introduce yourself, too, because I knew you as the CTO of BooRah, which was a social media sentiment analysis company. All I can invite you to do is to think. It really is different because this is about the world we live in, which is really changing as we speak here. It is not that we’ll have clearly “these are the take-home messages”. In some cases, we have to create them together, particularly when it comes to issues like privacy. There is no “this is the way it ought to be.” It is very much shifting as we speak. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc MS&E237 Spring 2010 Stanford University Andreas S. Weigend, Ph.D. The Social Data Revolution: Data Mining and Electronic Business It is 5:30. I apologize for the move. It was not our fault here. We lost a bit of time but we ended on time. I will let you know, by sending out email - one quick question. Who did not get an email from me should see Triptia. Whoever did not get an email from me in the last 48 hours, please see Triptia. I hope that by Thursday we have things ironed out. Those people who are interested in potentially doing some TA work for me, or have some ideas about annotating pages, please come see me right away and we’ll try to settle those things today. Thank you very much. See you Thursday. Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc