Download weigend_stanford2010_1overview_2010.03.29

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
Andreas Weigend (www.weigend.com)
The Social Data Revolution: Data Mining and Electronic Business:
MS&E 237, Stanford University
March 29, 2010
Class 1 Overview:
This transcript:
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.mp3
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2010/recordings/audio/
Course Wiki:
http://stanford2010.wikispaces.com
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
Andreas:
Welcome to MS&E 237, the Social Data Revolution. This term was coined a year and a
week ago when we were thinking about what we should call this phenomenon, that
people now knowingly and willingly share all kinds of data about themselves, about their
relationship with others, data the KGB wouldn’t have gotten out of them under torture.
If you think back in history, I think we could say there were three revolutions. The
first revolution was enabled by people being able to transport energy. That was
the Industrial Revolution. What happened there is that people were independent from
where the coal was, or where the water was, and they could basically start manufacturing
wherever they wanted.
Then people learned how to transport bits. That meant the knowledge creation
was independent from where the knowledge production was, and that of course
gave rise to the Information Revolution.
What has happened in the last few years, I think, is that in addition to being able to
communicate on the distance, to move the bits, the creation and sharing of those
bits has become very, very easy. For instance, last Tuesday, there was a group here
at Stanford called Quantified Self. A friend of mine, Kevin Kelly, who was the founder of
Wired magazine, he runs that group; it’s amazing how many devices some people carry
with them, where they record all kinds of things. For instance, the Fitbit is a box which
you carry with you and it tells you whether you sleep enough, most of us probably
don’t; whether you walk enough, and exercise enough, probably most of us don’t
either. The question is why do we need such a device? Ultimately, it’s about
changing behavior.
What really happened with the Social Data Revolution is that it’s not only that the
technology is there, but people do things differently now, and not only young people.
I talked to Danny Kahneman, the Nobel Prize winner in 2002 for Behavioral Economics.
He was a teacher at Princeton then. He said, “Andreas, it’s amazing that what people
now have when they apply for a junior faculty position is more than I needed when I got
tenure.” He said, “It’s just because of this people sharing stuff.”
I call it “glue programmers” versus “industrial strength programmers”. I’m a glue
programmer. I take building blocks, put them together, and hope it works. Physicists are
all glue programmers because we want stuff to work EE (double E) and I know that some
of you are people who don’t worry about the deep underlying structure as long as we
have something which works, as long as we need it to work and if it breaks afterwards it
breaks.
What I’m doing in this class, this is the first time that we’re teaching it, here as MS&E
237. I want to have it much more project focused than I had the previous one, which was
STATS 252, which I’ve taught for 6 or 7 years. Who here has heard about STATS 252,
which I used to teach? It’s about a quarter of the people. It was more problem-set
focused, where we had to build a recommender system in Delicious using Python and
stuff like that. This one is much different in that at the end of the quarter I want 10-12
groups which all have pretty much startup quality ideas in the area of people and data, of
social data, of shared data.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
0:04:18
In order to get there, I’m also bringing some cool people in. For instance, on April 20, we
have the chief product guy of Bitly come to class, and tell us what they can figure out by
Bitly because it’s basically the measurement of the world’s attention right now. Who here
knows Bitly? That is great. We’re rolling.
This is Triptia, by the way. Triptia just came back from India today, fresh off the plane.
She is our TA, and as some of you know we have talked, I’ve talked to some other
people because given the class size; we actually have some more resources. She
doesn’t have to worry about 80 people, but only around 40 or so.
Foursquare, geolocation and here - “Hang on while I check into Foursquare, Yelp,
Google, MyTown, MySpace, Facebook, and Twitter.” That’s an example of sharing
data and two years ago we had a really interesting project where I put the students in
class, and I was wearing one of these devices that recorded my location every 15
minutes. I didn’t turn it off. I told them, “You come up with some interesting insights.”
That is one of the other areas we want to have, geolocation.
We had Foursquare coming, Bitly coming. I have Auren Hoffman who runs a company
called Rapleaf. Does anybody know what Rapleaf does? Isn’t it cool, that sort of spy
on your social graph. Maybe you have an insurance claim and they say to the
insurance company, “Let me check out how her friends are, because friends of a
feather steal together. It turns out that her friends are kind of shady. Maybe you should
investigate a little bit more resources into investigating that claim, as opposed to her
neighbor’s claim, who of course has super clean friends and no worries about it.”
You think that’s the future? No, that’s actually practiced. I worked with an insurance
company in Chicago that actually uses Rapleaf and those of you in MS&E who think that
decisions are irrevocable, commitments of resources; they precisely think about what the
cost is of collecting data for that claim and how much they can learn from the social graph
for that.
We have a bunch of guest speakers, but it’s not a speaker series. I tortured these people
well enough before class, so they only say what I want them to say, basically. That is the
difference from last year, which was less project emphasis and slightly more of an
emphasis on problem sets. We talked about the change of behavior in people, how that
has changed.
The plan for today is I want to give you about a half hour of the material we will talk
about. It’s more like a lecture. If you have questions, do interrupt. Then I want to see
what questions you have and I want to give you an opportunity, if some of you have great
ideas, please briefly speak up. Class ends at 5:30. I’m a big fan of starting on time and I
apologize for the walk today. I’m a big fan of ending on time. Do we need a break? It’s
not clear for a 75 minute class, or given that we had a walk, can we go straight through to
5:30 today? Is that okay?
I want to make sure I leave enough time for you to ask questions about logistics or
whatever you want to know. Not everything has been figured out yet. I just came back
from Beijing myself. I did the course at [0:08:31 Xien Xua], a 4-day course on this stuff,
so by Thursday we’ll have it figured out.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
0:08:38
That’s the Social Data Revolution. You can find me relatively easily at
www.weigend.com. As I give you examples, I want to show you where I had invested
something, where I sort of figured out how something should be, and then how the world
has changed.
For instance here is my simple home page. When I created this page, I thought it was
pretty cool a few years ago, to have a simple box like this where people can simply put in
their name or their friend’s name and some email address, and then just hit return and
reach me that way.
The world has changed. How? If you think about Facebook Connect to contrast
this, for instance on my blog, what is the difference between leaving a comment on
Facebook Connect versus having a nice box? Both are boxes, a simple
implementation. What is the deep difference between those two things, in terms of
social data?
Jess:
I was going to say whether or not all of your Facebook friends can see what your
comment is and what blog you’re…
Andreas:
Right, so if you say your name as well, then I have a chance to actually learn it. What is
your major?
Jess:
Economics
Andreas:
Okay, Jess’s point is that in Facebook your friends can see it and that actually is a
very powerful driver. Frankly, who is interested in just me seeing some random
thing they’re putting on there? Very few people are. Why do you think people like
to actually have their friends see what they’re commenting on? They might appear
smart. They may be “self-pimping”, just to create a reason. One of the survey
results from last year was that most people give attention to get attention.
Harvard Business Review had an article about 11 months ago on the Social Data
Revolution and it’s really interesting to see that most of the articles, most of the
comments people left were “I will give attention but please give me some love
back.” That’s another element of the course, to constantly think about what has really
changed and why is it that the world is a different place now from what it was a few years
ago, when it was just pretty cool not to only have an email address but to have a little box
where people could write stuff.
While we walked, I deflated this from 3 points to 2 points. Does anyone want to share
any of the points you thought about, that you want to make sure the other students
understand clearly? If anybody has a point from when we were walking, this would be
the time to bring it up. This was a point I want to make, Facebook is about
distribution.
Devin:
I’m Devin. One of the points that I thought was important was the sense that people still
don’t realize how much of their Facebook data is accessible when they give access to the
application.
0:12:35
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
Andreas:
One of the questions is do you worry more about your own data when you share
with a Facebook application or do you worry more about your friends getting
spammed? I’m not an economist, but there is this notion about capital, about
social capital. I think capital is used if you want people to do stuff for you, right? With
your financial capital you pay someone and they fix a website or fix slides for you or
make you an ice cream or bake bread for you. Social capital probably has a similar
purpose in that it helps you get stuff done through other people.
What’s the tradeoff here between financial capital and social capital? Can it be
measured in the same currency? I think that’s another very interesting question of
the Social Data Revolution; what is the unit of capital here. Does anybody else want
to share something?
Jonathan:
For me mobile social in real time is a very compelling real time application…
Andreas:
I wish I had a white board where I could jot some stuff down. I haven’t used chalk for
about 10 years. I learned in this room, actually. I was a grad student here, about the
Cuban Missile Crisis, from Bernstein. In those days they still had chalk. What I want to
say are three things.
The 1990’s were clearly search. Before the 1990’s, people had to tell people where
something was. It’s amazing; you had to tell people on an FTP site where you
could find something. Then the 1990’s came along, it was search again but social
data - people publically sharing links. Page rank was an important element for a
good search. Allowing us to actually find stuff was the first big innovation, I
remember, after my PhD.
The second one was clearly social in the 2000’s. Why? Because of the social filter
that we now discover, through our friends. I had one question in Beijing, at [Xian
Xua] that you will also get, and that was a question which came from a friend of mine in
Singapore who said, “Imagine that Facebook was shutting down and all your data will be
destroyed in 2 days. What would you do?” That’s a good question.
Most of the people said “nothing”. But then I realized that Facebook is banned in China
so I think it was a fair answer. The notion of using the social graph to discover stuff is
something which simply wasn’t around before and I believe what the third dimension
is, for the 2010’s is clearly mobile.
Mobile - what is the device which is always with us, talking to us every now and then? If
you’re in a shop you would use RedLaser and it suddenly tells you what that item that
sells for so much in [0:16:23] would actually cost at Amazon.com or even better; you can
take a picture and use the Amazon app and it tells you what the item is. That is very
powerful.
I think search has certainly changed the universe. Social has changed the
universe, and what we’re seeing here is we are just seeing the beginning of mobile.
I haven’t gotten through all of your 80-something forms yet, so I’m surprised about the
variety of mobile phones and how few people actually have iPhones.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
0:16:54
Jonathan made the point of real time. What is real time for you? Right now, in the
big history of the universe. I’m a physicist by training and time scales is
something super important for us. When we talk real time, do we mean that we
know where the attention of the world is right now? Or, do we mean that we’re
talking about a day, a week?
Product development cycles, if you think about a new logo, it used to be the
timescale of months for companies. The packaging for toothpaste takes months to
change. Amazon takes minutes to make such changes. One of the things you’ll
hear many times in class is the PHAME methodology. Does anybody know what the
meaning of PHAME is?
P stands for problem.
H stands for hypothesis.
A stands for action.
M stands for metric.
E stands for experiment.
This timescale aspect has been reduced from months to minutes, by pretty
rigorously following this framework. You start with what the problem is. An
example is Amazon has a co-branded credit card. It makes the company about $60
million a year. The deal is this; if you sign up for this co-branded Chase-Amazon
card, Amazon gets to keep $100 for each customer they get to Chase, and in
addition to that, Chase gives them $30 which they pass on to the customer.
The problem is how do we get most people to actually sign up for the card.
Amazon doesn’t care whether they use them. There might be a second order effect that
if they don’t use them then Chase reduces the bounty, but the current game is get as
many people as possible to sign up for the card.
The hypotheses we had, Jeff Bezos and I discussed this back and forth. There
were two reasonable competing hypotheses. One was we give them $30 towards
their next purchase, and that means they will come back, repeat customers, it’s
engrained to think about Amazon all the time and the $30 voucher, etc. The other
hypothesis was giving them $30 now; they’ll sign up now and forget about the
future. What would you think would be the right thing for Amazon?
Student:
$30 delayed would probably promote return customers and that’s really what you’d be
hoping to do with a card, a loyalty program.
Andreas:
Whatever you said I would have said wrong because we haven’t decided on what
the metrics are. It totally depends on what it is you’re looking for. If you’re trying
to get loyalty you do different things than if you’re trying to go for the quick buck.
What was your point?
0:21:26
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
It’s likely that more people will sign up for it right now, but how many more? 10%?
What is the tradeoff? It’s a beautiful example where we ran the experiment where
we had two different actions which was just different wording in the two cases, and
we ran the experiment where half of the customers, the ones who had an even
customer ID, were shown the wording “You get $30 off today. You have $48 worth
in your shopping cart. You get this for only $18.” The other customers were
shown $30 on their next purchase.
It was a factor of almost 5 different, that almost 5 times more signed up right now,
as opposed to the delay and even a good company that has a good reputation like
Amazon.com, with a factor of 5, how bad would that be for a company that didn’t have as
good a brand or as good a reputation as Amazon.com. That ends all arguments.
Another example that ended the argument was in about 2002-2003, Amazon.com
was scanning the Philippines, a lot of Philipinos scanning page after page, without
telling the publishers anything about it. Basically, your computer got busy at night
OCR-ing all these pages to our friends in the Philippines for scanning. Suddenly,
there were a lot of scanned OCR books available at Amazon.com.
What we saw a few years later at Google, the publishers said, “What about
copyright?” We anticipated something like this so we had pairs of similar books,
similar price, similar page rank, similar Amazon sales rank etc., and one of them
we actually surfaced the scanned version. The other one, we didn’t surface the
scanned version. For instance, if someone searched for one of my books, they
wouldn’t see all the references in those books which were in the hold-out group.
We measured what the difference was in sales. Sure enough, it was 7% more sales
for those books that were scanned and indexed. That quickly ended the argument
with the publishers. This methodology is very much a physics or science-oriented
methodology, which now with communication, not just that exists in the Social Data
Revolution between people and people, or people and companies, but here from a
company perspective it’s now possible. Experiments is what the game is. There, it’s
not complicated multivariable testing, but just doing simple A/B testing, just do it. There
were a couple of hands up.
Student:
When you made the comment about 5 times as many people signed up when they
got instant gratification, did you track how many returned within a week or a month
or a year? Were they one-time-only shoppers or did they come back more and
more?
Andreas:
That’s a good point. It’s easy to measure short-term effects, and it’s quite hard to
measure long-term effects because people might forget their customer name, ID,
password, etc. I did some long-term studies, not on this one. We did one very good
long-term study with a Stanford intern I had a few years ago, Jimmy [0:25:08 Pang] and
I ran the first survey ad at Amazon.com. Let me give you some background.
[0:25:15], who is a VC here, asked me at a board meeting, “Why do people come to
Amazon,” thinking I would do my data mining and then come back with beautiful
results for the next board meeting. My view was very simple. I just ask people.
One of the questions I snuck into a survey which I’ll give you a link for at the end
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
of the meeting today, is what would you do if you could ask a million people on
Facebook or at Amazon any question? What question would you ask them?
0:25:43
The question I asked them was “Why did you come to Amazon.com today?” Then
I had another Stanford student, a former intern of mine, manually classify the first
thousand, writing a little postscript on the side, which then did the remaining
900,000 automatically. Interesting findings came out of that.
Correlating what they say with what they actually do is where the important
information lies. They say, “I’m going to buy…” a specific item. Only about a third
of the people who say that, end up buying within that session. A session is
defined as within that hour.
You have many people who say they want to buy something, they want to do
something, yet they don’t get to do it. Now you look at what they actually did. It
turned out that product search really sucked. You enter “blue pants” because you
want to buy blue pants and it showed you books that were titled with “blue” or
“pants,” which wasn’t exactly what you wanted.
Jeff said, how about ten million, and you do some search companies [0:26:56]. On
the other hand, the other interesting thing was if you conditioned the other way
around, if you conditioned on people actually making a purchase, and then see
what those guys said at the beginning of the session, it turned out that only half of
them actually came with an intent to buy. Only one half of the people who actually
buy something wanted to buy something. They said, “I’m just trying to kill some
time,” or “wanted to see what’s new.” That’s powerful. That’s the power we’re
talking about here, the power of recommendations.
I want to get through another few things here. The next point I’ve borrowed from
Kevin Kelly, who I mentioned before. Kevin is wondering what this connection
business is really about. What are we actually connecting here? Again, 20 years
ago, the world was about connecting computers. I remember when I used Quicken.
I was actually logging into Citibank. You remember trying to connect, and the modem
was busy. Eventually, I got through, so it was really a computer talking to a computer.
Afterwards, it was connecting pages. The web was not about connecting
computers primarily. It was truly about having the data about what objects, what
pages were connected. What came afterwards? It was connecting people.
This course is really trying to look at what is underlying all that. I would say that
underlying it is data. That’s what we call social data. I found an estimate that I made
a long time ago, in The Economist, in the special issue that came out about 4 weeks ago,
that I sent to people in the class. The amount of data each person creates doubles
every 1.5 - 2 years. We take a certain day and the amount of data we create in a
day, after 5 years, is about a factor of 10; in 10 years it’s a factor of 100; in 20 years
it’s a factor of 1,000; a factor of [0:30:50 10,000] after 20 years.
This is the cover sheet of the Secret Police in East Germany had about me. After 1989
when East Germany fell, we could look at our stuff. There were people who were paid to
actually collect information about other people. It was pretty thin. I was a grad student at
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
the time, so there was not all that much there. Compare this to what we have now.
There are a billion plus connected Flash players.
0:31:43
When I got that screen, allowing Amazon.com web service to access my camera
and microphone, of course the answer is yes. Remember it. I want whoever is
using web services here to know whether I’m smiling, whether I’m in a good mood,
whether I look tired, whether I may be watching something with a friend. I want to
be able to actually pick up the quality of my voice because the voice tells a lot
about the mood I’m in; then, serve me the relevant ads. What do you think about
that?
Student:
That sounds horrible. It seems like privacy …
Andreas:
But your time is better spent that way. If you’re ready for a beer in the evening,
assuming you’re 21 here, then it will show you a beer ad; whereas if you’re in class
it will show you a coffee ad or something.
Student:
Can I make my own decisions about what I’m going to drink?
Andreas:
Totally, but what is the scarcest good we have? Our attention. Recommendation
systems try to play to attention. The more data you can collect about someone’s
state, the better you can serve their attention, the less time you waste. If I see you
are slowly nodding away, then I know I’m not doing a super good job. There is actually
this trick in class which I sometimes use when I actually have benches, that people have
a cube in front of them. Green means yeah, you’re rolling. Red means I’m basically
falling asleep, so it’s good to get that feedback.
Student:
I think my attention is valuable to other people but my attention, myself, I prefer not to be
distracted.
Andreas:
Yes, the cost of interrupt, it’s vastly underestimated. Here, you’re attention is
you’re willing to give your attention to whatever is being shown to you. Then we
might as well give you the highest ROA, return on attention.
I’m just making some points about data here and I’m very much looking forward to
the discussion about privacy. It goes beyond privacy. It goes to the discussion
about identity. Who are we? Is identity now socially constructed? Presentation of
self, is it just what people show on Facebook? An interesting study has shown that
people are pretty damn good at managing their profiles on Facebook. People really are
presenting that “self” that they want to present. Whoever they are is a different story.
Another example was the world’s third-largest retailer called Metro Group. They’re
in Europe, Asia. I don’t think they’re in the United States. They have this Future
Store near Dusseldorf, Germany. Every single item has an RFI, radio frequency
identifier. It costs $.02. It’s about as big as a grain of salt.
As you’re going through the aisle and thinking about that cream cheese, “Maybe I
should lose a couple of pounds,” and you go for the low-fat cottage cheese
version. They just learned something about you. They learned that you’re slightly
worried about your weight so as you walk down the aisles - by the way the prices
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
are not fixed. The prices vary as you’re walking along. As you’re walking, more
dieting items like Weight Watchers might have arrows pointing to them. I’m not
making this up. That’s the Future Store. The arrows pointing to them might say, “Weight
Watchers is for you” right now because you just exhibited that maybe that date last night
didn’t go so well because the person thought you had a couple of pounds too many. It
got them the Big Brother Award, not an award you necessarily want.
0:36:06
Here is another one. I love geolocation so if you’re willing to put a device like that in your
car, then you’re only charged when you drive. That’s why it’s called Pay As You Drive
car insurance. I live in San Francisco. I was out of town the last month and my car
was in the garage. No insurance. What do you think would be the risks here?
How would you actually compute how much insurance I should pay driving to
Stanford, for instance?
Student:
It depends on your driving style.
Andreas:
One might be my driving style.
Student:
Speed, velocity, what roads you’re taking.
Andreas:
Maybe it also knows that when I’m starting, it knows when my class starts. Of
course, it knows because it knows my Outlook file or my Google calendar, and it
knows that I’m pretty damn late so in this case I might be speeding so that would
be $40 to actually insure the car; whereas after class I take it easy; unless of
course, it’s after dinner at a restaurant which is known for its good wines, where
the risk might be going up again.
If you can change behavior by telling people this is what it is, then maybe they’ll
take [0:37:38] instead. This is unfortunately no longer in business, or fortunately,
depending on your view. There were a couple of cases where it’s like a black box in
airplane crashes. After the fatal crashes, the insurance company looked at the boxes
and the dude was speeding, like on the German Audubon, 200 km an hour. You don’t
want to pay for somebody who has clearly broken the law. It’s not my idea of insurance.
Another example I got from The Economist, and I don’t know much myself about the
example, was a woman 16 or 18 on the East Coast who was anorexic. Apparently
she was pretty bad so she had to go to the hospital. Apparently there are two kinds of
anorexia. In one case, the insurance company needs to pay. In the other case, the
insurance doesn’t need to pay.
The moment the girl ended up in the hospital, the insurance company checked out
her Facebook profile, became friends with her friends, the same on MySpace, and
then made a case against her so they didn’t have to pay. For the insurance company
and their lawyers, it was clearly a case of the second kind of anorexia. What do you think
about that?
How do you know the next guy who is trying to be your friend on Facebook isn’t a cop?
Who wants to check out what the pot situation is no campus, or something? We don’t
know.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
Timescales are the potpourri of ingredients for the class. Timescale of data and
technology is one year. Social norms may be 10 years. Biology, I would say, may be
100 thousand years. Let me ask you, what do these two have in common? 99% of their
DNA overlaps.
Somebody brought up before the point about cost of interrupt being pretty high.
For the last 100 thousand years, we got used to if there is a little bit of information,
run [0:40:08]. If there is a little bit of sugar, grab it. If there is a little bit of salt,
grab it because you never know when the next [0:40:15] of sugar or fat comes
along.
0:40:20
That reflex we have, where if something changes, a shining object, to go after it, is
actually a reflex that might not serve us all that well. Brian [0:40:30], who was a
grad student with me and is teaching in the Psych Department, has some
wonderful studies on FMRI, functional magnetic resonance imaging, where he
basically shows that how we react on an ad, meaning what do we buy weeks down
the road, is happening in the first second. He knows from the activations how our
brain is firing, how the oxygen content is distributed there. He knows whether
we’re buying Pepsi or Coke when we go to the store weeks later. It’s interesting
how much is really in the biology, what we can’t change, and how much is
learnable.
I have two more slides here. This one I stole from Mark Zuckerberg, not exactly but
sort of in spirit. We have some EE’ (double E’s). I am personally a big fan of EE’s. I
started off in EE myself. I learned about channels and [0:41:36] and all that stuff. You
learn that the purpose of communication of a channel is to transmit information, these
modems. Zuckerberg said unfortunately you got it wrong.
It is really that information is an excuse for communication. If you think about it,
there is some truth to it. Many of the apps we see are just good examples of
excuses to hit somebody up, to just make sure we’re not alone. That means on
YouTube, that every minute 20 hours of video is uploaded. That’s social data,
shared data. 1 billion videos are watched every day. That’s about a seventh of the
world population. That’s a high number. That’s shared data. Twitter, what do you
think here? How long does it take to create a billion tweets? A minute, a day, a year, a
decade? Right now it’s about a month.
What we all have in common is we have Web 0 which is computers. Web 1.0 is
pages. Web 2.0 is people. We have all of this, due to the ability to very cheaply
create data and to share data.
I have a couple of blog posts that Eric Sun, who is now at Facebook, wrote for me in
class about 2 years ago. He’s a brilliant guy. He thought about what the effect of all this
is, what should we do for communication media now, given that the world has changed,
and [0:43:34] have become cheap.
I have a couple of things in terms of housekeeping URLs before I jump to whatever
questions you have for the last 10 minutes. The first question I have for you, and I just
made the slide today. Who is this guy? You probably don’t know him but his name is
[0:44:13] Young. He created an amazing survey for us, fresh off the press. Triptia
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
checked it out after she got back from India this morning. It’s a bitly/sdr2010. It should
take you about 45 minutes and that’s the first assignment.
I hope you will genuinely enjoy those questions. The guy is creative. I told you about
one of them, how would your life change in China with Facebook, and the answer was
nothing. We also have a couple of visitors coming next week. Norbert Schwartz is a
social psychology professor in Michigan. He’s visiting at the Center of Behavioral
Studies, and he is the best survey guy in the world. Norbert and I worked together in a
startup called [0:45:10] 8 years ago. When he saw that survey, I asked him for feedback.
He said he is coming and sitting in the class, which is pretty cool.
0:45:20
Write the URL down if you can’t remember it. It’s bitly - the guy who is coming on April
20th - bitly/sdr2010. SDR stands for Social Data Revolution. It’s case sensitive. 2010 is
this year. That’s what I want you to do before Thursday. It’s about 45 minutes, to an
hour maximum.
Why do I want you to do this? Two things - I want you to think about some of
these issues. How society, individuals, and businesses are changes in respect to
the Social Data Revolution. We are creating interesting data. As the Dean of the
University in Berlin, who will run it there, and [0:46:02] is running it in Singapore, so it will
be quite interesting to see the cultural differences, not huge sample sizes, but still 50-100
smart kids.
We have a couple of resources. This is my email address. I made a Gmail
address, which his [email protected]. Since our TA situation isn’t yet totally stable,
we have Triptia but we have certain money for more people, so whoever thinks they have
skills, some extra time to deal with the projects mainly of other students, come see me
after class today. Right now, MS&E 237 is the only thing you need to know when you
want to reach TAs or course assistance. Is this a global, campus-wide shift from TA to
CA? Or is this from STATS and Humanities course is TA and MS&E calls it CA? They
call it CA, I’ve learned.
Then Stanford 2010 Wikispaces.com is probably, for the moment, the wiki that I will
use. Who is the person who emailed me yesterday day night that he has set up a wiki,
wiki media for his company? Did he make it to class, or did we lose him on the way?
My plan is to migrate it somewhere else where we have more control. I know the
Wikispaces people are good people but what I’m planning is to have every single class
transcribed so you’ll get a transcript of every single class after a couple of days. Then it
will be great if we can simply annotate this. I don’t know yet whether this software is
something we can easily get done, get installed, or not, so for the time being this is where
we’ll update stuff. I’ll let you know there if we move it somewhere else.
We did lose a bit of time at the beginning. We have 7 minutes left to answer questions
you might have.
Student:
The materials you mentioned … whether you think we can still be successful without ….?
Andreas:
It is somewhat an iterative process that the main thing in class is your group project. If
you’re not a strong programmer, then look around and find someone who looks like a
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
strong programmer. Make friends with them. I know that undergrad computer science
students are in high demand in this class, so grab them while they last.
I personally think the synergy between having people from GSB, we have 16 GSB
students, undergrads, MS&E grad students, computer grads, is actually a super great mix
of people. I’m actually happy about this. If you don’t have the skill, then find somebody
in the group who has it. I was thinking about trying to assign groups. I don’t know the
interpersonal preference between people, so I think keep in mind that in the next class
and maybe Tuesday next week, you really need to find some people - maybe we should
take a break in the middle of the next class. It’s too big to do an introduction for
everybody.
0:49:48
I was thinking whoever is not comfortable for me putting what they sent to me yesterday
out in some form, maybe password protected for the class, should come see me. That’s
where you said what you’re interested in, what you’re good at, what you’re known for. If
somebody is not comfortable with me putting out what they sent me, then let me know.
Otherwise, that’s probably one good way that when you have nothing better to do, you
might as well look through there.
I forgot one other resource, which is facebook.com/socialdatarevolution, which is the
Facebook fan page where people post stuff that is interesting for the class. Sorry about
not having a unique name here but things grow, over time.
I also would like to introduce some friends we have here from Intuit. Intuit is giving us
some money so we can get another TA. Angus, do you want to say a couple of words,
why you’re here, about what you did before, that this school isn’t alien to you? I’m
working with another couple of companies.
It’s not that easy to get people to share data with you, but two years ago we had Friends
For Sale, where on Facebook you get a pet and you sell the pet, etc. They shared all
their data with us, which was very interesting. We found out that Stanford students stay
up later than Harvard students, by just looking at time of day access and stuff like that.
We always have a couple of companies who are interested in playing with us.
What you do in your project is really up to you. Find people you want to work with. I will
show you the structure of the quarter on Thursday, when the first proposals are due. I’ll
need to know who is in the group with whom, and then when we need to pitch. At the
very last class, I’m inviting a few VC friends of mine who will be sitting there giving us
their opinions. If they really like it, maybe even more than their opinions.
That’s the first concrete case. Angus will be coming to a good bunch of the classes, so
will his colleagues. Maybe you’ll introduce yourself, too, because I knew you as the CTO
of BooRah, which was a social media sentiment analysis company.
All I can invite you to do is to think. It really is different because this is about the world we
live in, which is really changing as we speak here. It is not that we’ll have clearly “these
are the take-home messages”. In some cases, we have to create them together,
particularly when it comes to issues like privacy. There is no “this is the way it ought to
be.” It is very much shifting as we speak.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc
MS&E237
Spring 2010
Stanford University
Andreas S. Weigend, Ph.D.
The Social Data Revolution:
Data Mining and Electronic Business
It is 5:30. I apologize for the move. It was not our fault here. We lost a bit of time but we
ended on time. I will let you know, by sending out email - one quick question. Who did
not get an email from me should see Triptia. Whoever did not get an email from me in
the last 48 hours, please see Triptia. I hope that by Thursday we have things ironed out.
Those people who are interested in potentially doing some TA work for me, or have some
ideas about annotating pages, please come see me right away and we’ll try to settle
those things today.
Thank you very much. See you Thursday.
Transcript - Tamara Bentzur - Testimonials – www.tbentzur.wordpress.com, www.outsourcestranscriptionservices.com
http://weigend.com/files/teaching/stanford/2010/recordings/audio/weigend_stanford2010_1overview_2010.03.29.doc