Download weigend_stanford2009_8_2summary_2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas Weigend (www.weigend.com)
Data Mining and Electronic Business: The Social Data Revolution
STATS 252
June 1, 2009
Class 8_2Summary: (Part 2 of 2)
This transcript:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.mp3
Previous Transcript – 8_1 Ads: (Part 1 of 2):
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_1ads_2009.06.01.doc
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 1
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
We started off by talking about how communication costs have dramatically changed the
world; that we can hear, basically only using TC/ IP, and we can reach the world with
Ustream. We can be very open and free. People can see us and interact with us by
potentially posting stuff here. That bidirectional communication, which has been
possible because of the costs having basically gone to zero, is what we are
exploring here, in the Social Data Revolution.
Going back; twenty or thirty years ago, the task was to connect computers. That
was infrastructure play and it was about hooking up the back office computers with
each other. Think about airlines, about number crunching, and I think it was the
President of Intel who said that he believes the main use of computers would be to store
recipes; he couldn’t really think of many personal computer uses. It was about
connecting programmers to computers, at best. Then, it was connecting front office
people, people who work in the airline office, for instance, to the computer. What
really changed is that in 1994 we suddenly all got connected, essentially, to the
back office.
Looking at the next class, I want to emphasize the Mars landing, which must have been
in 1998; it was a big event because more people were watching it on the web than on
television. In 1998, people were still thinking that the web was just a better
television. Think about MySpace and Facebook and how dramatically that has
changed. Facebook and MySpace is not just better television. It’s really about
interaction. It’s about Nokia’s slogan of connecting people – my apologies to
Orange.
Here is the first distinction I want to make. List versus stream, when we say
connecting people do we view the stuff we get from people as list, being defined as
something we need to work through, or do we see it as a stream; you stand in the
streaming water and you might grab a fish. If you don’t grab this fish, you could
grab the next fish.
That’s one of the differences; the streaming, real time search on Twitter that we
discussed two weeks ago is very different in terms of expectations to what we had
before. Leaving a trace and not making us feel bad because we haven’t gotten to it is
one of the things we really think about differently from how we thought about it a few
years ago.
0:03:28.7
So many people have their laptops out. If some want to use the EtherPad to convince
ourselves that it actually works, and drop down some of the things as I sometimes do
when I’m listening to somebody else; it gives me the feeling that the technology is not
failing us, here. It does not require registration. It seems open to abuse. It very rarely
happens. That’s one of the good things we learned on the web.
What are the expectations? In email the expectation is that it gets read. Where do
we have the expectation that it doesn’t get read? I think tweets have the expectation
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 2
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
that nobody reads them. That real time aspect in tweets, in this communication, is
very unclear to me. Where does real time really matter?
What is important is the difference between Twitter and Facebook. Facebook and
other predecessors are symmetrical. Facebook, at its heart, is about distribution.
The “share this” button is the essence of Facebook. Think about newsfeeds; it
was about distribution. The crutch of chronology or the “sorry state of relevance,” as
Ray calls it in his blog post, is an example where people have not yet mastered machine
learning to actually make that stream a little bit more interesting.
There is sharing and news feed. Photo tagging is a beautiful example of social data.
I tag somebody in a photo and that is social data that is very different from what we
expect the web to be. Why do people do it? You had some slides on that; why do
people do this? I think it’s self recognition. People want to be immortal. If I tag
somebody on a photo or I get tagged, maybe that survives me.
People actually want to be funny. It’s interesting; when Facebook asked people, 8 out
of 100 people said they made comments because they wanted to be funny. I think none
of us want to be funny, but it’s an interesting element. It’s not about giving attention,
but about getting attention. It’s this reciprocity. I give you some attention so
maybe you will give me some back.
The next step here was that we go beyond connecting people, beyond trying to
measure the attention stream of people, to connecting data. As we were driving
here today, John said, “You had somebody talk about the semantic web; I thought it was
dead?” It actually has been declared dead many times. Still, some people are doing it,
hoping that if we have the right metadata, then someday we’ll be able to actually have
those data connected to each other and become smart.
I am one step behind that. I am a big believer in that people should do what people
do well, and computers should be doing what computers do well. What we call the
Social Data Revolution means that we share, and the mind set is where the
revolution happens. How people look at data differently and what they want to get
back in exchange for the data they give is, for me, the revolution. It’s about
consumer behavior and expectation shifting.
Do you remember the Coca Cola example, where there were two guys who had a fan
page on Facebook? A couple of years ago, a company like Coca Cola would probably
have decided to sue them for trademark violation. When I was teaching my course at
Berkeley, somebody was worried about using the term BART because he would be using
the logo without having permission, the Bay Area Rapid Transit. How has that changed?
0:07:24.7
Interesting companies like Coca Cola have realized that by giving those couple of
people a video camera and a tour through the office, they were getting much more
bang for the buck than they would receive from negative press. That whole thing
about putting up things in the public that you don’t like, for instance, at the
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 3
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
conference somebody put up the list of attendees publicly saying, “These are all
the rich people who can actually afford to go to this conference,” and nobody
could do anything about it. He then put up the letter from the conference
organizers saying, “Please, take down this list,” and doubly enforced that they
tried to keep it private. That shift from the private to the public certainly has
happened.
To me, the Social Data Revolution is a shift in the individual’s expectations
towards what they can get in exchange for sharing data, sharing data about
themselves, and sharing data about relationships to others.
The plumbing is here. Knowingly and voluntarily sharing data, that is where the new
generation of data sits. It’s on MySpace, on Facebook, on dating sites, on Craig’s List;
people actually realize that knowingly and willingly sharing data is where to go. It’s not
sniffing the digital exhaust or going through the digital trash.
We are using the EtherPad here. I love peoples’ creativity here. Is there anything I
should know about? I see all of you laughing here, but I don’t quite know what’s going
on. Are there any comments? This was the first part where I just set the stage before we
move to the applications of the Social Data Revolution. [Laughter] Can I get some
attention please? [Laughs]
It’s not about the plumbing. It’s not about smart algorithms. It’s about getting
people to share stuff. How do you get people to ultimately share stuff? It’s by giving
them [0:09:59.7 unclear] in return.
I now want to explore and review with you what that means for companies. I’m not
thinking about Facebook or MySpace. I’m thinking about companies like Coca
Cola, Fortune 500 companies, or smaller companies; what does it mean for them?
I want to investigate one of the four Ps of marketing, which you remember are Product,
Placement, Pricing, and Promotion. I want to investigate one of them, namely the P
for product.
Who knows the product best? It used to be that the company that produced the
product knew the product best. If you have a Nokia phone, Nokia engineers who think
hard about it and then push it to the market, as they pushed tires onto the market before;
those guys thought they really knew the product best.
When you look for Nokia Map Activation, none of the top links lead to you Nokia.
All the top ten links on Google, at least when I checked it, led you to other sites
where people know more about the product than Nokia does. Ultimately, Google
really knows more about the product by indexing, storing, and searching the web,
than any company does.
0:11:48.1
Earlier this week, I heard Steve Ballmer Microsoft’s CEO, give a great performance
talking about Bing. Do you know what Bing stands for? It’s a new search engine. Bing
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 4
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
is not Google – funny. What we have learned is that there is so much information
that people voluntarily and knowingly share that if we incorporate this into the
process of product marketing, we can do an amazingly better job. At the heart of
it, it’s the bidirectional communication. It’s that feedback loop, which is so fast.
I am always surprised, and I think the same is true for pretty much everybody, to come up
with any company or product you are thinking about – your JVC camera there – and do a
Google search or search on Twitter. I was just at Zuni Cafe two nights ago. The
Maitre d’, the manager didn’t know about Twitter. I showed him what people had
been tweeting in the last half hour about Zuni. That real time element that people
often don’t realize how much they’re being talked about or how much their
enterprise is being talked about, or how much their product is talked about. That
is part of the Social Data Revolution for me.
One question is should the company provide the platform? One example is
www.mystarbuckidea.com, which is a site that had 60,000 ideas by people about
anything from pricing to the atmosphere in the restaurant, to Wi-Fi and so on. 60,000
people share stuff, vote for stuff, and discuss stuff. Isn’t that the dream of any
marketer, to have people truly engage with your brand, without even paying
anything but just to share, vote, and discuss, and see. When I added this up, the
number of people was about 60,000 to 70,000. That’s a platform that is hosted, in this
case, by Starbucks.
An alternative example is www.getsatisfaction.com, a company here in San
Francisco who advocates people powered customer service. They are a neutral
platform. They call themselves the “Switzerland of customer service.” If people have a
question, they can ask that question and usually get it answered. They have
company representatives sitting on that.
Another neutral example is a website called www.flatseats.com, which is about
business and first class seats. Things are described in amazing detail, by people
who actually sit in those seats, not by people who write the marketing materials that
we all love, but by people who actually endure the 14 hours to Singapore from here.
They share their experiences. What are their incentives? Maybe it’s contributing
to the community.
It is very interesting that this site is sponsored by a company called Skytrax, which
in the business of aggregating information or the airline industry. They saw the
opportunity in getting people to actually contribute to their site, by not
intermediating it but by keeping their space and giving the information back. You
probably heard about that Yelp problem; apparently for a certain fee Yelp would help you
by making not so positive reviews disappear. There is a question about whether or not
that should be happening.
0:16:16.9
Does this help anybody to bring this up, or do you see it anyway – the EtherPad. I’m not
sure whether I want to see it. [Laughter] I will leave that for your entertainment.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 5
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
The point is that these platforms, which help people with product development and
product marketing, can be organized and sponsored by the company. Starbucks
is an example or they can be neutral like FlatSeats or GetSatisfaction. FixYa is
another example of an Israeli company; if you have any tech problems you have a
chance of actually finding the answer there.
The product isn’t the message, but the perception of the product is the message.
Think for a moment about how that has dramatically differed from how traditional
advertising used to be. In traditional advertising, you don’t try to sell stuff to people.
You don’t tell them that the product really rocks.
Geoffrey Miller explained in his half hour podcast on my blog that you make them
believe that you will have certain qualities that other people will see in you if you
have that product. You wear that cologne, have that car, or wear that diamond.
Traditional advertising actually is not about the product; it’s about what other
people think about you when you buy the product. That’s a very interesting twist.
Now, I would say the perception about the product is the message.
Moving towards marketing, I want to spend the next 15 minutes or so on what I
think truly is the core of data mining and e-business. This is how
recommendations have changed. Amazon.com makes 10% to 20% of its revenues,
depending on the product group, through recommendations. I don’t know whether
you have some figures from Yahoo or from other companies, but I think low double
digits is pretty common if you have a decent recommendation engine.
On the one hand that’s amazing; 10% to 20% of the bottom line is just from having
a little piece of code that figures out what to show to people. That’s about the best
bang for the buck that I could get. On the other hand, if you think about how
people don’t know what they want; people don’t know what they get and people
don’t know how things are devolving. It’s not surprising that they can be
influenced by recommendations.
I want to give two examples from Amazon and then show you how things have
developed. We talked about “people who bought X also bought Y” and “people
who clicked on X also clicked on Y,” and “people who clicked on X eventually
bought Y” as three different ways of using the same algorithm which is similar to
what we talked about before today. The good thing about this is it’s a good
example for a simple algorithm, where the problems are to get the data and to
make it scale.
0:19:44.4
The other example from Amazon is social recommendations, “Share the Love.”
Again, voluntarily and knowingly sharing information that I just bought that book
with a friend, and in a perceived fair way, getting something exchange like if that
friend buys it within a week I get a 10% discount and he gets 10% off. Amazon has
sold another book. That was a pretty good idea that Amazon had about 10 years ago.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 6
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
In some way, what we learned here about leveraging the social graph; Amazon did.
What is the difference? Amazon did it in a private way.
If I forward that link or if I tell Gary about me having bought that book, you won’t
know about it. How this has changed in the shift from private to public is that if I
post on Facebook that I just bought this book, then all my Facebook friends
potentially get notified, if there happens to be this crutch of chronology in their
little window where they see what I just did. If I do it on Twitter, it’s there for
eternity because it gets saved, indexed, and people will know five or ten years
down the road when you apply for a job that you bought that book on How To Sue
Your Employer, or whatever the book is.
I’m distinguishing now the architecture of interaction, which is C-to-C and the
architecture of sharing, which I call C-to-W or customer to world. Let me give you a
few examples of C-to-C, customer-to-customer. Those of you who remember; that term
used to be used for eBay, where people buy things from each other. Purchasing stuff
is really a small part of the social interactions we have. How often do we buy stuff?
We may buy breakfast, maybe lunch, maybe dinner, maybe a bus ticket and that’s about
it, during the day. Contrast this with the interactions we have in a less [0:21:54.0
unclear], more an intentional exchange way, our orders might be more.
An example here is Skydeck. If you give them your phone number, your carrier,
and your password to the website, it shows you insights to your calling behavior.
That’s very interesting. It suggests actions; that dear friend of yours in L.A., did you
notice that it’s been you who has been calling him in the last couple of weeks? What
does that mean? Maybe it’s time for a month’s free subscription on a dating site. Those
are data that are passive data.
Rapleaf, a company I mentioned earlier today, is another example of a company
that uses the C-to-C data. It looks at my social network data, across social
networks. It figures out what kind of people I hang out with. In German we say,
[0:23:03.3 unclear], of “birds of a feather flock together.” If my friends tend to be
frosters, maybe that insurance should not be super interested in me. If my friends
are all super clean, then no worries.
Interestingly enough, if there are wrong data about you, unfortunately, you have no
way of fixing it. That pretty much sucks. If somebody makes himself an accountant –
I’m not suggesting this – let’s say Jay [0:23:29.0 unclear] and he does interesting things,
Rapleaf may indicate that this may be the same guy that worked at Fox, which might
cause John a problem. It’s not clear how we can fix that. He knows the people. John
can fix many things; breaking into computers is one of them. For normal people, that’s
not easy.
0:23:51.8
There are good things about it; it helps companies save money, insurance can use
their scarce resources to investigate the right claims. On the other hand, it can
cause people a lot of trouble. One of the open things that remain is how do we
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 7
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
deal with wrong data about us? That’s very difficult; I don’t know. How do we deal
with aging data?
In this C-to-C area, we briefly talked about a redefinition of customer value. When
we talked about customer value, sometimes called customer lifetime value if you
bring the time component into the game, we asked first of all “what do you mean
by customer value?” Is it the value the company has for the customer, or is it the
value the customer has for the company? Flip the arrow sometimes, as in not asking
a carrier what they can do for the customer, but what can the customer do for the carrier?
How can the customer help the carrier? What is the value of the carrier for the
customer is a very interesting, new way that evolves from this bi-directional
communication, which is very democratic now.
We learned from our Facebook friends, from Eric Sun and his talk, that it’s no longer
this notion that you try to find the influencers. With the threshold of interactions
being so low now, it’s so easy for anybody to propagate stuff and distribute
information.
The notion has come to make it as easy as possible to show you an attention
stream, to give you ways and to allow you, with lightweight interactions, to become
influencers for those few things you care for. In my case, and I your case, it’s audio –
as you found out on Facebook last night. You love to spend too much money on audio.
That is very interesting, in C-to-C, customers talking to customers, it’s not
customers to the world, yet. When people talk about architecture of interaction, which
three years ago at DLD, I called it web 3.0 because web 2.0 was participation. Web 3.0
was for interaction. It’s that we leverage the graph of connections between
customers that could be phone, Facebook, email, and the underlying hope is that
people rub off what their neighbors say, in some way.
How does one do this in an advanced way? I could specify coupling constants
between me and my friends. I really trust Ron Chung. If Ron does something well, I
want to get .9% of the credit he gets. If he screws up, I am happy to take .9% of the
blame he gets. If we allow people to attach metadata to the connections, then you can
actually learn much more about the graph than that sorry state of binary connections that
we live in right now.
0:27:12.7
It’s also interesting; from the perspective that maybe everyone trusts Ron, but maybe
Ron doesn’t trust anybody. Maybe he is a trust sync, maybe he’s a trust source. If you
give people metadata, which could be reflected in them making the same
purchases he makes getting higher prioritized than the purchases from somebody
you don’t trust and aren’t willing to couple your reputation to. That would be an
example of getting people to create metadata so their life, their feed, their stream is
more interesting than otherwise. That was C-to-C and I think that is about 5 years old,
or something like that.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 8
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
What has been happening in the last year or two is this C-to-W, customer-to-world,
when Tim O’Reilly, maybe five years ago, coined the term “web 2.0” and used it for
architecture of participation? I prefer the term architecture of sharing because
participating and sharing may be different. I think the sharing aspect is more
important for me than participating.
The key feature is the shift from private to public. I want to use two perspectives on
this. First, I want to use the perspective of the individual. Identity has become
socially constructed, more than ever. How much time do people spend in thinking
about the self marketing they do on their favorite social network? That’s how
people socially construct their reality. On the one hand, it’s really what they would
like to say; on the other hand, if they say something that isn’t true, some friends or
maybe not so close friends would quickly debunk that.
When we had Reid Hoffman in class, he made the point that the resumes that people
post on LinkedIn tend to be much closer to the truth than what they post on some drop
site, it made total sense to me. If I see that my friend claims to be something that he
wasn’t, the least I would do is send him an email and say, “Typo, or what happened?” It’s
social pressure or an alignment of this social reality, but yet allowing us to constantly
recreate it by which photos we put up – at Enrique’s birthday party last Saturday, the
question was, “Is this a good picture for Facebook?” I still have the same Facebook
picture I had up when I created my Facebook account, which shows that I don’t care all
that much about my picture. With that marketing, should I be in the picture with
somebody like Gary is in the picture with Mark Zuckerberg, is very interesting from the
social construction of self identity, in relationship to others.
From our homework, from our questions about how the notion of relationship has
changed, by potentially knowing whatever other people are doing, I want to put it
like this; who knows Robert Scoble? That’s a bad example. Let’s say Tim O’Reilly, who
has heard of Tim O’Reilly. Who is popular, Oprah? [Laughter] If you follow Oprah on
Twitter, you probably know more about Oprah than you know about your mom. Do
you know what your mom had for breakfast? Probably not, but you may know
what Oprah had for breakfast. That’s quite interesting, to see how our notion of
friendships and relationships have changed so much. That is the individual
perspective.
0:31:58.4
Now, let’s talk about the company perspective. On the one hand, there are
amazing new possibilities. I’m going to give you two. JetBlue listening in on Twitter,
when people use real time search, and that’s where real time actually is an
important element. They search for United or airline, or complaints, and having
agents who say, “We can help you out here; that wouldn’t have happened at
JetBlue.” The other example is BestBuy talking to customers who are unhappy
with competitors. The third example is Comcast, where people complaining about
Comcast get attended to by a special department at Comcast. Those are
opportunities where you can actually listen to customers, and market yourself to them
very differently from this broad, mass marketing that you are used to. Somebody just had
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 9
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
a problem with his refrigerator. He is more open to refrigerator buying than if you put a
beautiful GE ad in the newspaper.
There is risk, on the other hand, and a standard example is Kryptonite. There was a
funny video out there where somebody showed how to pick a Kryptonite lock. I see
people looking it up, now. Kryptonite decided the blogosphere had nothing to do
with them. They had a significant hit in revenues because people felt if someone
could pick my lock, why buy a Kryptonite lock.
Between those two, the CEO of Continental Airlines was told by his PR department
that he should look into this blogosphere thing. He told them, “Whoever wants to
come, I’ll treat them to a tour through the back offices here.” 280 people showed
up, bought their own ticket, and got the tour. That was a big PR effect.
The best example of all PR effects ever was the $1 million that Netflix never had to
pay out for the Netflix contest. That was C-to-W, the consumer-to-world and how
people post a blog entry about Kryptonite, the last element and on that I think we are not
here yet is how can we model situations.
Besides C-to-C and C-to-W; what models can we build of situation and modeling?
Women’s shopping behavior is significantly different from men’s shopping behavior. I just
had a phone call this morning with a company in the U.K. that does apparel
recommendations. Any study ever done, internally or externally, shows that women
shop differently from men.
One example, it turns out, is if men buy an expensive item like a car, they are done with
it. Ads for cars don’t get picked up. If women buy an expensive item like a car, it turns
out they go back and back and think about what other car they might have bought
instead. That was in a study that was done at the University of Michigan a few years
ago.
That is psychographics, knowing that it’s a person, the person is a male or a
female is certainly one element that Facebook and MySpace knows quite well.
That’s not situation. What is situation? Situation is trying to understand which
situation somebody is in. For instance, on the mobile, how is he moving or is it on a
mobile at all or is he on his PC at home? What is the bottleneck then?
0:35:53.2
I think attention is the bottleneck. How can we get the person’s attention and how
can we retain the person’s attention? The real time interactions triggered by
what’s happening at the moment is still not well understood. We have understood
the C-to-C, and the social part. We have said it’s pretty much out of control. I think we
are understanding the C-to-W by finding out what people are thinking and then hitting
them at that moment. Modeling the situation is certainly a machine learning
problem where richer data sources, and for me the mobile is the richest element of
creating data, will actually have some promise. We had a question on the survey of
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 10
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
whether you think if your mobile is not used the right way in marketing, how it would
influence your relationship to the company.
All of this is really about how the expectations of people are changing. There are
three revolutions we could say have happened, with the apologies to the French
people in the room, and Russian people etc.
The first one was the Industrial Revolution in which we could transport energy.
That meant that by transporting energy we could place the factories not necessarily
where the coal was or where the energy was, where the windmills or the water was, but
wherever other things were, where the resources were.
The Information Revolution is about transporting bits. That meant we no longer
had to be physically where the bit generation was. There was a famous war in
America, I forgot which one, but it had ended already and the people in the war didn’t
have word that an armistice or treaty had been reached. Does anyone know what that
war was? Perhaps it was the Civil War. In this war, people had already agreed that it
was over but people were still fighting because bits were tied to atoms.
What is the Social Data Revolution? What is it for you? What is now movable? Is it
reputation? Is it the creation that is the important part? Is it the distribution that is the
important part, or the interaction? What do you think? What is at the heart of it? We
already have the energy moving. We have the bits moving; bits move quite well out of
the classroom. What is really at the heart, for you; this is actually a question for you now.
What is at the heart – we have a Facebook page for Social Data Revolution, which is
actually super well going. We have all kinds of stuff. We’ve been talking about it for 3
times 8 hours. What is it, for you? To help you, I will talk for two more minutes and then
we will have a discussion for the last ten. I want to give two perspectives that might be
useful.
First, I am a physicist as you know; I got my PhD in physics. One of the most
important things in physics is time scales. Danny Kohneman at DLD said, “People
have time scales of seasons; of weeks, as in you should rest every seventh day; of days;
of hours.” Interestingly enough, the Greeks actually divided the time between sunrise
and sunset into a fixed number of intervals every day, as opposed to we who have a
more global time scale of hours. We have this really high frequency stuff of
extremely transient, real time stuff that we know what somebody did just a few
seconds ago.
0:40:12.1
When does real time matter? Real time is possible; there is no question about it.
When does it matter? It matters in the case where we want to hit somebody up
that is in a certain situation right then and there. It matters if we want to learn about
what’s cool, what’s hot, or whatever the right word is. More important than the real
time element, I think, is the feedback element, the product marketing element. We
can throw stuff out and see what’s coming back, the experimentation.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 11
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
The other way I want to look at time scales is biology versus technology. I don’t
know how old or [0:40:53.0 unclear] our DNA is, ten thousand? What’s a good estimate
here? I don’t know. How old are you, ten thousand years, one hundred, one thousand
years? I’m not good at these things. How old is our DNA. What’s the half width of DNA?
If you take my DNA, ten thousand years until half of it is swapped out? It’s a reasonable
sum. It should be more than a thousand. The people a thousand years ago looked
pretty similar to us.
Student:
It depends on how it’s preserved.
Andreas:
Let’s say it’s ten thousand years. What is the time scale of technology? It’s an order of
one year, whether you take Moore’s Law or the amount of data created this year, it’s
about one year. That’s about four orders of magnitude different. That’s damn interesting,
where we may be limited by biology and fool ourselves in the belief that we’re not limited.
Continuous partial attention is one of the examples where we are genuinely able to
embrace the technology. When I was coordinating with Enrique today about the food
for the reception, we were wondering how we would have done that fifteen years ago,
pre-mobile phones. How could I reach you if I couldn’t send you an email saying, “Here
is another idea”? That’s one way to think about it, about the Social Data Revolution is
about.
The other way is to think about economics. For me, I’m not an economist.
Economics means three things. One, we can measure things. Two, we focus on
measuring meaning like what the costs are, or the hidden costs that we have.
Three, we know what is scarce and what’s abundant. The economics of attention
means that we understand where things shift, things that used to be difficult to get
to, like the examples I gave you; they’re now abundant. What has become scarce
now? What are the bottlenecks?
For the sender, it’s become trivially easy to create stuff? No, creativity is still a
bottleneck. It’s become trivially easy to distribute stuff, C-to-W. For the recipient,
it’s pretty difficult to select and to consume. What I want to ask you is do you have
any thoughts about what you learned here, about this meta level, about data, about social
data, about the Social Data Revolution. I want to open up the last five or ten minutes for
discussion to learn from you. My question is what is it for you? What did you learn?
Where did the bits flip?
0:43:51.3
At the beginning of the quarter, I was telling you that my objective was to have you think
about things differently. My metric is how many bits flip per class for you. I want to know,
from you, and you can write it here, maybe you can write it on the You Stream or
whatever works. I want to understand what the bits that flipped were. How do you think
about things differently now than before? Is it privacy issues that we will talk about on
June 10? Where have you changed the way you think about the world, about your
identity, about what you share with the world, about relevance? Where did something
change for you?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 12
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
You talked about how you thought real time search really isn’t worth that much, but then
you talked about how important promotion was and knowing real time feedback. I was
wondering about the discrepancy in those two comments.
Andreas:
I didn’t want to over hype real time search. I gave some examples where knowing
what is good in a restaurant, for instance, if you are somewhere right now, right
there, that evening is certainly interesting. What is the opportunity cost? By
looking at stuff in real time, you don’t look at stuff in the broader time horizon. I
just want to make sure that we understand the tradeoffs, given that the day is twenty-four
hours and our attention is limited.
I didn’t want to put down real time search, in general; I just don’t think it’s the
solution to everything. By the way, real time search really has two elements to it.
There is the real time element, which Ray and I are not so sure if that’s the best thing
since sliced bread, but the search element is an important one. That makes real time
extend into the past. If you search for something on Twitter, which is not that frequent if
you search for your favorite rapper. The question is how do things change? If things
don’t change the real time information is probably useless.
At one startup, I was at ShockMarket. We realized that the statistics we had were just
too low. A super smart guy, J.C., built the [0:46:45.5 compilation] and the board, put the
pen down and said, “This is why I am now quitting.” A guy who went to GSB said, “Great!
New things every day.” We have [0:47:00.2 unclear] random fluctuations and people will
look at it every day. That shows the two world views, whether you’re interested in the
deep, underlying structure such as what do we learn about stocks, or whether
you’re interested in the random fluctuations. In finance you have both, people
constantly look at how tickers change, and other people are interested in seeing how
things change in the long term.
Your question was about real time search and its role. I don’t think we know its
role yet. I think like any technology, it’s not just a better or faster search; it’s a
different search. In generating content, which may get picked up by real time search
engines, they may change the way, like search engine optimization made a whole
cottage industry of companies that help people actually get better Google, Yahoo, etc.
rankings – not real time.
What other comments do you have? Do you think differently about recommendations?
Do you think differently about your own data? Do you think differently about what
companies should do with their data?
0:48:23.6
Matt:
I sort of got more excited about recommendation stuff. I think a couple of speakers who
came had interesting points; the amount that you can glean from really simple analysis
on a lot of data is almost as powerful as what you can glean with very advanced analysis.
There is definitely a dimension where … returns to what degree of analysis you do on
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 13
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
data. That is sort of encouraging in that it’s not that hard to play with stuff. I had a lot of
fun doing stuff with Twitter and Delicious and all that. Simple is good.
Andreas:
The fact is those data just didn’t exist five years ago, there was no Delicious.
There was no Twitter five years ago. There was no data to mine. The fact is it’s
still kind of difficult on the You Stream today, to get data. Part of this experiment
today was that Enrique thought it would be really interesting to see, and Ron did the work
as always, if we give people the chance to say stuff in real time, what is it that they
do say. It’s very difficult for me to concentrate at this end and at the same time watch
the screen, so I don’t know what they said. I’ll look at it later.
Those data sources – it’s probably just that we wonder how we lived without mobile
phones; that in a couple of years people will say, “How did we live without Twitter?”
Recommendations – many people get stuck in this because at Amazon all this was
done way before I was there, “People who bought X also bought Y,” and they think that
is state of the art. There is no social component to it. In some way, explaining the way,
you have this situation to do it but if you look at [0:50:11.2 unclear], that means that you
are in the high end camera range. But, there is no C-to-W component to it. There is
very little of a search component to it, all these things that we now have. There is
almost no behavior component, no gender component, and still it makes them 10%
to 20%.
Student:
I think what I got out of this class was when the LinkedIn people came, I thought that was
the best one. These short term... fooling around with the data got a lot of the short drives
and algorithms that … more long drive is the rare thing. It was taking this data mining
work and thinking of it as an active process, instead of just looking at the static data,
doing something, and not interacting with people.
Andreas:
Yes, the notion of data mining really has changed. I said in the first class that it
used to be given a set of data; what insights can you get. It’s now given a
problem; what data can you get. That’s what I said in the first class. What I’ve learned
in the last eight weeks is it’s not only that. It’s also how quickly you can actually
interact with the data. What questions can you ask? Thinking about it; PHAME
framework is a powerful framework and it’s not about data mining; it’s about doing
experiments.
0:51:51.9
For you, P stands for problem, what really is the problem. Hypothesis is if you
think about cognitive psych, maybe people value more something in the future,
maybe they value more now. M is for metrics; what are they? A is for the different
actions you could take, and E is for running the experiment. There is no explicit
data or data mining in the PHAME framework.
What you said, that it is interacting with data is one thing. It’s also interacting with
people, and that’s the experiment element. Nobody thought about that when they
talked about data mining. If you live in genetics, where things take a while to change,
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 14
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
you can’t do this. If you live on the web and in the mobile space, that’s where you
have these amazing opportunities to really interact with stuff on a very fast time
scale. For me, the interaction part is the most exciting one. That’s why I went to
Amazon.
Student:
One thing that really flipped for me was around instrumenting the world. It’s so easy to
get caught up on the data because it’s so exciting, but you’re ultimately bottlenecked by
how much you have instrumented. That was a good awareness point for me, particularly
around areas like healthcare, where there is so much that could be done but there isn’t
yet the instrumentation, necessarily, around all the things you would want, and the
records are piss poor. That’s one thing that jumped out for me. I enjoyed it.
Andreas:
So healthcare, for me, we all carry our mobile phones with us. Potentially, they could
measure an incredible amount, about how we walk, how we talk, how we feel, how we
interact with our friends, which probably have huge predictive character about whether
we’re going to have back problems, or whether we’re sleeping enough. I was preparing
class at 4:00 in the morning last night, with Ron. It is probably not good for our long term
health. Fitbit is one of the things we tried to get but unfortunately, they’re not ready.
Look at our mobile phones. I’m not sure; does Orange have any initiatives in the
healthcare world? Not just getting healthcare records together, but to use the phone as a
device for constantly collecting stuff?
Student:
[too far from mic]
Andreas:
That’s an interesting question because if your health insurance has access to those data,
or your car insurance knows your blood alcohol content before you get in the car, maybe
you’re not that comfortable having those data collected by your mobile.
Student:
Since you mentioned healthcare, I’m mentioning education. They aren’t allowed to use
social networks in the school. The bit that flipped for me was when you originally started
with the premise that optimization happens with the volume of use. I began to see that
potentially, sponsors that could solve some of the funding problems in education, by
sponsored content, could have a better relationship because it could have a monitored
relationship with education instead of being seen as evil. When you mentioned that
social networks create a different relationship with the sponsors, it opened a door for me
that said that potentially, we can solve education problems with sponsored content.
0:55:43.0
Andreas:
The notion about sponsoring stuff – due to a hundred years of being mass
marketed to, where marketing dollars were primarily spent on bad products; you
need to tell people the good things about the bad product. It’s probably carrying
over into education. I don’t know how much sharing is going on there. I would
expect that a lot of sharing is going on in the educational world. Google Docs etc.
is probably pretty standard in classes, now.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 15
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
We will continue at the reception, but before doing this I am going to acknowledge money
we received from John’s company, MySpace, which is paying for the reception - not only
for that, but also for an award. Having realized at the [0:56:42.2 unclear] conference, last
week, that video is a new medium that many people are using, here is what I am going to
spend $1,000 on. I ask you, and many of you have a mobile phone that is a camera or
something. It doesn’t matter what the quality is. I sent out the email, which hopefully by
now is reaching more people than before. Come up with some short, funny video that
you create. I am giving a lot of talks this year. If I use it at least once, I will give you $200
bucks, for the first five videos. That’s $1,000, 5 times $200. It should be somewhat
related to social data, the Social Data Revolution, to marketing, to the stuff which we’re
doing, recommendations. It doesn’t really matter to me what it is.
Maybe you want to interview your parent, or your parrot, or your RA or your prof. I don’t
know what. That’s really up to you. I will give you some examples here of a couple of
videos. I don’t want to influence you. Think about it. It doesn’t have to be in English.
One of my talks is in Milano. I want to get some cool stuff, maybe 30 seconds or a
minute. Quality is not primary. What do you think would be stuff you could use in a talk?
I suggest you upload this. There are 20 different video formats that Facebook takes.
You upload it to our Social Data Revolution site, and in order to give us 24 hours to look
at it before the last class, I need to ask you to do this within a week, actually, by Tuesday
next week at noon, so I can get a feeling about what’s happening. Please, do me the
favor; Facebook things tend to disappear at some stage, like my first videos did. Send an
email to this www.video.aweigend.gmail.com with a link. If it’s less than 10 MB you can
attach it so we have it in one space. That’s what I wanted to say about the $1,000. It’s
not much, but still, I think $200 for a 30 second video is not a bad hourly wage.
I have three things to announce. Thanks to MySpace, to [0:59:21.3 unclear] Fox. Thank
to Enrique who got the food. We now have food, some punch, and an equally attractive
non-alcoholic beverage. I was an RA when I was a grad student. It’s waiting for us in the
lounge in the Sequoia, in hall in the STATS department. If you invited some friends, they
will be there. If you still have some other friends you want to bring over, I think there
should be enough for all of us. I know 40 people RSVP’d on the Facebook page. It’s just
an opportunity for us to talk a little more, and for you to tell me more privately what you
think, what you liked, what you didn’t like.
If you have questions about homework, Ron is here. Ron does this totally voluntarily. He
is actually will not be in the last class, so I want to really say thank you, Ron, for having
helped out in so many ways. [Applause] You certainly have a lot of social capital
accrued here, from me, to help you with the startup, and probably from the students, as
well.
1:00:29.2
For the last class, privacy is the topic. We will have Cynthia Dwork, formerly from IBM
Research, and now with Microsoft Research. If you have time beforehand, you should
watch a half hour video of the IBM [1:00:44.3 unclear] 2008 workshop. I got to know her
a little bit. We shared a taxi in San Diego once, but that’s really the first time we talked
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 16
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
about stuff. She is extremely smart, a cryptologist by training, and understands about the
risks of sharing data.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 17
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_8_2summary_2009.06.01.doc