Download weigend_stanford2009_3data-1_2009.04.20

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas Weigend (www.weigend.com)
Data Mining and Electronic Business: The Social Data Revolution
STATS 252
April 20, 2009
Class 3 - Data: (1 of 2)
This transcript:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.mp3
Next Transcript: (Part 2 of 2)
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-2_2009.04.20.doc
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 1
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Welcome to class number three. I want to start by giving you a picture of what’s coming.
Reed Hoffman agreed to come in to class number six. As some of you might know, Reed
was Co-Founder of PayPal. Prior to that, he was an undergrad student and a grad
student, co-term, here at CSLI. He ran the speaker series in 1989-1990. He now runs a
company called LinkedIn, which is a professional networking site, basically the world
standard. He’s going to come to class with a guy who runs data mining at LinkedIn, for
class six, which is May 11th. If any of you are not on LinkedIn, familiarize yourself a little
bit with what’s happening. Keep the following question in mind.
We talked about the economics of data and the problem with LinkedIn is those
people who are eager to make contact with people, tend to be the ones who don’t
have that much to offer, like they want to get a job. On the other hand, those
people who have their plate full and have no time, are the ones they want to reach.
There is this intrinsic asymmetry that those who have the time may not be the
desirable ones, and those who don’t have the time are the ones people want to
reach. What do you think could be ways of how this fundamental asymmetry, this
imbalance could be addressed by LinkedIn?
Student:
I think having the notion of currency – you are interested in people being happy….
Andreas:
The problem is that those people who actually have the money are the ones who
build shields around themselves and those who don’t have the money don’t have
the money to actually get the currency. It has to be currency where we introduce
some artificial scarcities, like having one golden bullet message you could send a
week. We could have something like people ranking, not in a discriminatory way,
but where people could actually show the reputation – some kind of reputation
system. The question we would discuss then is what should we surface? For
instance, if you are a super reliable guy, you will get hit up by somebody and you always
respond within a day. Maybe we should actually show that information about you, so
people know he might not have the biggest influence in the universe, but at least he’s
going to respond to us.
I’m very happy and grateful to Reed. He’s a super busy guy who also invested in a whole
bunch of startups but he’s going to give us three hours of his time. I am very much
looking forward to that.
The week afterwards, we’ll have the CEO of Twine, Nova Spivak coming. He confirmed,
as well. Until then, we are among ourselves. Maybe we will get [0:02:51.9 Ida Marosen},
from Facebook, to show up for the next class or the class afterwards, to help us a little bit
with our metrics.
0:03:03.6
In terms of homework assignments, I will talk the second half of class today about
homework. I will get things in perspective here, which is further down. The main worry
you had is if you don’t get much traffic to your website for assignment two, part B, don’t
worry about that. The point is to show you what is simple and what is difficult to get to in
terms of data. Don’t worry if you don’t have more than you and your girlfriend visiting
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 2
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
your sight twice a day; the purpose is really to make it clear to you what is easily gotten
and what is very hard to get. That was the philosophy behind the second homework set.
It was not meant to be a very hard homework set. The first one was what can you get
by simply understanding what Google [0:03:46.4 unclear] provides you with. The
second one was how easy it is to set up something and what you can measure,
http, refer, and so forth. The third one was how do you use two off-the-shelf
things, Yahoo Pipes and Craig’s List to find places you want to live or whatever
your heart desires. Don’t worry too much about homework.
Homework 1, we will discuss after break, today. There is a lot of feedback. I love some
of the things I saw from you. I’m actually very excited about it. Before I show you that
excitement, let’s talk about what we’re doing in the first half of class.
The first half consists of three parts: about five minutes of a warm-up exercise
thought experiment that I announced at the end of the last class; a conversation
about data mining e-business, what are good data mining problems for ebusiness? What are their characteristics? In the last part before the break, I’m
going to talk about one problem in more detail. That will be one action people and
companies often take, namely, figuring out who they should give resources to.
The traditional, statistical term for that is “customer lifetime value”. I will construct
with you what customer lifetime value is for, and then deconstruct the current term
to reconstruct that takes into account the network we have between people, plus
the historical component in which we sometimes say we move from transaction
economics to relationship economics. That’s the plan for today.
Brian Knutson, a friend of mine who used to be a grad student with me in Psych and is
now a professor, he does these wonderful ephemeris experiments, where he finds out
that you show somebody a stimulus and within seconds you measure what gets
activated, the oxygen content in his brain, and the rest is history; the rest is mechanics. If
they like Pepsi over Coke, at that moment, they will always buy Pepsi over Coke.
I want to ask you; if you had devices, like I am, wired up here with all kinds of
devices, which recorded everything conceivably possible right now – even if somebody
walks by and you decide to turn your head toward that person, or if your concentration
slacks off. What I am saying here is if you could measure those things, how would
your behavior change, if at all? If we put you with a set of stuff around yourself, it
might be heavy but we won’t worry about that. We recorded it and backed it up in
some salt mine, how would your behavior change?
0:06:34.9

Case one is if only you, with maybe having some signs of lift, like a retina that is
still alive or something, some blood pulsing or DNA samples, whatever it might
need to show that it’s really you alive.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 3
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics

Case two is after your death, maybe other people in your will have been given
access to this data.

Case three is it would be password protected. You could give Enrique the
password and he can check out who makes your head turn.

Case four would be that law enforcement agencies and other friends would be
able to get into those data.
How would your actions change if everything you did was recorded? What would
you do differently? Would it not change? Maybe you’re so used to having
everything recorded that it makes no difference.
Student:
I would think I would do things less spontaneously.
Andreas:
For how long, for a day, two days?
Student:
No, it might be … what’s being recorded….
Andreas:
Okay, what do other people think? Would there be things you wouldn’t be doing or doing
differently? Would you take shorter showers, longer showers?
Student:
I would probably be aware of things the first few days, but then I think I would
adapt to it, just like having to wear eyeglasses. I would probably once a day or
once a week, I would look at parameters that they record… have this device alert
me while I’m performing them so I can actually change my behavior.
Andreas:
So, self insight would be something of interest to you, knowing that it’s not only
seven minutes a day that you spend on Facebook, but actually seven hours a day.
Student:
Yeah, I would essentially have this – it’s actually a program I’ve started writing. I would
have this thing telling me to stop, or jotting this down. If it were recording my daily
actions and it were to actually analyze my daily actions and not just web pages that I
view, I would also want it…. For example, let’s say that I don’t want to talk in a loud
voice in public places. It would detect that I’m talking very loudly and I’m getting
too loud; it would actually give me input and tell me to lower my voice or
something like that.
0:09:40.3
Andreas:
I heard that in Israel they now have banned phones at funerals. They thought it was not
the right thing for people to be talking very loudly there. Is this true?
Student:
I’m very sensitive to people talking too loudly, I guess, because I’m from Israel.
Student:
I think it would also depend on who can see what action that you take. If you had
control over who sees a particular action, for example, you don’t want your boss to
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 4
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
know that you did something stupid on Facebook…. If you could control those
permissions at a fine grained level, you wouldn’t have to lose that spontaneity.
Andreas:
What about if it got indexed? I think one dimension that makes a huge difference
is not just whether it’s recorded or not, but whether, with a few keystrokes you can
find and align all those moments where you did something. Is that where the main
value or danger lies, or where do you think the value of recording versus the value
of indexing and making it searchable.
Student:
indexing that is available, is that what you mean?
Andreas:
If you did video indexing or [0:11:05.6 unclear] indexing, if knowing when that part of your
brain is firing. It would be interesting, whenever certain people walked by…
Student:
I think if two out a hundred people do it versus ninety out of hundred people do it, they
create different results. Two people doing it … increase awareness, attention,
intention, so that it becomes the isolated… in that case. Whereas, if many people
out of hundreds start doing it, that becomes the norm. There is less attention…
habits within that crowd….
Andreas:
That’s an interesting one. In Germany, in the mid 1980’s, [0:11:54.0 unclear] era, there
was a big movement against passports that could be read by machine. People felt that if
we could have our passports or ID cards machine read, then that would give the police
too much power compared to if they had to manually write down name after name.
Now, the German passport has my RFID chip in it so if I would just walk in a shopping
mall and I don’t have it protected, other people can read out my fingerprint. In the last
twenty or thirty years, things have really changed a lot in terms of what people find
acceptable. 9/11 has also had its part in that, the percentage of people doing it and
creating norms, de facto norms, which people then adhere to.
Student:
I think indexing makes it a lot easier to run ad hoc… show me every time such and such
happens. If it’s not indexed, assuming you’re still able to… that will require you to say,
“What are the really interesting things I am curious about in my life,” and I think that could
work to decrease abuse of data mining, where you can’t make… periods…. You could
say, “I want to know how I am spending my time,” and it would shift queries
towards interesting things better.
0:13:22.3
Andreas:
Who of you would be interested – and I have no contact myself, but it’s not hard to get it
– the company in San Francisco that I mentioned before called Fitbit. Who would be
interested in hitting them up to see if they are willing to give all of us a $99 device that we
would be willing to carry around for a couple of weeks to see what sort of data on us we
can draw out of it? It’s not a class assignment. I could do it myself, but if one of you
were willing to take it… Would you want to do it together? Talk to them, make the
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 5
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
contact and I will be happy to show up and go with you if that helps. If it hurts, you can
do it yourself.
For them, they have eighty some-odd smart students helping to debug their device before
it’s released into the market. It’s a great deal for them. For us, it would be interesting to
have a device to collect stuff. It’s not easy to find a lot of information about them on the
web, so I think the right way is to try to contact them. If you need help, I’ll help you. If
you don’t need help, maybe you could come with fifty to eighty devices in the next class
and that would be great.
That was the warm-up exercise. Note that nowhere in this discussion did any
business aspect come up. I was curious about that. Nobody said somebody
would be able to provide services to me, somebody would be able to sell me stuff,
somebody would be able to find me matches, or to suggest people I should be
talking to. That was interesting; the e-business aspect, the money opportunities
were not primarily on your mind here. It was more the worry about having it
indexed and who would see it and what would happen to if after we die.
I now want to spend the next ten minutes or so talking with you about some of the key
traditional data mining problems that we have in e-business. One problem that
everybody always mentioned was the problem of recommendations. The way I
want to phrase these problems is what is scarce, and what’s abundant.
Recommendations, recommender systems have the property that your attention is
scarce and a company, such as Amazon.com or Netflix, is trying to give you
something in return for your attention, so eventually you will buy or find useful.
We will have a class in about four weeks, two weeks after Reed, where I will talk solely
about recommender systems. In the first class, I said recommender systems make
between 20% and in some cases 50% of the revenues of e-business companies. It is a
super important ingredient. I am advising a couple of companies in that space.
0:16:26.1
What other problems can you think of, besides recommender systems in the traditional
sense of recommending products to people? What other data mining problems; put Jeff
Bezos’ hat on. Jeff Bezos has three hats. One hat is the guy who sells books, i.e. he
has a retail store. His second hat is the guy who has a platform that enables others to
sell stuff. The third hat is an amazing technologist who just changed the world by
providing cloud computing and so forth. Right now, just take the third of Jeff where he is
the one who runs an e-business company, Amazon.com, as most of you know it, as retail
customers.
When I was Chief Scientist at Amazon.com, what data mining problems do you
think I would have been grappling with?
Student:
A system that gives actions that are relevant to a user. Instead of recommending
something, just having … for instance, buying tickets for this week’s …
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 6
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Cross-selling, basically that given that you always buy certain mp3s, the artist is in your
neighborhood and Amazon has to know what your neighborhood, otherwise how would
they be sending physical items to you. That artist is going to play so, “How about a 5%
on the tickets?” It’s interesting; more money is apparently made with merchandise sold in
relation to concerts, more so than on the tickets, I’ve heard.
What other problems? Can you think about if you were to run Amazon, you would
get all the data in the universe, every click, every call to the call center?
Student:
…
Andreas:
In [0:18:32.8 Double E] here… you know very well the distinction between prediction
and control. Prediction is a key ingredient. I say this is going to happen but the
real money is being made by taking actions in response to the prediction. Steven
Boyd has this wonderful example. He teaches [0:18:59.9 unclear] information. He says,
“What are the most expensive double-integer… ever computed?” Any idea? The most
expensive, like dollars per bit ever computed.
Student:
…
Andreas:
That’s in the right direction, what other ideas do you have? Think more commercial than
NASA. That’s the right direction.
Student:
GPS
Andreas:
No, it is, according to Steve…
Student:
…
Andreas:
Nothing compared to his example, which is the coefficients for the airbus controller.
There are about a hundred parameters and he makes the argument that virtually
hundreds of millions of dollars went into each of these coefficients because if you get
those wrong, the cost function is pretty high. Some numbers take a lot of money to be
computed. That is only the controller aspects.
The point I want to make is prediction is good. Control is better. If you can predict
something is happening and you don’t know what action to take, you are not in as
good of shape as you would be if you actually know what action to take, in order to
influence what the person is doing.
0:20:39.4
Let me give you an example to foreshadow what I will be doing before break, regarding
customer lifetime value. It’s kind of interesting if we know that on average we will
probably make from each one of you, with this [0:20:52.3 error…], so and so dollars.
Wouldn’t it be more interesting if we knew what we needed to do in order to shift you up a
little bit?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 7
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
The prediction problem would just be that this person is going to put $200, on
average, into Amazon’s pockets. Much more powerful than that would be if we
actually show him a certain thing, give him a certain offer, or whatever it takes, we
could get him to not give $200 to Amazon but $300 to Amazon.com. The mindset I
think is more powerful – we’ve come a long way from this old world of having
boring reports that nobody looks at, to a new world of having a fast-paced action,
trying out stuff and seeing how we can actually move people in the right direction.
That was my comment on prediction.
Student:
I guess…
Andreas:
Willingness to pay is an interesting one. We already talked about Ron Howard’s
example of how much I would sell my t-shirt for. People have no incentive to ever reveal
and tell you their willingness to pay. If you say, “Do you want to buy my mobile, old …
$10,” but how can we incent them to honestly tell us what they’re willing to pay? That’s
where we move to experiments.
I can give you some experiments where we compared stuff and varied prices;
where we had some set of items that were similar to another set of items, and
these we discounted %10 and those we discounted 20% and we would see that of
course we would sell more of the 20% discounted ones, but taking into account the
profit margin, do we actually make more money from selling three times as many
at a 20% discount, versus a given number with the 10% discount?
Student:
…
Andreas:
Do you mean how you message things?
Student:
Yes, if you give a 10% versus a 20% discount… at full price…
Student:
The personal …
Andreas:
Who actually believes that when they read “30%” that it is really 30% off? Nobody, of
course. I think it’s about the psychology. I am actually not sure; it’s one of the questions
where I don’t know whether the world has changed or not, whether with the transparency
that we are creating on the web, the price that one company touts is 30% and the other
one is 50% off happen to all be the same price, but just indexed to a different, nonrelevant baseline. I don’t know if those old tricks still work or not. I do know we can
simply do experiments where we compare two different price points and we see the
conversion rate times the profit margin to discover which of them was bigger. That is
how companies do adjust the prices; it’s purely experimental.
0:24:27.5
Note, no information from competitors needs to be looked at. When I talk about this
people say, “We need to spy on the competitors.” No, we just do it in an androgynous
way. Of course, we could break into the competitor’s computer systems, but that doesn’t
last that long. I asked my friend if he would be willing to add one of my friends as Oprah
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 8
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
following him at Twitter, but he didn’t think that would be cool, although I’m sure it would
have gotten my friend a lot of traffic. You heard that Oprah is not on Twitter with
something like 300,000 or 400,000 followers within a day – old media meets new media.
The example here is what can you do androgynously, what can you do internally at
the company, and certainly varying prices you can do. There is some rule that you
shouldn’t discriminate against people. That means one country, one time, one
price. Time is finally [0:25:21.7 unclear] and nobody is saying that at 7:00 and for
24:30:20 versus 24:30:21 people should get the same price. Are there any other
problems here? What are problems in business?
Student:
I think I have a Netflix example. Maybe we could also incorporate the weather…? …
certain area… might want to stop more that type of … warmer weather…
Andreas:
That’s an example of a hypothesis, which may or may not be right, but it allows us
to come up with different ideas, with different actions. Then we run the PHAME
framework, where the problem is to rent out more movies, the hypothesis is that
the weather matters. The action is we stock this one versus that one. The metrics
are how many movies did we rent? The experiment is to do an A/B test.
By the way, [0:26:32.0 unclear] did a study, perhaps ten years ago, and showed that
whether there is sun in New York, on a given day or not, has a positive correlation with
the stock prices going up. He looked at the delta, day after day, of I think, SNP 500
NASDAQ, and found that if there was a sunny day, people were more optimistic about it.
There are precedents for that.
Student:
… risk of being … try to mine … space of consumers… too narrowly … what you want.
0:24:27.5
Andreas:
I think one dimension that has been missing so far, and what you brought in is the
dimension that if you actually ask people they might actually tell you something.
All we did here was pretty much in the area of sniffing the digital exhaust. Based
on what people have to give you, we just figure out what we can do based on that.
That would be about recommendations. I was really surprised, because we talked
about sharing data and why we would share data, but nowhere in the discussion
did you actually say to give people the chance to go from a CRM, customer
relationship management to CMR, which is customer managed relationships,
allowing people to say what they are interested in. That dimension of providing
pipes for people, of bidirectional communication, of a two-way communication,
that they can tell you what they’re interested in; that is something I believe is a
deep, fundamental change in business and will have been changed forever.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Page 9
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
… if sometimes, when you’re given a recommendation and it’s too focused, and I see
how wrong or off that recommendation is, I lose credibility with the system, it loses
credibility in my mind. With the Amazon recommendations you get at the bottom, are
they relevant to you guys? How many of you have purchased something from Amazon
and they….
Andreas:
Let me ask another one. Who has not purchased… would you mind expanding on why
you have not purchased?
Student:
… I know exactly what I want. I’ve done the research on it before, so I just buy, and I
usually don’t even look at the recommendations.
Andreas:
I think all of us are influenced easily. You are the minority, actually. Who else hasn’t?
Student:
My buying patterns are I don’t buy the same things over and over. I have very specific
purchases when I go in to buy something.
Student:
I think the recommendations are also moved out of the context … looking for a
particular… recommended on that subject, it makes sense that Amazon, at some point…
on that subject it makes sense, but then Amazon at some point has things like… out of
context recommendations. I don’t think they’re relevant.
Andreas;
So, actually, in some way we need to really break down the many places the
recommendations show up on Amazon.com. There is the blog stuff, which is
pretty random. On the other hand, for people who bought this item they also
bought those items, that is something that is super useful; you might buy a spare
battery or whatever it is, by helping, through the user behavior, to support new
people in their decision-making process.
I think that in many cases, you are not quite sure what battery might go with that laptop,
but if 87% of the people who clicked on this item also bought that item, then you can be
sure that 87% of the people can’t be stupid.
Student:
I think it’s interesting, for the recommending the items that are…
Student:
I was wondering why… cross promotions… services at Amazon. If they see I have a
trend of looking at … and cloud-computing related literature and they’re watching the
service, it’s pretty safe to say they’re probably interested in an Amazon PC2 service. If
I’m starting to buy, because they’re main service is businesses… books by categories
and now they see there is a new category returning to my purchase history. For
example, I started looking at things about cooking but I never looked at books about
cooking. They can offer me products from the kitchen section. I’ve never seen that.
0:32:00.4
Andreas:
Just to spend two minutes on making sure you know what we’re talking about, could we
kindly get the camera on top of the paper here? The problem we’re talking about here
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 10
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
is typically called the cold-start problem. The basics work like this: there is a cosomething matrix, say a co-purchasing matrix, where we have an “item I” and then
“item J”. This cell here gets incremented by one when people within a certain time
period, “Delta T,” which is 24 hours, buy both item I and item J. This one here gets
incremented by one, as any person, we don’t know whether it’s you or you, buys
those two items.
Now, if another person comes and looks at item I, we normalize all the possible
items to J here; that’s about a fifty million times a fifty million matrix. It’s very
sparce. Then we see there are some peaks here, and naively, those would be the
items we would recommend. I have two remarks.
One, if you do this, then items that are popular for everybody will be the ones that
are recommended. You just bought that 64 GB SD card. People who bought this
card also bought Harry Potter. While true, it’s not exactly what you’re looking for.
Like everything, whether it’s search or anything else, it’s a bag of tricks. The trick
here is that you want to normalize things.
You normalize them and then you have the relative probability that this item J is of
interest, and not having whatever you do, buy Harry Potter, or whatever the
popular item is right now.
Student:
It’s like TF-IDF.
Andreas:
Exactly, term frequency – inverse document frequency, I was wondering whether I
should use it. In speech you use term frequency – inverse document frequency.
Basically, you normalize by what is in the corpus. If one term appears quite often
in a document, but not often in the corpus, it’s more relevant than if a term appears
frequently in both the corpus and the specific document.
Amazon has its own version, which is Statistically Improbable Phrases (SIPs),
which is another way of using the same idea of how does this document, or this item
differ from the corpus? TF-IDF is the way it was first studied, twenty or thirty years ago.
That is the basic idea. The second point I wanted to make, besides the
normalization segment is that of course, you have this sort of positive feedback
loop. Once you recommend something, people are more likely to click on it. To
get back to your example, if people are interested in cloud computing, you need to
actually seed it in some way, with stuff to build tunnels or bridges from one area to
another area. That’s not easy. Cross-selling is actually a hard problem. Partly
because the statistics are so low.
0:35:52.7
If you think about it; how many people who buy a camera also buy a battery versus how
many people who buy a certain book then actually buy PC2/EC2 services? That’s not a
trivial one and it needs human intervention. You try things out, use text, and if people
click on it you are doing the right thing.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 11
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
The framework, traditionally as you know it, it comes by many words. I use the
term “item-by-item” filtering. The basis here are individual items. It was extended
in a number of very good ways. My favorite one of those is not “people who
bought X also bought Y,” because that’s quite a rare event, that you buy things
together. It was “people who looked at X eventually bought Y”. That is a beautiful
example of how the clicks of individuals, (in an aggregate way, within a 24 hour
window, no persistent ID behind that), allows Amazon to support people in their
decision-making process.
Now, what is the genius in that idea? It is that somebody really thought about the
problem very deeply. I don’t know who come up with the idea anymore. Somebody
thought about it and said, “We have this machinery; let’s not just create data only
by co-clicking behavior, not only by co-purchasing behavior, but by crossing
clicking and downstream purchasing.”
Those are the ideas that really make the difference. I deeply appreciate people who
actually come up with those things. It’s a very simple framework. It’s a few lines of code.
The last few years, I did the Delicious recommender system. I’m still talking with a
couple of companies about who would give us data this year so we can write a live
recommender system. It’s not very complicated with the Python implementation in the
Collective Intelligence book, you basically just run it. The idea is deep here; that’s a
simple click behavior thing.
Student:
Netflix has a prize for recommending movies… is that because renting movies is much
harder…
Andreas:
The big difference is that movies are about ranking, 1, 2, 3, 4, … You associate a
certain rating or ranking, or judgment with an individual item, versus at Amazon, it
is based on clicking. These are very different problems. In one case, you have the
user go through the movies and say, “I like this one, 5 stars; I like this one, 4
start,” and so forth. Amazon has a much lower threshold. These are very distinct
problems. What I told you right now is totally different for some algorithm where
you actually rank items.
The second remark is that million dollars spent is the most brilliant PR coup, ever
done in the universe. I think the day it was done, about 50 people emailed me about it,
“Absolutely brilliant.” New York Times talks about it but they haven’t spent the money
yet. If everybody tries, they have about 1,000 groups. I know the team that won last
year, two Austrian guys I met with. The end result is pretty much random. You might as
well draw numbers. But, from a PR perspective, it was absolutely amazing. That is the
true genius of that idea.
0:39:52.5
This one here has a different intellectual quality from saying…
Student:
When you did the people that bought x bought y, is that truly endpoint to endpoint, or did
they actually look at different paths?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 12
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Paths are actually damn hard. It really [0:40:17.8 unclear] centric kills you. A path
means that you don’t only have two point correlations, as we call them in physics,
like here, but that you actually look at path dependence or hysteresis or whichever
department you’re in, you have different words for that.
If you have fifty million items, [0:40:39.0 unclear] kills you basically, at the three-point
correlation. One interesting question here is how do you treat time? There are two
time scales involved. One is the delta in the number of clicks between looking at
an item and then eventually buying an item. There, not knowing what to do, you
might as well collapse them all into within a session of 24 hours.
You could try to be smart and say, “Hey, maybe it is e-n/m, where n is the number
of clicks… or somewhat decay things. That is a hypothesis, in the PHAME
framework. The problem is what is the right experiential decay or non-experiential
decay? Maybe it’s a [0:41:25.0 unclear]. Then, hypotheses are things decay
exponentially. The actions are let’s compare flat box card, within the 24 hours,
versus exponential decay and experiential as no memory is the only one you can
easily implement. Then, you just measure. You have metrics. Are people going to
buy? Are people going to come back? Are people clicking on the item? Then you
on to experiment. I’m not sure you will actually learn much more. I think the
brilliant thing was to have that idea and then the implementation. In this case, and
fortunately as in many cases, it is as long as it is fast, reliable, doesn’t break,
doesn’t collapse, it probably doesn’t make a big difference which one you use.
I now want to pick one specific example. Let me ask you; who of you has heard of
this thing called customer lifetime value? About have of the class. For me, the
question always is why do people compute stuff. Let me start with a story here.
About three years ago, I was in an office in Hamburg. My mobile phone rang. Singapore
Airlines called so I thought I might need to reconfirm a flight. They said, “Is this Professor
Weigend?” I said, “Yes, what can I do for you?” “We would like to know what the right
coefficient is.” I replied, “Good question, the right coefficient for what?” They said, “For
customer lifetime value.” I told them “.73” and they said, “Thank you so much, that’s
great. When you’re in Singapore next, please come by and we’ll treat you for lunch.” I
said, “Is that it?” “Yes, thank you so much for helping.” I asked them if they would
consider anything else, but they said, “We were told that you know customer lifetime
value and we should give you a call. Thank you so much for telling us what the
coefficient is.”
0:43:33.0
I actually went there when I was next in Singapore, and I actually talked with those
people. We tried to really reframe the problem and ask, “What is the problem you
are trying to solve? What do you mean by customer lifetime value? What can you
try to get people to do?” There is the frequent flyer program. Is that what you will
actually try to get people to buy more tickets, more full price tickets, to bring their
family, to get the company? There are so many possible actions you might want to
get people to do. Don’t just reduce it to one number. I’m not sure whether they’re
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 13
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
still using .73. I’m not sure if they’re using the frequent flyer program or how the
customer lifetime-value calculation is doing for you.
What I want to say here is that traditionally, it goes back a long time. It solved the
following problem: If we have a certain scarcity – that could be sending people
mailings, or it could be giving them upgrades on an airline, or giving them
replacements when they call into a call center and say an item wasn’t working.
These are all examples of scarcities.
Who should we distribute them to? Of course, as customers, we would all love free
upgrades on every single flight we take. It wouldn’t work for the airline this way. They
would be out of business so it ultimately wouldn’t work for us, either. That is one of the
justifications of why people move to this, why they do customer lifetime value; in
order to make decisions.
Other examples would be that you might be interested in lead generation. You get
leads on people who might be interested in buying a pool or whatever people are buying
in California; who should you actually call? As you come in the morning, here is your list
of customers and you want to have it listed in some way. You can do it in alphabetical
order, or you can have some model behind it that tries to tell you what it is that you might
be getting out of them.
When we talk about value, customer lifetime value, or when we talk about costs,
one thing I really want you to take away at the end of this course is that costs and
values are much more than what most people consider. I want to hear some
examples from you about costs you think are not typically considered in you getting
mailings, or some values you think might not be considered in the typical sense. That’s
an important discussion for me.
Student:
What is the client going to say about the company.
Andrea:
Here we have the word of mouth example. If somebody gets treated poorly, they
might actually have a voice in this day and age of democratization of media. It might be
much more damaging to the company to have shut off that one client than they ever
thought because he’s going to blog about it; she’s going to talk about it with friends;
they’re going to start a group, for instance “Dell Sucks”; the effect is much higher than if
they had given that person satisfaction.
0:47:10.4
There is a book by Jeff Jarvis called What Would Google Do? It came out about two
months ago and it’s a very good book. It’s precisely about the word of mouth case of
Dell Sucks and Dells’ response of Ideastorm and other ways of trying to re-engage
with the community, trying to get them back, as opposed to trying to be painted as
“they don’t do anything for us.” That would be an example for the word of mouth
you spoke about.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 14
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
I’ll repeat the question so we’re on the same page here: What are costs that
companies typically don’t consider, and what are values that companies or
individuals typically don’t consider?
Student:
I’d say morale of the company if you get a lot of complaints. It could actually affect
the morale of the company.
Andreas;
How would you measure that; pictures of people and see how many smile when they
walk in each morning, productivity, voice pitch, how fast people walk – are they running
up the stairs or just slogging down them; how long to they spend in the cafeteria?
Student:
How late they stay at work.
Student:
Do analysis on their G-Chat.
Student:
Voice pitch
Andreas:
Employee attention – that was a great example; how long people stay at the company.
Student:
You could just ask them.
Andreas:
Gee, that’s a good idea, and they might tell you, particularly if you ask them
anonymously; you might find out amazing things. I was once in a session with a
CEO of a company where he thought he would just ask the people what they were feeling
about him. I know I went on a walk with him afterwards and told him, “You have a
day to fix this. If you don’t fix this within a day, you will totally lose credibility. If
you ask them and they tell you, unambiguously, what to do, it is a very finite
window of time you have to fix it.”
What other values; think about you as an individual, as a user. Think about the costs you
incur. Think about the costs the company may not know about.
Student:
I think there is a huge cost of frustration…
Andreas:
For years, I used to show this movie of somebody eventually banging on his computer
and then shoving it off his cube. That is sort of the cost – in this case you know because
you have to buy him a new monitor. In most cases, the frustration we all have, by not
finding something of by something being broken, damnit, and we know it’s not our
problem because we can reproduce it, but who would you tell and how do we fix
it? What do you think can be done about this? How can we channel this in?
0:50:20.2
Student:
I think, going back to asking people, there is a big difference between someone asking
you and having a computer asking you…. The person is very likely to help in the future if
you give her an answer. The computer is much less likely. There has to be a clever way
to prove to people that it’s worth their while to answer.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 15
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
… crash, how many of you send a report back to Apple… 20% of the class? What else
could you do besides just send a report? All of you who don’t’ send a report, why don’t
you send a report?
Student:
It doesn’t change.
Student:
… those error reports don’t capture the state, they don’t capture the context.
Student:
I think sending reports to the actual software… crashes, it sends reports to Mac, who
then says there is a problem with our OS, … API. If you send them to Microsoft and say
… messed up… again, that might be more….
Student:
I think the other reason why you wouldn’t send a report is what you get back.
Andreas:
Right, and I think it is if you at least knew that someone is reading it. I’m still
working, more work than I ever thought, on getting this Google forum up, not getting the
forum up but getting the questions up. One thing I promised you is that one of us will
actually read every single answer you are giving to us. We mean it. I will probably read
all of them. Enrique will probably read all of them, as well. That makes a big difference
for you to think about it a little bit if you know that somebody will be reading it as opposed
to [0:52:23.0 unclear] on the best case.
Student:
… their users in terms of how they improve their software based on the feedback they
get. They tell you within each piece what they’ve changed, and they tell you through the
feedback they wrote that within the next two or three versions they will act upon it.
Andreas:
Although the companies can’t promise to act on every feedback they get. As we just saw
with the airlines, we would all love to get free upgrades on every single flight, but it’s not
possible from an economic perspective. It has to be very carefully and clearly
communicated from the company, where is it that you can make sure somebody
reads it, where somebody actually makes sure you get an answer. These are two
very different worlds.
Student:
I think your question was very generic. I am not sure – in terms of costs, there are
products that actually are hazardous and consumers are not aware of this. For example,
this is an … costs…. A lot of emotions in the cost… in particular, airlines companies.
They are completely unaware of the costs to some people refusing to fly, or losing
baggage. It can be terrible if you are missing… or something very important. There is no
way to collect these data, but if it were possible, it would be amazing from the respect to
what they could do about it.
0:54:14.1
Andreas:
This is actually a very good example you are using. If you are American Airlines,
and it’s JFK in the winter, and the airport is closed for a day. Now, you have the
first flight out. Who do you let on that flight? You can’t just magically create new
aircrafts. These are good questions. Should you take that guy who has a blog and
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 16
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
who will create American Airlines Sucks, afterwards, waving his book that he
already wrote about Dell. Would you let him on the flight? Those are good
questions about finite resources.
Let’s come up with some more costs. One cost that I’m passionate about is the
cost of interrupt. I think people, and a lot of studies have shown, people really
underestimate the cost of interrupt. I think I told you in the first class, Danny
Kahneman asked me, “Okay, Andreas, I’ve talked to you for four hours; can you please
turn this notification in Outlook off? That’s the deal we had. I would talk to you for four
hours and you would turn the notifications off.” The cost of interrupt is something many
people know they shouldn’t permanently [0:55:17.1 unclear] their email. Who of us hasn’t
been caught ourselves in a meeting, instead of listening to what the interesting person
has to say, just texting under the table?
Student:
Maybe building on that, attention overload?
Andreas:
Attention overload or information overload, yes. You are so quiet in that corner over
there. You can hear me okay?
Any costs there? Any long term costs versus short term benefits?
Enrique:
… how many of you actually know what all the rules that go into something organic? You
know everything that goes into something being organic, or you just look at the label and
trust that it’s organic? How many of you know all the regulations?
Student:
Not every single one.
Enrique:
Two people?
Student:
… there are things we … in an organic store.
Andreas:
Fair trade is another example. There is a company called GoodGuide.com, which
in a wikified way, tries to have people actually contribute to what all the good stuff
and bad stuff is in your drink. For instance, did you know that drink has as much sugar
as eleven sugar cubes?
Student:
This one has no sugar at all.
Andreas:
25 grams and it’s two servings in there, so it’s like 50 grams of sugar. How many grams
does it have?
Student:
I don’t know, it says “no extra sugar.”
0:57:11.6
Andreas:
No extra sugar, of course. It’s a good example of having community create the data for
you. By itself, what’s the agency problem? Of course, the company makes the juice
here, has no incentive to point out to you that maybe after 6:00 in the evening you
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 17
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
shouldn’t drink this juice anymore because it will just make you fat. The body turns sugar
into fat after 6:00; that’s what the rules are.
Some other people might make a comparison saying, “This is labeled this way, this
is labeled that way; the serving size is different,” and so on. That’s a good example
of having a community create more awareness of the cost. A good example there
is about lotions you put on your skin, there is a huge discrepancy in quality and
some of it you are better not putting on your skin, apparently.
Student:
Would you consider the cost of misinformation? Companies make mistakes all the
time, putting wrong phone numbers on things, telling a customer – airlines will tell you the
plane might come, but they’ll tell you information that may not be – they could just
be making mistakes or they could not really…. Companies struggle with that all the
time. The cost … but I don’t know how to quantify that.
Andreas:
I think we have to distinguish. Last time I was at O’Hare, in Chicago, there was a
United flight that just wasn’t leaving. Finally, it turned out that a crew didn’t arrive. They
have 100 or 200 people sitting there for hours because they couldn’t ship four people
who serve us the lousy drinks, the flight attendants. Once you know these things, do you
communicate to the customers and say, “Dear Customer, sorry you’ve been waiting for
three hours, but we couldn’t find four people to serve you drinks. Would some of you
maybe volunteer to help out here? Then we can leave.” Or, do they not know?
I think in many cases of information problems, the poor people at the gate may not
know. They may be told every ten minutes that the plane is going to leave in ten
minutes. It’s unclear where the problem is but if I was going to run that problem, I would
make it damn clear that I understand where the root cause of the problem is and
then go and solve it; I just can’t believe there aren’t four flight attendants that can’t be
found at O’Hare, for United Airlines, within three hours. What do I know?
Student:
Too much complexity.
Andreas:
We talked a little bit about decision making. It is so interesting; I told you in the
first class, and it totally flopped. Let me try this again. One of the beliefs I have is
that when you talk about core costs of complexity, and I said it this way and I’ll say
it this way again and hopefully you can make more sense of it now; people don’t
know what they want. I remember in the first class you said, “What are you talking
about? Of course we know what we want.”
1:00:26.0
Ultimately, we don’t know what we want. If you go to a restaurant, and they have
twenty dishes, just bring me something reasonably healthy and good, and I’m good. In
contract to “Would you like to have combination A) double prime, with….?” Who knows.
Why do people create this complexity? There, going back to misinformation, an
interesting point is people look for comparisons.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 18
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
I gave you the example of The Economist, of the different prices. Banks, and airlines
are particularly good in shifting your attention to irrelevant stuff. There is the big
14% APR, but we’re not talking about this, “You get one free flight every seven years if
the moons are aligned.” Compared to their competitors, they only give you a free flight
every eight years.
Shifting attention of people in this complex world, to the irrelevant stuff, is
something where we can all help by making movement, as Enrique always calls it,
to get people to actually focus on the relevant stuff. GoodGuide is a good example
there, where they try to focus you on the stuff that matters, and not on the color of
the label.
As far as complexity, there is way more information out there. There is a cost of
acquiring information. In the cost of decision making, I think it is up to us to create
a world where we actually surface relevant information, as opposed to drown
people in irrelevant information. Banks area beautiful examples of that.
Amazon actually doesn’t do that. Amazon really tries to give you the information
you want. Google doesn’t do that because Google’s incentives are aligned with
yours, of Google shows you the relevant stuff, Google is better off because you’re
more likely to click. If Google tries to do some sort of back door deal with somebody,
moving [1:02:19.8 unclear] so about Yelp; do you know Yelp, the community site? “Well,
young man, for a certain fee I can help you out with that review. If you want to click now,
the review has disappeared.” I don’t know if it’s really true or not, but that’s what I hear.
Your reputation is very quickly gone. Do any of you speak German here? [1:02:45.7
German phrase] “Once your reputation is gone, you live without any problems.”
I want to push through SDR and customer lifetime value. The typical, old
computation of customer lifetime value – and we did establish right now that it’s
about resource. I loved your airline example of the flight being grounded and who do
you give the seats to. In the old world, you went through the digital traces, the
digital exhaust, the garbage and stuff like this. Traditionally, it was transactions.
In the new world, we share data. We understand how much influence somebody
has by being a blogger, by having created Dell hell. This social notion means that
what we left companies with involuntarily, we now do much more of this
voluntarily, knowingly and voluntarily supplying them with data.
1:03:50.7
Private becomes public. This is one of the deep insights of what’s happening right
now. Formerly private data has not become public data. Before I take the company
perspective, I want to take the customer perspective. What is it that people, us as
individuals, take into account when we make decisions? Before we go to
customer lifetime value and what they should be considering, let’s look at us and
how we actually make our decisions.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 19
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
When we talk about recommender systems, I totally missed you saying friends
before. It’s a factor of five, eight, maybe even ten that friends’ recommendations
are more powerful than company recommendations. Any study you look up
comes up with a number of between five and ten.
Who do we trust? We trust our friends and our peers. Friends are people I know
personally, and peers are people who have similar social status, live in a similar
area or whatever it is. That is how the customer decision making has changed.
I would argue that customers now compute their CLV, “company lifetime value” of
“Am I going to take the Visa card, or am I going to take an American Express card?, and
which is the 73 American Express cards being offered do I pick?” “Will my favorite airline
still be in business in a year or two from now? How do I discount it if now, the roundtrip
flight between San Diego and Los Angeles will be 300,000 miles?”
Student:
How do experts play into this whole equation? Do we still trust them?
Andreas:
Yes, I took them out. Who trusts experts? I think the question is where do we trust
experts. Do we trust experts when it’s about finding which airline you should take if you
could fly business class somewhere? Of course not. You go to Flatseats.com where you
have honest people reporting about the flights they took and how the seats were.
Where do we trust experts; in electronics, computers?
Student:
Maybe when it’s critical, health issues?
Andreas:
Health is always interesting. Health, maybe legal?
Student:
Cars, like with Consumer Reports.
Andreas:
Consumer Reports is the largest publication in the United States.
Student:
… way to do both… experts have better… best people …
Andreas:
I wouldn’t necessarily call that an expert. There used to be, but I don’t think any of you
know what I’m talking about right now, there used to be a profession called travel agents.
These were people who have offices. I know I had one in [1:07:38.1 unclear] Park. You
would come in, talk with them, and they would say, “You should go to Costa Rica,” or
where ever. I think that pretty much is extinct now.
1:07:47.8
The value that Expedia etc. provides by having customer reviews – they’re very
interesting business models – really exceeds what that girl sitting there and happened to
read about Costa Rica the night before because her boyfriends said, “Let’s go to Costa
Rica,”; she says, “Costa Rica would be right for you. There is this wonderful hotel with a
sea view, and a breeze that comes in the evening and the palm trees…” I think people
were fooling themselves.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 20
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
I think people were fooling themselves. Yes, we were cutting down complexity but I’m
not sure if we were better off in those days. That’s an example where experts really don’t
matter. Ultimately, I don’t care for an expert telling me whether I would like that hotel; I
care about people like me telling me whether they liked that hotel or not.
Booking.com has this brilliant conditioning that you pick who you are; are you a
single guy, a family with children, and then they show you people like you, conditioned on
your status, and the hotels they like. I think that is the right way to go because what poor
expert can check out all the hotels in the world?
Student:
There has to be a place for both.
Andreas:
There are areas for both, clearly. I think travel is a beautiful example where Booking.com
just eats the lunch of all the other travel agents.
Student:
My point is… might have friends who have been there once before, to one place, but
there is a friend of a friend who has gone there for years. I have to look at that person.
Andreas:
Right, but we still have three categories: friends, people we know or friends of
friends; then there are peers, where do Stanford students like to go during Spring
break to party, even if you don’t know them. The third one is experts, the travel
organizers who tell you where you should be going. I don’t think that in the travel
space, experts still play much of a role.
Student:
I’ll go with them.
Andreas:
All right, to Costa Rica?
Student:
I’ve been to Costa Rica too, but I also want to travel because I want a certain
underground or non-touristy experience. I want to talk to some local people or somebody
who really knows what’s going on and not do something…
Student:
Especially someone … more likely to talk about it in the first place, even though you’re
more excited about it. That linked together, there is a very good chance…
Student:
I don’t consider those professional experts.
Andreas:
These are peers. Thank you for taking sides with me. [laughter] Another couple of quick
points?
Student:
Financial planners, financial planning…
1:10:40.5
Andreas:
These are different things. Accountants are necessary. I think financial planning, I’m
not sure whether those people still have the right of existence here, after what happened
the last couple of years. Trust me; just hang in there and some day it will be good again.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 21
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
Certain books are still better than …
Andreas:
I was in Buenos Aires for Christmas with Daniel [1:11:05.1 unclear name], who is my
accountant. He went to Stanford when I was here. His sister, Sarah [1:11:09.4 unclear
name], wrote The [1:11:09.5 unclear title] from Buenos Aires. I have to say, compared to
everything else I read on the web, that was our bible. I am sort of neutral; I think it
depends on the area.
Student:
I think the overwhelming sensation that I was getting from this is that people look
to experts for things they don’t have any experience in. If you’re a doctor and you
feel sick, you might not need to talk to a doctor about it. If you know nothing about
medicine and you’re feeling sick, then you might want to go to talk to a doctor. How do
we make a distinction…?
Student:
… notion of curation… a whole experience… it’s not just a single point that … expert that
someone’s curating a particular experience…
Andreas:
There is a site on health information aggregation. I have the note at home; I didn’t bring
it. There is apparently a site on symptoms. We just talked about it at our last TA
meeting. There is some company that tries to go from symptoms to actually building
models by people entering the symptoms they have. I can get it for you, or if anybody
knows about it… not the traditional way, but when people enter their symptoms and then
through statistics or the law of large numbers, it comes out that we have much finer
granularity and better predictions, earlier predictions.
Student:
…use that as a precursor… what’s happening with them…
Andreas:
One good example we talked briefly about before, and one of your graders, Ryan, works
at 23andWe, where once you had your DNA analyzed, you answer questions such as
how do you experience bitter, how many beers does it take for your nose to turn red, and
stuff like this. They try to related this to your DNA snippets.
How is customer lifetime value affected by the data we’re collecting? It is in two
ways. One is across time and the other is across the network. Customer lifetime
value traditionally you look at how much money they gave us last year and we
discount it some way in the future and that’s all we did.
We ignore what our relationship is; we assume there is no relationship we can
influence, and we totally ignore the point you made, that you might have friends
who you try to influence in a positive or negative way, depending on the negative
experience you had.
1:14:46.2
The talk with Chuck in San Francisco, I called up that number 1-800-call-chuck and said,
“I would like to talk to Chuck.” Chuck didn’t answer. I really felt misled. [laughter]
Equally well, Fidelity, and they said, “Here is a conversation with a fund manager.” I had
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 22
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
a conversation with somebody I had never met before – unbelievable. People essentially
tell you stuff that is not true. I call this fake relations, which are just pretending.
What actually is a relationship If we talk here about how we want to go from
transaction economics to relationship economics, we need to know what a
relationship is. What kind of relationship do you want to have with a bank? What kind
of relationship do you want with your stock broker, in general? What do you expect from
that? You would like them to know you, what your interests are, what your values are.
When you once told them something, you don’t want to have to tell them again. What
other things? What are the properties of relationship here?
Student:
Reciprocity
Andreas:
Ah, reciprocity, he does your taxes, you do his?
Student:
… otherwise, if I don’t do … I will not feel as good. People tend to … the reciprocity.
Andreas:
This is interesting. The last point I’m making here is that the question is of cost.
Are we done? You write her a check. Is that all you do for her? What are the costs? It
goes back to this. What do you truly want in a relationship? Data is at the heart of
it.
For me, one of the key discussions I had at Amazon was should we actually allow
people to erase past purchases. “I don’t want that toothbrush to be on my record.”
The toothbrush brakes the next day; “Amazon.com, how can I help you?” “I bought that
toothbrush.” “What toothbrush?” People need to understand that once they nuke it,
it’s gone. That’s an important element there.
I think the data is at the heart of the relationship. They get you to give them data
and then they do good things with it, whatever that means in a specific example.
Having to re-enter data, for instance, is the death of any relationship. If that girl
asked you for the third time, “What was your name, again?” it is probably not going to be
a good start.
Enrique:
On the note of reciprocity, does anybody use any type of tool or technology – you
owe somebody something. You bought somebody lunch and you want to keep track of it,
or tips – do you pay for more tips these days – kind of a social capital that we… some
kind of tool.
Andreas:
What tool do you use for that?
Student:
…
1:17:57.5
Andreas:
How do you spell that? BUXFOR. What’s it called, Socialfly? I should keep track that
you bought bagels and pizza? That’s what it’s for?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 23
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Social:
…
Andreas:
What other tools?
Social:
…
Andreas:
I use Outlook it says “Andreas owes lunch to John Lee.” That’s my way. I look up a
phone number and invite him. What was the other tool here? BillMonk.
Student:
There’s also … and …
Andreas:
This is about social capital. We talked about the relationship across time. We
talked about how truly a data-centered approach, of getting people to give you data
and then doing the right thing with those data – in this generality, across companies –
from airlines to shoe shine shops is all we can say here. One recommendation I would
give to companies.
The second one is generalizing customer lifetime value across the networks. If
you want a C-to-C perspective, there the fact is that social recommendations are
super important. Friends, peers, versus anonymous and I am happy to add
experts here so you don’t feel not heard. The traditional customer lifetime value
ignores this most important source. Why? Because it didn’t have access to the
data. What’s different now? People voluntarily and knowingly share data about
themselves, their relationships, and that is where this concept we deconstructed in
the beginning as pretty useless is now reconstructed and reconstituted.
For me, at the heart and truly the heart of this class here is it’s all about
communication. It’s lightweight communication that I don’t have to write a letter to
United Airlines, “Dear Customer Service representative…” It’s one button like
GetSatisfaction.com, “I have that problem, too,” or “I’m going to comment on your page, “
or “I’m using Facebook Connect to comment on Enrique’s blog.”
There is no latency to the time scale aspect. I don’t have to wait for days, weeks,
months until I see whether things have been resolved. I see it right away.
It ultimately is about symmetrical relationships. We are all aware that we can’t want
everything for every company. It won’t work Understanding that what is called
“idiosyncratic fit”, what do I value, versus what do I not value – not what does the
company want me to value. Think about it; people fly 50,000 miles to have a glass of
champagne, which is worth $7 as opposed to $5. It’s pathetic.
1:21:34.8
It’s companies that put their systems on us. At the heart, we have ability, through
lightweight communication, through bidirectional communication. Not only
communication between company and customer, but between customer and
customer. Observing those communications, such as on Facebook, who talks about
lousy phone coverage on campus, for instance? I have to put my phone into the window
in my office for it to ring here in the STATS department; versus, what conversation do
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 24
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
people have. That is really the future of customer lifetime value, to understand the data
that are flowing between individuals.
To summarize: historically, it was current value [1:22:26.8 experiential/exponential]
decayed, nothing fancy about this. We then realized that other sources are more
important. One is your own source, over time, because you get influenced by the
company doing the right thing for you. That is what I mean by we move from
customer transaction economics to customer relationship economics.
Secondly, and more in line with your first assignment here, look at the network.
For looking at the network, what we have there is of course that we understand
how influential you are, who you can bring. We look at second order of
phenomena; of those people you could bring, what do they actually do? Do they
do the desired things, or are they all frosters?
One company I still haven’t decided whether I want to bring them in, is Rapleaf,
which gives companies access to who your friends are. If Ron has really good
friends that say, “Yeah, he’s cool,” and “Don’t worry about his friends,” if all of his friends
are frosters then you have that insurance claim there and I’m not sure; maybe we should
use some more of our sparce resources for investing insurance claims, to look into yours.
That is what I wanted to say. At the heart; voluntarily, knowingly sharing data can
actually lead to new constructs of customer relationship value, of customer
network value. Not to be forgotten, it’s also just as much the customer who makes
the decision about the companies he or she wants to have a relationship with.
We have seventeen minutes until 4:00. What we will do in the last hour, today, is in a
very tight schedule, starting at 4:00 sharp, we will have group number 12. Number 12
hasn’t entered themselves on the wiki, yet. There are eleven groups, plus a twelfth group
that hasn’t entered the wiki. Each group gets exactly five minutes. To be fair, I really
have to watch that. You present what it is that your homework number one is about.
1:24:34.7
Enrique and I will probably give you quick feedback, and I want to make sure we get
through this so that everybody in the class has a feeling about what exciting ideas others
are working on. I think it is a good use of time to have one hour here we just, in rapid fire,
go through everything you have done. Everybody here is able to stand up for one minute
to say what they’re doing. The reason I didn’t want to tell you about it before hand is I
wanted to avoid you over preparing and making PowerPoint slides here, and getting all
worried about it. We’ll stick to the schedule. Luckily, it’s twelve groups so it’s five
minutes each. We will start exactly at 4:00.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 25
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_3data-1_2009.04.20.doc