Download weigend_stanford2009_5facebook-1_2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas Weigend (www.weigend.com)
Data Mining and Electronic Business: The Social Data Revolution
STATS 252
May 4, 2009
Class 5 Facebook: (Part 1 of 2)
This transcript:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.mp3
Next Transcript: (Part 2 of 2):
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-2_2009.05.04.doc
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 1
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Welcome to Class 5 on Data Mining and E-Business. Today, in the first half, we will wrap
up some loose ends. I will give you some feedback on the homework. We then will have
Itamar here, from Facebook, who is willing to answer any questions you might have.
In the second half, we will have a more formal talk by him, as well as by Eric Sun, who
took the class last year. Eric is talking about Gazuntheit, which is how people infect each
other. Maybe that’s not the right word to use for Facebook; how you get new users and
whether you should try to find out who the influencers are or directly go and try to find out
who you can influence.
What I put on the agenda for today is that we will focus a bit on some of the learnings
in e-business, namely marketing. There, I want to start with segmentation. Those
of you who have done statistics know that there is a notion of clustering. That is
the idea that you find groups of data where the distance between data points
within the data cluster, distance in some loosely defined way is smaller than the
distance between the clusters.
A lot of people have made a lot of money by telling companies that clustering their user
base according to old data like demographics and cycle-graphics, who their users are.
Let me give you an example, if you know that 70% of the people on your site are female,
what action to you derive from that? You have a site, you are selling something, and now
you find that 70% of the people on your site are female. Is it actionable? You are
nodding your head. How?
Student:
…
Andreas:
What he is saying is if you assume you don’t have ways – and marketing is often about
getting new people to the site – if you have no way of actually changing the
constituencies and that is all you have, then you might need to have more content
that is interesting for females. But, the question a marketer would have is if I
spend $1, will I have a higher chance to spend money if I spent that $1 on another
female? Or, should I target those underrepresented males? Those are the
questions people traditionally couldn’t answer because they couldn’t experiment.
Furthermore, in many cases, gender might not be the key variable that describes your
interest in a given point of time.
0:03:26.4
When we think about segmentation, it’s very important to realize that two groups
of people use this term very differently. Segmentation Sub 1, if you will, is the
segmentation based on demographics, cyclegraphics, based on variables people
could buy from Axiom, for instance, and variables that don’t change much – your
zip code, how often do people move? That segmentation is pretty dead. Why is it
dead? Compared to alternatives of not spending a dollar trying to find out which age
somebody is, but trying to find out, given their click behavior, what they are actually
interested in, you can get much more leverage and much better targeting to people if you
actually use a much richer data set.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 2
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Now the second way the word segmentation is used is basically to do any
conditioning to whatever variable seems to make sense. That is more alive than it
has ever been. Finding out what good variables are, testing hypotheses, and then
actually playing to those in a PHAME framework is the segmentation that is very
powerful.
When people in traditional companies come to you and ask you about segmentation, be
careful. Try to understand what they really have in mind. If it is that they went to school
ten years ago and they learned about demographic data, Prism has 60, 70, 80 different
groups of people, soccer moms, etc.; it’s probably not as powerful as what you can
leverage if you leverage the data that people give in the social data revolution.
That’s one thing I wanted to make clear here. Related to one of the points you made
in the homework, which was that you would be interested in Google Trends to actually
understand what are the sub-networks or domains of the countries where different
things happen. That’s a good example to form hypotheses, but that’s only one
step. The real proof in the pudding is to run experiments and to figure out, from
the experiments, what works and what doesn’t work.
The other thing I wanted to do is show you a table I made here. Many of you probably
will be asked, as you are interning somewhere in the summer, “What should my
company actually learn from this? How does it affect us?” This is my
segmentation here. Here are three groups of people you might have that
conversation with.
The first group is the top executives, the people who run the company. Believe me,
unless they are geniuses like Jeff Bezos or my friend who runs Swiss, they have no idea
what you’re talking about when you’re talking about data; they have no idea. Don’t think
they share the enthusiasm we have for data mining. What you need to tell them is that
it’s cheap, it works, and in these hard times, it’s one avenue you should take.
What do I mean by it’s cheap? Here are some examples of what you can do.
Engage with somebody who, on Twitter, is criticizing your company. For instance, if you
are Jet Blue and somebody has a negative tweet about United, talk to them. See what
effect that has. I bet you that person will be very impressed if he complains about
something from one airline, United, and somebody from Jet Blue says, “I think I can help
you out here.” First of all, they listened to it, and secondly they are probably
offering something. That’s very different from these mass marketing and targeting
segmentations. People tell you what they think. You just have to go there and
listen.
0:08:15.3
Top executives often are worried about big capital expenditures, yet another $100
thousand system. Tell them they should just hire you for an intern for a few hundred
bucks a day and you would be willing to help them out with that.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 3
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
What other things do you think those people might be receptive to, small things that don’t
cost much but that have a big effect?
Student:
… expensive bottle necks…
Andreas:
But in the context of what we do in class, that is a true statement, but in the context of
class?
Student:
You could map… bottleneck of people are hitting a buy page but not buying…
Andreas:
That is probably not something the CEO will be excited about. What can you tell
him that he will spend two hours next weekend on? That’s always a question I
have. If you manage to do that, getting him to go to Twitter or whatever it is for
him, and search for United Airlines. Contact that person. That is something that is
very finite. Are there any other examples, as opposed to big website analysis?
He should just use Technorati or some blog search engine, get a feeling about
what 50 or 100 comments people are saying about us. He gets filtered information
from his PR department or whoever talks to him and doesn’t want to get fired. He can
just go there and listen to what’s out there.
I found that works extremely well, if you give people very concrete, small things
that take a couple of hours and zero dollars; they will actually do it. They will go to
the company on Monday and they will tell them how things need to change.
Besides that, the core difference is between online and offline. Many people, such
as airlines, still think about their world as a transaction business. They should
think that due to persistent identity that we have online, such as Facebook
Connect, due to persistent identity it’s no longer a transaction business but it’s a
relationship business. That means their decisions will often be different because you
want to not only take the present transaction into account but you want to take the future
into account, as well.
Maybe the first point should be having them understand that experimentation is
super cheap. They are used to working with agencies. Agencies or consultants get a
fraction of the information across, they misunderstand half of it, and they may outsource
this. It is a long and noisy feedback loop where most of it gets lost.
If they do an experiment and if you think about it, what’s really needed to do experiments
online? I’m not talking about offline; it’s very little. If you have some ideas, try them out.
It’s a world that most of these people don’t realize just how cheap and easy it is.
0:11:47.1
For instance what you said about identifying bottlenecks or seeing conversions is so
much harder than just trying out something and measuring the effect. Of course,
metrics is an important thing to understand. When I say metrics, I don’t mean just
quarterly KPIs which they need to have their CFO sign off. I mean event-driven,
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 4
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
minute-by-minute, variables that tell you a lot of stuff, and we have to use our
intelligence to figure out which ones matter, how they correlate and stuff like that.
Student:
Isn’t part of the problem … have metrics. In the second part, you’re assuming you have
metrics already, so … given metrics, it’s not that hard to find bottlenecks. The difficult
thing is how you get the right data.
Andreas:
I think there are a number of reasons why people resist. One is typical statistics
education. It’s not what you need to deal with large databases. They might hire
[0:12:58.5 unclear] statistician from Stanford and they throw the data at them and the
person doesn’t know how to deal with large amounts of data. That’s one problem.
The second problem is that what we learned was optimized, what we as a
discipline optimized for a world where data was expensive, where experiments
were expensive. When you wanted to breed better rabbits, even rabbits [0:13:24.3
cough] to breed. When you wanted to create better rice, more resistant, all these things
had intrinsic time scales.
It’s not so with most of the experiments we can do. What we optimized for, making
the most out of every subject we could potentially get into the door, is not a problem if
you have a million people coming every day. It’s no longer a problem of statistical
significance. When people ask you that, you know that they have never thought about
the problem. It’s about relevance; what are the relevant questions? That’s actually
hard to teach in a standard course.
Thirdly, people might be afraid that as they get stuff clearly measured, it might
show that what they always believed to be working or the agency told them would
be working might actually not be working. In general, for a company it’s a good thing
to create transparency. For the individual who just invested their money based on the
usual paradigm, half of all marketing dollars, but we don’t know which of them. Do you
know that saying? Fifty percent of all marketing works, but we don’t know which fifty
percent.
Now, if you can tell them these things, maybe they will even be out of a job. People
haven’t learned the skills, don’t have the mindset, may be worried. What other problems
do you think make it difficult for companies to do this? It’s not a technological issue.
The reason I gave you Google Analytics was to see how truly easy it is. They
might think it’s a big problem. Here is one example for that.
0:15:08.1
BestBuy wanted to have a wiki so employees could do this web 2.0 thing. They
went to their favorite technology provider and they said they could have five stores in half
a year on a wiki. That would cost about a million dollars, as a prototype. Now they have
Geek Squad, about thirty thousand guys who fix PCs. Geek Squad said, “If fly us out
for a nice dinner to headquarters, we’ll do it over the weekend.”
Banks are probably different. I wouldn’t want my bank to run on a wiki, but for most
things, we don’t need that level of triple checking. Yes, they believe it’s expensive to
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 5
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
build simple things. They haven’t realized that cloud computing, storage, all of this stuff
is basically free. It’s just gluing it together. It won’t work perfectly, but at least you have
something we can get some insights with.
However, of course the CTO hates that message. The CTO hasn’t’ realized the people
he is employing might have used Google Docs in high school, and now they’re pushed
into a Windows 97 environment where certain forms have to be filled out, but please,
make so no parenthesis around the area codes in phone numbers and stuff like that.
How pathetic is that? How long does it take to parse out the number and let people add
dashes or whatever they want to add?
What other reasons do you think these people have that doesn’t immediately make them
buy into it?
Student:
…
Andreas:
All the cash cows are kind of rare these days. Many cash cows suddenly disappear once
you stick to hard to them. The real estate market is a good example where why pay 6%,
3%+3% if you can actually have a piece of code to help you out?
There is clearly the fear in every single conversation I’ve had at that level, of “The
loss of control.” The only answer to this is, “What are you talking about? You
already lost control. People are talking about you, even if you are not listening.
Your choice not to control or not to control; it’s do you want to know what they’re saying
and then maybe actually change your product, or will you pretend that nobody is talking
about you?” That’s a very powerful argument there, and that’s why a couple of things
you said to me were to spend a couple of hours and see what people are saying about
your product. It’s nothing fancy; if you know how to use a browser, and most people do,
then you’re in good shape.
I don’t want to put you down here. These are valid points. I just want to debunk what
people think because you’re not the only one who thinks this. Anybody else?
Student:
…
Andreas:
Do you think it takes a bigger leap of faith to do that over the coupons? I could argue
with coupons. People would have bought this stuff anyway. It’s called the “stockpiling”
problem. Of course, they will use your coupons if you give them, but otherwise, you
would not have to give them that $10 off.
Student:
…worked for.. .the summer. They had a wonderful database…
0:19:58.6
Student:
…
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 6
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
I have no luck with my Internet connection here. At Berkeley, I made a joke about do
they have Internet there, and it didn’t go down well. I think I’ll refrain here. Okay, I want
to show you stuff.
That is the top level. The second one is marketing people. They need to
understand how the traditional essence of marketing, such as product, placement,
pricing, and promotions, has absolutely and fundamentally changed in the last few
years. For instance, take product and product marketing. The feedback loops were
long. They brought people into the office for focus groups and all of that stuff. Now, we
do experiments in real time. We actually see what people are saying. It’s a dramatic
shift and very few marketing people have realized that impacts on what they need to
do. They still think about optimizing the one message and trying to reach as many
people with that message. This is opposed to figuring out what people actually
want and then giving it to the individual. This contrast of hitting up somebody who
criticizes United and then helping him out as Jet Blue is so different from coming up with
nicer brochures and sending them out.
I’m not sure whether for you that is already so natural that you don’t even think about it
anymore, but I guarantee that most people who are top marketing executives and
companies don’t yet live in that world.
They want to know should we go and do and all these questions but they have no idea
what is promising and what isn’t. What do you think about Second Life? Should they be
on Second Life? How would you answer that question?
Student:
I think it was a fad for a while but hasn’t quite turned out to be what…
Andreas:
I think the press loved Second Life when it came out. I know these stories, “Our
employees are required to wear suits on Second Life.” I am not convinced. I think it is
not what people are looking for when they really want to have interactions. Ultimately
people are interested I people and not in some avatars. That doesn’t mean you can’t
make a lot of money by having virtual items in virtual worlds.
The next point is we now can systematically capture the irrationalities in decision
making. You try to tell people things that might be more or less true about your product.
It was the product, [0:24:31.4 unclear] marketing. Understand how people actually
make decisions and support them in their decision making process and help them
form the traps that served us as well, when we lived in caves, but might not serve us well
anymore, like our love for options. We like so many options, even more options, but we
will never get anything done. We actually never buy the product. That behavioral
economics perspective, behavioral decision making perspective is very important.
0:25:02.0
Recommender systems, as you know they make on the order of 20% of Amazon’s
revenues. That’s very hard to get if you try to do anything else. Having them
understand how recommender systems have shifted from product based to people
based, social recommendations, and situation based, which you couldn’t measure
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 7
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
before because you didn’t know how somebody uses the catalog. The average time
a catalog gets looked at in the United States is 14 seconds. That’s a lot of trees for 14
seconds. Having them understand that recommender systems are not some
random thing for techies, but can make a big difference for them.
Surveys – I’m trying to show you the survey I put up. Do you have Internet access
today? I’m getting suspicious that neither of my two computers have access. If I have
you type a URL from my email, would you be willing to plug your computer in so I can
show you? Just come up and read off my email. This would be the form.
I think recommender systems are so key and the other thing they need to
understand is that a fifth P has arrived. That P is the P of platforms; you don’t
make your money by just having a better product, but by trying to get on the
platform.
These are marketing people; I find they’re much more difficult to deal with than
executives and CEOs. They grew up in a different world, in terms of what they consider
constance. They’re interested in sniffing the digital exhaust and all that stuff, as
opposed to just having the big picture in mind. Did any of you work for a marketing
company or in a marketing department at Dell?
Student:
I had a marketing…
Andreas:
At Dell, it’s interesting that the people do Ideastorm. It’s a very different part of the
organization, from the traditional people who actually do the marketing and send out
emails, which is another organizational issue; people haven’t understood that this is
actually one and the same thing.
One thing I sometimes say is, “Everybody is a marketer.” Not only are the
employees marketers, but every customer is now a marketer, too. That mind shift
from coming up with the most important message and your target customer; having
persona research talk about John and so on; to just being pragmatic and saying,
“Everybody is a marketer.”
How do you talk to these people? You talk to them and say that Amazon actually fired its
marketing department in 2003. There is no traditional marketing person there. Amazon
stopped buying stuff from other companies, like [0:29:35.2 Comsco]; why should we
spend $100 thousand on data from other companies that showed us how many
customers we had? When I saw the first report and was told how much it cost, that was
pretty much the end of that report.
HTD [0:29:50.3 referral]; you can figure out where people are coming from looking at
data.
0:30:02.4
Pricing is also very interesting, how pricing is changing on an individual level.
There are actually implications that we don’t understand yet for pricing.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 8
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Finally, this instrumenting the world – it’s not only instrumenting the site but
instrumenting the world. Billboards, for instance, have cameras that try to figure out
whether you’re walking to them and what’s your likelihood of looking at them. I worked
with a company that had in-store displays. We tried to change, for the sake of simplicity,
now we’re showing ads for Pepsi. Then you know how many people there are. You can
measure the warm bodies there. You really just needed an infrared sensor. You could
tell how many people had seen it and then you know at the cash register down the road,
whether it had any effect. Instrumenting the world is more than just instrumenting your
website.
The last group is every company has a few people who are sort of visionaries, who
want to know where it’s going. For them, I think it is the social data revolution,
trying to figure out what is really out there, and how they could use it in their
business. The fact that communication costs have dropped, what can they give back to
their users in response to their users giving them data? That’s a conversation to engage
them in.
I’m not a good organizational guy. I don’t talk about changed agents, but I do have a fair
bit of experience of knowing what works with these three groups of people. If you get it
wrong, I think your chances of succeeding in changing people there are greatly
diminished.
There are a couple of remarks on the homework. I’m talking about homework two. The
purpose, again, of homework two was to take away any fears you might have had that it’s
difficult to get to data.
The first one was to look at Google Trends. There were very nice remarks. You found
good ideas here about how it helps with wording. Criticism, of course, and I totally agree
with this – why don’t we have a higher granularity than just day. For instance, peoples’
behavior in the evening could be very different from their behavior during work. However,
even the United States, with has three time zones, some person’s evening might not be
another person’s evening. Things quickly become more complicated. I told you in the
very first class, when we showed the number of clicks per session at amazon.com, that
because we poorly defined a session, people in Japan cut their sessions in the middle
because Japan is twelve hours different from us.
0:32:53.6
Query refinement – a good point; it’s often very interesting to see this error of time. In
the decision making process, or in the query refinement process, how do people
actually learn? If you can capture that, and make it easier for people to see what
other people who started with the same problem and then show them where they
actually ended up. Why do you think Google doesn’t give us that information? Or
should I ask you why does Google give us any information, at all, like Google Trends?
What’s in it for Google? It depends on where you’re coming from.
Student:
… information because they want us to … make more money…
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 9
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
In general, that’s also an answer, but specifically here giving you related words is a
brilliant move. I might not have thought about word XYZ which happens to be
related, but if I bid on that as well, Google makes more money. But, more
specifically here, why would they give us Google Trends?
Student:
… Google is cool, they let us play with their data, which from a public – how people think
of the company, and it’s a really good thing…. I think the trends tells you … exposure on
your site and don’t have enough… makes sense … without that trend information… as to
who is going to your site.
Andreas:
You’re talking Google Analytics right now; I’m talking about more of the trends that you
take a certain word, “swine flu” and then see how that takes off. It’s more indirect, I think,
than that you think about your site. Maybe I don’t know how people actually use it.
Student:
… If everybody knows that Google is a one stop shop…
Andreas:
I think it’s always a little tricky to try to figure out what people could use against you, like
how could you empower a new competitor by giving them data. Amazon is also super
worried about giving out data because in retrospect, and I have to say Jeff was right, you
never know if you’re dealing with a hundred million people; one or two people could
probably think of something that not even Jeff Bezos thought about. If you release data,
you could be in trouble. As I said, in the example of Netflix, where Cynthia Dwork
showed that if you have at least 8 movies rated, those anonymous people, 98% can be
identified.
A very good remark was the device. It’s very interesting to see from which device
people are querying. Does this change over time? I personally think, being
interested in mobile, that I would like to see some of that.
Itamar:
… Related to the Google Trends question… there is a product similar to Google
Trends on Facebook, called Lexicon. It’s called Lexicon Pro, and it allows you to
see the frequency of words in utterances on Facebook and messages and wall
posts and comments. I think the point you made about improving one’s brand is a
good point, but in the case of Facebook, there’s another reason to make this
public. It allows people like our advertisers to see that Facebook actually provides
a very good signal. When stuff is happening in the news, like swine flu, people are
actually talking about it. When there’s an advertisement that is heavily invested so
people are seeing an ad for a movie that is coming out soon, the talk about that movie is
actually increasing. It’s a demonstration that there is real social effect on the site. I
think the same can be said for Google.
0:37:40.2
Andreas:
I don’t know about swine flu on Facebook, so it’s very interesting how people pick up
these things. Google Analytics – I like the group that compared it to Yahoo website
entitles. Who was that. Do you want to make a couple of points here? Why don’t we
share the microphone, otherwise, remote people are always in bad shape.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 10
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
My name is [0:38:17.1 unclear]. I was comparing Google Analytics and Yahoo
Analytics. Google offers so much rich features in their metrics and dimensions
reporting. Their power is that they are custom reporting. However, if you compare
it to Yahoo Analytics, Yahoo Analytics is geared towards micro sites and it is very
low latency, so in a couple of seconds you can get feedback, sooner than Google
Analytics because Google Analytics you may have to wait 24 hours to get your
data. I was commenting about the easiness of the interface. Yahoo Analytics style
is like a web 1.0 thing. I don’t know if you have used it.
Andreas:
I haven’t. By the way, we do Google Analytics – there is somebody who had pipes, so
it’s not that I’m pushing one company versus another.
Student:
Yahoo Analytics is from a company called IndexTools. They bought IndexTools and they
put stuff…
Andreas:
Tell me what Google Analytics does.
Student:
It’s pretty much the same. They’re competing products; it’s just the look and feel. Yahoo
Analytics is more geared towards how to improve your sites for more business.
Andreas:
Like a site optimizer
Student:
Yes
Andreas:
So the purpose of this, and I thought I made this clear; not that you worry about driving
traffic to your website, but I wanted you to see how easy it is so that when you get that
question from your favorite CEO, you will know now trivially easy it is, and it’s free to get
information that ten years ago you would have never dreamt of having.
If you were Amazon.com, would you use Google Analytics?
Itamar:
You could …
Andreas:
One thing is latency is not great, but more importantly, you don’t want to have
other people knowing everything that is going on with you. Companies like
Comsco, Hitwise, they can actually do a pretty good job in predicting Amazon’s
earnings announcement by seeing a fraction, not even a representative fraction,
but for a company that is in the consumer space, they know extremely well what’s
going on.
0:41:07.9
If you used Google Analytics, Google would know it with probably better accuracy
than most companies would know it themselves. It is some fear, not where the
data sits, but that really by mistake or by aggregation, if they have a report on
eBusiness dominated by Amazon, people will know stuff about Amazon without
Amazon even releasing it.
Student:
I was going to say what you said about … giving away your data.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 11
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Yahoo Pipes was pretty straight forward. The idea there was that you don’t have to
be deep programmers in order to understand how to get at stuff. It’s interesting
what this “atomization,” – what making this knowledge atomically small really
implies.
Who uses RSS readers? Who has an RSS reader they read more than once a month?
Less than half of the people. What do you do with it? It can’t be something that is
important, otherwise there would be more hands up in this class.
Student:
Andreas:
What do you do with it? Why do you use it? Why don’t you go to the websites?
Student:
I have a lot of stuff that I look at every day, so it’s efficient to go to it quickly, instead of
typing a website…
Andreas:
Does your newsreader learn what you’re interested in?
Student:
Some of them sort of learn to the extent that… this blog or that …
Andreas:
I think the key is not that we have a delivery mechanism, RSS, Atom, etc. Correct
me if I’m wrong, but I don’t think that’s the key. The key is that we can measure
attention data, which is related to those much smaller units than buying a
newspaper and taking it home. It is truly, for me, that one of the social data revolution
things it implies is that we can now, given our own attention data, and attention data
of the world, do a much better job in having the right atoms bubble up; having
those things that we are actually interested in being more likely for us to see than
if we just pick up a paper.
It’s not so much doing natural language processing, figuring out what the text is
about, [0:44:18.2 deducing] is actually very difficult. If you have to almost identical
articles, and you want to know more, does it mean you want to know more to that
level of detail or more by the same author, or more by the same country. It’s very
difficult. I think the social element of sharing, such as Facebook Share, is what will
ultimately make the different there, just as we had in recommendations.
Student:
I think the potential… not only as individual… as source… potential network effect that
source can create so it’s like the social attention is slightly different from individual
attention.
0:45:00.5
Andreas:
Actually, that needs some explanation. The aggregate of the social attention, what is
the sum missing compared to the whole?
Student:
There is a website called 8 RSS. I’m not sure if you guys are familiar with it, but it’s a
third-party tool that can plug into popular news readers like Google Newsreader, or
NewsGator or something like that. It measures the amount of time you spend on each
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 12
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
source per article, or per source, and then it tries to come up with this little attention score
that is specific to that source. I think, when I look at something like that, what I see is
it inherently does a good job for me, looking at a source, but I don’t learn too much
from looking at that score. When I learn something, it usually comes from what my
friends are sharing with me in Google Reader, so that I have shared attention with
a group of people that I have in Google Reader. What I’m talking about is the
social component of looking at news or anything new, when it’s layered on an
efficient reading interface like Google Reader, it makes it much more interesting, if
not more efficient, because it bubbles up certain things. The things that bubble up
are basically the shared attention items.
Andreas:
Amazon had a feature about ten years ago where, depending on which domain name
you were ordering from, they showed stuff people were ordering, for instance, at
Microsoft.com and at Sun Microsystems, etc. It was not a very popular feature by those
companies. Seeing that all the Microsystems employees were interesting in Linux
seemed like something that Microsoft wasn’t happy about Amazon surfacing. That’s one
easy way, an old way of showing the attention of a community where you find interesting
things.
The points of this were the atomization of information as opposed to the
unbundling, as well as progress is made not by techniques or clustering of
articles, but by measuring social data, measuring the attention of the individuals,
and in my perspective, empowering them to give explicit attention or metadata –
who they think this might be interesting for in regards to their friends.
The last thing I want to do before handing over here is to give a couple of remarks on the
survey that I promised to you. It was much harder than I thought; I boiled it down to 25
questions. I promise, I will read everything you write here, personally. The reason I am
asking you for your name and email is because I need to associate it with a grade. It’s
part of participation. The email is a way for me to get back if you have any questions.
Some of these are very simple questions. I am curious about how you learn about stuff
like the swine flu outbreak, what is the biggest surprise you had, and any feedback you
have, if you think some questions don’t make sense or are misunderstood, give me a
quick call or send me a quick email. We will fix it. This is going into a Google Doc and I
was planning on removing the personal information and then just make it accessible to
everyone, and summarizing what we find out here.
0:49:27.1
Student:
…
Andreas:
Unfortunately given that we have only 1 TA who works 10 hours a week, I had this up
here. The URL is long. I will put the URL instead of reading it out here. I will put the
URL on the wiki. How do you feel about [0:49:50.9 unclear] and those things, tinyURL
and those things? Neutral, great, awful?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 13
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
I like it.
Andreas:
Okay, I promise if we can use your computer, I will put it up on today’s wiki. I would ask
you to do it. It’s 25 questions. I don’t know how long it will take but I really thought these
would be also be interesting things for you to think about. They were partly formed by
your first survey that we did in class in the first class. Give me feedback. Tell me if
things are not clear. Tell me if you think I’m missing a big point here and I will add it at
the bottom. This goes into Google form so it’s pretty low tech.
Does anybody know how I could extract IP addresses to prevent people from filling it in
multiple times, and put it into the Google Doc? Does somebody happen to know this? If
so, you know Google Forms? It’s basically what you see here and puts it into Google
Spreadsheet. If somebody knows how to easily get additional data then please talk to
me in the break, because I would like to do what I preach, to know these things, like what
time of the day people do these things and stuff like that.
The homework for the prediction market hasn’t been graded yet. It’s due today so I have
no feedback on that yet. The purpose of that was to make you think about how we set up
a market. We will plan on creating one here. I don’t know if we have a volunteer for that;
if anybody wants to use Inkling Markets, it’s not that hard to set it up. Then we will run it,
but the purpose is not just to play in it but to understand how to do this, as yet another
source to get to data, and yet another source where the aggregation happens in a very
smart way.
Student:
You said that 20% of revenues in Amazon come from recommender systems. How
is that measured?
Andreas:
It’s people who click on an item that they eventually but, which showed up on
recommendations. It’s like the coupons and that’s why I know about the problems here.
Would people have bought it anyways? We don’t know. Don’t nail me down on 20%. It
clearly depends on the specific category and is it 20% of revenues, sometimes of items.
The point I’m super comfortable making is it is a two-digit number of increase that
you wouldn’t have if you didn’t have this piece of code on the site. In that sense, it
is really what I consider the new marketing.
I will put here on the wiki the URL and ask you to do it by tomorrow evening. I will look at
it on Wednesday and I can tell my last class at Berkeley on Thursday what people are
saying because they are also doing it.
0:53:40.2
Enrique is not here. [0:53:47.0 unclear] is not here. Matt, do you have anything to say
about the www.socialdatarevolution.com site?
Matt:
I can give a little demo.
Andreas:
I love that attitude.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 14
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Matt:
This is just a dashboard of how all the teams are doing with the different pages. When it
pulls up there are four metrics we’re using, so far. The total number of fans, the total
interactions on the last day that there are data available for, the number of unique views
and the number of page views. There it is; teams are ordered by how well they’re doing.
Andreas:
Have you seen it yet? Who has seen it yet?
Matt:
I haven’t sent it out yet.
Student:
…
Andreas:
We’ll put the websites up. Partly, the process is broken. We have 1 TA who has 10
hours to support us here, that’s about 5 minutes per student per week. That’s why I
heavily rely on volunteers who actually have super grades.
Matt:
It will be on www.socialdatarevolution.com shortly. If you are interested in it now, you
can go to www.mattkjones.com/sdr and it will bring you there. The idea is red teams are
Stanford and blue teams are Berkeley. They’re ordered by how well they’re doing.
[Laughter] There is a little heat map on the right that shows who is doing well in each
category so you can see that this team isn’t doing so hot overall, but it’s doing better than
a lot of them in this particular metric.
Student:
… rumor they would…
Andreas:
In The Favorite Dish, they changed their model. It’s up to people. What I love about it is
I love the transparency. If they want to do something else, they shouldn’t expect that
people still go to The Favorite Dish.
Matt:
An important thing to note here is that these are all stats from May 2. The only thing that
is aggregated over time is total fans. Page views, total interactions, unique views are just
from that single day. That might change; I don’t know, that’s up to you. That’s the way
it’s implemented right now.
Andreas:
We did put some thought into making it robust, so for each of these metrics we rank the
teams, and then it doesn’t matter whether you have ten times as many people or just one
person more. Your rank is one above the other one. It’s very robust against outliers and
we discussed for quite a while what would be fair. If people have comments, once again,
I’m a super open person to listen to how we could do it better. I would rather err on the
side of being a bit too robust over one team doing well in one variable and then just
winning, and all the rest sucks. The CSV thing is pathetic.
0:57:08.4
Matt:
Andreas:
How do you do that?
Matt:
Work at Facebook.
Andreas:
I think it’s great. Let’s give Matt a round of applause for putting this up here. [Applause]
It’s all hooked up so it’s automated every night and updates.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 15
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
On that note, I want to welcome Itamar Rossen. Two years ago, we had his previous
boss, Jeff Hammerbacher here. That was a big hit. It turned out that four students went
to work with him afterwards. We just met a couple of days ago. My dear friend, Eric Sun,
who will come and join us towards the end of class, worked for Itamar last summer.
Itamar was born in Jerusalem and went to Stanford. He graduated four years ago (five)
in symbolic systems. Here is how we tried to organize it.
I want to give you the chance for about five minutes now to ask him things you want to
know so we get it out of our system. Any questions you have, it could be on metrics, why
isn’t this that way or the other way, and then we will get it out of the way. After break, I
promise to shield him from any attacks and we will open it at the end for discussion.
Right now is your time to think about what bugged you in the first homework, what you
wish we would have done, what metrics we were missing, or whatever you have; thank
you for coming. He’s super nice so please be nice to him.
Student:
… swine flu… leverage the social graph to find out more about it.
Itamar:
About the swine flu? I can actually show you. Can I use this computer? I need to
refresh my memory here because I haven’t looked at this in a while, but we published
something about the occurrence of the phrase.
Andreas:
The questions could also be higher level. This is one specific data point about swine flu.
Think more about data strategy or data team. This is a great chance to lift it up. He will
be here until the end of class, so in the break you can ask other things.
Itamar:
I’ll stay after class, too, if you want to talk.
Andreas:
I don’t get any kick back if they hire people, let’s be very clear. [Laughs]
Student:
… higher level… what approaches do you try…
Itamar:
About swine flu?
Student:
… suppose you have something going on in the world and you want to try to figure out
some greater social graph… of this. Do you have something besides the wall?
Andreas:
He can’t type while you’re talking. Let me collect a couple of questions to free you up.
Itamar:
I can multi-task; it’s okay.
Andreas:
I will take a couple of questions here and then you decide what you want to answer.
Itamar:
I’ll answer all the questions.
1:01:16.9
Andreas:
I asked the TA to send this out last night so you could send in the questions beforehand
but it will work out.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 16
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
There are a lot of big name people using Facebook pages… what kind of stuff are they
asking for?
Andreas:
Big name people, what is a big name person?
Student:
Coca-Cola, some celebrity.
Andreas:
Coca-Cola is a beautiful example of the mind shift that happened with that
company. Two years ago, I ran a workshop in Europe for marketing 2.0, where the
Coca-Cola CMO was there and he saw something on the web and said, “We’ll sue
them.” Somebody had used the name Coca-Cola in some context.
Now, they found out that people had a very successful fan page, a Coca-Cola fan
page on Facebook. They said, “Great, we’ll fly them out and give them a tour and
maybe we’ll even give them a video camera or whatever…” so do you see that mind
shift here? By big name people, you mean companies. What was the question?
Student:
What kind of metrics are they asking for besides the one we already named?
Itamar:
I’m happy to answer that. We’ve generally found that big companies aren’t terribly
sophisticated in the metrics that they ask for, and that’s not a negative thing, at all.
They’re just looking for the very high level signals. For example, we’ve been able to
make a lot of headway playing [1:02:39.3 unclear] against one another, telling [1:02:42.5
Xiao Ming] when Shack has more fans, that the Shack page is getting more fans and
they should invest more money and we’ll publish more advertisements for the [1:02:50.1
Xiao Ming] page. That’s a pretty clear example of a very basic metric that is driving them
there, and really what [Xiao] and Shack’s agents or publicists are probably interested in is
broad distribution.
I mentioned another example earlier, with a movie. That’s actually a real example.
We’ve had many movies create pages and have [1:03:14.7 EMU] advertisements. That’s
basically a name of an interactive advertisement that we have on the homepage; paid a
lot of money for it, and really the thing they were most excited about is this lexicon-type
feedback. They wanted to see where people were actually talking about their movies.
1:03:32.8
A few people in the brand advertising world have gotten a little bit more
sophisticated, where they’ve sort of approached us and have asked us to work
with them on surveys and qualitative experiments that show whether users are
able to actually recall a brand and are interested in buying that brand’s goods after
seeing an advertisement. We’ve run some polls where we look at users who have seen
a brand advertisement and see if they can recall a particular fact that the advertisement
mentioned.
Andreas:
What is interesting is that whenever there is a new technology, people look at it
from the old perspective. When radio was new, apparently, it was trying to imitate
newspapers. When television was new, apparently they had people sitting there
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 17
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
like with big microphones, as if it were the radio, until people realized that
television is not just a better radio.
More interesting, when the web came along, at the beginning, it very much was
just a better television. That is what it was meant to be, until people realized that
it’s something very different. Most of these poor old brand advertiser guys in the
companies only know that they can give a lot of money to people to tell them how
[1:04:57.7 unclear] recall, and so forth. That’s why they’re asking for this; it’s the world
they know.
How can we get them out of that world and potentially out of the Google world,
which is just performance based, and really come up with ways in which you could
do better than anybody else?
Itamar:
That’s actually what Eric Sun’s talk will be all about. It’s about social distribution of trends
and phenomena on Facebook. We can actually map the path by which a brand’s
Facebook page grows organically through the website, via newsfeed. If we show a
brand data that really shows that the branching factor, on the way that information
diffuses from a core of initial individuals to a size of millions and millions of users is very
quick and very high, then it’s very compelling data. That’s really one of the main
contributions of Eric’s work that he’ll be talking about at the end of the class.
Andreas:
We should ask them; what do you think – are you more likely - let’s say it’s four years
ago, before most of us were on Facebook. Are you more likely to accept an invite if it
comes from three friends who are in different networks? Let’s say one of them is in
Singapore and the next one is in Germany and the third one is in Stanford; “Gee, this
must be a global phenomenon. Why have I heard about it?” Or, are you more likely to
accept, again keeping three as the constant, if there are three people on the same
network? Three people from Stanford say, “I invite you to Facebook.” What is your gut
feeling here? These are two hypotheses and in the PHAME framework, two hypotheses
driving very different actions. Who would think that having three different networks, each
one having a person saying you should join, versus having the same network? Let’s be
specific. Who thinks three different networks?
Student:
You control how well…
1:07:15.6
Andreas:
Let’s keep everything else constant. We talked about this. Let’s assume that everything
else is constant. They know very little about people who are not on Facebook, yet, so it’s
actually hard to know how well they can control it. Let’s assume number three is fixed,
and let’s assume you know all the people equally well. What do you think; if you didn’t do
an experiment? Covering your bases, not putting all the eggs in one basket, all these
metaphors, who thinks it should be three different networks? A show of hands? One,
two, three, four, five, about eight or nine. Who thinks it should be from one network?
Itamar:
Could you guys speak about the reasons for your answers?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 18
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Just one word before they influence each other here. It’s steady for people here with one
network, but not a huge difference. Let’s hear the reasons.
Student:
It’s easier to … three in one network. Otherwise, the other three separate networks…
Andreas:
I said this poorly. What I meant is it is one thing…
Itamar:
Assume that you’re a non-user of the site and you receive three invitations. In one
case, those three invitations come from three individuals that are in three
completely disparate networks on this site. For example, one is in Singapore, one
is in Stanford, and one is in Argentina. In the second case, you receive three
invitations from people who are all part of the same network, all part of the
Stanford network. The question is, holding everything else constant, which
scenario is more likely to lead you to accept the invitation and join the site?
Andreas:
As an example of a fair question, I think a fair problem, right? Let’s hear more
hypotheses here.
Student:
…
Andreas:
I explained it poorly. Once again, who thinks three different, as disparate as
possible, spanning this space, an invitation for one network called Facebook.com.
Itamar:
It went down.
Andreas:
It’s still 7 or 8 people. Who think that all from the same group? It’s interesting that again,
by explaining, we don’t come to a clear decision. If you didn’t do the experiment, how
would you go about it? Would you go to Mark [1:10:09.0 unclear] who tells you about the
strength of weak ties, or how would you now do that if you couldn’t experiment?
Student:
If holding everything else constant… Facebook… why does that…
Andreas:
Maybe there is no difference. Let’s hear some hypotheses about why are you on
one side of the camp or the other side?
Student:
… Stanford that you might be more interested in knowing people, like keeping in contact
with people from different networks … why I would say if I had … people outside
Stanford… I would say why would I join that…
Student:
Those people sending invites are my friends anyway…
1:11:13.2
Andreas:
I love these arguments because it’s so powerful in showing how experiments
actually set an end to discussions. We’ll hear some more. I have not seen you in
class before?
Student:
I’m always here. [Laughter]
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 19
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Maybe your hair has grown, or something. Sorry, I’m an honest guy. I haven’t noticed
you so I welcome your comments. [Laughter]
Student:
I would go for being all on the same network because it seems like there’s enough critical
mass that maybe all of them know each other so that … better use to you than if they’re
all three different networks…
Andreas:
Ron
Ron:
We feel if you were in the same network the air is stronger… for … you don’t know who is
in the system. If you have three of your friends from disparate networks, you’re more
likely to go, “Oh, … might actually be bigger in terms of all the friends that they also know
that I could hook up with, rather than saying my one isolated network of … only … why
would I want to use a social network that is designed to get me more audience…
Andreas:
So, more fuel and then we’ll get to the answer, or should we wait until after the break.
Itamar:
I think more people want to talk.
Andreas:
Let’s have you, we have no female voice so far, today.
Student:
I guess my – the reason I voted that I would want three from the same network –
assuming that three people from the same network … Stanford, I would say that would
actually occur… three people from random networks… that network… at that time I
wasn’t at Stanford so I would have said…
Andreas:
That is an interesting point that what might have been the case four years ago might no
longer be the case now.
Student:
I think it’s the three people network that I would feel that all of that is…
Andreas:
Shall we do four more?
Student:
For the same reasons, I have different sides. Since I want critical mass and … seems
like it’s a random site. I’m from the east coast so most of my friends are outside of
Stanford, from the east coast. It’s very weird for a site to somehow be… both geographic
areas. To me, if that happens, if it’s three random networks, it’s more of an
indication that it has reached critical mass, that it’s something more major that is
not just this local thing. I would be more willing to check it out there. I think …
non-visionary is a big problem that when four years ago, when it first started out, I think
there was a lot more of everyone joining these massive groups that really didn’t have
much meaning.
1:15:07.1
Andreas:
Of course, we could translate it as should you invite people – for your invitations for your
pages, or for the ad dollars you spend and stuff like this, how do you spend it? That was
just one problem, the problem persists even if the underlying questions are different.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 20
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
You are hiding behind the wall. Is there any reason for that?
Student:
No… the reason I say that being invited by the people from the same network would have
a stronger effect than people from three different networks would be that … same
network, interaction with them based on … face to face, there is another level of
protection there. People from different networks… as often as people from the same
networks.
Andreas:
Anybody else want to bring something in?
Student:
It might just be… run your A/B experiment, … better but might not be the best solution…
Andreas:
I think the conditioning is where the fun always starts for me, to see that not only
50%, 70%, or 90% do this, but then understanding what the differences are
between the groups. I did a paper ten years ago where I analyzed the behavior of 30
million transactions of [1:16:40.6 unclear] futures. People fell into different groups, very
distinct groups, and then it really became interesting to see the properties of those
groups.
Two more people?
Student:
… essence I think we… difference between … and Facebook.
Andreas:
Not really, I set it up poorly. I didn’t mean that there are multiple, like 900 thousand
different social networks. What I meant was would you respect or react more likely if you
get three invitations to the same thing by people in one group, or by invitations to the
same thing by people across the world.
Student:
It signals that people are from one group that they have similar interests or similar
locations, or some associated factor that is going to be … network or a very directed
network, versus like everyone and their mother is on this site…
Student:
I think it depends on … big part of … success… closed network…. At scale…
Andreas:
So, the reason I took fifteen minutes on this was this is a very typical discussion people
have, and it’s a good discussion. The parallel to this was a discussion I gave last time to
this when I said should we give a discount now and have a higher conversion rate or
should we give the discount for the next purchase and have a higher repurchasing rate.
Now comes the answer.
1:18:22.2
Itamar:
I’ll set it a bit more broadly. We were actually interested in a question that is
related to this, which is a bit more broad, which is if you were invited by several
people, does the fact that they have shared characteristics affect your likelihood of
joining? One of those shared characteristics is being part of the same network.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 21
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
In order to answer this question, we ran a statistical experiment nine months ago,
where we took some sample data of users who had been invited by a fixed number
– non users who had been invited by a fixed number of users on this site. We
estimated a logistic regression model, predicting whether the invite recipient
would accept the invitation as a function of various demographic characteristics of
the inviters, whether they were in the same network, whether they were friends,
whether their age was similar, how many mutual friends they had and so on.
For all these types of shared characteristics, like whether they were friends and
whether they were in a shared network, we found pretty meaningful, significant,
positive coefficients in the logistic regression model, which signaled to us that
shared characteristics really do have a positive effect.
You hit upon the right hypotheses that if you receive invitations from three people
and they’re all part of the same network, it gives you a feeling that there’s a
cohesive social experience there, that it’s not just a random friend that you had in
high school and a random friend that you made during your year in China, and a
random friend you met in an internship. It’s really people who know each other
who are communicating together that you then want to be a part of.
Another important factor is the response variable here is just acceptance. We
didn’t study what the effect was on actual engagement, given that the user would
sign up. That’s a different study all together.
Student:
…
Itamar:
Exactly, it’s not an empirical experiment where we ran an A/B test because it’s not clear
what you would want to vary there, what system you would turn on versus off.
Student:
…
Itamar:
Absolutely, that’s absolutely right. One problem here is a sort of selection bias where
people might be more likely to invite together in groups, where the three inviters
are part of the same network. What we did was we basically constructed the
samples so we would have an equal proportion of invite senders and invite
recipients that were having these same shared characteristics and those that did
not have shared characteristics. We controlled for other features like the age of
the inviters, how long they had been on this site, how active of users they were,
countries they were in, and so on. By controlling for these other nonrelated
factors, you hope to be able to make valid inferences from the logistic regression
model.
1:21:31.8
Andreas:
If you look at the coefficients and draw conclusions from that, particularly of [1:21:36.8
unclear], the power you have – and you will give other examples after break – is where
you can vary things. In that sense, it was a poor example, but I was curious about
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 22
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
what people actually think about it. The purpose was not whether you should send
invitations this way or that way. The purpose was to show you that these are
totally valid discussions, and it’s great to come up with these hypotheses. Then, it
doesn’t end by the highest paid office that is making the decision. It should end by
people saying, “These are the two actions we will take, and these are the metrics we will
measure it by.” The metric is only acceptance versus engagement afterwards. You
would get different results.
Itamar:
I’d say that one thing this did motivate was a feature called an invitation suggester.
When you send an invitation to someone, we now prompt you to go through your
friends and suggest for some of your friends to also invite them. This seems like a
pretty sensical feature in any case, but given these results, it seems especially
powerful because if I, as an invitor, manage to get Andreas to also invite someone
we both know, that compound invitation seems like it would be more powerful.
Andreas:
The coffee closes at 4:00, so I think in interest of having people awake for the second
half of class, shall we take our break now and be back at 4:00? Would that work? Okay
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 23
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_5facebook-1_2009.05.04.doc