Download weigend_stanford2009_2ecosystems-1_2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas Weigend (www.weigend.com)
Data Mining and Electronic Business: The Social Data Revolution
STATS 252
April 13, 2009
Class 2 Ecosystems: (Part 1 of 2)
This transcript:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.mp3
Next Transcript: (Part 2 of 2):
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-2_2009.04.13.doc
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 1
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
STATS 252, Spring 2009 lecture 2. Welcome to the class. Here is the agenda for today.
I will start walking you through our wiki, specifically, through the logistics. Then, we will
talk and recap what we did last time, which was the PHAME framework. After that, we
have Linus Young, who was a student in class three years ago. He then became a hotty
by becoming the most popular Facebook application for a while. He then took what he
knew from Facebook and became a mafia guy by running the most popular iPhone app
for a while. He is now going to turn to babies, but we will talk about babies at the end of
the first part.
In the second half, we will have the founder of RockYou, [0:00:51.7 unclear], come to
class and he will talk about ecosystems and platform. At the end of class I will tell you
what we’re going to do during the rest of the quarter. Somewhere in between, I will
sneak in a discussion of the first homework and tell you about what makes good metrics
good metrics. That’s the agenda for today. Are there any questions?
If not, then let’s quickly run through the wiki. As all of you know, you should all have
editing privileges on the course wiki. What I will talk about in it is this is actually a wiki,
which means it relies on all of you contributing to it.
There is another page that is the page I edit on www.weigend.com, which is the Stanford
teaching description. I will start uploading the mp3 files after each class, starting
probably tomorrow. There will be mp3s of each class, for people who either miss the
class or have nothing better to do while they are taking airplanes or driving can actually
listen to them.
What is this class about? I will talk about this in the last part of today’s class. That will be
filled in, in real time. Who is in the class? Most of you have actually found the Ning
social network that we created for class. I will ask you to please upload your pictures so I
know how to associate the names with the pictures. I do give a grade for class
participation of 35%. I don’t want to know just the CS guy in the back, but what
corresponds to that picture so I can do a fair job of grading.
0:02:39.0
The communication goes via email to me, and I’ve set up communication paths to
the TAs. Don’t use Ning for mission critical communication. Use Ning if you want to
get to know each other, if you want to talk about your stuff, and also there is a page on
Stanford 2009, which by mistake the TAs created the Student Summary. I’ve locked this
page because I don’t want you to have to enter the information multiple times. What I
ask you to do is to take that information, and whatever you’re comfortable with,
everything is public, to put it out on the Ning network. I may unlock it again, but don’t add
stuff on this page; add it on Ning.
Back to the boring logistics here. Grading policy – homework is 60%. Contribution to
the wiki is 30%. Class participation is 5%. Contributions elsewhere, like the
Facebook group, is also 5%.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 2
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Most of you have figured out that the homework submission email was actually broken.
The TA didn’t get around to creating it. I created it Sunday night. If people had
problems, I apologize for this. This is [email protected], where you submit
homework so we have a clear record of when it was submitted. When you need help, hit
up the TAs, which is one email address for all of them,
[email protected], which all of them share and they will grab questions
you have and try to answer them.
In the last class, you met Enrique Allen, our social media TA, who has been more than
pulling his weight on the first homework. He really rocks. I met with all my 6 teaching
people on Saturday at my house in San Francisco, so most of the stuff you see, he built.
We have [0:04:42.9 unclear], who is actually grading your first homework. We have
another grader, Ryan Mason, who took the class last year and is not working at
23andMe.
I told you about the Facebook page. We started as a group and moved it over to a
Facebook page, which is either www.facebook.com/socialdatarevolution, or
www.socialdatarevolution.com. That is also where Matt is going to help me build the
dashboard for the various metrics you have been creating, which we will see on a
daily update, for the various groups here and at Berkeley, how you are doing on
the homework.
I just wanted to make sure I take the five minutes to once walk you through this. If you
have questions about this, if you see typos or if you see typos in the course description or
if you think something is missing, email me and tell me, “Andreas, you got this wrong.”
0:05:33.4
All right, time to get to content. I want to start by recapping from last class, the PHAME
framework. PHAME stands for problems, hypotheses, actions, metrics, and
experiments. One question is can you actually see this? Is this large enough a font for
people to read? Okay, good. I will give my file, as a habit, at the end of class to one
of the people who is responsible for making this wiki. The wiki is viewed three
days after class. That would be Thursday evening. That way, you have all the
notes I wrote and you just wikify them and add stuff that I may say but didn’t put in
the notes. One person who is responsible for this week’s wiki should come up to
me afterwards and say, “Email me your notes.”
Problems, Hypotheses, Actions, Metrics, and Experiments – I contrasted this last
class against the old-style hope, the internal hope for “Let’s just do data mining
and hope for insights to emerge.” We are still waiting for those insights.
As you see, data is not actually the primary thing, mining is not the primary thing,
but the problem is the primary thing. Problems, for most people, actually start with
what are we doing it for? What is the business model or what is the monetization?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 3
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
I want to start the conversation today by giving you a pretty complete view of everything
there is to monetization. Is that a good start? Okay, so first of all, you know in the web
1.0 area, it was ads that fueled the Internet economy. I want to do this in the context
of PHAME, and work with you in figuring out if we have a certain monetization
model, what are the corresponding metrics that matter.
For instance, if you wonder about the ads, what would be the metric you would like
to have as large a number on as possible? Exactly – views, ad views, ad clicks.
Conversion rate comes in here. You want to have people stay on your page as long as
possible.
Then, the next thing is you can actually sell stuff. For example, Amazon.com and
there, it’s already no longer clear whether you want to have people stay on your site as
long as possible, or whether you want to give them what they’re looking for, they’re done,
and they know how it was cool, it was smooth, as opposed to putting stumbling blocks in
their way.
The best example of versioning is dating sites. On the one hand, you want to have
a lot of people who are actually on the dating site. There is a good inventory. On
the other hand, you want to make some money. You have two different versions,
the free version to get the inventory, and the paid version. You need to
differentiate it in a smart way.
For instance, you could differentiate it that you can find out contact details about the
person if you pay, or you can send an unlimited number of messages as opposed to only
a couple of messages a day.
What other ways of monetization do we have? Virtual goods is actually one of the
upcoming ways to monetize stuff. People buy bits. We talked last time about whether
a virtual gift is actually different from a real gift, besides the obvious. What other ways of
monetization are there, since we want to have a complete picture here?
0:09:21.7
Lead gen? do you want to explain how lead generation works?
Student:
…
Andreas:
For instance, what’s the one where you get a certain number of points at Facebook apps
and then they sell your name to some mortgage broker or some Russian mail order
brides, or whatever they’re selling there? [0:09:57.7 Offer Power] is one example. There
are a whole bunch of them that do lead generation. Lead generation actually has
its roots way before that. For instance, if you look for mortgage, at least the way it use
to be, it was pretty expensive. People on average make a lot of money on the key word
mortgage.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 4
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
On the other hand, if you look for primary school or relocation, you tended to get those
key words much cheaper. People always tried to understand who we could come up
with key words that are cheaper, but correspond to the same idea of lead
generation as the very expensive ones.
Google – we talked about data sources last time that Google knows. By knowing
the sequence of what people do, by knowing key words, they understand what
they subsequently do. Google can do an awesome job in suggesting to people,
“You might also consider those key words.” Why does it make sense for Google?
Because by suggesting this, more people bid on those key words so the price goes up.
Google runs a second price auction, which means if you bid $1 for your key word if
somebody clicks, and the second guy only bids $.10, you don’t have to pay the $1 if
someone clicks on yours; you only have to pay the $.10. This is a more stable algorithm
for auctions than if you actually pay what you say you would pay. It also makes people
hope, “We only have to pay what the second guy is paying,” so maybe the average price
is going to be higher for Google. Are there any other monetization ways?
We have subscriptions and one time payments. For instance, some games are one
time payments and you own it. Other games are subscriptions where you keep on
paying to play it. Dating sites keep on paying until you don’t need it anymore.
Student:
Mining information… your site versus…
Andreas:
Information products is a term I use for that. We’ll have one class toward the end of
the quarter, where I will talk about how we can leverage information, how we can use
information we collect on the web for finance, for trading. That’s a classic example. A
good specific example here is a company that had a great product. They figured
out how to understand something about a pharma by going to doctors and doing
service with them, and measuring the implicit and explicit behavior.
0:12:31.2
How did they make money? It was not by going to the pharma industry. It was by
going to Wall Street and telling Wall Street, “We know something about that
product. We know the patterns of how doctors prescribe, we know how the
patterns change,” and that was their way of monetizing their insights as opposed
to trying to help the pharma industry to create better products or sell them better.
That’s a specific case of a more general case called freemium, which means
sometimes it is free and then when you want to have a premium version, you pay
for it. Or, Chris Anderson has a wonderful paper in Wired, about a year ago, with the
title, “Free,” which is the classic Gillette model. You might know that you buy something
for free, such as a shaver, but then you buy the blades and you continue buying blades.
With Google that is yet another level. You pay nothing; you pay with your attention and
your clicks, but Google gets the money from somebody else. Understanding who the
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 5
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
agents are who are willing to pay what the incentives are, aligning the incentives is
a key thing.
Student:
… for access to API…
Andreas:
There is a company in San Francisco which does the API stuff. Rapleaf is another
company we will talk about later, in terms of data products. Mashery – that’s right.
Mashery is a company that does just APIs for you. They know how to make sure that
the calls happen the way they should happen and stuff like this. Most of these things
exist in very cheap building blocks where the company sells it thousands of times, makes
their money that way, and you can just take the building block and now you have an API.
For instance, BestBuy actually uses the Mashery API.
Student:
… franchising… sell the brand … domain overseas…
Andreas:
That may be part of the selling and versioning, already.
Student:
…
Andreas:
Those are actually already included here. The versioning might be the LinkedIn thing,
that certain things are free, but then you have a premium membership that allows
you to do other things. You can message people. Information products, access to
data; I think we pretty much have this space here, up there. Ray, one other thing?
Ray:
…
Andreas:
What’s missing is the ecosystem and the App Store is a very good example here.
Widgets, which [0:15:29.1 Jao Shung] is going to talk about in the end of class today,
which is another example where you have [0:15:33.5 unclear] building a widget and each
of them has certain stakes in the game.
0:15:40.7
The point I’m making here is that these are different problems. The problems are
driven by the way we monetize them. If we now focus on one thing, namely on getting
users, then acquisition x retention is the right thing. Nobody cares for users who only
come once, unless they buy some that one time, which actually is not a subscription
service. That’s what you want. Most people want users to actually like the product
and to come again.
One problem is acquisition and the other problem is the retention. Retention is the
term where I sometimes say the product is the message. It’s not that whatever
marketing is money spent because you have a bad product. It’s not just what the
marketing spent, it’s to get people to decide once, but then people experience the
product and that’s what makes them come back. That’s the problem space.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 6
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
For your homework we will focus on acquisition and retention. Sometimes,
acquisition is called viral loop. Linus will be talking about this. Sometimes retention
is saying, “We want to have really good engagement.”
Hypotheses is the second of those five letters of PHAME. Hypotheses, in my
perspective, are often coming from cognitive signs, from behavioral economics.
Ultimately, what we’re interested in is helping people make better decisions, and
so this means we need to figure out how people make decisions.
For instance, if I give you either $10 right now, or $100 in a week from now, who would
pick the $10 right now? I guess you know where to find me in a week from now. Let’s
say if you see something on the street or you meet somebody at [0:17:31.7 Tresida] and
you say, “Hey, do you want $10 right now, or do you want $100 in a week?” Who would
take the $10 right now; about half of you? Who would take the $100 in a week; roughly
the other half, okay. I guess it depends on the person.
Now, I’m shifting this into the future and it’s the same guy at [0:17:53.4 Tresida]. Do you
want to have $10 in fifty-one weeks from now, or do you want to have $100 in fifty-two
weeks from now? It’s the same delta of one week. What do you think? Everybody
would basically go for the $100 because between fifty-one and fifty-two weeks is not a big
difference. This is called hyperbolic discounting.
0:18:15.9
One thing, when I was preparing with Linus here, was that we really came up with a very
good example. It’s an example from Amazon. Here is how that example worked. Mike
[0:18:26.9 Shaw], who runs Wikinvest, tested the following; he tested what was the
economics behind co-branded credit cards. The economics behind co-branded credit
cards, like the Amazon Visa card, is that if Amazon gets a new customer to Chase,
Chase pays Amazon $100 and Chase pays Amazon $30, which goes to the customer.
Basically, for each time Amazon brings a new customer to Chase, Amazon makes $100
and the customer gets $30. The question is how do you message that to the
customer?
The two alternatives were, basically one of them was saying, “You have $43 worth
in your shopping cart. You can get that for only $13, today.” The other one was
saying, “Come next time, and we’ll give you $30 off of your next purchase.”
It is not as trivial as you think, at first sight. You can argue both ways. You can
argue the former, $30 off right now, is actually better because people would rather
take – that’s why I talked about the hyperbolic discount – the $10 or $30 right now,
than wait for some point in the future. It depends on what you want to measure.
If you are measuring how much is Amazon making over all, then the fact that somebody
has a $30 credit may make them come and buy something later since they have that
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 7
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
credit there. That is a good example of two hypotheses and we don’t know apriori,
we don’t know before the experiment, which one will actually work better.
Now, it clearly depends on the metrics we are looking at. If we are looking at the metrics
of people signing up for the card, we’ll have more the first time. If we take a different
view such as we want to know how much is Amazon going to make down the road,
including the $100 but also including selling more stuff and having better retention of
customers, maybe the latter one is better. It turns out; the former one was the one
that actually worked better. Give people a discount right now. In the data
community, that’s typically called, “Whether Mr. Right or Mr. Right Now; we’ll take
Mr. Right Now.”
That was the hypotheses part. I’m actually curious what hypotheses you have for
building your pages for the social data revolution. You have to think big, think
broad. The hypotheses don’t come by staring at code. They come by having
ideas, by talking with friends, and they come by looking at data, this [0:21:10.5
unclear] process, and coming up with new ideas.
We will do a class on visualization in the quarter. In the visualization class, I will
make the big difference between real time and interactive. Most things, with the
exception of performance metrics, we don’t care all that much about whether
they’re real time or not. We do care that they are interactive because it is our time.
So, for the interaction time, how long does it take between having the idea and
getting an answer back that counts much more so than whether this hit was a
minute or an hour old.
0:21:44.4
The actions clearly depend on what problem you’re trying to solve. I already
mentioned, as an action, the varying of the text. Another example of an action is if
you sent emails out to people, reminder emails, marketing emails, so you have
another channel and not just the website. It could be Facebook notifications.
When do you send those emails? That’s another experimental question and it turns
out that sending them a little bit earlier than when the last time the person read the
previous email. If you read emails on Tuesday, at 10:00 in the morning, we’ll send out an
email on Tuesday at 9:30 in the morning. For some reason it seems to be that people
read last in, first out. That seems to be a time that works well for people reading email.
You don’t want to bury it by sending it on Friday evening. People will come to the office
on Monday morning and have a stack of stuff and never get to the bottom of it. These
would be two variations here, two of the parameters that can vary; text and time of
sending out an email.
The lesson for the endpoint is twofold. One is you want to have many metrics. It
is not one metric that matters. I got, from the TA, the whole union of all this stuff you
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 8
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
created here. I will talk to you a little bit about a framework on how to put them together.
There is a lot of metrics you want to consider, not just one.
Student:
I was wondering why…
Andreas:
What I call experiment is running the experiment. For me, experiment means the
one liner in the code when doing a split test, and A/B test. The point I try to make
here is that I want to have differential actions. I want to have it either this way, $30
off now, or that way, $30 in the future. I believe that splitting it conceptually is
better than lumping it all together and calling it an experiment.
Also, metrics are part of the experiment as well. That was a good question. Any
other questions?
Student:
…
Andreas:
I hate to tell you, but no, the actions are the actions you are taking. The consumer
action or reaction comes in the last point, as you do the experiment. When I’m
talking about actions here, these are differential actions where you decide it could
be this way or it could be that way.
Last time, I gave you the example of having a shopping cart on the left versus having the
shopping cart on the right. Today, I gave you other examples. We have a lot more
examples here. It’s the actions you are taking, so it’s a very empowering approach, as
opposed to hoping for the insights. You start with the actions. I sometimes call it the
primacy of the action, that’s the starting point. That was a good question, thank you.
0:24:52.5
Now, we get to the heart of it. We already talked a little bit about metrics. It is not
just metrics that measure user behavior. It is metrics that measure site behavior,
as well. For instance, if somebody sees a fatal, if something crashes, there will be
negative effects for the user. If he is trying to buy something, the guy is not going to buy
now. There will probably be long-term effects. As I said the last time, short term is
easy. Long term is hard effects.
Then, we do the experiment. For the experiment, what matters is to have this one
line of pseudo-code here, if a certain condition is met, it shows them the new stuff.
If the condition is not met it shows them the old stuff.
Here is a fun exercise for the computer scientist behind you. If you want to have 2% of
all customers see the new stuff, how would you pick 2% of the customers?
Student:
…
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 9
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
So one is you take the user ID. You take the remainder of the model 100 and see where
[0:26:14.1 unclear]. That’s reasonable. What other ways do you have? The problem
there is that you always have the same 2%.
Student:
…
Andreas:
You can add a random number to this. There are many ways of doing it. They are all
good. What’s important is that the answer was not, “In the morning you take the first 100
people, and then you take the next 500 or 5000.” You pretty much decide, before the
person comes, whether they’re going to be in the “test set,” which is the old one, or in the
new set where the new stuff is. Once you have decided this, then we actually show them
the thing. The advantage of this is that you don’t have biases in any way.
The way it’s done at Amazon is that each person has an Amazon ID, which you
probably don’t even know. You don’t want to start user IDs counting up from 1. That way
your competitors would know how many customers you have. You want to start in the
big space, take random numbers without replacement, maybe ten billion, hundred billion,
some big number. Storage is cheap. Assign a random number. Then, if you want to
have 2% of the people, you first pick one of those ten digits. You say, “Okay, it’s
the third digit.” You pick another random number, if the third digit is a 7 then you
have 10% of the people. You have to add this with another digit, and then you say,
“Or, between this random number and that random number,” and that will get you
2%.
There are many ways of doing it. The big difference is don’t try to do it through a
soft launch such as “We did this and now we are moving over and we’ll compare
it,” because you have other external effects which are typically much more
important – day of the week effect, or some other apps, than what it is you want to
measure.
Are there any questions about that? It’s an important methodological point.
0:28:09.9
In terms of framework, that completes my part of the review of the PHAME model. For
about five minutes, I want to talk with you now about a few dimensions leading over to
Linus, about what I am looking for in good metrics, by contrasting and not on an
example basis, by contrasting two things. They always have in common that there
is something called deep structure, which we are looking for versus surface
structure.
People sometimes think this is like seeing the forest or seeing the trees? The answer is
not. We are not making the difference between seeing the forest, which is some
aggregate phenomenon versus seeing the trees, which is some high granular
phenomenon. We are actually looking at the dynamics. We’re coming up with
models that tell us if for instance this sunlight is changing because of
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 10
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
photosynthesis or whatever is happening there, there will be effects to the
dynamics. Things will be growing differently.
Don’t think forest versus trees. Think how we can actually do experiments that
allow us to understand the underlying dynamics.
Here are a few ways I try to explain this. I hope we will get to it. I have three examples
for this. The first one is we talk about a model versus a description of the trends. A
model tends to be something where you can plug something in. For instance, a
linear operator is a model. Some input is going in and some stuff is coming out. A
model for me, is a predictive model. I can make predictions. I can compare these
predictions to what actually happens in reality.
The is in contrast to the descriptive model where I just try to summarize things,
talk about trends, but we don’t know whether there is any predictive power in that.
Deep structure, for me, means somewhere I can make predictions. In the equation
of the business, if I change this parameter, if I, in the viral loop, change this, I make
a prediction that we get 7% more. If we get 7% more, we know we have a damn
good model, which has some understanding about the underlying dynamics.
The second way I talk about this is that I’m interested in the axes of the space. I’m
interested in the underlying dimensions, much more so than only being interested
in the instances and the points in this space. When you work with clients, they often
come and ask, “What software package should I buy?” That’s of no interest to us here in
an academic setting.
We want to understand what the dimensions of this space are. What are the
characteristics of these packages. Why? Because statistics is a science that
deals with noise. Statistics is a science that deals with generalization. The
assumption we have everywhere in statistics is if I move a little bit in input space,
output is only moving finitely.
0:31:29.8
If you want to be fancy you can speak Latin and say [0:31:32.6 unclear], which means
nature doesn’t jump around. Germans always have Latin sayings for stuff.
It’s very different from people who just know that here are packages, and here are
instances, but they don’t know whether they’re close by or whether they’re miles
apart and there is no generalization from one to the other.
The third way I like to talk about this deep structure versus surface structure is
that I like to talk about tools versus art. We are more or less engineers here, so we
like to build stuff that people can use. The output of course, depends on what we put
in. That’s very different from people who create art. To create art, you make that
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 11
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
movie and when you are done with the movie you are done; you release it, and that’s it.
It is up to people how they want to interpret it.
Some visualization is that way, but that’s not how we really learn stuff. That’s the art
aspect and it can be very beautiful. But, it is not how to make progress in data mining
and understanding what the data is.
We are doing deep structure here, which means we try to build models, making
predictions. We try to look at the axes of the space and we try to build tools. That
doesn’t mean that there is something wrong with art. Art says things in other ways. For
instance, there are great movies we love that tell us things. I want to make the
difference between the surface structure, which something fixed has, and the
underlying structure, which we are interested in, here in class.
Are there any questions?
Student:
…
Andreas:
6 to 10 is a number pulled out of a hat. I figured there are roughly ten groups. There are
five groups at Berkeley, so it’s still a manageable thing. It makes it clear to you that I’m
not asking for the one most important metric. In other cases, I will ask what the one most
important reason for things is. It is not that I always believe in 6 to 10. It’s a reasonable,
manageable size. There are 3 to 5 people in each group so if each of you have two
metrics you are really passionate about, you can each have them without having to fight
over them.
0:33:50.3
Metrics fall in certain groups. There are the traditional metrics that come from the
olden days when people actually looked at newspapers, unique users, how many
people read my paper. Then, we have metrics that concern the individual. Those
are engagement metrics, how often does a person come back. Customer lifetime
value is one of the things that falls in here. The new thing for the last couple of
years is social metrics which are metrics between people. These could either be
implicit metrics like I forward something to Rohit. I see what he is doing with it. I
post on somebody’s wall; these are all things between people. Of course, explicit
feedback where the standard example is from Amazon; I found this really useful, or
this answered my questions, or this was helpful. You give feedback between
things. It’s either about the site, or it is about people/individuals, or it’s about the
relationship between individuals. It’s the same as we had when we talked about
the social data revolution; it was sniffing the digital exhaust, which scientists can
do by themselves. It’s people revealing things about themselves, as a first stage,
and then as a second stage, people revealing stuff about their relationships with
others. Qualitatively, computationally, they are very different.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 12
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
How do I analyze them? We look at the properties of individual metrics. What is
expected; what is good; what is bad? If we are also looking at the set of metrics,
we see what are the tradeoffs between the metrics? If this one goes up, do we
expect that one to go up, as well? Or if this one goes up, do we expect that one to go
down? That’s how I want you to think about metrics, in that space.
Finally, to visualize them, an important thing is to look at the distribution. I gave
you some examples from last time. It’s not one number, like the average page views,
which we are interested in. It is the distribution across all of the sample. The
second one is you want to look at how things change over time.
You can also look at how robust they are by removing a certain number of data points
that go into the metric. Does it have a big effect? If so, you are very sensitive towards
outliers. If it does not have a big effect, then it’s probably more robust. You want to go
for stuff that is robust. For example, I gave you the slope of the [0:36:15.7 law log]
plot, which is the exponent in a power law.
After this introduction, it is a great pleasure for me to introduce my friend Linus Young.
Linus took the class three years ago and I won’t embarrass you by showing you the
pictures we took at the party. You can find them if you just look for “ling,” which is
basically his first initial and last name. He’s so modest, he didn’t even put his name on
the presentation, on Flickr. I’m super happy you’re here.
Linus:
Thank you. I’m very happy to be here, too. It’s been a while. This class is a lot of fun. I
just miss class, in general. Like Andreas said, I took his class three years ago. I just
recently graduated. Since then, what I’ve been doing is I started a Facebook company,
right after class, which I eventually sold. Then, I did an iPhone app company, which I
eventually left. Now, I do something completely different, which I’ll talk about a little bit
later.
0:37:17.9
Today, Andreas asked me to talk a little bit about developing on the Facebook platform,
and the iPhone platform, and to give some of my thoughts, my insights, and my
experiences about how I did this. Feel free to interrupt me at any time. This is more of a
discussion than me giving a formal talk. This is the first time I have given it, so it may not
be as polished. Andreas, please feel free to interrupt me with any questions you might
have.
Just to give you some relevant background about how I got started, the Facebook
platform opened up in the summer of 2007. I was just doing nothing during that summer,
between summer school and things like that. I had a lot of time. My buddy and I went to
a bar one night and were talking about this. It was very interesting. He said, “What can
we do with this?”
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 13
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
For the first time, we have access to two hundred million users, their data, their
social connections and things like that. We came up with this brilliant idea, after a
few beers; what if you could hook up virtually with someone on Facebook? That’s
what we found, on Facebook when we go there we look at a lot of pictures, and things
like that.
We had this brilliant idea; what if you could pick a friend, virtually mate with them,
and then create a baby and take care of that baby? It was a cute little game. We
coded it up that summer, and it did very well. It got a lot of users. We got a lot of
data, and we got really interested in this. We said, “Here’s a market that we can go
after.”
Back then, not a lot of people knew whether this was something just for fun or whether
you could build a business out of it. We wanted to build a business. We thought there
was market potential.
Framing it in Andreas’ PHAME model, what was the problem? Our problem was
how do you get a lot of users. My partner and I knew that this was not something we
wanted to be in, for a long time. Basically, we wanted to make a lot of money and
capitalize on this hot market. To do that, we needed to get a lot of people, very quickly.
That was one problem.
The second problem was how do you scale this? We saw in the summer, that even
with a hundred thousand users or so, our server started getting hiccups, started crashing,
and we had little problems. If we wanted to get to the next order of magnitude, to a
million users, how were we going to scale that? That’s a big problem.
Finally, related to the first problem, how do you get an application compelling
enough to get this huge amount of growth? Those were our problems.
0:39:34.5
When the summer ended, I actually took the Facebook class here, and I met one of the
most brilliant guys I’ve ever met. He is a physicist. He brought the virality model from
physics; it’s very simple model you have in biology… he brought it from physics and
basically put it in the context of Facebook. He created this “viral loop”.
These are some of the metrics of it. It’s kind of hard to see, but basically, it’s a formula
between your invitation rate, acceptance rate, and conversion rate. You just multiply
it together and if you get a number over 1, you’re viral. I can talk more about this later.
That kind of solved our first problem.
He proved the model. He basically said if you follow this, you can grow very
quickly. It came down to basically forcing people to invite their friends. That kind
of solved the first problem because now we knew we could grow.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 14
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
The second problem was how do you scale this? Luckily, I had a roommate who was
doing a PhD thesis on scaling. He was at Berkeley at the time. We brought him on
board and with our skills together, we felt we could properly scale this.
The last problem was harder to solve. What do you make that’s compelling enough to
grow quickly? That brings me to my next life. We had several hypotheses.
We knew from what was hot at the time, and what was really growing, was that
people wanted to know something about themselves. More importantly, they also
wanted to know what their friends thought about them. We wanted to figure out an
application that was compelling enough that they would “annoy their friends” by forcing
the invites that would grow our apps.
Basically, we threw a lot of ideas at the wall. We threw about twenty or thirty ideas and
we saw what stuck. These are some of the applications that stuck. One was, “What
does my birthday mean,” so you enter in your birthday and you figure out what the
meaning of it was by inviting friends, popular friends, second friends. The one that really
did well was “You’re a hotty,” which was basically how hot your friends are.
Andreas:
One important element here is that it’s not that people try very hard to figure this
out by arguing from first principles. With technology being so easy, you just throw
it out and see what works. That’s a very different way of thinking about this
compared to developing a nuclear power plant or something.
Linus:
That’s a great thing about software. You can try a lot of stuff. You change your logo,
change some text, and just start throwing the thing out there. Certain things stuck really
well. I’m going to dive deeper into this one, “You’re a hotty,” and explain what it was.
0:42:08.1
It was very simple. It’s two pages. You get the invite and it says, “Your friends think
you’re hot. See how hot you are.” You click through it and you get presented the first
page on the left, which is basically here is your top ten friends. See where you rank by
inviting all these people. You force them to invite.
Once they invite twenty or so friends, they get that second page which allows them to
readjust those rankings. It’s very simple. We coded the thing in about six hours. You
can see how it is.
Some of the actions we took, and some of the things we varied was what happens when
you rate someone hotter, do you send emails to all their friends, do you send it to one
friend, when do you send emails to them, when do you send applications. All these
different things we kind of varied. We thought of ways to optimize. We did manage to
optimize.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 15
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
We had exponential growth. That was about a course of a month when we were
exponential. We were getting about three hundred thousand daily actives. All this
growth was from new users. We eventually hit a peak where we burned through
the first field of Facebook. We’re just getting residual users in that viral loop, and
eventually we trailed off. I’ll go into that later.
It was pretty amazing. We were getting three hundred thousand new users a day, and
eventually burned through all of Facebook.
Andreas:
Let me ask you a question. What were the hypotheses that led you to come up
with hotties? Was it that you saw James [0:43:42.8 Hong’s] “Hot or Not,” was it you
figured out people love to know about themselves, was it that people don’t know what
they want, but they are good at comparing stuff? What let you to this one and what led
you to the other ones, in terms of the hypotheses creation?
Linus:
Our hypotheses were that most of the people on Facebook are younger users, or
new college students. That’s the first time they actually opened it up to all the
colleges. We wanted something very colloquial, like very fun. We wanted
something that would attract their ego. All those things were kind of related
around that. It’s also because that was what the top apps were doing at the time.
They were all revolved around that.
Andreas:
One of the questions that we always discuss was what of this is actually people
2.0, where are people different from the way they used to be and one is just the
same old same old; people have certain things. If you could comment on that a
little bit.
0:44:46.6
Linus:
I think people are always the same. People are always concerned about
themselves and what other people think about them. I think we see that in Facebook
and eventually, in the iPhone…
Student:
…
Linus:
We actually knew you could make money off this model because it was proven by the
other people in the Facebook cloud. We knew that advertisers were paying very high
CPMs and it was a very easy thing to do. Does that answer your question?
Student:
…
Linus:
I’ll get back a little later on in the slides about how we have to re-engage them. Yes, to
answer your first question, it was a couple of thousand dollars a day, at its peak. It was a
pretty decent size of money.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 16
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
Sort of going off that, it seems like you’re optimizing a lot for getting a lot of new users…
future engagement… brand recognition…
Linus:
We’ll get to that in the metrics section. Basically, we did. Actually, that’s a good tie into
the next set. These are some of the metrics we came up with. I’m not sure why it’s
not showing very well. We spent six hours putting the application together, but we
spent about six weeks trying to collect these metrics. It was a pretty hard thing to
do. We coded it in Ruby, which luckily has a lot of packages to allow you to collect these
statistics, but it was very hard to integrate it, especially on a platform. It’s much easier on
a website, but when you have this middleman, Facebook, it’s very hard.
We had to go and figure out new ways to figure out these metrics. We collected a
ton of metrics. We would do this all asynchronistically. Every night, we compiled
gigabytes of data, and every morning we would have reports on these types of
things.
Even though we collected all this, we didn’t look at all these every day but only when we
had a certain problem we would come back and look at these. Feel free to ask me
questions on any of these. It was a lot of metrics and a very grueling process to actually
code all this stuff to record it.
Then we ran experiments. We did a lot of A/B testing. Going back to the other
question. We burned through all our users. We may have annoyed a lot of people, so
our numbers starting trailing down. When we got bought out, one of the criteria was to
bring that user base back. We had twenty million users at the time; how do you reengage them.
We did a lot of A/B testing. We did a lot of changing layouts, laying buttons and
things like that. We basically built an engine, called a sticky engine. We wanted to
see how engaging, how sticky the app was by changing all these various metrics.
0:48:15.9
What we did was we had different branches. We did A/B testing, tagged each user,
like you said, 2% of the users or 5% of the users with the new features to see how
it performed. We compiled these stats to give us graphs. Unfortunately, I don’t have
any more data so the only screenshot was the original design of the engine.
We optimized it. We optimized the hell out of it and then our conclusion was
actually that the best way to get new users was actually to do one-to-many
notifications, or one-to-many actions.
If I do an action, I inform many friends of mine. We built our whole application
around doing this type of stuff. We eventually changed “You’re a Hotty” to something
where it’s like buy and sell your friends. This was also very engaging at the time. One of
the actions we had was you would buy all these friends and a one-to-many action, for
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 17
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
example, would be you could pet them. Their value increase. You email all their friends,
notify all their friends that they’ve been petted and then you get re-engagement coming
back.
We did a lot of stuff like this. It is kind of interesting. You kind of have to do this dance
with Facebook where you do something that is kind of evil but not too evil and then they
push back and you push back and you eventually get to this equilibrium state where
everyone is sort of happy.
That’s how we would re-engage our numbers. Actually, we did better at re-engaging
users than we did in our viral growth. That was fun. Eventually, this got boring. We
came with the expectation of making a lot of money. We kind of met that; we got bought
out. Now, we’re in this company and it’s kind of boring because we have more
developers but it’s very slow. You don’t have the freedom to do what you really want to
do.
I was able to renegotiate my contract, get out of it and I moved onto something else.
Andreas:
Before we move the iPhone app, do you have any questions about what Linus did?
Student:
Did you look at the number of …or uninstalls? Do you think it makes sense to set either
… if I’m over this…
Linus:
That’s very interesting that you say that. In terms of uninstalls, it doesn’t happen very
often, actually. In the old profile, it did. It would always be under 5% or something. It
was very low. I don’t know if people just don’t know how to uninstall or they just didn’t
bother to. We didn’t really concern ourselves that much.
0:50:57.0
In terms of annoying users, that was a very interesting question. To tell you the truth; I
didn’t really care about the users. I would never spam my friends. These networks
would be bigger in the European countries and different countries. I didn’t know any of
them and they didn’t personally affect me so I really didn’t care.
To play in this game, if you want to be in this game, you have to annoy people.
Even the top comedies, like Play Fish, is one of the biggest app companies right now;
they annoy the hell out of the users. They do it subtly but they do it all the time. The
second company was the one that bought us out, actually, they do it all the time too. This
is the price of playing the game, you alienate your friends.
Student:
Do you think that devalues the platform for the brand….
Linus:
I think they do better at it now. At the time, it did significantly hurt their brand. They were
considered spam-type companies, basically.
Student:
I was wondering…
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 18
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Linus:
It’s funny, I don’t know if this completely answers your question but a lot of the
A/B testing we would get would be marginal differences, like .1% or .2%. It was
very insignificant. We would never get orders of magnitude difference on any text
changes. They always say that all these .1% add up and eventually you get to
something that really helps, but it didn’t really do much. At the end, we just kind of
did away with text changes, button placement, and things like that. We kind of
went with our intuition and figured out what it was. Then, we cracked … if you
send one too many notifications, you’ll grow, very easily actually.
Student:
Did you try any method of recording those people who were spreading the word?
Linus:
Sort of like an incentive program, right? Towards the end we did that. At the time,
when we were doing the applications, we didn’t really need to. You have to do a lot
of statistics. You have to figure out who are the most engaging users. We had
those statistics, but then you have to go out and engage them or create some kind
of incentive program to do that. We didn’t really feel we needed to do that because we
had something that worked at the time. We were meeting our goals so we didn’t need to
work harder, basically.
Student:
Do you feel that people are going to get sufficiently annoyed with this stuff sometime in
the future… or do you think it’s just going to go on… and take more and more of it?
Linus:
I thought that we reached the tipping point, at some point. Facebook did crack down.
They said, “This is getting out of control,” first with the request system. They completely
closed that down. Then, with the notification system, they closed that down. They put
more metrics, so the app developers did basically push that point. They annoyed the hell
out of the users. Now, I think it’s in this equilibrium state. I think app growth is not in the
U.S., primarily, but in different countries that are growing. They’re very popular there.
0:54:24.1
I’m not really sure. I think I’m still out on that one. In the U.S., I think it’s already reached
that point.
Andreas:
One distinction we can make here is between deep structure and surface structure.
The surface structure is some developer changing a parameter on Facebook and
allowing you to only send out N as opposed to M invitations. This can make or
break a company. You know that it’s sort of some developer who makes these
distinctions. By contrast, it’s the deep structure of getting the incentives aligned.
For instance, Google’s incentives in search are very well aligned with our
incentives to search. Google tries to show us stuff that is most relevant to us.
That way, we’re more likely to click.
I think if we just try to reverse engineer, what a company like Facebook is doing, it
is not at all as interesting as if you tried to build a two-sided market, where there is
price discovery, where we set it up so that whatever happens there is some fair
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 19
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
solution where the equilibrium is being found as opposed to the equilibrium being
set by somebody tuning something up or tuning something down.
Linus:
That’s a much better model, I agree. Are there any other questions?
Student:
…
Linus:
It was basically daily actives. It was when we were viral. All our apps were viral so we
sold at the peak. You guys are very interested in the acquisition. I should probably move
on and take more questions later.
Then, I left the company and basically started another company to do iPhone
applications. We would try to take what we learned from Facebook and try to mimic that
on the iPhone. It’s very interesting when you have a mobile phone, the games
change a little bit. You don’t really have that social aspect anymore. You have
more location-based aspects. Based on that, we tried to tailor an app to do that.
We tried to make better apps. Our first app, to the far left, was “What’s Hot”. We like to
use the word hot a lot; it seemed to work well on Facebook. You basically take a picture
of where you are. If you are on the street and see something interesting, you take a
picture, upload it to our server, it shows whoever has that app. It did fairly well. This was
the first app we created. It wasn’t the best. It did okay but not as well as we thought.
Mainly, the problem was it was our first app, but people started using it for porn. They
started taking really explicit pictures, they were very hard to manage, so we shut it down.
0:57:11.5
Based on that, we moved on to the next one. A lot of people would take pictures of
themselves and want to know how hot they were. We created a “Hot or Not” type
application on the iPhone. We built more viral loops into it. We tried to bring that
social aspect, the stuff we learned from Facebook, into the iPhone. We developed
our own invite system. It would go through your contact list and you could pick
who you wanted to SMS. It would send SMSs to come to the app.
That didn’t actually work as well, either. It was very interesting. People don’t want to
send SMSs to other people and they have to send it to people who have iPhones
for one thing, and basically try to push them to use the app, when the reward is just
figuring out how hot they are, there are multiple websites that do this, on the web, on
Facebook or wherever. That app kind of failed.
We were at a problem now. We didn’t know really what was working on the iPhone. We
decided what else is very engaging that will actually make users go out there and
actively go to their friends and pull people onto the app. We came up with this
mafia game.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 20
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
This is not rocket science. This was actually a game developed about twenty years ago,
on the TI83. It took off on the web. It took off on Facebook. You’ve probably seen
invites for this. Why don’t we just port this to the iPhone?
I don’t know how many people have played this game, but it’s literally a text-based game.
It’s very simple, but the whole game revolved around you bringing in more people to the
game, to become the biggest mobster. This turned out to be wildly successful because
people would just go out there and pull people into the game.
The app was number two on the App Store, for about a month. Our users grew. That
was that. Around this point, I started feeling I wanted to do something different. I did
annoy a lot of users. I did do a lot of things; coming to Stanford, getting a good education
in engineering, I could probably spend my engineering skills doing something better.
This is fun, it was fun to code this, it was fun to scale it, it was fun to do all these different
things, but it wasn’t very rewarding. It wasn’t something that was changing the world very
much, and frankly, it was taking other people’s stuff and being first to market, capitalizing,
and taking advantage of the situation. It was easy to do but it wasn’t very challenging.
Right now, I’ve started a non-profit that makes low-cost infant incubators for developing
countries. It’s called “Embrace,” so check it out on the web, if you are interested more in
this. I’ll take some questions and I can talk about anything. I can compare Facebook
with the iPhone, talk about where I think the future is going, talk about Embrace.
Student:
…
1:00:29.8
Linus:
This is completely different. Everything I’ve done was electrons and bits and things like
that. This is a physical product. I think some of the key skills I can bring in is just in
terms of how you run a startup, the processes you go through and what you need to do
and figuring out what is most important at a certain time. As far as general engineering
skills, I also bring those to the table. There is actually a lot of physics and engineering
that goes into this product, like how you heat – it’s basically a pouch in a sleeping bag,
and how you heat that pouch. We actually had to write code, very special code, to make
it so it doesn’t overheat, so it’s very safe. I was actually coding two weeks ago, non-stop,
to make this heater actually work. Finally, I did my focus on design, so there is a lot of
user-centric design and a lot of that type of stuff.
Student:
... back to the… on the iPhone. Did you stop because you weren’t …
Linus:
We were getting problems with Apple, but it wasn’t what we intended the thing to be. It
wasn’t doing anything. It wasn’t making money. It was giving us a lot of liability. The risk
to reward was not there to continue doing it.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 21
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Student:
…
Linus:
It’s crazy, they would go on forums, they would post everywhere. There really wasn’t a
viral loop. They would actually be very proactive and go out there and get all their friends
with iPhones to join their mob. They would have Twitter groups and things like that. It is
a very addicting game.
Student:
… pricing
Linus:
Pricing on the App Store is very interesting too. The Mafia game itself is like
Scrabble. You can’t really copyright a game. You can copyright the trademark, the
images, the text you use, and things like that. The game itself, you can’t really
copyright. We just did our own version, new text, new graphics. The game itself is the
same but everything else is different.
What we found works is to actually make it free for a little bit of time, getting a
buzz, getting it to generate on the App Store, in the free section. Then you have a
time it is free and then you start charging for it. You generate an initial buzz, initial
user base where people are going out and grabbing people, and then you slowly
charge more and more money.
Student:
Do you think that’s better than the mode where you have …
Linus:
We tried that model and it works okay but it didn’t generate as much revenue as doing
something like that, for that particular application.
Student:
The concept for the iPhone application, where you try to get users to draw them in and
get them to invite their friends…
1:03:54.8
Linus:
You’re completely right, the business models are different. More specifically, you’re just
saying how did we go about creating a business model?
Student:
How did it affect your business process…
Linus:
I always believed that when you go in you have to figure out what your expectations are.
Our expectations were always to just make money and exit quickly. For the Facebook
one, it was basically do ads. That worked really well and when the CPMs dropped, we
tried to find the exit.
For the iPhone business model, we tried many different things. We made a free app and
see how ads did. It constantly evolved and eventually turned into this. We started were
basically all the revenues were generated by ads. That actually did pretty poorly. It’s
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 22
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
very hard to get money through ads on the iPhone. I guess there are not enough users
and the CPMs are too low.
Then, we tried a free version and then a paid version. That worked fairly well, but I’m not
really sure why it didn’t do as well. You initially get this user base and then – it’s kind of
like a bait and switch. You get them and get them to pull in other people, and by then,
you charge money. I still think the business model on the iPhone App Store is not
settled, yet. It’s also very tailored the specific type of application.
Student:
How did the user response change when you changed your application from the free
mode to paid mode?
Linus:
When you go from free to paid, I don’t think the users are actually that upset. There were
a lot of users, so I tended not to read any of the emails that they sent to us. That’s
another thing, when you get to twenty million users, by the way, you get hundreds of
emails every morning. I just stopped reading emails.
I look at the empirical evidence. If the revenues don’t go down, that’s fine.
Student:
Can you talk about the viral…
Linus:
Sure, on which one? On Facebook, right? There are three components. One if the
invite rate. How many people, when they come to your app, do you require them to
invite? If it is fifteen or whatever, you get that number multiplied by the number of invites
they actually send out. In this case, we force them to send out the full fifteen, multiplied
by the conversion rate. Once the other person gets the invite, how many people
actually accept that? We found the sweet spot was about fifteen. If you force
someone to invite fifteen people, about half of them would drop out and won’t do it. The
people that do, and if the people that do send out the invites, and the conversion rate is
fairly high, you will actually get a number greater than one. It’s very simple.
1:07:03.8
Andreas:
It’s actually very intuitive. The viral loop might be scary, but it really shouldn’t be.
It is the interaction between people being asked to do stuff. I gave you the example
last time with the $10 and if you think it’s fair you keep it and if you think it’s not fair, you
have to return it. It’s the same thing here.
If people think it’s fair to invite 500 people, they will do it. If they don’t think it’s fair, then
you’re [1:07:31.6 unclean]
You are really trading off things here which are people constants. Ultimately, you
are doing experiments about what people think is fair. What is the right price to
pay? How much do I annoy my friends?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 23
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Do you think that a model where people would have, in the back of their minds,
some cost, some price they have to pay in the friendship; if I hit up Linus five
times, he’s not going to talk to me anymore, versus if I hit up Enrique two times,
he’s not going to talk to me anymore. Do you think people have models like this?
Should we try to model the cost of annoyance, the cost of interrupt? You
mentioned the cost of unsubscribe, which is super high. If somebody says, “I
don’t want this anymore,” one action to take is never to invite that person again.
The ones that unsubscribed are probably not in the market anymore.
What are your thoughts about having it in a more decision-theoretical framework?
Linus:
We actually thought about that. I would love to know how many times I can annoy
particular friends. I don’t think people think that way. They can kind of get a sense.
They can get feedback from their friends when they get annoyed.
I used to pet a lot of my friends. They would say, “Stop petting me,” and things
like that. I would stop petting them. That’s one way to get feedback. We actually
tried stuff where with the metrics, we found the users who would accept requests
more often and we tried to tailor it so that when other people came to the site and
invited those people, they would pop up to the top of the list. People who are more
susceptible to trying new applications. That’s one thing.
I rarely invited my friends. I always had fake accounts. It’s hard for me to say because I
really didn’t care about the users. I don’t know if that’s a good or bad thing.
1:07:03.8
Student:
Dr. Weigend, you said about the fairness model, right?...
Andreas:
I just gave you the example last class, where I give you $10. It’s culturally dependent
that if Matt gives Blake $.01, what is Blake going to say versus if Matt is going to give
Blake $3, and Blake says, “Okay, that sounds fair.” People have some intrinsic notions
about fairness. What we are probing now is whether the costs they have of bugging their
friends seems worth it for them.
What Linus is saying is people don’t care about their friends.
Linus:
Oh no, some people do.
Andreas:
Should we have a model, for instance, where we figure out, based on past
behavior, how much that person can bring us? What about the second order
model? If I only invite five friends, but they are really good friends and will do
whatever I ask them to do, versus Matt invites five hundred friends and they don’t
even click; did you take that into account?
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 24
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Linus:
Yes, I probably misunderstood the question. I do care about annoying my friends. That’s
why I created other accounts, to spread the word about my other app. We wanted to
find out users, and we got it from our statistics about who invites the most people,
who are most susceptible, who are good users, who liked to invite friends and
those friends actually invite other people too? It takes a long time, but you can
actually compile these statistics. That was helpful.
Student:
… before people start using your application…
Andreas:
People don’t know what they’re getting, basically. It’s like you click on a link at Google;
you don’t know what’s behind it.
Linus:
That’s why the text has to be very compelling.
Student:
Any words on future of the next cool app….
Linus:
If I had to do something right now, if I had to do it all over again today, I was wondering
what I would actually do. It depends on what you want to do. Do you want to build a
company, or do you want to make a quick exit. If you want to make a quick exit, I
would say Facebook is still a very good platform to develop on. You can get rapid
users. It’s amazing; you have more than two hundred million users. You have access to
all of them. If you want to make a big splash, just create something very innovative and
creative, or copy something and just do it better than everyone else, out execute them.
That would be a good way to do it. You can make a lot of money, still. It’s a very good
platform.
1:12:09.9
If you want to build a company, I would say mobile is probably a better idea. With
Android coming out, with iPhone being dominant, I feel the mobile space is finally going
to take off. Before this, I actually did a mobile startup. We built it on the Microsoft mobile
platform, which was not as good. I think it’s kind of dying.
The problem with doing mobile startups is there are so many different operating
systems out there. You have to work with all these different companies and
they’re also very private. They’re not as open. This was three years ago. It’s very
hard to develop on all these different phones, port your app to all these different
phones, to get them to work together.
Now, I think and hope that there is going to be one dominant player, hopefully the iPhone
or the Android. They are more open. Google is very open to opening their APIs and
working with developers. So is Apple, to some extent. Hopefully, there will be two
big players for now, and maybe later there will be one player. Then you can really
explore the space. The phones are getting more powerful. I really believe that’s the
next big market. If I had to do something, I would probably do a startup in the
mobile space.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 25
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Linus told me I can ask him five questions. These are the five questions we agreed on
when we prepared here, in the framework that you took the class three years ago and
you can look at it here.
1. The first question is what has not changed and what has remained the same? If you
just focus on what changes, you miss half of it.
2. The second one is how has the world changed and what is different now from what
was three years ago?
3. What will be there in another three years?
4. What would you have done differently?
5. What advice would you give to the class this year?
Are these fair questions?
Linus:
Yes, these are fair questions. I sort of answered some of these. What has not
changed? People have not changed. Just from the Mafia game; that was popular
twenty years ago. It’s popular now, and it’s popular on the iPhone. People tend to
stay the same, their behaviors stay the same. I think that’s a very interesting thing.
1:14:11.4
Andreas:
But let me play Devil’s Advocate here. People talk about continuous partial
attention, that people are constantly moderating what’s happening on their mobile
phones, what’s happening on their short messages, their IMs and so forth. That
certainly was not the case before. The way we deal with communication, and the
way we deal with communication and information overload is quite different from
what it was ten years ago.
Linus:
There is much more noise.
Andreas:
For instance, the way people communicate on Facebook is pretty [1:14:44.0
unclear]. You might hit me up on Facebook, on Ning, on Twitter, and I might see it
or I might not see it. By contrast, if you send me a short message, you can be
pretty sure that I will be looking at it.
Linus:
That is what has changed, though.
Andreas:
It’s both. I think it’s important to understand where we changed and where we
didn’t change. I think people changed and people didn’t change. What do you guys
think?
Student:
…
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 26
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
As we always agree, people getting paid and people getting laid. [Laughter]
Linus:
It works pretty well on Facebook, at least. It’s human behavior. The Mafia game is
inherently addicting. I have no idea why. I’m not a game player, but whatever we did, we
mimicked it. It’s addicting.
Student:
…
Linus:
That’s definitely a great point. What has changed is this whole app thing.
Applications are hot right now Three years ago, there were apps out there, it’s just
that no one had heard about them. I think Amazon even had their API opened up.
People could create little things but it just wasn’t as hot. I think Facebook was the first
player. It wasn’t hot because people didn’t know you could make money off of it. A lot of
people said, “This is just a tool I could leverage for my existing website for this existing
business. You can actually build a company just by making apps. Once people found
that out, it became hot.
I think people are more open with their information, definitely. You see all these
celebrities now, on Twitter. It’s amazing what they share and how open they are. I would
never have dreamed, three years ago, that it would have taken off so far, like Barak
Obama and things like that, it’s leveraging those technologies and showing everything
you have.
1:17:10.5
Andreas:
Another big difference is that technology has been totally commoditized. Amazon
spent a lot of money building its first round of service. Now, you can use elastic
computing, scaling, you don’t have to worry about putting anything [1:17:29.2 unclear]
anymore. The great thing about this is it has freed up our creativity, of coming up with
ideas, and testing them rapidly, fading rapidly, and then 19 out of 20, one works. The
technology we have really moved up the stack. The primitives are way higher than the
way it used to be.
Linus:
Programming language, like Ruby, it’s an easy programming language. If you
spend a week doing it, if you have some computer science background, you can pick it
up. Back then, PHP was a little easier than Python or whatever, but I really believe Ruby
is a great technology to build on. People can learn it very quickly.
What will it be like in three years? Again, I said mobile will be very big. I think data
will just become more freely shared. Services based around that will do very well,
as well.
What would I have done differently? I don’t know, actually. I have been kind of happy
with what I’ve accomplished. Maybe I wouldn’t have been so spammy, I don’t know. It’s
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 27
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
a very good experience about what really matters. I found out, through this process, that
I do like technology but I don’t like it at the expense of doing things that aren’t very
creative. I think it was a good learning experience. Maybe I would have done something
more creative, but I wouldn’t have gone down the path I am right now. I’m happy with
what I’m doing. I probably not have done much differently.
What advice would I give? The best advice, and I really think this is important, is if
you want to start a company, really figure out with your partners what the
expectations are. Make sure all your partners are on the same page. Then just
hammer at it.
Student:
…
Linus:
In terms of apps sort of dying in the U.S., I’m not saying they’re dying but people are kind
of less sensitive to it. When the first request came, people were like “What the hell is
this,” and they would go and check it out. Now, they’ve become desensitized to it. Also
Facebook changed their profile so it’s less prominent. I don’t know if their behavior
necessarily changed, it’s just that they got used to it.
Student:
…
1:20:56.9
Linus:
That’s a great question to ask [1:20:59.4 unclear], when he comes. He probably knows
more about it. I’ve been out of this game for a while. Just look at your requests; they’ve
probably gone down. I know that I’ve seen reports that international markets are
growing. Whether they’ll eventually get desensitized to it, I don’t know, I assume they
will.
Student:
…
Linus:
I’m all about giving out data, but I have to check with my partners. If you are really
interested, maybe we should talk afterwards. Ask Andreas for my email. We share data
with the research group. One of my partners was doing his PhD thesis on scaling, so he
has shared data with his old group. There are interesting insights. I think there are a
couple of papers on it.
Andreas:
Okay, given that the [1:22:08.8 unclear] café closes at 4:00 and we need our shot of
caffeine to get through the second half of class, let’s give Linus a very, very big thank
you. Let’s be back here in 15 minutes, which is shortly before 4:00. Thank you.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
Page 28
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_2ecosystems-1_2009.04.13.doc