Download weigend_stanford2009_7dinner_2009.05.18

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas Weigend (www.weigend.com)
Data Mining and Electronic Business: The Social Data Revolution
STATS 252
May 18, 2009
Class 7 Recommend: (Part 3 of 3)
This transcript:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Corresponding audio file:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.mp3
Previous Transcript: (Part 2 of 3):
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7recommend-2_2009.05.18.doc
To see the whole series: Containing folder:
http://weigend.com/files/teaching/stanford/2009/recordings/audio/
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Page 1
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Ron:
I’m Ron Chung, and I’m here with Nick Kallen, Yu-Shan Fung, and Nova Spivack,
along with Andreas Weigend, who wants to talk with us all about the social data
revolution.
Andreas:
In class today, we were talking about real time search. That means that we have a
time scale which is no longer the time scale of indexing and re-indexing the web,
which we are used to from Google. It is a time scale of minutes, of really seeing the
most recent stuff. The question that came out was where does that matter? Where
does it matter about what’s happening in the world right now, as opposed to the
integrated view, which has a more balanced view, going back into the past? Who
wants to take this one?
Nick:
I think I would start with asking a more basic question, which is what is the difference
between real time data and non-real time data? Is this actually a new phenomenon?
That is the first thing; we should have some skepticism. Is it clean? There is this
new type of data and it needs to be indexed in a different way.
There are real time search engines, like Twitter Search; we claim that Google is not
a real time search engine. What is the difference? Why can’t Google be a real time
search engine right now?
Andreas:
The question about real time and time scales has come up on Wall Street over the
years. It turns out that now the key differentiator is where your box sits in the
collocation facility. That the speed of light, for Wall Street trading, makes a difference.
The decisions they make by front running somebody else is very different from being the
one in second when the price has shifted.
From the question we are talking about, Nova, you gave some good examples in
your presentation about where you think real time actually makes a difference, not
noncontroversial ones, but I would be interested if you could share those with the
audience.
Nova:
I think having real time access and searching the current moment is really
important when there is some kind of live event or ongoing event that you want to
keep track of and you want to participate in it when you’re not there. There is this
need to see what’s happening, right when it’s happening, and not afterwards, when you
want to see the perspective of a lot of people, rather than reading one person’s
perspective later on, whether it’s a movie and you want to see what people think about it
right when it opens, and see a lot of perspectives, and not just one review, or it’s
something like Burning Man and you want to see what’s going on when you’re not there.
0:02:40.6
Andreas:
I would argue that there is a reason I don’t want to see ten thousand tweets about
the plane landing in the Hudson River. It’s just really a high opportunity cost, which I
could do other things in. I’m not happy if I see random stuff because I believe we can
do better than random. I’m actually surprised because let’s say at Amazon.com,
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Page 2
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
we don’t see whichever happens to be the most recent item that just arrived at the
warehouse. At Google, we don’t happen to see whichever is the latest changed
page. But, when it comes to Twitter, to the live web, to real time search, suddenly
we throw away everything we have learned about costs of interrupt, of cost of
search, of attention costs; we throw it away and succumb to that ordering which is
pretty much random, namely whichever came last. How is it possible that people
give up their hope of having someone or some group edit, or curate for them,
which actually helps increase the signal of the noise ratio and instead they just go
for random stuff?
I beg to differ whether it’s random. We’ve experimented a bit at Twitter with
presenting information different ways, using traditional relevancy algorithms to present
tweets, and honestly; we’ve found that presenting information in a chronological
order is the most relevant way to present the information to the end user.
Nick:
That’s a function of the type of questions that people are asking of Twitter Search.
When you ask a question of Twitter Search, it tends to be the case that the
information is less accurate, the older the information is. The kind of relevancy, the
contemporaneous nature of the tweets that are being produced, as you’re issuing
the query, they’re more relevant because they’re of a temporal character. They’re
about events that are unfolding. I’m at a conference and I want to know which talks
are interesting. I’m going to see a movie tonight and I want to see where the crowds are
or I want to get tacos at the Korean Taco Truck; where is it right now. Something crazy is
happening; there is smoke on Geary Street; what’s going on? Really, the accuracy of
the data is as much about the recency of the data as it is about anything else.
Yu-Shan:
I would like to challenge; is that really the right dichotomy to be thinking about? Does it
have to be chronological versus relevance? If you think about search, the Holy Grail
of search is the only one to return one result. In that case, the presentation question
of whether it’s chronological or relevant, it doesn’t matter; you want to find the one
thing you want to know at that point. In that case, shouldn’t you really want to
factor both the temporal nature of the tweets, and take into account how long ago
it has been said and so forth, and really come up with what is going to be most
relevant for that given moment?
Ron:
Are you talking about in the context of Mr. Tweet?
Yu-Shan:
Not necessarily, we’re talking about real time search. I don’t really see that it has to be
one way or the other.
0:06:19.6
Nick:
I agree that these are not mutually exclusive, although I would take issue with the
comment you just made that the Holy Grail of search is presenting one result. We, as
people who are thinking about these issues, should have mental models of what
exactly is the purpose of a search query. Are people asking questions that they’re
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Page 3
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
hoping to get a factual answer to, like who killed John F. Kennedy, or when was soand-so born? If so, then there is a single document that answers such a question. If
that’s not the class of queries that people are asking, then maybe it’s not the Holy Grail,
one single result.
Nova:
Even with factual questions, there could be different answers. You could ask a
question about geo-political boundaries and the answer differs with time. In 1960,
a country might be inside another country and then afterwards it’s independent.
Similarly, even in the present, different parties might disagree about how to represent the
world.
I think you need to provide multiple perspectives, even when you think you know
the answer; you should still show alternatives. I think when we look at the real
time domain; the purpose of the real time domain is really to see what’s going on
in the present. From that perspective, there are many things going on in the
present, and there is no one right answer.
Andreas:
My perspective is that dialog is the mode interactivity, where as in real life, we
don’t give speeches when we talk with our friends. Those people who do tend to not
fair all that well with their friends. We like to throw something out and see what
comes back.
What I don’t understand is that dialogical mode is not as supported as it should be
by search engines. We saw Twine today, and what I really enjoyed there was that
you seem to be supportive, saying, “Here are the number of categories that are
coming back; which one are you interested in?”
Nova:
Where Twine is heading is increasingly towards helping users filter the web and
track their interests. To do that, we have to understand what they really want. It’s
really hard to ask a user to tell us that when they do their query because often they
don’t even know.
What we can do is, based on what they tell us, we can then say, “Here are some
other things we think you might be interested in,” to keep narrowing it down. It’s
kind of a dialog interaction where the user gives us a little information; that
enables us to give them some possible next steps and we keep going like that.
That is missing in a lot of search engines, today, and even in websites. I think this
conversational interface is coming back, and certainly, with the rise of real time and the
stream and the services like Twitter, conversation is actually becoming more and
more a part of the web. I think it will also become part of the user interface, as
people get more and more used to it, being able to converse with search engines
and other applications.
0:09:41.9
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Page 4
Transcript of Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
Stanford University, Dept. of Statistics
Andreas:
Throughout history, we have realized that a new technology which has been
deployed tends to imitate the old technology. Then we found out that television really
wasn’t just a better radio. We found out that the web definitely wasn’t just a better
television. I think what we are going to see soon is that Twitter is not just a better
messenging service. I am personally very excited to see what people are going to
use these things for, and to support new users, and then to see what works and
what doesn’t.
Ron:
I would like to thank everyone for coming and chatting a little bit about real time
messenging systems from Twitter, the Twine and its search engine-like
categorization system, and Mr. Tweet with its relevancy on Twitter users, and lastly
Andreas Weigend for really driving the social data revolution.
Transcript by Tamara Bentzur, http://outsourcetranscriptionservices.com/
http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_7dinner_2009.05.18.doc
Page 5