Download ppt

IBM Research Information Flow Prediction and People Mining Ching-Yung Lin IBM T. J. Watson Research Center May 27, 2007 5/27/2007 | Information Flow Prediction and People Mining | Ching-Yung Lin © 2007 IBM Corporation IBM Research Data Flow through an Internet Gateway..  10Gbit/s Continuous Feed Coming into System  Types of Data • Speech, text, moving images, still images, coded application data, machine-to-machine binary communication  System Mechanisms • Telephony: 9.6Gbit/sec (including VoIP) • Internet  Email: 250Mbit/sec (about 500 pieces per second)  Dynamic web pages: 50Mbit/sec  Instant Messaging: 200Kbit/sec  Static web pages: 100Kbit/sec  Transactional data: TBD • TV: 40Mb/sec (equivalent to about 10 stations) • Radio: 2Mb/sec (equivalent to about 20 stations) 2 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Network Monitoring and Stream Analysis rtsp ftp tcp ip http udp Advanced content analysis rtp audio video sess Interest Routing Interest Filtering keywords id Interested MM streams sess ntp per PE rates 3 Dataflow Graph Packet content analysis Inputs 200-500MB/s ~100MB/s 10 MB/s By IBM Dense Information Gliding Team 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Borrow this from Hoover... 4 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research One of the issues – Speech Recognition, Speaker & Social Network Detection Stream A Speaker Detection Olivier Denoising & Social Network Analysis Mihalis talks to Upendra Ching-Yung talks to Stream B Stream C Deepak After denoising Stream D - Social network - Fusion technique - Iterative method What can be achieved by combining content analysis and social network analysis? 5 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Challenge – every node in the network is unique Photo Source: New York Times, 3/2/2005 6 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Part I: Dynamic Probabilistic Complex Network and Information Flow 7 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research The Most Difficult Challenge: State-of-the-Arts?  Our Objectives: Find important people, community structures, or information flow in a network, which is dynamic, probabilistic and complex, in order allocate resources in a large-scale mining system.  Social Networks in sociological and statistic fields: focus on (1) overall network characteristics, (2) dynamic random graphs, (3) binary edges, etc.  Not consider probabilistic nodes/edges or individual nodes/edges.  Epidemic Networks & Computer Virus Network: focus on (1) overall network characteristics – when will an outbreak occurs, (2) regular / random graphs.  Not focus on individual nodes/edges.  (Computer) Communication Networks: focus on (1) packet transmission – information is not duplicated, or (2) broadcasting – not considering individual nodes/edges or complex network topology.  WWW: focus on (1) topology description, (2) binary edges and ranked nodes (e.g., Google PageRank)  Not consider probabilistic edges 8 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research What is a Dynamic Probabilistic Complex Network? 9 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Modeling a Dynamic Probabilistic Complex Network  [Assumption] A DPCN can be represented by a Dynamic Transition Matrix P(t), a Dynamic Vertex Status Random Vector Q(t), and two dependency functions fM and gM.  p1,1 (t ) p 2,1 (t )  p (t ) p (t ) 2,2  1,2     p1,N (t ) p 2,N (t )  p N,1 (t )  p N,2 (t )  ,   p N,N (t )   Pr( yi , j (t )  SE1 )   Pr( y (t )  SE )  i, j 2  pi,j (t )  , q (t )   i    Pr( yi , j (t )  SE E )   Pr( xi (t )  SV1 )   Pr( x (t )  SV )  i 2   ,     Pr( x ( t )  SV ) i   V   P( t ) where Pr( y   i , j (t ) E  SE )  1, Q(t )  q1 ( t )   q (t )   2   ,     q N (t )  P(t   t ) f M (Q(t ), P(t )), and Q(t   t ) g M ( P(t   t ), Q(t ), P(t )), Pr( x (t )  SV )  1,   i V and xi (t ) : the status value of vertex i at time t. yi , j (t ): the status value of edge i →j at time t. 10 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Information Flow in Dynamic Probabilistic Complex Network (Let’s call it: Behavioral Information Flow (BIF) Model)  [Assumption] Edge can be represented by a four-state S-D-A-R (SusceptibleDormant-Active-Removed) Markov Model. Nodes can be represented by three states S-A-I (Susceptible-Active-Informed) Model. P( t )  p1,1 (t ) p 2,1 (t )  p (t ) p (t ) 2,2  1,2     p1,N (t ) p 2,N (t )  p N,1 (t )  p N,2 (t )  ,   p N,N (t )  Q(t ) P( t   t )  q1 ( t )   q (t )   2   ,     q N (t )  f ( M, Q(t ), P(t )), and Q(t   t ) g (P(t   t ), Q(t ), P(t )), where  Pr( yi , j (t )  S )   Pr( y (t )  D )  i, j  pi,j (t )    Pr( yi , j (t )  A)    Pr( y ( t )  R ) i, j    i , j     i, j  ,  i , j     i , j   i, j  i, j  i, j  i, j  1 11  Pr( xi (t )  S )  qi (t )   Pr( xi (t )  A)   Pr( xi (t )  I )  i    ,  i  i  i  i   i  1 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Major Difference between BIF and Prior Modeling Methods in Epidemic Research and Computer Virus Fields  Prior Models:  Model Human Nodes as S-I-R (Susceptible, Infected, and Removed).  Did not consider individual node’s behavior different in network structure/topology  did not consider edge status.  We propose to model edge status as (autonomous) S-D-A-R Markov Model (Susceptible, Dormant, Active, Removed)  We propose to model human node behavior as S-A-I (Susceptible, Active, and Informed). 12 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Edges are Markov State Machines, Nodes are not  State transitions of edges: S-D-A-R model. (Susceptible, Dormant, Active, and Removed) This indicates the time-aspect changes of the state of edges. 1  1  trigger 1    D S A R Edge view 1  States of nodes: S-A-I model. (Susceptible, Active, and Informed) Trigger occurs when the start node of the edge changes from state S to state I : S A I trigger Node view 13 Network view 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Edge State Probability and Network Configuration Model  Nodes and Edges P(t   t )  f (M, Q(t ), P(t )),  Network Configuration Model (which is learned by training). It includes the network topology information, long-term edge probability, and delay parameter). M ( 2,1 ,  2,1 ,  2,1 )  (1,1 , 1,1 ,  1,1 )  ( ,  ,  ) ( 2,2 ,  2,2 ,  2,2 )  1,2 1,2 1,2    (1, N , 1, N ,  1, N ) ( 2, N ,  2, N ,  2, N )  ( N ,1 ,  N ,1 ,  N ,1 )  ( N ,2 ,  N ,2 ,  N ,2 )  ,   ( N , N ,  N , N ,  N , N )   i,j = 0  No Edge between i and j  Our KDD 2005 paper is a special case that i,j =1 or 0, and did not model (i,j ,i,j ) 14 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Define Edge State Probability Update Function Edge State Probability Update function f(.) s.t.: P(t   t )  f (M, Q(t ), P(t ))  Given three different cases: 1. On trigger: xi (t   t )  I , xi (t )  I 0 0  i, j   0      1  i , j 0 i, j   i, j  pi,j (t   t )    i, j   0 i , j 1   i , j    0  i, j  i, j  1  i , j 1   trigger S 1   D A 0  i , j  0  i , j  F  pi,j (t ), 0  i , j    1  i , j  2. No trigger – node not informed yet: xi (t   t )  I , xi (t )  I pi,j (t   t )  pi,j (t ), 3. No trigger – node has been informed: xi (t   t )  I , xi (t )  I pi,j (t   t )  F  pi,j (t ),  Therefore, consider the probabilities of node states, then we get f(.): pi,j (t   t )   i  F  pi,j (t )  (1  i )  pi,j (t ) 15 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation R IBM Research Nodes: State Transitions Determined by Incoming Edges Q(t   t )  g(P(t   t ), Q(t ), P(t )),  Node State Probability Update Function g(.): S   (1  n ,i ) 0  n V ,i i  q i (t   t )  i   1  (1  n ,i ) (1   n ,i ) n ,i    nV ,i  i   nV ,i  0 1 (1   n,i ) n ,i  nV ,i  where     Pr(n {1  1 A  0   i  0 i      i  1   I trigger Q  q i (t ), N }, yn,i (t   t )  R, yn,i (t )  A)  (1   n ,i ) n ,i nV ,i and V,i is the set of all source nodes of the incoming edges of Node i: V ,i  {n | n  {1 N },  n ,i  0} Network view 16 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research An Application of Information Flow Prediction – find important people  Who are the most likely people to talk about this information at a specific time given the current observation? (m, n)  arg m,n{1 N } max( m,n (t   )) given Q(t ) or (P(t ), Q(t ))  For a given concrete observation, the values in the given priors P(t ), Q(t ) are either 0 or 1.  For speaker recognition results, the priors can be confidence values between 0 ~ 1. 17 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Case Study I – Switchboard data from 679 people  Monte Carlo Method: Simulate each DPCN information flow for 1000 times.  It takes 12 seconds to use MC simulation to predict the process. (For a given model and test all 679 nodes, it takes a PC 130 mins for calculate the probabilities if the information flow starts from different 679 seeds). The Probabilities of the Nodes Receives Information 0.3 SeedID100 0.25 0.2 0.15 0.1 0.05 18 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center 676 649 622 595 568 541 514 487 460 433 406 379 352 325 298 271 244 217 190 163 136 109 82 55 28 1 0 © 2007 IBM Corporation IBM Research The distribution histogram of the alpha values of the edges in the Enron dataset. 100000 All Topics Market Opportunity California Market North America Product 10000 1000 100 10 1 0 0+ 19 ~ 1 0. 0. 1+ ~ 2 0. 0. 2+ ~ 3 0. 0. 3+ ~ 4 0. 0. 4+ ~ 5 0. 0. 5+ ~ 6 0. 0. 6+ ~ 7 0. 0. 7+ ~ 8 0. 0. 8+ ~ 9 0. 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center 9 0. + ~ 1 © 2007 IBM Corporation IBM Research Noise Factor I – Impact of Classification Error from Speaker Recognition  Assume the classification precision rate on the speaker (node) i is fi, and the false alarm rate on the speaker i is φi.  Then the expected number of times that the node is counted is: K  fi K  i  2 Z  And the link is counted is: L  fif j L  i j Z Z K fi K φi 2Z  Therefore,   L  fif j L  i j Z truth detected i, j fi K  i  2Z K  If we assume a universal precision and false alarm rate at all speakers, then: L f2L   2Z i , j   K fi K  i  2 Z Assume the average waiting time of links and the average transmission duration of links are the same regardless of the links observed, then: i, j  i, j and  i, j   i , j  If we assume the false alarm rate is small and can be neglected when the number of nodes is large, then i, j  f  i, j 20 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Speaker Recognition Accuracy can be Improved by Fusion of Original Speaker Recognition and Predicted Node Probability  We can use this fusion method to combine both speaker recognition result and the estimated node probability: fi  fi  i fi  i  f i ,k  k k which is guaranteed to be increasing when  i   k Speaker i Recognizer Speaker i Recognizer BIF Prediction 21 fi fi ,k1 fi,k 2 fi,k 3 Before Fusion fi   i fi 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center After Fusion with BIF Prediction © 2007 IBM Corporation IBM Research Recognition Result from Switchboard-2 Telephone Conversation Set 1.2 1 Node 218, no false alarm 0.8 Node 164, no false alarm Node 218, mutually confused Node 164, mutually confused 0.6 0.4 Node 218, prob. false alarm = 0.3 0.2 Node 164, prob. false alarm = 0.3 0 0 1 2 3 4 5  Improvement on Recognition Accuracy on Node 171. The x-axis is the time that model is updated based on the recognition result after fusion. The y-axis represents the recognition accuracy. In the six testing cases, the Node 171 is usually confused with Node 218 or Node 164. In the first two cases, there are no false alarm from the classification of Node 218 or 164. In the next two cases, they are usually confused with each other. In the last two cases, the false alarm from Node 218 or 164 is 0.3. 22 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Case Study (II) – our experiments on Enron Emails 23 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Modeling and Predicting Topic-Related Personal Information Flow  Content-Time-Relation Model Combine content, time and social relation information with Dirichlet allocations and a causal Bayesian network. [ Song et al., KDD, August 2005] (1st paper combining content analysis and social network analysis) ad     A f t Given the sender S z w T r and the time of an email: 1. Get the probability of a topic given the sender 2. Get the probability of the receiver given the sender and the topic N  D Tm : observations 3. Get the probability of a word given the topic a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/emailr: receivers, w: content words, N: Word set, T: Topic 24 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center Boxes represents iteration . © 2007 IBM Corporation IBM Research Corporate Topic Trend Analysis Example: Yearly repeating events Topic45(y2000) Topic Trend Comparison Topic45(y2001) Topic19(y2000) 0.03 Topic19(y2001) Popularity 0.025 0.02 0.015 0.01 0.005 0 Jan Mar May Jul Sep Nov Topic 45, which is talking about a schedule issue, reaches a peak during June to September. For topic 19, it is talking about a meeting issue. The trend repeats year to year. 25 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Topic Detection and Key People Detection of “California Power” Match Their Real-Life Roles Popularity T opic Analysis for T opic 61 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0 Jan-00 Key Words Key People (a) Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01 power 0.089361 California 0.088160 electrical 0.087345 price 0.055940 energy 0.048817 generator 0.035345 market 0.033314 until 0.030681 Jeff_Dasovich 0.249863 James_Steffes 0.139212 Richard_Shapiro 0.096179 Mary_Hain 0.078131 Richard_Sanders 0.052866 Steven_Kean 0.044745 Vince_Kaminski 0.035953 Event “California Energy Crisis” occurred at exactly this time period. Key people are active in this event except Vince_Kaminski … 26 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Social Network of Enron Managers  If we try to find out social networks based on all communications, it is difficult. 27 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Information Flow in Enron – California Market  Actor 151 (Rosalee Fleming — the Enron CEO Ken L.’s assistant) is the key information spreader of this issue. 28 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Information Flow in Enron – Market Opportunities  Rosalee Fleming also played an important role at “Market Opportunities.” She received info from Actor 119 (Mike Carson) and Actor 23 (James Steffes – VP of Gov. Affairs of Enron.)  Actor 68 (Rod Hayslett -- CFO) is also a major information spreader. 29 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Information Flow in Enron – North American Products  Two disjoint communities can be observed. Actor 21 (Keith Holst) and Actor 142 (Dan Hyvl) are the main bridges of the two communities. 30 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research This kind of analysis is wonderful, but..  We cannot wait until our company has scandle and bankrupts....  What kinds of applications can be valuable out of network analysis? 31 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Part II: Small Blue 32 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Social Network -- A key differentiator for corporate performance Informal social network within formal organizations is a major factor affecting companies’ performance: Krackhardt (CMU, 2005) showed that companies with strong informal networks perform five or six times better than those with weak networks.  Brydon (VisblePath, 2006) showed that the performance gain of companies utilizing social networks: • 16x at sales • 4x at marketing • 10x at hiring 33 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research We hope social network and expertise mining can dramatically increase our colleagues’ knowledge and collaboration 34 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Social Networks -- Beyond the organizational chart  Organization charts are not the best indicator of how work gets done  Senior people are not always central; peripheral people can represent untapped knowledge  Making the network visible makes it actionable and becomes the basis for a collaboration action plan Source: Cross, R., Parker, A., Prusak, L. & Borgatti, S.P. 2001. Knowing What We Know: Supporting Knowledge Creation and Sharing in Social Networks. Organizational Dynamics 30(2): 100-120. [pdf] Provided by Drs. Tony Mobbs and Kate Ehrlich, IBM 35 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Group and Roles Central people  Sam. Could be bottleneck or holding group together Andy Frank Indojit Carl Peripheral people  Earl. Goes to others but noone goes to him for information. At risk for leaving. Potentially unrealized expertise Karen Darren Bob Sam Ming Neo Sub-groups  Group split by function. Very little information shared across groups Leo Earl Gerry Harry Jeff Marketing Finance Manufacturing This slide is excerpted from SNA Theory, Concepts and Practice by Dr. T. Mobbs, BCS and Dr. K. Ehrlick, Research 36 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Some Roles are especially critical What happens if Sam leaves the group through layoffs, job reassignment, attrition, merger, retirement? Andy Frank Indojit Carl Karen Darren Bob Ming Neo Leo Earl Gerry Harry Jeff Marketing Finance Manufacturing This slide is excerpted from SNA Theory, Concepts and Practice by Dr. T. Mobbs, BCS and Dr. K. Ehrlick, Research 37 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Relationships are multi-dimensional and (traditionally) uncovered through network questions Awareness Actions Emotional Communication How often do you communicate with this person? Awareness I am aware of this person’s knowledge and skills Trust I believe there is a high personal cost in seeking advice or support from this person Innovation How often do you turn to this person for new ideas Valued Expertise How likely are you to turn to this person for specialized expertise Access I believe this person will respond to my request in a reasonable and timely manner Advice How often do you seek advice from this person before making an important decision? Learning How likely are you to rely on this person for advice on new methods and processes Energy I generally feel energized when I interact with this person Provided by Drs. Tony Mobbs and Kate Ehrlich, IBM 38 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Personal Network preferred source for information and collaboration Forces: • Time Constrained • Delivery activity focus • What gets measured gets done • Expedience • Perceived value (return on time investment) Personal Network • • • • • • fast turnaround of request specific response Small # relevant items returned recommendation of quality ability to quickly understand the supplied resource & determine relevant parts additional context / value-add info not available in electronic materials Preferred / primary mode High reliance on: • 50% ~ 75%: Personal networks (Gartner Report, 2006) • Hard-drive materials • What has worked for them previously (personal experience) ? GBS Practitioner with task in project / delivery environment client client W3 Stub W3 Stub W3 Stub / client W3 Stub / Client W3 Stub / client W3 Stub W3 Stub / client Project Tools Project Repositories Knowledge View PSN Methods Education Communities Other w3 content Collaboration Existing Resources Provided Standalone, disparate, poor integration, large number of sources, steep learning curve (identify, understand & synthesise into specific work context), difficult to locate, choose & use. • leads to • • Under utilisation of electronic products and services. Content has lower performance impact / not realising full potential benefits. Widely inconsistent working practices.  Who knows what? How to reach them? Who plays what hidden roles? 39 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Mining Expertise, Interests and Social Network  People can be “known” by:    public resources: • publications • personal webpages • blogs • presentations • wiki organizational resources: • patent applications • bluepages personal resources: • emails • instant messaging • meeting • phone calls • face-to-face interactions public timely & abundant resources for expertise modeling private  Expertise can also be inferred by her friends’ recommendations or expertises. 40 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research SmallBlue Clients (Distributed Automatic Social Sensors)  Other IBMers’ EgoNets  Other IBMers’ Expertise Inferences  My personal network (Ego net) inferred from my Notes emails in server/local/archive and SameTime chats External Data Bluepages BlueGroups CommunityMap BlogCentral IBM Forum KnowledgeView Social Bookmark  I cannot see their communicati ons, EgoNets nor Expertise Inferences  Inference of my understanding on my friends’ expertise  user search experts or person SmallBlue Find  social network analysis of Top-K experts  social network analysis of a list of people SmallBlue Connect SmallBlue Ego  Corporate-wise ranked experts  My friends’ social values to me SmallBlue Inference Engines and Servers  Ranked experts in my extended personal network, in a business unit and/or in a country  Only Public Information is shown  how to reach a person  social network info  Evolution of my Ego net SmallBlue Reach SmallBlue Expand  Who I may want to know.. Which communities I may want to join.. Which documents I may want to look at Private & Personalized 41  My social paths to her: which friends can introduce her, which friends work with her, ..  trust, awareness, collaboration.  Her public postings, profiles, and communities to judge whether she is the right person. Public  social network analysis (SNA): who are the key persons in this network? who are the major hubs? who are the major bridges? Public & Personalized 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center  SNA of a formal group, a bluegroup or a community © 2007 IBM Corporation IBM Research Major Use of SmallBlue Find  Find out who are the experts of any search terms. (Right now, zillions of possible terms.)  Rank them based on collaborative expert recommendation  Can show experts based on:  whole corporate-wise  business unit  country  my personal proximity 42 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Collaborative Expert Recommendation  Combine everyone’s knowledge of the expertise of our colleagues.  The more recommendation from more colleagues, the higher the score.  The more recommendation from my trusted colleagues, the higher the score.  The higher recommendation score from colleagues, the higher the overall score. Combining all IBMers’ knowledge, we can make an advanced expert finding search engine. Utilizing the expert search engine, we can enhance all IBMers’ knowledge and social connections. 43 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research SmallBlue Reach Paths help users to reach another person  SmallBlue Reach Paths show the shortest paths for me to reach a person up to 6 degrees away.  SmallBlue Reach Paths can be initiated from any one of three SmallBlue applications.  Can be used for:  Access -- knowing who can help introducing me to this person.  Trust -- knowing who in my social networks knows this person.  Get Familiar with – knowing what kinds of people are contacting to this person.  Initiate Communication – who do we know in common. 44 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research SmallBlue Ego  How healthy is my personal social capital?  What is the social value of Alice to me?  What are the changes and trends of my social capital evolution?  For instance, I have to talk to Alice soon. She is valuable to me in terms of social connections and she is getting out of the Ego net circle.. 45 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research SmallBlue Connect  Enterprise Social Network Analysis Tool  Showing Social Networks of people based on:  expertise key words  formal hierarchy  Any list of emails  Utilizing Social Network Analysis to show:  who are the important hubs among experts  who are the important bridges linking groups 46 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Privacy Consideration – Bottom Line  Employees’ communications (e.g., time, from, to, cc, subject, content of emails, SameTime, etc.) are NOT searched nor retrievable to anyone.  Employees’ knowledge of other employees are INFERRED. Only the aggregated inferred knowledge is searchable. It is NOT possible to guess which part of aggregated inferred knowledge is contributed by whom.  In the social network analysis graphs, people relationships are modeled by their multimodal generic relationships. NO clue for their communication content.  Only the employees’ outgoing emails & instant messages and the portion that was authored by the employee is utilized.  Anyone can suggest keywords not be searched, search terms that should not find him, or ask to remove from the system. 47 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Preliminary User Evaluation Scores 5 – very satisfied 48 5 4 3 2 1 Capability 24% 42% 17% 17% 0% Usability 28% 33% 5% 25% 10% Search 10% 43% 23% 22% 2% Reliability 28% 38% 17% 12% 5% Performance 15% 45% 25% 13% 3% Privacy 29% 34% 34% 3% 0% Personal Network 15% 50% 13% 23% 0% Overall Satisfaction 17% 49% 17% 15% 2% 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Demo 49 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Coincidence ??  SmallBlue Ego Trial Release (8/21) 50 SmallBlue Find and Connect Trial Release (9/20) SmallBlue on TAP (11/07) 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation IBM Research Acknowledgements  Thanks to the SmallBlue Team Members:               Vicky Griffits-Fisher, Kate Ehrlich, Christopher Desforges, Michael Ackerbaruer, Reynold Khachatourian, Irina Fedulova, Ekaterina Zaytseva, Jeffrey Borden, Jennifer Xu, Yi Gu, Jie Lu, Dima Rekesh Belle Tseng Xiaodan Song  Contact: Ching-Yung Lin ([email protected]) ( http://www.research.ibm.com/people/c/cylin ) 51 5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt