Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Shanxi HPC Research Center NLP and Big Data Xiaoge LI [email protected] WBDB2013, Xi’an, China Introduction Internet is a big knowledge base unstructured NLP & IE “understand” human language Unstructured data Structure data Problems Human language changed Let Google it ! Net language ( LOL, 给力) compounds words (JFK airport) Domain knowledge Domain specific training sets Chinese tokenization 小菊/ nr /的/u/生活/ vn /很/d/给/v力/ vg 小菊/ nr /的/u/生活/ vn /很/d/给力/a NLP need big data Unsupervised (weekly supervised)learning knowledge acquisition Relationship New words NE gazette System Architecture Knowledge acquisition NLP & IE HDFS information fusion Entity graph Map Reduce HBase Linux Cluster knowledge acquisition Large scale Corpus from Web Weekly supervised learning Bootstrapping technique Map reduce,Hbase Location NE and new word P = 87.28%, 72.1% Chinese NLP & IE engine Pipeline FST & statistic mixture model Input:plain text Out : structured XML Map reduce Speed: 500KB/s in 10 nodes Information object Profile and Event Information Object 事件 Name Entity Person Organization Location Product Time Pre-defined Event General Event Example Profile In Concept-Based Profile, its attributes are filled by its participant profiles. Information Network NLP IE • • • • Tokenization POS Sallow parsing Deep parsing Cross document information fusion • • • • NE tag CE linkage NE Profile Profile Merge • Information Object network • Vertex: Profile • Edge : relationship Cross Document Information fusion Hierarchical Clustering Map Reduce Hbase Half Million Profiles Computing complexity P=94.65% R=88.24% F= 91.33% Information Graph multi-dimension Orange: location Gray: organization Blue: Person Source: 2012 People’s daily Query: China Agricultural University Expand 1 level Organization-Organization Network Query: China Agricultural University filter: Organization Location-Personal Network Query : 青岛港, filter:Location Person-location Network Query: 金日成 Future Work Query Language Graph Mining Enhance NLP Engine visualization Questions? Thank you