Information Extraction (IE) is a task of identifying "facts", such as the attack/arrest events, people's jobs, people's whereabouts, merger and acquisition activity from unstructured texts. In this talk I will give an overview of the most successful techniques for each IE task and point out the remaining challenges. I will also focus on discussing the recent advances in cross-source IE (e.g. across different documents, genres, languages and data modalities). Traditional IE techniques assess the ability to extract information from individual documents in isolation. However, users need to gather information which may be scattered among a variety of sources. These facts may be redundant, complementary, incorrect or ambiguously worded. Furthermore, the extracted information from a document may need to augment an existing Knowledge Base (KB). I will discuss several new extensions to state-of-the-art IE and systematically present the foundation, methodologies, algorithms, and implementations for these advanced extraction capabilities.
Bio: Heng Ji is Edward G. Hamilton Development Chair Associate Professor in Computer Science of Rensselaer Polytechnic Institute. She received her B.A. and M.A. in Computational Linguistics from Tsinghua University in 2000 and 2002 respectively; and her M.S. and Ph.D. in Computer Science from New York University in 2005 and 2007 respectively. Her research interests focus on Natural Language Processing, especially on cross-source Information Extraction. She received Google Research Award, NSF CAREER award, Sloan Junior Faculty Award, IBM Watson Faculty Award, PACLIC2012 Best Paper Runner-up, "Best of SDM2013" paper and AI's Top 10 to Watch Award by IEEE Intelligent Systems.