Visit
Fri 09/23
Wei Xu headshot

LMSS @ Cornell Tech: Wei Xu (Georgia Tech)

Importance of Data and Controllability in Neural Language Generation

Natural language generation has become a popular playground for deep learning techniques. In this talk, I will demonstrate that creating high-quality training data and introducing controllability over different editing operations (such as paraphrasing, sentence splitting, etc.) can lead to significant performance improvements that overshadow gains from model variations. In particular, I will focus on the text simplification task that improves text accessibility, including: (1)  a monolingual word alignment model that can identify semantically related text spans between two sentences for analyzing human editing operations; (2) a controllable text generation approach that incorporates syntax through pairwise ranking and data argumentation; (3) a neural conditional random field (CRF) based semantic model to create parallel training data. I will also briefly discuss our other work on large-scale paraphrase acquisition from Twitter.

Speaker Bio

Wei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology. She was a postdoctoral researcher at the University of Pennsylvania. Her Ph.D. is in Computer Science from NYU. Her research lies at the intersections of machine learning, natural language processing, and social media. She receives the NSF CAREER Award 2022, NSF CRII Award 2018, Best Paper Award, COLING 2018, Criteo Faculty Research Award 2018, and CrowdFlower Ai for Everyone Award 2018.