Projects

 Ongoing Projects

Proficiency in argumentative writing contributes to one's academic and professional success. However, the Nation's Report Card shows that most adolescents are not skilled in argumentation and frequently experience difficulty when comprehending arguments and constructing well-rounded essays. This project will generate new insights into artificial intelligence and human-computer interaction capabilities for enhancing student learning of argumentative writing. The proposed research will advance the understanding of how people learn argumentative writing and argumentation. The project will improve the state-of-the-art in natural language processing by developing techniques for argument mining and argument quality measurement. 

Media narratives play a critical role in echoing, influencing, and reinforcing public opinion. Yet, it is challenging to represent narratives structurally to empower efficient extraction, let alone to understand their dissemination and influence. This project’s objectives are: (1) designing a computational framework with a unified narrative representation, grounded in social psychological theories, (2) gaining deeper insights into how narratives emerge and spread through the lens of argumentation, and (3) providing practical quantitative measures of narrative influence. 

This project aims to build text summarization systems that can understand and aggregate information from long documents, so as to allow users to explore their content with summaries that are generated in styles they prefer. The summarization tools will make long documents more accessible and comprehensible, easing the knowledge learning experience of the general public. Researchers and practitioners can also use the tools to summarize long documents relevant to their work, and educators can incorporate them in their classes to bolster students' reading and writing skills. The project also broadens the investigator's efforts of engaging young students in immersive research opportunities, allowing them to participate in the design and implementation of advanced summarization systems.

Effective and Fine-grained Feedback for Enhanced Language Model Reasoning and Alignment
LG AI Research

While large language models (LLMs) are dominating the NLP scene, they continue to display numerous shortcomings, especially in complex reasoning problems. For instance, LLMs often formulate reasonings that contain logical fallacies and lack common sense. For complicated tasks involving multiple steps, carefully curated instructions are essential for LLMs to generate quality solutions. The primary objective of this project is to enhance LLM reasoning and alignment by (1) improving the NL feedback that is automatically generated by LLMs, and (2) creating fine-grained feedback to allow LLMs to be trained with dense and detailed signals.

Multi-Document Reasoning with Large Language Models
LG AI Research

Large language models (LLMs) have demonstrated impressive performance across a variety of knowledge-intensive tasks, such as text summarization, question answering, and math reasoning. However, LLMs still struggle with multi-document (MD) tasks that require unique reasoning skills over heterogeneous resources of information. The primary goal of this project is to analyze and improve the multi-document reasoning capacities of large language models (LLMs). 

This project aims to build computational systems to detect and quantify how media ideology affects the creation and presentation of news at the level of articles and their constituent events. This project will promote the transparency of news production and enhance public awareness of media decisions. The developed tools can effectively and efficiently support the measurement of media ideology at organization- and article-levels, which facilitates research in broad areas, including political science, social science, and communications. The proposed research will involve graduate and undergraduate students from a diverse array of backgrounds, especially underrepresented groups. The developed datasets and methods will form the basis of modules in newly developed courses. The knowledge produced in the project will be distributed to the public via demos, published blogs, talks at podcasts, and guest essays to newspapers.

This project investigates design processes where the unmet needs of users are elicited from social media, online forums, and e-commerce platforms, and translated into new concept recommendations for designers using artificial intelligence (AI). The motivation stems from the growing abundance of user-generated feedback and a lack of advanced computational methods for drawing useful design knowledge and insights from that data. The research will establish a rigorous computational foundation that (1) enables large-scale elicitation of user needs from online reviews using advanced natural language processing (NLP) algorithms, and (2) translates the elicited needs into the visual and functional aspects of new concepts using novel generative adversarial networks (GAN) algorithms. The theoretical innovations will advance the fundamental understanding of how AI can augment the performance and creativity of designers in early-stage product development processes. This project will boost national competitiveness in innovation by creating tacit opportunities for designing innovative, inclusive, and competitive products. The convergent research team will create outreach initiatives for STEM students, teachers, and underrepresented minorities, and engage with industry and research stakeholders to ensure technology-market fit and successful dissemination.

 Past Projects

Knowledge-grounded Scientific Reasoning
LG AI Research

Teaching machines to reason like humans has received significant attention in the past decades. This project aims to empower large language models with knowledge-grounded reasoning skills. First, we will develop a new reasoning framework that is equipped with knowledge indexing and retrieval ability to support grounded inference, analysis, and content creation. Second, we will investigate new decoding methods that allow the inclusion of knowledge constraints that branch multi-step reasoning processes. Finally, we still study new evaluation methods that measure models' true reasoning ability beyond fact memorization.

Understanding, evaluating and generating arguments are all crucial elements in the decision-making and reasoning process. Not surprisingly then, a multitude of arguments are encountered and constructed on a daily basis as decisions are made at work and at home, in our social life and in our civic life. In spite of their ubiquity in our lives, most people are not particularly skilled in the interpretation or generation of arguments. At best, making sense of the often massive amount of argumentative online text on a topic of interest remains a daunting task. And while numerous tools exist for representing, modeling and visualizing arguments and argumentative discussions, they are limited by the substantial human effort required to input, organize and annotate arguments for use by the tools. Thus there exists a pressing need for, and this project aims to develop, automated techniques from the field of Natural Language Processing to support all facets of argumentation. This project will have a wide array of broader impacts, including providing other researchers with annotated datasets and tools for the analysis and generation of arguments, enhancing education through graduate and undergraduate mentoring, and promoting STEM education diversity through programs for middle and high school girls.

Reasoning with Large Models
LG AI Research

Large pre-trained language models have obtained impressive performance on diverse NLP tasks. However, their reasoning ability on events of complex narratives is still limited. The overarching goal of this proposed project is to enable reasoning over different types of data based on large pre-trained Transformers. To address the above challenges, three concrete tasks are proposed with the following outcomes of this project. First, new models are proposed to perform reasoning over both structured (i.e., symbolic) and unstructured knowledge (in the forms of neural representation and text). Second, methods for automatically constructing prompts are developed to provoke the large models to retrieve embedded knowledge, which can be combined with externally provided knowledge for downstream inference tasks. Third, reasoning path generation models that provide proof to achieve enhanced model interpretability and offer new perspectives for large model evaluation as well as better understanding of the limitations of their reasoning ability.

A large portion of the ever-increasing amounts of text, audio, and video data produced in today’s world is being generated by populations of emerging importance in lower-resource languages. This rich source of data is of little value if the information cannot be effectively searched. Launched in October 2017, The MATERIAL program is a 47-month venture seeking to address this challenge by building robust, automated language capabilities with limited linguistic resources, expertise, and tools.

MATERIAL’s ultimate goal is to build a Cross-Language Information Retrieval (CLIR) systems that find speech and text content in diverse lower-resource languages, using English search queries. This system will allow analysts to submit queries in English and receive short English summaries of relevant foreign language items that saliently display relevance to their information needs. Success is measured by a novel end-to-end retrieval metric that will assess the system’s ability to retrieve all relevant documents, while producing few false alarms.

Why do people consume and share fake news online? Previous work has shown that news consumption and sharing emerges from complex interactions among news sources, news content, and user characteristics: users consume and share ideologically aligned news and shun the opposite. This behavior is further complicated by fake news, which can amplify content’s ideological and emotional characteristics without the constraints of truth, and by peer sharing, which may reduce institutional barriers to fake news and amplify local, peer-to-peer polarization. Identifying the mechanisms by which peers amplify fake and polarized news remains challenging, however, because social media has coevolved with polarization and shifts in the news media landscape. To illuminate these mechanisms, we begin by developing new natural language processing (NLP) methods to measure the ideology and emotion of news content and to assess how ideology and emotion of news content affect sharing and consumption. Having established this baseline, we then exploit a natural discontinuity to identify specifically peer-related effects on sharing: recent public changes in the Facebook algorithm abruptly shifted the balance between peer- and media-sourced news, allowing us to use difference-in-difference and other longitudinal estimators to measure changes in polarized or fake news and discover how peer-sharing affects these tendencies. This combination of NLP, network, and discontinuity approaches should provide unique insights into the interactions between news, ideology, falsity, and peer sharing, and shed light on important questions such as how social media may have affected polarization, fake news, and political knowledge in the recent era.

Meeting is a common way to collaborate, share information and exchange opinions. Many available meeting transcripts, however, are lengthy, unstructured, and thus difficult to navigate. It would be time-consuming for users to access important meeting output by reading the full transcripts. Consequently, automatically generated meeting summaries is of great value to people and businesses alike by providing quick access to the essential content of past meetings. The core objective of this research project is to automatically generate abstract-style focused meeting summaries to help users digest the vast amount of meeting content in an easy manner. It helps the research community to better understand the characteristics of the meeting domain, define the summarization task in meetings in a more consistent way, improve speech summarization evaluation metrics, and allow the wide use of speech summarization techniques in many applications (such as generating meeting minutes or lecture outlines). The broader impacts of this project includes sharing insights on conversational text with social scientists, providing natural language processing research training to students, and contributing effective methods for meeting summarization to the general public.