Visit

By: Sarah Marquart

Four researchers from Cornell Tech received an Outstanding Paper Award at the 2023 Empirical Methods in Natural Language Processing (EMNLP) Conference in December 2023. The winning paper, Text Embeddings Reveal (Almost) As Much As Text, was co-authored by Associate Professor of Computer Science Alexander “Sasha” Rush, Professor of Computer Science Vitaly Shmatikov, Assistant Professor of Computer Science Volodymyr Kuleshov, and PhD student Jack Morris.

The paper explores privacy concerns surrounding text embeddings, a technique in natural language processing (NLP) that solves the challenges presented by the nuanced and sometimes ambiguous nature of words and phrases. While machines can quickly and efficiently understand numbers, human language is much more tricky. Therefore, text data is converted to numerical data that a machine learning algorithm can adeptly process. In some instances, such as with systems that utilize large language models, auxiliary data is stored in a vector database of dense embeddings until it needs to be retrieved.

But just how private are these vector databases? If someone with malicious intent were to attempt to reverse engineer text embeddings, how much private information could they reveal about the original text?

As it turns out, quite a bit. Using a multi-step method called Vec2Text, the authors were able to reconstruct 92 percent of a data set of original text exactly. Further, the team successfully retrieved 94 percent of first names, 95 percent of last names, and 89 percent of full names from a data set of clinical notes. Their findings have profound implications for data privacy, especially in sensitive domains like healthcare.

“Large language models are causing us to rethink lots of assumptions about privacy and natural language. While it was known that this technique was theoretically possible, it was quite surprising to see it work so well on real instances,” says Rush.

The researchers conclude that text embeddings and raw data expose similar amounts of sensitive information. Consequently, they advocate for treating both with equal precautions, both technically and perhaps legally.


Beginning July 1, 2024, Samuel Curtis Johnson Graduate School of Management Nakashimato Professor of Marketing Manoj Thomas will assume the newly created role of Associate Dean of NYC Initiatives, signaling a new level of partnership between Cornell Tech and the SC Johnson College of Business.

At Cornell Tech, Thomas will work closely with and report to Jack and Rilla Neafsey Dean and Vice Provost of Cornell Tech Greg Morrisett, exploring strategic growth efforts for business programs at the Cornell Tech campus.

“Cornell Tech looks forward to working closely with Professor Thomas in his new role as Associate Dean of NYC Initiatives,” said Dean Morrisett. “His leadership will enhance the strategic opportunities presented by Cornell Tech’s unique mission and its partnership with Cornell’s SC Johnson College of Business.”

“I am delighted to be involved in this initiative to increase Cornell’s footprint in New York City,” said Professor Thomas. “Cornell has some unique strengths that can galvanize business education in NYC—the breadth of business programs at SC Johnson College of Business ranging from data science to hospitality, the advances in AI at Cornell Tech, and our deep-seated commitment to work for the greater good.”

In addition to this new role, Thomas will continue to serve as Senior Director of Programs for the Johnson School with responsibility for the EMBA programs and the Johnson Cornell Tech MBA program.

Johnson School Dean Vishal Guar noted, “This important strategic role will prepare us well for the future. Our EMBA programs are largely NYC-based. Thus, locating this position in New York is very beneficial for those programs and will help strengthen our JCT Program, which, at nearly 10 years old, is positioned to enter the next stage of maturity. In addition, this will add to our faculty presence in New York.”

For the SC Johnson College of Business, Thomas will serve as a member of the College Leadership Team reporting to Charles Field Knight Dean Andrew Karolyi, with responsibility for leading the college’s strategic vision and execution for NYC-based executive education, co-curricular programming, faculty growth, and related activities.

“I am thrilled to have an exceptional scholar and academic leader in Professor ManojThomas take on this vital role for Cornell University,” said Dean Karolyi. “The SC Johnson College of Business continues to partner closely with Cornell Tech building on existing programs and exploring new opportunities in NYC to pursue our shared mission.”


Director of the XR Collaboratory Harald Haraldsson, PhD student Shuo Feng, and Cornell Tech Assistant Professor Thijs Roumen posing with a custom wire-bent Cornell Tech Twisted T.

By Sarah Marquart

The XR Collaboratory (XRC) at Cornell Tech has selected two winning proposals for its inaugural XR Collaboratory Prototyping Grant, which provides technical guidance and monetary support for developing augmented and virtual reality (AR/VR) applications. Thijs Roumen, assistant professor of information science at Cornell Tech, who is supporting PhD student Shuo Feng, and François Guimbretière, professor of information science at Cornell University, who is supporting PhD student Mose Sakashita, are the recipients of the new grant award.

The XR Collaboratory at Cornell Tech accelerates activities in augmented and virtual reality through course offerings, projects, and cross-campus collaborations. The team’s primary interests are in 3D user interfaces and interaction design for head-mounted displays. They build high-fidelity prototypes with students through projects and coursework, and collaborate with Cornell faculty on exploratory AR/VR-related research across disciplines such as computer vision, computer graphics, and human-computer interaction.

“Conducting research involving the development of 3D user interfaces for AR/VR can be both challenging and time consuming,” says Harald Haraldsson, Director of the XR Collaboratory. “Our aim with this grant is to provide resources to help with applying best practices to AR/VR prototyping for research purposes. This includes how to develop an inventory of modular, reusable components, being consistent across the research team in the use of design patterns and code style, all with the objective of speeding up the prototyping process and ensuring continuity from project to project.”

François Guimbretière and Mose Sakashita
Cornell Bowers CIS Professor François Guimbretière and PhD student Mose Sakashita posing with the VRoxy robotic system.

Roumen and Feng’s XR project focuses on wire bending, a crucial aspect of industrial manufacturing in which items like springs, paperclips, and hooks are mass-produced. Recently, there’s been a growing interest in custom wire bending for medical applications, entertainment, soft robotics, and fabrication support. To enhance the design process and meet this need, the team has proposed building an AR interface to design wire-bent structures.

Guimbretière and Sakashita’s ongoing work explores the potential of integrating VR with physical robots to enhance remote collaboration. It’s based on a prior initiative by the pair called the VRoxy system, a mobile robot with a monitor for a head that can be present on a remote user’s behalf. While VRoxy can mirror the user’s non-verbal cues — such as pointing or face tilting — in real-time, the bot can’t physically interact with objects, limiting certain types of collaboration. Guimbretière and Sakashita’s proposal includes addressing these limitations by implementing a 3D user interface that enables the remote VR user to navigate larger environments and manipulate physical objects with robotic arms.


By Tom Fleischman

A consortium aiming to make New York a global leader in artificial intelligence would help Cornell play a role in shaping the future of AI, promote responsible research and development, create jobs and unlock opportunities focused on public good.

Cornell is one of seven institutions in the state poised to become part of Empire AI, proposed by Gov. Kathy Hochul during her State of the State address on Jan. 9.

The $400 million consortium, also including industry partners, would create and launch a state-of-the-art AI computing facility in upstate New York. In addition to providing computing power, the facility will prioritize sustainability in terms of both power generation and cooling of the system’s hardware.

“Our reputation across the globe has always been synonymous with boldness and innovation. So where else but New York should this be happening?” Hochul said. “AI is already the single most consequential technological commercial advancement since the invention of the internet. Global AI has already been valued at $100 billion just last year, and it’s brand new. And it’s projected to reach $1.3 trillion by 2030.”

The seven founding institutions would be Cornell, Columbia University, New York University, Rensselaer Polytechnic Institute, the State University of New York (SUNY), the City University of New York (CUNY), and the Simons Foundation and its research partner, the Flatiron Institute.

“I am excited to see the development of this shared computing facility, which will fast-track cutting-edge research and responsible AI tools to the benefit of all New Yorkers,” President Martha E. Pollack said. “As artificial intelligence promises to transform our economy, accelerate medical breakthroughs, and offer unprecedented tools for research, it is imperative that academic research institutions like Cornell partner to optimize AI technology in service of the public good.”

The project, which needs approval from the state Legislature, would be funded by public and private investment. That includes $275 million from the state in grant and other funding, and contributions from Cornell and other founding members and individuals, including philanthropist Tom Secunda, a co-founder of Bloomberg LP and one of the driving forces behind the consortium.

“Cornell faculty, staff and students are innovating and using AI computing approaches to address societal challenges – from sustainable agriculture to improved urban design to personalized medicine and health,” said Krystyn J. Van Vliet, vice president for research and innovation. “We look forward to Empire AI enabling computing resources in New York state to advance purposeful, cutting-edge AI research. We’re excited to support this effort to put New York at the forefront of artificial intelligence innovation, and to advance research, innovation and translation that develops and uses AI creatively and responsibly for the greater good.”

Van Vliet said such a shared facility can be located in a place where renewable energy for power and cooling is accessible and inexpensive. “That really aligns with Cornell’s 2030 Project and sustainability goals for Cornell’s research community,” she said.

The consortium could be transformational for Cornell, said Greg Morrisett, the Jack and Rilla Neafsey Dean and Vice Provost of Cornell Tech.

“What Empire AI can provide is a real jolt for the university overall, to explore a range of opportunities,” he said. “Without it, there’s a whole set of things that we just won’t be able to explore that we’re going to have to leave to industry to explore. And industry doesn’t necessarily have the same incentives that we do around, for example, fairness, accountability, transparency, all of the things that academic researchers are going to push the frontiers on.”

Kavita Bala, the dean of the Cornell Ann S. Bowers College of Computing and Information Science and lead on the Cornell AI Initiative, agreed.

“This can help us recruit incredible talent in AI,” she said. “This will give AI researchers across campus the opportunity to realize their vision and do the kind of research we want and need to do.”


Have you recently completed your PhD and want to harness your deep tech expertise to start a company? Want to do it in New York City and get funding from day one? Want to build a startup while keeping your academic track record? If so, consider the Runway Startup Postdoc Program.

The Runway Startup Postdoc Program is part business school, part research institution, and part startup incubator. Based at the Jacobs Technion-Cornell Institute, Runway ushers recent PhDs in digital technology fields through a paradigm shift — from an academic mindset to an entrepreneurial outlook.

Startup Postdocs arrive with ideas for unproven products and markets that require time and specialized guidance to develop. These startups demand more than a few months to launch. They need a bit of a “runway.” That’s why our program lasts 12–24 months and incorporates academic and business mentorship.

Runway provides an impressive package valued at $175,000 in the first year and $102,000 in the second year, which includes a salary, research budget, housing allowance, space and more. Learn more about benefits and perks here.

Runway also provides corporate support, and a new approach to Intellectual Property. You can learn more about Runway companies built at Cornell Tech here and more about the Runway program on our FAQ page.

 

Apply Now


The Siegel PiTech PhD Impact Fellowship supports Cornell Tech PhD students in technical fields to conduct 12-week summer externships with nonprofit and public sector organizations across NYC. Students immerse themselves in real-world projects, gain exposure to the technology challenges facing public interest organizations, and contribute critical skills and expertise to advancing their host organization’s mission.

The fellowship term is 12 weeks between June and August. Impact Fellows work 20 hours per week, to leave enough space in their schedules for independent thesis research.

We accept applications through 5 PM EST on February 5th, 2024. We select between 8-12 Impact Fellows per cohort.

 

APPLY now


By Sarah Marquart

The computing sector accounted for an estimated three percent of global CO2 emissions in 2022 — more than Spain, Italy, France, and Portugal combined. Still, compared to the staggering impacts of fossil fuels, that number may seem insignificant. But every emission counts when it comes to combating the climate crisis, and this sector is only expected to grow over the coming decades as the demand for computing increases.

That’s why tech giants like Amazon, Google, Meta, Microsoft, and Intel are committing to sustainable computing — to curb this potential impact. Joining them in this collective action is Udit Gupta (BS ‘16), assistant professor at the Jacobs Technion-Cornell Institute at Cornell Tech and member of the School of Electrical and Computer Engineering at Cornell University.

Gupta will have the opportunity to confront these challenges directly through projects funded by two grants he received from the U.S. National Science Foundation (NSF) through its Design for Environmental Sustainability in Computing program.

“There is a dire need to think critically about sustainable computing,” Gupta stresses. “Already computing accounts for three percent of worldwide emissions. But the total environmental impact is even broader in terms of energy consumed by data centers and charging mobile phones, [the] water consumed to run data centers and semiconductor fabs, and e-waste of discarded devices.”

Gupta received a $2 million grant to study the environmental impact of edge computing devices — such as smartwatches, tablets, Internet of Things (IoT) devices, sensors, and smartphones — and address their escalating environmental toll. Joining him in the effort are Professor Amit Lal from the Cornell School of Electrical and Computer Engineering, Associate Professor Vijay Janapa Reddi of Harvard University, and Associate Professor Josiah Hester and Professor Omer Inan of Georgia Tech.

With the funding, the team plans to develop an end-to-end, open-source framework named Delphi, a sort of toolkit for future designers, engineers, and manufacturers to reference as they create the next generation of edge computing devices. The suite of design tools will emphasize environmental impact, sustainability, and longevity without sacrificing user experience or performance.

To create Delphi, the researchers will collect a groundbreaking dataset documenting the actual emissions and resources associated with the creation of edge devices. Not only will this data help with the development of Delphi, but it will also help establish an Electronic Sustainability Record for edge devices — sort of like the nutrition labels on our food. According to Gupta, this transparency is crucial for both consumers and manufacturers.

“Electronic Sustainability Records are a key way to raise awareness of the environmental impact of devices,” Gupta explains. “Consumers get visibility on the climate impact their products have, empowering them to make sustainability-focused decisions. For manufacturers, the electronics sustainability records enable fine-grained tracking of individual components over the lifetime of devices; this allows us to balance device performance, efficiency, and application quality with sustainability.”

These types of disclosures, tools, and resources couldn’t come at a better time, as 2023 marks the hottest year on record, drawing more eyes to the problem with each new generation.

In September 2023, Gupta also received a $300,000 Early-concept Grant for Exploratory Research (EAGER) award from the NSF to develop cloud infrastructures for designing sustainable electronics. As principal investigator, Gupta hopes to use cloud resources to build a shared community infrastructure and tools to measure the carbon footprint of computing platforms across their lifetimes — from assembly line to daily operation. Future engineers and designers can use these resources to weigh sustainability statistics alongside traditional performance metrics.


By Tom Fleischman, Cornell Chronicle

Researchers from Cornell Tech have developed a method to identify delays in the reporting of incidents such as downed trees and power lines, which could lead to practical insights and interventions for more equitable, efficient government service.

Their method, which works without knowing exactly when an incident occurred, uses the frequency of reports of the same incident by separate individuals to estimate how long it took for the incident to be first reported. The first report establishes that the incident occurred, and subsequent reports are used to establish the reporting rate.

Applying their method to more than 1 million incident reports in New York City and Chicago, the researchers also determined that a neighborhood’s socioeconomic characteristics are correlated with reporting rates.

“We’ve devised a fairly general method that works for a large class of these problems, known as ‘benchmark problems,’ where you can get duplicate reports of an incident,” said Nikhil Garg, assistant professor of operations research and information engineering (ORIE) at Cornell Tech, as part of the Jacobs Technion-Cornell Institute.

Garg is senior author of “Quantifying Spatial Under-reporting Disparities in Resident Crowdsourcing,” which published Dec. 5 in Nature Computational Science.

“We’re optimistic that this method can be used to understand underreporting,” he said, “not just in 311 (citizen “hotline”) systems, but more broadly where these benchmark problems appear.”

Garg’s co-authors are Zhi Liu, lead student author and a doctoral student in ORIE, and Uma Bhandaram, deputy chief for data systems and analytics for the New York City Department of Parks and Recreation.

Crowdsourcing is an essential component of city management; crews can’t be everywhere at the same time, and they rely on residents to report issues to the proper authorities so they can be addressed. Large cities – including New York, Chicago, Los Angeles and Houston, the four largest U.S. cities – have reporting systems that residents can log into to report problems.

“The 311 system is a big one,” Garg said. “New York City, for instance, can’t know where all the problems are all the time with something like 700,000 street trees – NYC gets over 3 million service requests a year from the public. For us, this started with a general question: Who is actually participating in all of these participatory mechanisms underlying government?”

“That’s also one of the questions that city agencies are interested in – the fact that people behave differently,” Liu said. “So how do they respond to these requests?”

Garg and Liu’s model takes the available information – the occurrence of an incident, and the public’s reporting behavior related to that incident – and converts it into a Poisson rate estimation task, which expresses the probability of a given number of events occurring in a fixed interval of time or space.

Without knowing exactly when the incident happened, the method uses the number of reports between the time of the first report (but not including it) and an estimated incident resolution time to quantify an incident’s rate function. The method could allow city managers to determine the reporting rates of different types of incidents in different neighborhoods, and address problems more equitably.

The researchers applied their method to more than 100,000 resident reports made to the New York City Department of Parks and Recreation, and to more than 900,000 reports made to the Chicago Department of Transportation and Department of Water Management. Even after controlling for incident characteristics, such as the level of emergency response needed, they found that some neighborhoods reported incidents three times faster than others.

The disparities corresponded to socioeconomic characteristics of the neighborhoods. In New York City, reporting rates were positively correlated with higher population density; the fraction of people with college degrees; income; and the fraction of the population that is white.

The researchers were able to further validate their method by testing it on incidents for which exact times were known.

“We find overwhelming evidence that people use 311 systems differently,” Liu said. “And when we’re thinking about the downstream response to those reports, this can serve as a very good reference point. Say no one reports an incident and it’s been sitting there for a prolonged period: We might want to respond to it faster, so that the overall delay is similar across neighborhoods.”

And as Liu said, their system promotes equity in terms of responding to the most urgent problem first.

“One key finding is that equity and efficiency don’t have to trade off,” he said. “Sometimes they’re in accordance – the most severe incidents should be addressed across the city at a faster rate, no matter where they are. So in that sense, equity and efficiency are actually aligned.”

Said Garg: “There’s so much work left to do, and that our team is continuing to do, to make these systems more efficient and equitable.”

This work was funded in part by the Urban Tech Hub at Cornell Tech.

This story originally appeared in the Cornell Chronicle.


By Jim Schnabel, Weill Cornell Medicine

Researchers at Weill Cornell Medicine, Cornell Tech and Cornell’s Ithaca campus have demonstrated the use of artificial-intelligence (AI)-selected natural images and AI-generated synthetic images as neuroscientific tools for probing the visual processing areas of the brain. The goal is to apply a data-driven approach to understand how vision is organized while potentially removing biases that may arise when looking at responses to a more limited set of researcher-selected images.

In the study, published Oct. 23 in Communications Biology, the researchers had volunteers look at images that had been selected or generated based on an AI model of the human visual system. The images were predicted to maximally activate several visual processing areas. Using functional magnetic resonance imaging (fMRI) to record the brain activity of the volunteers, the researchers found that the images did activate the target areas significantly better than control images.

The researchers also showed that they could use this image-response data to tune their vision model for individual volunteers, so that images generated to be maximally activating for a particular individual worked better than images generated based on a general model.

“We think this is a promising new approach to study the neuroscience of vision,” said study senior author Amy Kuceyeski, professor of mathematics in radiology and of mathematics in neuroscience in the Feil Family Brain and Mind Research Institute at Weill Cornell Medicine.

The study was a collaboration with the laboratory of Mert Sabuncu, professor of electrical and computer engineering at Cornell Engineering and at Cornell Tech, and of electrical engineering in radiology at Weill Cornell Medicine. The study’s first author was Dr. Zijin Gu, who was a doctoral student co-mentored by Sabuncu and Kuceyeski at the time of the study.

Making an accurate model of the human visual system, in part by mapping brain responses to specific images, is one of the more ambitious goals of modern neuroscience. Researchers have found for example, that one visual processing region may activate strongly in response to an image of a face whereas another may respond to a landscape. Scientists must rely mainly on noninvasive methods in pursuit of this goal, given the risk and difficulty of recording brain activity directly with implanted electrodes. The preferred noninvasive method is fMRI, which essentially records changes in blood flow in small vessels of the brain – an indirect measure of brain activity – as subjects are exposed to sensory stimuli or otherwise perform cognitive or physical tasks. An fMRI machine can read out these tiny changes in three dimensions across the brain, at a resolution on the order of cubic millimeters.

For their own studies, Kuceyeski and Sabuncu and their teams used an existing dataset comprising tens of thousands of natural images, with corresponding fMRI responses from human subjects, to train an AI-type system called an artificial neural network (ANN) to model the human brain’s visual processing system. They then used this model to predict which images, across the dataset, should maximally activate several targeted vision areas of the brain. They also coupled the model with an AI-based image generator to generate synthetic images to accomplish the same task.

“Our general idea here has been to map and model the visual system in a systematic, unbiased way, in principle even using images that a person normally wouldn’t encounter,” Kuceyeski said.

The researchers enrolled six volunteers and recorded their fMRI responses to these images, focusing on the responses in several visual processing areas. The results showed that, for both the natural images and the synthetic images, the predicted maximal activator images, on average across the subjects, did activate the targeted brain regions significantly more than a set of images that were selected or generated to be only average activators. This supports the general validity of the team’s ANN-based model and suggests that even synthetic images may be useful as probes for testing and improving such models.

In a follow-on experiment, the team used the image and fMRI-response data from the first session to create separate ANN-based visual system models for each of the six subjects. They then used these individualized models to select or generate predicted maximal-activator images for each subject. The fMRI responses to these images showed that, at least for the synthetic images, there was greater activation of the targeted visual region, a face-processing region called FFA1, compared to the responses to images based on the group model. This result suggests that AI and fMRI can be useful for individualized visual-system modeling, for example to study differences in visual system organization across populations.

The researchers are now running similar experiments using a more advanced version of the image generator, called Stable Diffusion.

The same general approach could be useful in studying other senses such as hearing, they said. Kuceyeski also hopes ultimately to study the therapeutic potential of this approach.

“In principle, we could alter the connectivity between two parts of the brain using specifically designed stimuli, for example to weaken a connection that causes excess anxiety,” she said.

Many Weill Cornell Medicine physicians and scientists maintain relationships and collaborate with external organizations to foster scientific innovation and provide expert guidance. The institution makes these disclosure public to ensure transparency. For this information, see profile for Amy Kuceyeski.

Jim Schnabel is a freelance writer for Weill Cornell Medicine.

This story originally appeared in the Cornell Chronicle.


By Patricia Waldron, Cornell Ann S. Bowers College of Computing and Information Science

As journalists and professional fact-checkers struggle to keep up with the deluge of misinformation online, fact-checking sites that rely on loosely coordinated contributions from volunteers, such as Wikipedia, can help fill the gaps, Cornell research finds.

In a new study, Andy Zhao, a doctoral candidate in information science based at Cornell Tech, compared professional fact-checking articles to posts on Cofacts, a community-sourced fact-checking platform in Taiwan. He found that the crowdsourced site often responded to queries more rapidly than professionals and handled a different range of issues across platforms.

“Fact-checking is a core component of being able to use our information ecosystem in a way that supports trustworthy information,” said senior author Mor Naaman, professor of information science at the Jacobs Technion-Cornell Institute at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science. “Places of knowledge production, like Wikipedia and Cofacts, have proved so far to be the most robust to misinformation campaigns.”

The study, “Insights from a Comparative Study on the Variety, Velocity, Veracity, and Viability of Crowdsourced and Professional Fact-Checking Services,” published Sept. 21 in the Journal of Online Trust and Safety.

The researchers focused on Cofacts because it is a crowdsourced fact-checking model that had not been well-studied. The Taiwanese government, civil organizations and the tech community established Cofacts in 2017 to address the challenges of both malicious and innocent misinformation – partially in response to efforts from the Chinese government to use disinformation to create a more pro-China public opinion in Taiwan. Much like Wikipedia, anyone on Cofacts can be an editor and post answers, submit questions and up or downvote responses. Cofacts also has a bot that fact-checks claims in a popular messaging app.

Starting with more than 60,000 crowdsourced fact-checks and 2,641 professional fact-checks, Zhao used natural language processing to match up responses posted on Cofacts with articles addressing the same questions on two professional fact-checking sites. He looked at how quickly the sites posted responses to queries, the accuracy and persuasiveness of the responses and the range of topics covered.

He found the Cofacts users often responded faster than journalists, but mostly because they could “stand on the shoulders of giants” and repurpose existing articles from professionals. In this way, Cofacts acts as a distributor for information. “They carry those stories across language, across the nation, or across time, to this exact moment to answer people’s questions,” Zhao said.

Importantly, Zhao found that the Cofacts posts were just as accurate as the professional sources. And according to seven native Taiwanese graduate students who acted as raters, articles by journalists were more persuasive, but Cofacts posts often were clearer.

Further analysis showed the crowdsourced site covered a slightly different range of topics compared with those addressed by professionals. Posts on Cofacts were more likely to address recent and local issues – such as regional politics and small-time scams – while journalists were more likely to write about topics requiring expertise, including health claims and international affairs.

“We can leverage the power of the crowds to counter misinformation,” Zhao concluded. “Misinformation comes from everywhere, and we need this battle to happen in all corners.”

The need for fact-checking is likely to continue to grow. While it’s not yet clear how generative artificial intelligence (AI) models, such as ChatGPT or Midjourney, will impact the information landscape, Naaman and Zhao said it is possible that AI programs that generate text and fake images may make it even easier to create and spread misinformation online.

However, despite the success of Cofacts in Taiwan, Zhao and Naaman caution that the same approach may not transfer to other countries. “Cofacts has built on the user habits, the cultures, the background, and political and social structures of Taiwan, which is how they succeed,” Zhao said.

But understanding Cofacts’ success may assist in the design of other fact-checking systems, especially in regions that don’t speak English, which have access to few, if any fact-checking resources.

“Understanding how well that kind of model works in different settings could hopefully provide some inspiration and guidelines to people who want to execute similar endeavors in other places,” Naaman said.

The study received partial support from the National Science Foundation.

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.

This story originally appeared in the Cornell Chronicle.