Output list
Conference proceeding
Date presented 11/2025
Findings of the Association for Computational Linguistics: EMNLP 2025, 4574 - 4592
2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 04/11/2025–09/11/2025, Suzhou, China
Large language models (LLMs) face persistent challenges when handling long-context tasks, most notably the " lost in the middle " issue, where information located in the middle of a long input tends to be underutilized. Some existing methods that reduce input have the risk of discarding key information, while others that extend context windows often lead to attention dispersion. To address these limitations, we propose Tree of Agents (TOA), a multi-agent reasoning framework that segments the input into chunks processed by independent agents. Each agent generates its local cognition, then agents dynamically exchange information for collaborative reasoning along tree-structured paths. TOA enables agents to probe different reasoning orders for multi-perspective understanding , effectively mitigating position bias and reducing hallucinations. To improve processing efficiency, we incorporate prefix-hash caching and adaptive pruning strategies, achieving significant performance improvements with comparable API overhead. Experiments show that TOA, powered by compact LLaMA3.1-8B, significantly outperforms multiple baselines and demonstrates comparable performance to the latest and much larger commercial models, such as Gemini1.5-pro, on various long-context tasks. Code is available at https://github. com/Aireduce952/Tree-of-Agents.
Conference paper
Date presented 08/2025
Thirty-Fourth International Joint Conference on Artificial Intelligence AI4Tech: AI Enabling Technologies, 16/08/2025–22/08/2025, Montreal, Canada
Fake news and misinformation poses a significant threat to society, making efficient mitigation essential. However, manual fact-checking is costly and lacks scalability. Large Language Models (LLMs) offer promise in automating counter-response generation to mitigate misinformation, but a critical challenge lies in their tendency to hallucinate non-factual information. Existing models mainly rely on LLM self-feedback to reduce hallucination, but this approach is computationally expensive. In this paper, we propose MisMitiFact, Misinforma-tion Mitigation grounded in Facts, an efficient framework for generating fact-grounded counter-responses at scale. MisMitiFact generates simple critique feedback to refine LLM outputs, ensuring responses are grounded in evidence. We develop lightweight, fine-grained critique models trained on data sourced from readily available fact-checking sites to identify and correct errors in key elements such as numerals, entities, and topics in LLM generations. Experiments show that MisMitiFact generates counter-responses of comparable quality to LLMs' self-feedback while using significantly smaller critique models. Importantly, it achieves ∼5x increase in feedback generation throughput, making it highly suitable for cost-effective, large-scale misinformation mitigation. Code and LLM prompt templates are at https://github.com/xxfwin/ MisMitiFact.
Conference paper
FineRR-ZNS: Enabling Fine-Granularity Read Refreshing for ZNS SSDs
Date presented 06/2025
2025 62nd ACM/IEEE Design Automation Conference (DAC) , 22/06/2025–25/06/2025, San Francisco, CA
Zoned namespace (ZNS) SSDs are emerging storage devices offering low cost, high performance, and software definability. By adopting host-managed zone-based sequential programming, ZNS SSDs effectively eliminate the space overhead associated with on-board DRAM memory and garbage collection. However, while background read refreshing serves as a data protection mechanism in conventional block-interface SSDs, state-of-the-art ZNS SSDs lack read refreshing functionality to guarantee data reliability. Moreover, implementing zonelevel read refreshing in ZNS SSDs incurs significant overhead due to the large volume of valid data movements in a zone, leading to degraded I/O performance. To efficiently enable read refreshing for ZNS SSDs, this paper proposes FineRR-ZNS, a fine-granularity read refreshing mechanism for ZNS SSDs. FineRR-ZNS employs a host-controlled fine-granularity read refreshing scheme that selectively determines block-level read refreshing via metadata remapping. A zone reconstruction method is also designed to retrieve remapped data forming complete data during zone-level RR. Specifically, the remapped data after zone reconstruction are still available and prioritized for read access until their respective blocks need the next RR. Evaluation results show that FineRR-ZNS significantly enhances read refreshing efficiency and I/O throughput compared to zone-level read refreshing implemented in the state-of-the-art ZenFS file system.
Conference paper
Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales
Date presented 02/05/2025
Findings of the Association for Computational Linguistics: NAACL 2025, 29/04/2025–04/05/2025, Albuquerque, New Mexico
Number-focused headline generation is a summarization task requiring both high textual quality and precise numerical accuracy, which poses a unique challenge for Large Language Models (LLMs). Existing studies in the literature focus only on either textual quality or numerical reasoning and thus are inadequate to address this challenge. In this paper, we propose a novel chain-of-thought framework for using rationales comprising key elements of the Topic, Entities, and Numerical reasoning (TEN) in news articles to enhance the capability for LLMs to generate topic-aligned high-quality texts with precise numerical accuracy. Specifically, a teacher LLM is employed to generate TEN rationales as supervision data, which are then used to teach and fine-tune a student LLM. Our approach teaches the student LLM automatic generation of rationales with enhanced capability for numerical reasoning and topic-aligned numerical headline generation. Experiments show that our approach achieves superior performance in both textual quality and numerical accuracy.
Conference paper
CoupledCB: Eliminating Wasted Pages in Copyback-based Garbage Collection for SSDs
Published 2025
2025 Design, Automation & Test in Europe Conference (DATE), 31/03/2025–02/04/2025, Lyon, France
The management of garbage collection poses significant challenges in high-density NAND flash-based SSDs. The introduction of the copyback command aims to expedite the migration of valid data. However, its odd/even constraint causes wasted pages during migrations, limiting the efficiency of garbage collection. Additionally, while full-sequence programming en-hances write performance in high-density SSDs, it increases write granularity and exacerbates the issue of wasted pages. To address the problem of wasted pages, we propose a novel method called CoupledCB, which utilizes coupled blocks to fill up the wasted space in copyback-based garbage collection. By taking into account the access characteristics of the candidate coupled blocks and workloads, we develop a coupled block selection model assisted by logistic regression. Experimental results show that our proposal significantly enhances garbage collection efficiency and 1/O performance compared to state-of-the-art schemes.
Conference proceeding
Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning
Date presented 25/02/2024
Proceedings of the 38th AAAI Conference on Artificial Intelligence, 38, 20, 22447 - 22456
38th AAAI Conference on Artificial Intelligence, 20/02/2024–27/02/2024, Vancouver, Canada.
This study aims to minimize the influence of fake news on social networks by deploying debunkers to propagate true news. This is framed as a reinforcement learning problem, where, at each stage, one user is selected to propagate true news. A challenging issue is episodic reward where the "net" effect of selecting individual debunkers cannot be discerned from the interleaving information propagation on social networks, and only the collective effect from mitigation efforts can be observed. Existing Self-Imitation Learning (SIL) methods have shown promise in learning from episodic rewards, but are illsuited to the real-world application of fake news mitigation because of their poor sample efficiency. To learn a more effective debunker selection policy for fake news mitigation, this study proposes NAGASIL - Negative sampling and state Augmented Generative Adversarial Self-Imitation Learning, which consists of two improvements geared towards fake news mitigation: learning from negative samples, and an augmented state representation to capture the "real" environment state by integrating the current observed state with the previous state-action pairs from the same campaign. Experiments on two social networks show that NAGASIL yields superior performance to standard GASIL and state-of-the-art fake news mitigation models.
Journal article
Published 2024
Database : the journal of biological databases and curation, 2024, baae044
We launched the initial version of FishTEDB in 2018, which aimed to establish an open-source, user-friendly, data-rich transposable element (TE) database. Over the past 5 years, FishTEDB 1.0 has gained approximately 10 000 users, accumulating more than 450 000 interactions. With the unveiling of extensive fish genome data and the increasing emphasis on TE research, FishTEDB needs to extend the richness of data and functions. To achieve the above goals, we introduced 33 new fish species to FishTEDB 2.0, encompassing a wide array of fish belonging to 48 orders. To make the updated database more functional, we added a genome browser to visualize the positional relationship between TEs and genes and the estimated TE insertion time in different species. In conclusion, we released a new version of the fish TE database, FishTEDB 2.0, designed to assist researchers in the future study of TE functions and promote the progress of biological theories related to TEs. Database URL: https://www.fishtedb.com/.
Conference proceeding
Veracity-aware and Event-driven Personalized News Recommendation for Fake News Mitigation
Date presented 04/2022
, 3673 - 3684
WWW '22: Proceedings of the ACM Web Conference 2022, 25/04/2022–29/04/2022, Virtual Event, Lyon France
Despite the tremendous efforts by social media platforms and factcheck services for fake news detection, fake news and misinformation still spread wildly on social media platforms (e.g., Twitter). Consequently, fake news mitigation strategies are urgently needed. Most of the existing work on fake news mitigation focuses on the overall mitigation on a whole social network while ignoring developing concrete mitigation strategies to deter individual users from sharing fake news. In this paper, we propose a novel veracityaware and event-driven recommendation model to recommend personalised corrective true news to individual users for effectively debunking fake news. Our proposed model Rec4Mit (Recommendation for Mitigation) not only effectively captures a user's current reading preference with a focus on which event, e.g., US election, from her/his recent reading history containing true and/or fake news, but also accurately predicts the veracity (true or fake) of candidate news. As a result, Rec4Mit can recommend the most suitable true news to best match the user's preference as well as to mitigate fake news. In particular, for those users who have read fake news of a certain event, Rec4Mit is able to recommend the corresponding true news of the same event. Extensive experiments on real-world datasets show Rec4Mit significantly outperforms the state-of-theart news recommendation methods in terms of the capability to recommend personalized true news for fake news mitigation.
Conference proceeding
Identifying Cost-effective Debunkers for Multi-stage Fake News Mitigation Campaigns
Date presented 02/2022
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 1206 - 1214
WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, 21/02/2022–25/02/2022, Virtual Event, AZ. USA
Online social networks have become a fertile ground for spreading fake news. Methods to automatically mitigate fake news propagation have been proposed. Some studies focus on selecting top k influential users on social networks as debunkers, but the social influence of debunkers may not translate to wide mitigation information propagation as expected. Other studies assume a given set of debunkers and focus on optimizing intensity for debunkers to publish true news, but as debunkers are fixed, even if with high social influence and/or high intensity to post true news, the true news may not reach users exposed to fake news and therefore mitigation effect may be limited. In this paper, we propose the multi-stage fake news mitigation campaign where debunkers are dynamically selected within budget at each stage. We formulate it as a reinforcement learning problem and propose a greedy algorithm optimized by predicting future states so that the debunkers can be selected in a way that maximizes the overall mitigation effect. We conducted extensive experiments on synthetic and real-world social networks and show that our solution outperforms state-of-the-art baselines in terms of mitigation effect.