Abstract
Vertical Federated Learning (VFL) is a technique that facilitates collaborative model training across institutions under the umbrella of data privacy protection. In VFL, the initial step involves identifying a common set of users among all participants, a process known as ID alignment. Private Set Intersection (PSI) is one of the commonly used methods for ID alignment, allowing two parties to compute the intersection of sets without revealing any additional information. However, in practical applications, the data of the participating parties are often unbalanced, and there are challenges related to communication and computational complexity that traditional PSI methods cannot address. Therefore, we propose an efficient unbalanced circuit private set intersection protocol aimed at achieving ID alignment while protecting users’ privacy. Firstly, the oblivious key–value retrieval protocol is optimized using the ring-switching technique in homomorphic encryption. By utilizing subring, more efficient PIR query compression and response packing are achieved, while keeping throughput constant. It thus significantly improves the efficiency of the protocol. Additionally, we improve the assignment of datasets in the unbalanced circuit PSI using the probabilistic batch codes technique. The goal is for the participant to compute only for a specific location, which results in a significant improvement in computation speed. We conducted experiments to evaluate the protocol’s performance across various unbalanced dataset configurations. By comparing it with other representative protocols, we assessed its efficiency and feasibility, achieving promising results. The online communication volume is approximately 30% of that of other leading protocols, and the computational cost has been reduced by a factor ranging from 1.01x to 4.38x.