Abstract
Non-rigid registration of 3D animal bodies presents unique challenges due to their morphological diversity, presence of symmetry causing correspondence ambiguities, and complex articulated structures with varying degrees of freedom. Existing geometric methods fail to capture the semantic relationships between anatomically equivalent regions, while learning-based approaches require extensive manually-labeled training data that are often unavailable for diverse animal species. In this paper, we propose a novel training-free framework that combines the semantic richness of DINO vision transformer features with anatomical constraints to achieve robust animal shape registration without 3D supervision. Our approach employs a coarse-tofine pipeline that first renders multiple views of input meshes to extract semantic DINO features and anatomical keypoints, then establishes initial correspondences through DINO feature matching and performs skeletal pose alignment to generate a coarsely registered mesh, and finally refines the dense registration through iterative energy optimization combining correspondence, ARAP, and smoothness terms with Hungarian algorithm updates. Our results demonstrate that our training-free method achieves performance comparable to existing methods while requiring no 3D supervision, enabling practical application to in-the-wild animal data where labeled 3D training datasets are unavailable.