Spatial omics are increasingly "fused" with classical biomedical imaging modalities to complement them with spatially-resolved molecular profiling, enabling deeper understanding of the biochemical basis of health and diesease. Before the multimodal information can be fused, all contributing imaging data must be spatially aligned through a process called coregistration, which remains a challenge. Here we present a multimodal coregistration framework with great genericity and accuracy for spatial omics data. Specifically, an original dimension reduction algorithm "PseudoRep" is used to bridge cross-modal gaps: PseudoRep can generate a single-channel representation of the highly multiplexed molecular image from spatial omics and ensure that the resulting representation appears "pseudo-unimodal" to the image from the other modality. Then, a Deep Learning-based transformation algorithm "DeepReg" is adopted to accurately model nonlinear tissue deformations due to anatomical variations or sample manipulations. Our method demonstrated its versatility by coregistering mass spectrometry imaging-mediated spatial metabolomics data with diverse modalities (magnetic resonance imaging, brain atlas, microscopy, and spatial transcriptomics) and reduced 38\%\textemdash69\% of coregistration errors compared to existing methods. Notably, we open-source a toolbox to implement both the coregistration and its downstream tasks (co-expression analysis, anatomical annotation, and pansharpening), which is expected to greatly facilitate the democratization of multimodal fusion studies in the spatial omics community.