WT-MVSNet: Window-based Transformers for Multi-view Stereo

Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, Kai Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Recently, Transformers have been shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Window-based Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since point-to-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1st on Tanks and Temples benchmark.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Bibliographical note

Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.

Funding

This work is supported by the Key-Area Research and Development Program of Guangdong Province (No.2020B0909050003), the Science and Technology Innovation Project of Shenzhen (JSGG20210802154807022) and the National Natural Science Foundation of China under Grants 61902415.

FundersFunder number
National Natural Science Foundation of China61902415
Special Project for Research and Development in Key areas of Guangdong Province2020B0909050003
Shenzhen Science and Technology Innovation ProgramJSGG20210802154807022

    Fingerprint

    Dive into the research topics of 'WT-MVSNet: Window-based Transformers for Multi-view Stereo'. Together they form a unique fingerprint.

    Cite this