Abstract
Recently, Transformers have been shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Window-based Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since point-to-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1st on Tanks and Temples benchmark.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
Editors | S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh |
Publisher | Neural information processing systems foundation |
ISBN (Electronic) | 9781713871088 |
State | Published - 2022 |
Event | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States Duration: 28 Nov 2022 → 9 Dec 2022 |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
Volume | 35 |
ISSN (Print) | 1049-5258 |
Conference
Conference | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 28/11/22 → 9/12/22 |
Bibliographical note
Publisher Copyright:© 2022 Neural information processing systems foundation. All rights reserved.
Funding
This work is supported by the Key-Area Research and Development Program of Guangdong Province (No.2020B0909050003), the Science and Technology Innovation Project of Shenzhen (JSGG20210802154807022) and the National Natural Science Foundation of China under Grants 61902415.
Funders | Funder number |
---|---|
National Natural Science Foundation of China | 61902415 |
Special Project for Research and Development in Key areas of Guangdong Province | 2020B0909050003 |
Shenzhen Science and Technology Innovation Program | JSGG20210802154807022 |