2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Feature matching based on local windows aggregation

      research-article
      1 , 2 , 4 , , 2 , 3
      iScience
      Elsevier
      Applied sciences, Computer science, Network modeling

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          The core goal of feature matching is to establish correspondences between two images. Current methods without detectors achieve impressive results but often focus on global features, neglecting regions with subtle textures and resulting in fewer matches in areas with weak textures. This paper proposes a feature-matching method based on local window aggregation, which balances global features and local texture variations for more accurate matches, especially in weak-texture regions. Our method first applies a local window aggregation module to minimize irrelevant interference using window attention, followed by global attention, generating coarse and fine-grained feature maps. These maps are processed by a matching module, initially obtaining coarse matches via the nearest neighbor principle. The coarse matches are then refined on fine-grained maps through local window refinement. Experimental results show our method surpasses state-of-the-art techniques in pose estimation, homography estimation, and visual localization under the same training conditions.

          Graphical abstract

          Highlights

          • We propose an optimization method to obtain more accurate sub-pixel matching positions

          • We designed a local window aggregation module to obtain better image feature points

          • Perform outstanding match in weak texture region integrating coarse and fine features

          Abstract

          Applied sciences; Computer science; Network modeling

          Related collections

          Most cited references53

          • Record: found
          • Abstract: found
          • Article: not found

          Attention Is All You Need

          The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Distinctive Image Features from Scale-Invariant Keypoints

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

              While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected)
                Bookmark

                Author and article information

                Contributors
                Journal
                iScience
                iScience
                iScience
                Elsevier
                2589-0042
                28 August 2024
                20 September 2024
                28 August 2024
                : 27
                : 9
                : 110825
                Affiliations
                [1 ]Heilongjiang University, No. 74 Xuefu Road, Harbin 150080, Heilongjiang, China
                [2 ]Qiqihar University, No. 42 Wenhua Street, Qiqihar 161006, Heilongjiang, China
                [3 ]Anhui Wenda University of Information Engineering, No. 3 Forest Avenue, Hefei 231201, Anhui, China
                Author notes
                []Corresponding author leewenpeng@ 123456126.com
                [4]

                Lead contact

                Article
                S2589-0042(24)02050-9 110825
                10.1016/j.isci.2024.110825
                11416493
                39310757
                929d3d4a-50cf-441a-8406-f302b7f6824d
                © 2024 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 19 May 2024
                : 31 July 2024
                : 22 August 2024
                Categories
                Article

                applied sciences,computer science,network modeling
                applied sciences, computer science, network modeling

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content88

                Most referenced authors861