当前位置 博文首页 > 随便起个明吧的博客:文章阅读 - MVSNet: Depth Inference for U

    随便起个明吧的博客:文章阅读 - MVSNet: Depth Inference for U

    作者:[db:作者] 时间:2021-06-04 11:25

    相关工作

    MVS算法分类

    1. 直接点云重建 [1-2],缺点:难以做到充分的并行处理
    2. volumetric重建 [3-4],缺点:空间离散带来的误差以及内存消耗较大
    3. 深度图重建 [5-9]

    基于学习的stereo

    匹配两个patch [10-12]、cost正则化 [13-15]

    基于学习的MVS

    SurfaceNet [3]、LSM [4](只能用于小尺度场景)

    网络结构

    需要关注的几点:

    1. 计算cost volumn时使用方差而不是均值 [16],可以更好的表示多视特征图间的差异。

    2. 引入了probability?volumn (沿着深度方向对cost volumn进行softmax的处理),不仅用于每个像素的深度的评估,而且用于表示深度评估的置信度。

    3. 初始深度图计算 ,最直接的处理是argmax,但由于其不能提供子像素的评估,且反向传播时不可微,故这里的处理是沿着深度方向计算数学期望。

    4. 概率图计算,对于错误匹配的像素点,他们的概率分布往往比较杂乱。

    因此这里简单的采用四个最近的深度假设值的概率和计算得到概率图。

    5. 深度优化中,加入参考图像并且采用残差学习的方式。此外,为了避免深度尺度保持在一个固定的范围,对初始深度图进行归一化到 [0,1] 之中,优化结束后再对其进行去归一化处理。

    6. loss的计算

    训练以及后处理

    1. 训练

    数据集:DTU [17]

    2. 后处理

    滤波:去除概率低于0.8的点;去除深度不一致的点。

    融合 [18]

    实验

    1. DTU

    反射区域以及弱纹理区域表现较好。

    2. Tanks and Temples

    3. Ablations

    限制

    ground truth深度图没有完整的遮挡以及背景信息。

    参考文献

    [1]?Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2010)

    [2]?Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2005)

    [3]?Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: An end-to-end 3d neural network for multiview stereopsis. International Conference on Computer Vision (ICCV) (2017)

    [4]?Kar, A., H¨ane, C., Malik, J.: Learning a multi-view stereo machine. Advances in Neural Information Processing Systems (NIPS) (2017)

    [5]?Campbell, N.D., Vogiatzis, G., Hern′andez, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. European Conference on Computer Vision (ECCV) (2008)

    [6]?Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. International Conference on Computer Vision (ICCV) (2015)

    [7]?Sch¨onberger, J.L., Zheng, E., Frahm, J.M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. European Conference on Computer Vision (ECCV) (2016)

    [8]?Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra highresolution image sets. Machine Vision and Applications (MVA) (2012)

    [9]?Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., Quan, L.: Relative camera refinement for accurate dense reconstruction. 3D Vision (3DV) (2017)

    [10]?Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. Computer Vision and Pattern Recognition (CVPR) (2015)

    [11]?Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. Computer Vision and Pattern Recognition (CVPR) (2016)

    [12]?Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research (JMLR) (2016)

    [13]?Seki, A., Pollefeys, M.: Sgm-nets: Semi-global matching with neural networks. Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)

    [14]?Kn¨obelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T.: End-to-end training of hybrid cnn-crf models for stereo. Computer Vision and Pattern Recognition (CVPR) (2017)

    [15]?Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P.: End-to-end learning of geometry and context for deep stereo regression. Computer Vision and Pattern Recognition (CVPR) (2017)

    [16]?Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. International Conference on Computer Vision (ICCV) (2017)

    [17]?Aan?s, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. International Journal of Computer Vision (IJCV) (2016)

    [18]?Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.M., Yang, R., Nist′er, D., Pollefeys, M.: Real-time visibility-based fusion of depth maps. International Conference on Computer Vision (ICCV) (2007)

    文章地址:https://arxiv.org/abs/1804.02505

    代码地址:https://github.com/YoYo000/MVSNet