路面裂缝分割相关论文笔记

发布 : 2020-01-29 分类 : 论文 浏览 :

最近在做路面裂缝分割相关工作,这里做相关论文的记录。
paper 目录

阅读摘要

1.A Deep Neural Networks Approach for Pixel-Level Runway Pavement Crack Segmentation Using Drone-Captured Images

arXiv:2001.03257 [pdf] cs.CV eess.IV
Authors: Liming Jiang, Yuanchang Xie, Tianzhu Ren
Abstract: Pavement conditions are a critical aspect of asset management and directly affect safety. This study introduces a deep neural network method called U-Net for pavement crack segmentation based on drone-captured images to reduce the cost and time needed for airport runway inspection. The proposed approach can also be used for highway pavement conditions assessment during off-peak periods when there are few vehicles on the road. In this study, runway pavement images are collected using drone at various heights from the Fitchburg Municipal Airport (FMA) in Massachusetts to evaluate their quality and applicability for crack segmentation, from which an optimal height is determined. Drone images captured at the optimal height are then used to evaluate the crack segmentation performance of the U-Net model. Deep learning methods typically require a huge set of annotated training datasets for model development, which can be a major obstacle for their applications. An online annotated pavement image dataset is used together with the FMA data to train the U-Net model. The results show that U-Net performs well on the FMA testing data even with limited FMA training images, suggesting that it has good generalization ability and great potential to be used for both airport runways and highway pavements.
Submitted 9 January, 2020; originally announced January 2020.
Comments: 13 pages, 5 figures

1.简述

提出了一种基于无人机图像的路面裂缝深度神经网络 U-Net 分割方法。该方法也可应用于车辆较少的非高峰期公路路面状况评估。
结果表明,在 FMA 训练图像有限的情况下 U-Net 在 FMA 测试数据上表现良好,具有良好的泛化能力,在机场跑道和高速公路路面上都有很大的应用潜力。

2.数据集

使用无人机在美国马萨诸塞州费奇堡机场(FMA)的不同高度采集的跑道路面图像、
使用在线标注的路面数据集和 FMA 数据对 U-Net 模型进行训练。
1579509382227.png
使用无人机拍摄图片尺寸为(5472 pixels x 3648 pixels) ,将其划分为(256 pixels x 256 pixels)的块,对其进行标注。

3.方法

  • U-Net

在将其应用于分析无人机收集的跑道路面图像之前,对原始 u 形网进行了一些超参数调整。在(4,5)的启发下,考虑更深层次的结构,每个卷积层的通道数增加 0.5 倍,提高模型拟合和泛化能力。另外,将图像输入维数设置为 256×256 像素。

1579507355352.png

  • Data Augmentation

使用数据增强,进一步增加训练集数量。

  • Hyperparameters

优化器:Adam 优化器的学习率为.0001。

损失函数:binary cross entropy

激活函数:最后一层使用 Sigmoid 函数,其他层使用 ReLu 激活函数。

训练集(training episodes)设为 1000。

Batch size:5

训练

在 Crack500 训练集上训练,在 Crack500 和 FMA 测试集上测试、Crack500&FMA 数据集上训练,在 FMA 测试集上测试。
构造 Crack500&FMA 数据集是因为,FMA 数据集数量太少了,不足以训练 UNet 模型。

4.效果

1579510998291.png
1579511017802.png
1579511036262.png
U-Net_Crack500 模型还在高速公路,由激光路面扫描系统采集的图像,有着不错的表现。
1579511182475.png

2.Automated Pavement Crack Segmentation Using Fully Convolutional U-Net with a Pretrained ResNet-34 Encoder

arXiv:2001.01912 [pdf] cs.CV
Authors: Stephen L. H. Lau, Xin Wang, Xu Yang, Edwin K. P. Chong
Abstract: Automated pavement crack segmentation is a challenging task because of inherent irregular patterns and lighting conditions, in addition to the presence of noise in images. Conventional approaches require a substantial amount of feature engineering to differentiate crack regions from non-affected regions. In this paper, we propose a deep learning technique based on a convolutional neural network to perform segmentation tasks on pavement crack images. Our approach requires minimal feature engineering compared to other machine learning techniques. The proposed neural network architecture is a modified U-Net in which the encoder is replaced with a pretrained ResNet-34 network. To minimize the dice coefficient loss function, we optimize the parameters in the neural network by using an adaptive moment optimizer called AdamW. Additionally, we use a systematic method to find the optimum learning rate instead of doing parametric sweeps. We used a “one-cycle” training schedule based on cyclical learning rates to speed up the convergence. We evaluated the performance of our convolutional neural network on CFD, a pavement crack image dataset. Our method achieved an F1 score of about 96%. This is the best performance among all other algorithms tested on this dataset, outperforming the previous best method by a 1.7% margin.
Submitted 10 January, 2020; v1 submitted 7 January, 2020; originally announced January 2020.
Comments: 9 pages, 6 figures

1.描述

提出的神经网络结构是一个改进的 U-Net,其编码器被一个预先训练的 ResNet-34 网络所取代。使用 dice coefficient 作为损失函数,使用 AdamW 自适应矩优化器来优化神经网络中的参数。还使用了一个系统的方法来寻找最佳学习率。最终 F1 Score 为 91%。

2.数据集

使用 CFD(320x480)数据集,用如下方法做了数据增强。以 20 pixels 为步长,在图像的水平和垂直轴上分别裁剪出 128x128、256x256、320x320 尺寸的图片,在其对应的 ground truth 也做同样的操作。最终图片对应尺寸数量如下:
1579531826428.png
在训练时,每张图片执行三种类型的数据增强:旋转、翻转、改变亮度。

3.方法

  • ResNet34 base U-Net

1579532052186.png
ResNet34 是在 ImageNet 上预训练的模型,去除其最后的平均池化层和全连接层。接上了上采样模块。解码器由重复上采样块(图 2 中的洋红色和紫色块)组成,它将输出激活的空间分辨率提高一倍,同时将特征通道的数量减半。每个上采样层由 1 个 BN 层,ReLU 层和 1 个转置卷积层组成(2*2kernel,2 stride)。在 BN 层个转置卷积层之间添加了 SCSE 模块(concurrent spatial and channel squeeze and excitation module)。

  • dice coefficient loss

dice coefficient loss 相当于 F1 score,将其作为损失函数,相当于对 F1 分数直接进行优化。

训练

  • 参数初始化:下文的初始化方法(高斯分布,其均值为 0、方差为$2/n_l$,其中$n_l$为卷积层通道数。)ResNet34 部分使用预训练参数。

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:surpassing human-level performance on ImageNet classification,” in IEEE ICCV, Santiago, Chile, Dec. 2015.

  • 优化器:AdamW 优化器($\lambda =0.01$, $\alpha$是学习率, $\epsilon =10^{-8}$)
$$ \boldsymbol{\theta}_{t}=(1-\lambda) \boldsymbol{\theta}_{t-1}-\alpha\left(\frac{\widehat{\boldsymbol{m}_{t}}}{\sqrt{\widehat{\boldsymbol{v}}_{t}}+\epsilon}\right) $$
  • 学习率:使用较大篇幅讲学习率

4.效果

  • 评价方法
$$ \begin{aligned} \operatorname{Pr} &=\frac{T P}{T P+F P} \ R e &=\frac{T P}{T P+F N} \ F 1 &=\frac{2 \times \text { Pr } \times R e}{\text { Pr }+R e} \end{aligned} $$

1579535311307.png
1579535328566.png
1579535350811.png

3. CrackGAN: A Labor-Light Crack Detection Approach Using Industrial Pavement Images Based on Generative Adversarial Learning

arXiv:1909.08216 [pdf, other] cs.CV cs.LG eess.IV
Authors: Kaige Zhang, Yingtao Zhang, Heng-Da Cheng
Abstract: Fully convolutional network is a powerful tool for per-pixel semantic segmentation/detection. However, it is problematic when coping with crack detection using industrial pavement images: the network may easily “converge” to the status that treats all the pixels as background (BG) and still achieves a very good loss, named “All Black” phenomenon, due to the data imbalance and the unavailability of accurate ground truths (GTs). To tackle this problem, we introduce crack-patch-only (CPO) supervision and generative adversarial learning for end-to-end training, which forces the network to always produce crack-GT images while reserves both crack and BG-image translation abilities by feeding a larger-size crack image into an asymmetric U-shape generator to overcome the “All Black” issue. The proposed approach is validated using four crack datasets; and achieves state-of-the-art performance comparing with that of the recently published works in efficiency and accuracy.
Submitted 18 September, 2019; originally announced September 2019.

1.简述

FCN 网络是一个强有力的像素级分割网络,但用于工业级路面图像裂缝分割是有问题的。由于裂缝和背景的样本严重不平衡。网络很容易“收敛”到将所有像素作为背景(BG)的状态,但仍然会有很好的损失,称为“All Black”现象。
为了解决这一问题,我们引入了 crack-patch-only(CPO)监督和端到端训练的生成对抗学习,这迫使网络在保留裂纹和 BG 图像的同时,始终生成 crack-GT 图像。
该算法通过将较大尺寸的裂纹图像输入非对称 u 形发生器来克服裂纹“All Black”的问题。
解决

  • 1.All Black 问题:网络收敛到所有像素都是背景的状态
  • 2.提出 crack-patch-only (CPO) supervision and generative adversarial

learning

  • 只需要少量劳动力标注的 GTs,减少标注的劳动力。即使网络在小图片块上训练,也可以有效的在全尺寸图片上检测。

2.数据集

自建 CrackGAN dataset (2048x4096 pixel),安装在时速 100 公里的汽车顶部的线扫描工业摄像机拍摄,摄像头扫描 4.096 米宽的路面,每扫描一次,得到 2048×4096 像素的路面图像( 1 pixel represents 1×1 mm 2 area)

3.方法

  • 模型结构

1579767560440.png
D 是一个预训练鉴别器,它从只在 crack-GT patches 上训练的 DC-GAN 得来。
1579768013248.png
1579768035344.png

训练

交替优化以下两个目标:

$$ \begin{aligned} \max _{D} V(D, G) &=E_{x \sim p_{d}(x)}[\log D(x)] \ &+E_{z \sim p_{d}(z)}[\log (1-D(G(z)))] \ \max _{G} V(D, G) &=E_{z \sim p_{d}(z)}[\log (D(G(z)))] \end{aligned} $$

4.效果

1579768935867.png
1579768988464.png
1579768973457.png
在这项工作中,我们提出了一种新的深层生成对抗网络,命名为 CrackGAN,用于路面裂缝检测。该方法解决了基于 fcn 的像素级裂纹检测中存在的全黑问题。引入了具有 CPO 监督的生成式对抗损失,使目标函数规范化,克服了类裂纹对象设计中固有的数据不平衡问题。提出了基于 cpo 监督训练的 BG 图像非对称 u 形结构。此外,该网络被设计成 FCN,可以用小图像块进行训练,但可以无缝地处理全尺寸图像。实验证明了该方法的有效性,并与最近发表的文献进行了比较,取得了较好的效果。此外,接受域神经元特性的理论分析可以用来解释深度学习中的许多现象,如语义分割[6]的边界模糊、生成的图像用 GAN[33]、[35]模糊等。我们相信,本文所讨论的对每个神经元特性的分析将成为未来设计有效的神经网络的常规方法。

4. A Cost Effective Solution for Road Crack Inspection using Cameras and Deep Neural Networks

arXiv:1907.06014 [pdf] cs.CV cs.LG eess.IV
Authors: Qipei Mei, Mustafa Gül
Abstract: Automatic crack detection on pavement surfaces is an important research field in the scope of developing an intelligent transportation infrastructure system. In this paper, a cost effective solution for road crack inspection by mounting commercial grade sport camera, GoPro, on the rear of the moving vehicle is introduced. Also, a novel method called ConnCrack combining conditional Wasserstein generative adversarial network and connectivity maps is proposed for road crack detection. In this method, a 121-layer densely connected neural network with deconvolution layers for multi-level feature fusion is used as generator, and a 5-layer fully convolutional network is used as discriminator. To overcome the scattered output issue related to deconvolution layers, connectivity maps are introduced to represent the crack information within the proposed ConnCrack. The proposed method is tested on a publicly available dataset as well our collected data. The results show that the proposed method achieves state-of-the-art performance compared with other existing methods in terms of precision, recall and F1 score.
Submitted 22 October, 2019; v1 submitted 13 July, 2019; originally announced July 2019.

1.简述

使用 GoPro 运动相机采集图像。本文提出了一种结合条件瓦瑟斯坦生成对抗网络(conditional Wasserstein generative adversarial network)和连通图的道路裂缝检测方法。
该方法以 121 层密集连接的神经网络和 5 层全卷积网络为鉴别器,利用反卷积层进行多层特征融合。为了克服与反卷积层相关的离散输出问题,引入了连通图来表示拟合裂纹内的裂纹信息。
与现有的方法相比,该方法在精度、查全率和 F1 分数方面都达到了最新的水平。

2.数据集

相机后挡风玻璃安放:
1579774075928.png
1579774136892.png
1579774163742.png
使用后挂载配置有三个主要原因:

  • 挡风玻璃可以反射汽车内部的光线,降低前面安装配置的图像质量。
  • 前摄像头离地面较远,它的大部分视场(FOV)被汽车的引擎盖挡住了。因此,前面的安装配置牺牲了太多的空间分辨率与我们上面的分析。
  • 我们的最终目标是在车辆上直接使用备用摄像头进行行车时的裂纹检测。在这种情况下,不需要安装任何外部设备。

数据是在车辆以 40 ~ 80 公里/小时的速度行驶时采集的,相机采用 240 帧/秒的帧率和 1/3840 秒的快门速度,每 6 帧提取一次图像。最终在 Edmonton, Canada 不同路面采集 3 个小时图像,构造 EdmCrack600 数据集。这是目前最大的数据集。

  • 数据集对比

1579774626654.png
1579774649721.png

3.方法

  • ConnCrack

1579774743826.png

  • 生成器:cWGAN
    1579775914595.png

训练

  • loss function
$$ \begin{aligned} &L_{c \mathrm{WG}, A N}(G, D)=E_{x, y}[D(x, y)]-E_{x}[D(x, G(x))]\ &G^{*}=\arg \min _{G} \max _{D}\left(\lambda L_{c W G A N}(G, D)+L_{\text {content }}(G)\right) \end{aligned} $$
  • 预训练

在 ImageNet 和 CFD 数据集

学习率:$1x10^{-6}$

  • EdmCrack600 数据集

学习率:$1x10^{-5}$, λ is set to$5×10^{-6}$

4.效果

  • 预训练数据

1579776227118.png
1579776249755.png
1579776334046.png

  • EdmCrack600 数据集
    1579776609645.png
    1579776623377.png
    1579776697993.png
    1579776714014.png

5.FPCNet: Fast Pavement Crack Detection Network Based on Encoder-Decoder Architecture

arXiv:1907.02248 [pdf, other] cs.CV
Authors: Wenjun Liu, Yuchun Huang, Ying Li, Qi Chen
Abstract: Timely, accurate and automatic detection of pavement cracks is necessary for making cost-effective decisions concerning road maintenance. Conventional crack detection algorithms focus on the design of single or multiple crack features and classifiers. However, complicated topological structures, varying degrees of damage and oil stains make the design of crack features difficult. In addition, the contextual information around a crack is not investigated extensively in the design process. Accordingly, these design features have limited discriminative adaptability and cannot fuse effectively with the classifiers. To solve these problems, this paper proposes a deep learning network for pavement crack detection. Using the Encoder-Decoder structure, crack characteristics with multiple contexts are automatically learned, and end-to-end crack detection is achieved. Specifically, we first propose the Multi-Dilation (MD) module, which can synthesize the crack features of multiple context sizes via dilated convolution with multiple rates. The crack MD features obtained in this module can describe cracks of different widths and topologies. Next, we propose the SE-Upsampling (SEU) module, which uses the Squeeze-and-Excitation learning operation to optimize the MD features. Finally, the above two modules are integrated to develop the fast crack detection network, namely, FPCNet. This network continuously optimizes the MD features step-by-step to realize fast pixel-level crack detection. Experiments are conducted on challenging public CFD datasets and G45 crack datasets involving various crack types under different shooting conditions. The distinct performance and speed improvements over all the datasets demonstrate that the proposed method outperforms other state-of-the-art crack detection methods.
Submitted 4 July, 2019; originally announced July 2019.

1.简述

采用编译码器结构,自动学习多种环境下的裂纹特征,实现端到端的裂纹检测。
具体来说,我们首先提出了多扩展(Multi-Dilation, MD)模块,该模块可以通过与多个速率的扩展卷积来合成多个上下文大小的裂缝特征。该模块得到的裂纹 MD 特征可以描述不同宽度和拓扑结构的裂纹。
接下来,我们提出 Squeeze-and-Excitation(SEU)模块,利用挤压-激励学习操作优化 MD 特性。最后,将上述两个模块集成在一起,开发了快速裂纹检测网络 FPCNet。

2.数据集

  • CFD datasets
  • G45 crack datasets

3.方法

1579782781244.png

  • Multi-Dilation (MD)

1579782800843.png
MD 模块通过结合不同速率的多个扩展卷积[23]和一个全局池,提取不同上下文大小的裂缝特征,检测不同宽度和拓扑结构的裂缝。

  • rate=1:这种卷积适用于薄而简单的裂纹,但不能有效地检测宽裂纹和拓扑复杂的裂纹。
  • 这些裂纹可以用更大的 r 值(例如,4)的膨胀卷积来鲁棒检测。
  • SE-Upsampling (SEU)

1579784610822.png
具体来说,MD 特征首先通过转置卷积进行上采样,使其分辨率恢复 2 倍,通道数减少到原来值的一半。接下来,MC 具有与图 4 相同的分辨率。FPCNet 的网络结构。该方法使用 4 Convs(两个 33 个卷积和 ReLUs) +最大池作为编码器来提取特征。接下来,使用 MD 模块获取多个上下文大小的信息。随后,4 个 SEU 模块作为解码器运行。H 和 W 表示图像的原始大小。红色、绿色和蓝色箭头分别表示最大池、置换卷积和 11 个卷积+ s 形。MCF 表示编码器中提取的多卷积特征,MDF 表示 MD 特征。添加到 MD 特性。执行全局平均池,从添加的 MD 特性中获得每个通道的全局信息。随后,通过挤压操作(F sq)处理全局信息。使用全连接层来挤压具有一定比例的通道数量(在本研究中,我们使用的比例为 16),并使用 ReLU 层对输出进行非线性化。我们在输出端进行一个激励过程(F ex),利用全连通层将压缩后的输出恢复到原来的通道数。使用 sigmoid 层获取通道权值。较大的权值表明通道特征对裂纹检测的贡献较大。最后,将每个 MD 特征乘上相应的权值,得到最优的 MD 特征。

  • FPCNet

1579784755668.png

训练

  • loss function: binary cross entropy (BCE) + dice coefficient loss
$$ \begin{aligned} L\left(Y^{_}, Y\right)=& \frac{1}{N} \sum_{P \in N}\left(Y_{P}^{_} \cdot \lg Y_{P}+\left(1-Y_{P}^{*}\right) \cdot \lg \left(1-Y_{P}\right)\right.\ &+1-\frac{2 \times T P}{2 \times T P+F P+F N} \end{aligned} $$
  • 优化器:SGD with Momentum (0.9) a batch size of 1 and a weight decay of 0.0001.
  • 学习率:初始为 0.01,在第 50/80/110 epoch 分别缩小 10 倍,在 120epoch 终止。

4.效果

1579784862206.png

  • 效率

1579785225006.png
1579785243650.png

本文作者 : HeoLis
原文链接 : https://ishero.net/%E8%B7%AF%E9%9D%A2%E8%A3%82%E7%BC%9D%E5%88%86%E5%89%B2%E7%9B%B8%E5%85%B3%E8%AE%BA%E6%96%87%E7%AC%94%E8%AE%B0.html
版权声明 : 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明出处!

学习、记录、分享、获得

微信扫一扫, 向我投食

微信扫一扫, 向我投食