

































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
1 Department of Materials Science and Engineering, University of Wisconsin-Madison,. 7. Madison, Wisconsin, 53706, USA. 8. 2 Oak Ridge National Laboratory, ...
Typology: Schemes and Mind Maps
1 / 41
This page cannot be seen from the preview
Don't miss anything!


































1 A Deep Learning Based Automatic Defect Analysis Framework for
2 In-situ TEM Ion Irradiations
Mingren Shen
1
, Guanzhao Li
1
, Dongxia Wu
3
, Yudai Yaguchi
5
, Jack Haley
4
, Kevin G. Field
2 ,
Dane Morgan
1
1
7 Department of Materials Science and Engineering, University of Wisconsin-Madison,
8 Madison, Wisconsin, 53706, USA
2
9 Oak Ridge National Laboratory, Oak Ridge, Tennessee, 37830, USA
3
10 Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin,
4
12 Department of Materials, University of Oxford, Oxford, OX1 3PH, UK
5
13 Department of Computer Sciences, University of Wisconsin–Madison, Madison,
14 Wisconsin, 53706, USA
6
15 Now at Nuclear Engineering and Radiological Sciences, University of Michigan - Ann
16 Arbor, Michigan, 48109 USA
25 Abstract
26 Videos captured using Transmission Electron Microscopy (TEM) can encode details regarding
27 the morphological and temporal evolution of a material by taking snapshots of the microstructure
28 sequentially. However, manual analysis of such video is tedious, error-prone, unreliable, and
29 prohibitively time-consuming if one wishes to analyze a significant fraction of frames for even
30 videos of modest length. In this work, we developed an automated TEM video analysis system
31 for microstructural features based on the advanced object detection model called YOLO and
32 tested the system on an in-situ ion irradiation TEM video of dislocation loops formed in a
33 FeCrAl alloy. The system provides analysis of features observed in TEM including both static
34 and dynamic properties using the YOLO-based defect detection module coupled to a geometry
35 analysis module and a dynamic tracking module. Results show that the system can achieve
36 human comparable performance with an F1 score of 0.89 for fast, consistent, and scalable frame-
37 level defect analysis. This result is obtained on a real but exceptionally clean and stable data set
38 and more challenging data sets may not achieve this performance. The dynamic tracking also
39 enabled evaluation of individual defect evolution like per defect growth rate at a fidelity never
40 before achieved using common human analysis methods. Our work shows that automatically
41 detecting and tracking interesting microstructures and properties contained in TEM videos is
42 viable and opens new doors for evaluating materials dynamics.
47 Introduction
48 Transmission Electron Microscope (TEM) has widely been used to characterize a
49 material or material system since TEM provides resolution limits at or below common
50 microstructural features of interest. Recently, a surge in the use of in-situ TEM techniques has
51 occurred, partially due to the advent of digital capture devices. In-situ TEM experiments have a
52 distinct advantage over ex-situ experiments as they allow researchers to study materials’ intrinsic
53 properties and responses as external conditions are manipulated such as temperature, pressure,
and type of reaction cells
1
55 challenging problems like elucidating mechanisms for catalysis, atomic behavior during material
reactions, and nanoscale property changes under loads
1,
57 The value extracted from an in-situ TEM experiment requires careful analysis of the
58 observed processes. For many of these experiments, this analysis includes dynamically detecting
59 features present in the microstructure and analyzing the microstructural evolution in each frame
60 of the experiment, typically captured in a digital video form. For decades, quantification of
61 defects in in-situ TEM data has been completed by humans, which is tedious, time-intensive,
62 error-prone, biased, and impractical to scale. For example, a typical manual workflow of
63 counting defects in TEM images requires an experienced researcher carefully going through
64 every frame for different types of defects and labeling objects in the images one by one. Such
65 manual analysis typically takes many minutes per frame (e.g., in this study, we found it takes
66 about 20 to 60 minutes to process 1 frame, depending on the complexity of the TEM images).
67 Typical in-situ TEM experiments can generate tens to hundreds of frames of video data per
68 minute, so a long video can rapidly become impractical to analyze. Moreover, the labeling
69 quality also depends on the attentiveness of researchers, which may be reduced after spending
70 hours on this repetitive work. Furthermore, other factors such as researchers’ proficiency and
71 personal preference when analyzing TEM images contribute to inaccurate or at least inconsistent
72 labeling. Human interpretations are often required for analyzing TEM images that the same
73 researcher may give different labeling results at different times. The above observations imply
74 that manual counting and analyzing methods are hard to scale and prone to human-based errors.
75 In the future, the demand for better TEM analysis methods will only grow, as recent advances in
76 TEM equipment, e.g., high-speed cameras, and fast microprocessors will keep accelerating the
rate of data acquisition
3
78 Automatic analysis of TEM/STEM data, especially identifying microstructural defects, is
79 a long-standing pursuit of both the academic community and the industrial sector. To
80 automatically analyze defects contained in TEM/STEM images, various methods have been
applied, such as matching key-points in different regions of interest
4
81 , applying different
thresholding values to segment different defects
5,
82 , representing the texture of various targeted
structures by the bag of visual words (BoW)
7
or synthesizing artificial image dataset
8
83 , and using
84 traditional machine learning methods, e.g., k-means clustering, to find defects contained in
TEM/STEM images
3
86 that they still require extensive human knowledge and time to apply to a given system, and each
87 new material system requires a significant new investment to find an effective approach.
88 Recently, modern deep learning methods have been applied to solve the defect identification
problem in static TEM/STEM image data
9 – 11
89 , suggesting this is a promising approach that could
90 be reliable and highly flexible across many materials and problems. However, deep learning
91 approaches have not yet been applied to the problem of automatic detection and analysis of
92 defects contained in in-situ TEM video. Imaging-related research around TEM/STEM video
work
9
, computer vision methods like Local Binary Patterns (LBP)
30
115 descriptor were used to
describe local pixel environment, machine learning methods like AdaBoost
31
116 were used to select
the most useful visual features, and shallow Convolutional Neural Networks (CNNs)
32
117 were used
118 to refine the final output to find the defects. This type of methods requires extensive hand-tuning
119 and integration of multiple stages which is unlikely to provide a practical general approach
120 compared to relying on just one deep learning model. The second main type of methods belongs
to the encoder-decoder type of deep learning methods
33
122 weakly supervised encoder-decoder type neural network system to extract atomic locations and
123 defect types from atomically resolved scanning transmission electron microscopy images to
124 interpret complex atomic and defect transformation of silicon dopants in graphene as a function
of time
26
126 widely used for analyzing material images like studying segmentation of nanoparticles in bright
field TEM images
34
and defects in STEM images of steels
11
128 the most relevant information and builds a useful inner state, which usually requires careful
training
33
35
16
, Mask R-CNN
36
, and SSD
37
130 etc. For example, Mask R-CNN is used by Chen
et al. to study the microstructure of aluminum alloy
38
131 and Faster R-CNN is utilized by Anderson
et al. to investigate helium bubbles in X-750 alloy under irradiation
27
133 quite accurate and fairly easy to train. Given their wide use and appealing properties, in this
134 study, we use one of the third type of methods (YOLO) to study geometrical and temporal
135 changes of loop type defects in the videos of In-situ TEM ion irradiations. YOLO is similar to
136 methods used by others for TEM but unique in its speed to analyze videos in real time.
137 One of the most successful applications of deep learning is computer vision, where the
138 ultimate goal is teaching a computer to do the image-related task(s) like finding which object
139 contained in an image (object detection) and which pixel belongs to different objects (image
segmentation)
39
141 Challenge (ILSVRC) was made in 2012, the deep Convolutional Neural Network (CNN) based
approach has demonstrated its success in many image-related tasks
39
143 problem (trying to find the location and category of all the objects contained in an image), which
144 is also the focus of this project, there are two general categories of methods: two-stage methods
and one-stage methods
39
. For a two-stage method like Faster R-CNN
35
145 , the object detector will
146 first propose some candidate bounding boxes containing the object location information and then
147 classify the category of those candidate bounding boxes. One-stage methods like YOLO will
output the object location and category at the same time
16
149 slower but more accurate than one-stage methods. YOLO (which is an acronym for You Only
150 Look Once), is one of the most widely used one-stage methods and offers speed, accuracy, and
fast engineering application potentials
39
152 (or video frames) into grids and predicting the location and the category of the potential
bounding boxes with a set of pre-defined anchor boxes in each cell of the grid
16
154 improving its design and implementation details over time and during the writing of this draft,
YOLOv
40
155 and YOLOv5 (https://github.com/ultralytics/yolov5) were developed. However, we
will use YOLOv3 as this is still the most widely used and recognized version
16,41– 43
157 In this paper, we focus on the specific task of adapting the deep learning based YOLOv
158 model into an automated framework for analyzing in-situ TEM video data. Specifically, we
159 focus on the problem of detection and analysis of radiation-induced dislocation loops generated
condition used where g =011 and the deviation parameter, s g
183 , close to zero, only a fraction of
184 a/2〈 111 〉 and a〈 100 〉 loops are visible in TEM. To enable direct comparison to the previous
human-based analysis where multiple g - vectors were used for detection and analysis
47,
185 , we
186 applied a fractional visibility constant 7/4 to make YOLO detection results comparable with
published results using other g conditions
44
188 are elliptical and the defect size is the length of major axes of the ellipse. The video image size is
189 1344 pixel x 962 pixel with 2.6884 pixel/nm conversion factor which gives the physical size as
190 500.0 nm x 357.8 nm. The video consisted of 1175 frames which were linearly related to dpa and
191 time through Equation 1 which means each frame corresponds to 1.75 seconds.
− 4
193 Equation 1
194 The video data was acquired via frame-by-frame image registration to eliminate sample
195 drift and relevant camera movement. Since there is no landmark frame or feature that can be
196 used to align the video across the whole irradiation dose range, the video was divided into
197 smaller batches for primary image registration, and the final sets from the previous batch were
used to carry over the alignment
15,
selected landmark frame with a template matching and slice alignment plugin
50
200 dissemination of data and codes for this paper is described in the Data and Code Dissemination
201 section.
202 We opted not to use any previous data labeling and decided to label the data ourselves for
203 this project to establish the ground truth data. This choice was because we did not have the exact
pixel positions of each defect in the previous study by Haley, et al
44
process that has been used in other studies
9
206 researchers and they checked each other’s labeling and explanations for 3 frames before labeling
the real data via an open-source software called ImageJ
51
208 ground truth in this study.
209 The YOLOv3 model was adopted from an open GitHub repository
210 (https://github.com/qqwweee/keras-yolo3). To train the model, we first converted the pretrained
211 darknet53 weight via COCO dataset into Keras format and then modified the final class number
212 to our defect number, which was one class in our case since we treated all a/2〈 111 〉 and a〈 100 〉
213 loops as the same type of defect. This single class approach was necessary as Burgers vector
214 determination is not possible using only the g =011 condition in the video. We then applied the
215 transfer learning technique to fine-tune the model by freezing the first 245 layers of YOLOv
and training the last 7 layers
52
217 composed of 1176 frames and 21 frames were selected and labeled. The sampling was done at
218 random except an effort was made to assure that the sampled frames were approximately
219 uniformly distributed throughout the full set. Among the sampled 21 frames, 15 frames were
220 used for training (trained on 12 frames and validated on 3 frames) and 6 frames were used for
221 testing, where these 6 frames were not seen by the YOLO model during training. The model was
222 trained on GeForce GTX 1080 for 18300 epoch and the learning rate of Adam optimizer was
switched between 10
and 10
223 with batch size equal to 4 and Non-Max Suppression (NMS) IoU
224 equals 0.45 to find the optimal weights. Real-time data augmentation operations, e.g., left-right
225 flip, changing hue, saturation, lightness, were applied for the training dataset to enrich the dataset
226 and enhance the performance of the CNN for variations in defect contrast, size, and morphology
242 Figure 1. Selected images from the test dataset for various damage doses (e.g. time scale). Subfigure (a), (b),
243 and (c) are the ground truth labeling developed by two researchers, while subfigure (d), (e), and (f) are
244 labeled by the automated machine learning program. Here, (a) and (d) are for frame number 120 , (b) and (e)
245 are for frame number 472 , and (c) and (d) are for frame number 824. 1 frame increment equates to about
246 0.00140 dpa, see Equation 1. F1 score compares the machine detection results with human labeling of each
247 column separately.
250 Figure 2. The visualized comparison of the human labeling results (blue boxes) to the YOLO detector results
251 (yellow boxes) for frame number 824.
253 The initial qualitative comparison was encouraging so additional quantitative analysis
254 was conducted. The statistics of the model performance were examined based on the metrics of
255 precision and recall and their harmonic mean which is also called the F1 score. The precision,
256 recall, and F1 score were generated using the six test images that were never used in the training
257 process. The test was iterated with different cut-off Intersection-over-Union (cut-off IoU) values
258 as shown in Figure 3. Here and elsewhere in the paper the IoU refers to the ratio of the area of
259 overlap (intersection) to the combined areas (union) of predicted and ground truth bounding
260 boxes. The cut-off IoU refers to the threshold above which a predicted bounding box is
261 considered as a candidate match for a ground truth bounding box. Predicted bounding box
262 matches are assigned by building a matrix of all IoU values between all predicted and ground
288 Figure 3 .The performance of the YOLO detector with different cut-off IoU thresholds.
290 The developed YOLO model was run on each frame of the in-situ TEM video to extract
291 geometry information of each visible defect for the duration of the experiment. After obtaining
292 the geometry and position of each defect per frame, we used this information to extract defect
293 properties, such as median size and number density. Such properties of materials are widely used
294 in the nuclear materials field from which the dataset originated and provide insights into the
295 interplay between the imparted damage and the change in microstructure. We picked four typical
296 frames and compared the machine learning prediction results with ground-truth labeling. Those 4
297 frames were not used in training and testing the machine learning model. We first compared
298 defect density. Defect density is important for many materials properties and for nuclear
299 materials as it is strongly correlated to mechanical properties, e.g., through the dispersed barrier
model
46,
301 are summarized in Figure 4. The densities of defects per frame were determined via machine
302 learning (ML) method and manual labeling for ground truth by dividing the total number of the
303 loops by the volume of the sample for each frame. The sample was treated as a rectangle bulk
with dimensions of 416.6 x 264 x 75 nm
3
304 and both results were corrected based on the loop
305 invisibility for the given imaging condition.
308 Figure 4. Loop number density from the whole TEM video. The plot compares the loop number density
309 obtained from the ground truth labeling done by experts in this study and the result obtained from the
310 machine learning detector. All the data shown on the plot uses the corrected proposed density, which is 7/4 of
335 background were found by applying a thresholding method, specifically Otsu's binarization and
336 Distance Transformations. To remove noise, we use a morphological opening operation with a
337 3x3 kernel. We followed the official tutorial from OpenCV, and more details can be found
there
58
339 very smooth. OpenCV’s fitEllipse() function was called to fit the needed defects and the
340 major axis length of the fitted ellipse was defined as the defect size. Detailed fitting results with a
341 cropped region of interests are provided in the Data and Code Dissemination section. The
342 machine learning results of the defect size distributions were compared to ground truth labeling
343 in Figure 5. Although differences were observed in defect median size of each frame in Figure 5 ,
344 investigation indicated that these differences did not exceed 13 .0% difference in median size,
345 and the average difference is only 5.5% and the standard deviation of difference is 5.3% across
346 all doses investigated. The exact formula used to calculate these statistics is given in the
347 Supplemental Information (SI) Section 1. These results indicated that a well-trained machine
348 learning based model could be used for loop detection and analysis and achieve human-like
349 performance comparable or better than the large differences that can be expected by manual
labelling
9
351 learning results. At the same time, it needs to be emphasized again that the true strength of such a
352 technique lies in its ability to detect defect information for every frame quickly and accurately
353 instead of just focusing on a small subset of frames.
356 Figure 5. Box plot comparing the distribution of median size for two methods the ground truth labeling done
357 by experts in this study, and the result from machine learning detector. All distributions are separately
358 analyzed and compared by their irradiation condition, which is 1.0, 1.5, 2.0, and 2.5 dpa.
360 Figure 6 shows the size distribution for the entire duration of the in-situ ion irradiation
361 TEM experiment where, for each frame, the blue line represents the median of loop size, the top
362 of the gray boundary indicates the third quartile of loop size distribution and the bottom of the
363 gray boundary indicates the first quartile of loop size distribution. With the YOLO-based
364 machine learning detector, we could extract data generated in every frame and investigate the
365 material properties with hundreds of times more data than previously collected by hand for this
366 data set (the data collected by hand are the red points shown for four different typical dpa values
367 where these four points are not seen in training data, with red lines connecting them as a guide to
368 the eye). The large amount of analyzed data makes subtle trends easy to identify. For example,
369 although there are some noises, a clear trend can be seen in Figure 6 that the median size, Q1,