quadnet-loss metric calculation, Summaries of Engineering

quadnet-loss metric calculation using triplet loss

Typology: Summaries

2021/2022

Uploaded on 03/08/2023

jsohaenr
jsohaenr 🇺🇸

2 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Correcting the Triplet Selection Bias
for Triplet Loss
Baosheng Yu1, Tongliang Liu1, Mingming Gong2,3,
Changxing Ding4, and Dacheng Tao1
1UBTECH Sydney AI Centre and SIT, FEIT, The University of Sydney
2Department of Biomedical Informatics, University of Pittsburgh
3Department of Philosophy, Carnegie Mellon University
4School of Electronic and Information Engineering, South China University of
Technology
Abstract. Triplet loss, popular for metric learning, has made a great
success in many computer vision tasks, such as fine-grained image clas-
sification, image retrieval, and face recognition. Considering that the
number of triplets grows cubically with the size of training data, triplet
selection is thus indispensable for efficiently training with triplet loss.
However, in practice, the training is usually very sensitive to the selec-
tion of triplets, e.g., it almost does not converge with randomly selected
triplets and selecting the hardest triplets also leads to bad local minima.
We argue that the bias in the selection of triplets degrades the per-
formance of learning with triplet loss. In this paper, we propose a new
variant of triplet loss, which tries to reduce the bias in triplet selection by
adaptively correcting the distribution shift on the selected triplets. We
refer to this new triplet loss as adapted triplet loss. We conduct a number
of experiments on MNIST and Fashion-MNIST for image classification,
and on CARS196, CUB200-2011, and Stanford Online Products for im-
age retrieval. The experimental results demonstrate the effectiveness of
the proposed method.
Keywords: Triplet Loss ·Selection Bias ·Domain Adaptation
1 Introduction
Deep metric learning aims to learn a similarity or distance metric which enjoys
a small intra-class variation and a large inter-class variation [42]. Triplet loss is
a popular loss function for deep metric learning and has made a great success in
many computer vision tasks, such as fine-grained image classification [39], image
retrieval [17, 22], person re-identification [6, 14], and face recognition [34, 31].
Recently, deep metric learning approaches employing triplet loss have attracted a
lot of attention due to their efficiency for dealing with enormous of labels, e.g., the
extreme multi-label classification problem [32]. More specifically, for conventional
classification approaches, the number of parameters will increase linearly with
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download quadnet-loss metric calculation and more Summaries Engineering in PDF only on Docsity!

Correcting the Triplet Selection Bias

for Triplet Loss

Baosheng Yu^1 , Tongliang Liu^1 , Mingming Gong^2 ,^3 , Changxing Ding^4 , and Dacheng Tao^1 (^1) UBTECH Sydney AI Centre and SIT, FEIT, The University of Sydney (^2) Department of Biomedical Informatics, University of Pittsburgh (^3) Department of Philosophy, Carnegie Mellon University (^4) School of Electronic and Information Engineering, South China University of Technology [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. Triplet loss, popular for metric learning, has made a great success in many computer vision tasks, such as fine-grained image clas- sification, image retrieval, and face recognition. Considering that the number of triplets grows cubically with the size of training data, triplet selection is thus indispensable for efficiently training with triplet loss. However, in practice, the training is usually very sensitive to the selec- tion of triplets, e.g., it almost does not converge with randomly selected triplets and selecting the hardest triplets also leads to bad local minima. We argue that the bias in the selection of triplets degrades the per- formance of learning with triplet loss. In this paper, we propose a new variant of triplet loss, which tries to reduce the bias in triplet selection by adaptively correcting the distribution shift on the selected triplets. We refer to this new triplet loss as adapted triplet loss. We conduct a number of experiments on MNIST and Fashion-MNIST for image classification, and on CARS196, CUB200-2011, and Stanford Online Products for im- age retrieval. The experimental results demonstrate the effectiveness of the proposed method.

Keywords: Triplet Loss · Selection Bias · Domain Adaptation

1 Introduction

Deep metric learning aims to learn a similarity or distance metric which enjoys a small intra-class variation and a large inter-class variation [42]. Triplet loss is a popular loss function for deep metric learning and has made a great success in many computer vision tasks, such as fine-grained image classification [39], image retrieval [17, 22], person re-identification [6, 14], and face recognition [34, 31]. Recently, deep metric learning approaches employing triplet loss have attracted a lot of attention due to their efficiency for dealing with enormous of labels, e.g., the extreme multi-label classification problem [32]. More specifically, for conventional classification approaches, the number of parameters will increase linearly with

2 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

128-D

, , , , , , ,

, ,

, , ,

extract feature embedding (^) select triplets evaluate loss

, ,

,

, , ,

, ,

,

, , ,

prepare data

k

1 2 3 4 5 6 …

Fig. 1: The pipeline of triplet loss based deep metric learning. In the first stage, a mini-batch is sampled from the training data, which usually contains k identities with several images per identity. Deep neural networks then are used to learn a feature embedding, e.g., a 128-D feature vector. In the third stage, a subset of triplets are selected using some triplet selection methods. Lastly, the loss is evaluated using the selected triplets.

the number of labels, and it is impractical to learn an N-way softmax classifier with millions of labels [29]. However, with triplet loss, deep metric learning is able to efficiently deal with an extreme multi-label classification problem by learning a compact embedding, which is known as the large margin nearest neighbor (LMNN) classification [42]. As a result, deep metric learning exploiting triplet loss is very efficient for applications with enormous labels, e.g., the number of objects in image retrieval [17], the number of identities in face recognition [34] and person re-identification [14].

To learn a discriminative feature embedding, triplet loss maximizes the mar- gin between the intra-class distance and the inter-class distance. As a result, for each triplet (xa, xp, xn), where xa^ is called the anchor point, xp^ is called the positive point having the same label with xa, and xn^ is called the negative point having a different label, the intra-class distance d(xa, xp) will be smaller than the inter-class distance d(xa, xn) in the learned embedding space. As the number of triplets grows cubically with the size of training data, triplet selection thus is indispensable for efficiently training with triplet loss. Specifically, triplet selection usually works in an online manner, i.e., triplets are constructed within each mini-batch [34], and we describe a typical pipeline of deep metric learning using triplet loss in Fig. 1.

However, the performance of triplet loss is heavily influenced by triplet se- lection methods [6, 14], i.e., training with randomly selected triplets almost does not converge while training with the hardest triplets often leads to a bad lo- cal solution [34]. To ensure fast convergence, it is crucial to select “good” hard triplets [34] and a variety of triplet selection methods have been designed in dif- ferent applications [39, 17, 34, 14]. Although selecting hard triplets leads to fast

4 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

data and label, respectively. More specifically, we propose a distribution match- ing loss function by employing Maximum Mean Discrepancy (MMD) [16], which measures the difference between P S^ (Φ(X)|Y ) and P T^ (Φ(X)|Y ). As a result, we learn a discriminative and conditional invariant embedding by jointly training with the triplet loss and the distribution matching loss. In this paper, we first introduce the problem of triplet selection bias for learning with triplet loss. We then address this problem by reducing distribution shift between the triplet-induced data DˆS and DˆT. As the proposed distribution matching loss adaptively corrects the distribution shift, we refer to this new variant of triplet loss as adapted triplet loss. Lastly, we conduct a number of experiments on MNIST [23] and Fashion-MNIST [45] for image classification, on CARS196 [20], CUB200-2011 [38], and Stanford Online Products [29] for image retrieval. The experimental results demonstrate the effectiveness of the proposed method.

2 Related Work

Deep Metric Learning and Triplet Loss. Many problems in machine learn- ing and computer vision depend heavily on learning a distance metric [42]. In- spired by the great success of deep learning [21], deep neural networks have been widely used to learn a discriminative feature embedding [39, 15]. Deep metric learning employing triplet loss raises a lot of attention due to its impressive per- formance on FaceNet [34] for face verification and recognition. After that, triplet loss has been widely used to learn a discriminative embedding for a variety of applications, such as image classification [39] and image retrieval [17, 22, 49, 12, 47]. A majority of applications for triplet loss lies in visual object recognition, such as action recognition [33], vehicle recognition [26], place recognition [1], 3d pose recognition [43], face recognition [34, 31, 9], and person re-identification [10, 46, 6, 25, 4, 14].

Triplet Selection Methods. Triplet selection is the key for the success of triplet loss and a variety of triplet selection methods have been used in different applications [39, 15, 34, 31, 40, 7]. More specifically, in the deep ranking model proposed by [39], triplets are selected according to the pair-wise relevance score. In [40], the triplets are selected using the top k triplets in each mini-batch based on the margin d(xa, xp) − d(xa, xn). In [15], it selects only hard triplets, i.e., d(xa, xp) < d(xa, xn), while [34, 31] select semi-hard triplets which violate the triplet constraint, i.e., d(xa, xp) + α < d(xa, xn), where α is a positive scalar. Unlike [34], which defines semi-hard triplet using moderate negatives, [35] se- lect semi-hard triplets based on moderate positives. [7] proposes an online hard negative mining method for triplet selection to boost the performance on triplet loss. In [14], it proposes a batch-hard triplet selection method, i.e., it first select a set of hard anchor-positive pairs, and it then select hardest negatives within the mini-batch. Recently, [44] proposes a weighted sampling method to address the sampling matters in deep metric learning.

Correcting the Triplet Selection Bias for Triplet Loss 5

Domain Adaptation. Domain adaptation methods can be divided into four categories due to different assumptions about how the distribution shifts across domains. (1) Covariate shift [16] assumes the marginal distribution P (X) changes across domains while the conditional distribution P (Y |X) stays the same. (2) Model shift [41] assumes that both P (X) and P (Y |X) independently change across domains. (3) Target shift [48] assumes that the marginal distribu- tion P (Y ) shifts wile P (X|Y ) stays the same. (4) Generalized target shift [11, 27, 24] assumes that both P (Y ) and P (X|Y ) independently change. Since triplet loss is widely used for extreme multi-label classification problems, we model the triplet selection bias by the change of P (X|Y ) in this paper.

3 Formulation

In this section, we first introduce triplet loss for deep metric learning and a widely used triplet selection method, i.e., semi-hard triplets [34]. We then for- mulate the problem of triplet selection bias as the distribution shift problem on triplet induced data. To minimize the distribution shift, we propose a distribu- tion matching loss, which jointly works with the triplet loss to adaptively correct the distribution shift. As a result, we refer to this new triplet loss as adapted triplet loss.

3.1 Triplet Loss for Deep Metric Learning

Let X, Y denote two random variables, which indicate data and label, respec- tively. Let D denote a set of training data sampled from P (X, Y ), i.e., D = {(xi, yi)| (xi, yi) ∼ P (X, Y )}. Metric learning aims to learn a distance function that assigns small (or large) distance to a pair of similar (or dissimilar) exam- ples. A widely used distance metric, i.e., the Mahalanobis distance, is defined as follows: d^2 K (xi, xj ) = (xi − xj )⊤K(xi − xj ), (2)

where K is a symmetric positive semi-definite matrix. As K can be decomposed as K = L⊤L, we then have

d^2 K (xi, xj ) = ‖L(xi − xj )‖^22 = ‖x′ i − x′ j ‖^22 , (3)

where x′ i = Lxi and x′ j = Lxj. Inspired by this, deep metric learning uses deep neural networks to learn a feature embedding x′^ = Φ(x), which generalizes the linear transformation x′^ = Lx to a non-linear transformation Φ(x). That is, the learned distance metric is

d^2 K (xi, xj ) = ||Φ(xi) − Φ(xj )||^22. (4)

To learn a discriminative feature embedding Φ(x), i.e., intra-class distance is smaller than inter-class distance [42], triplet loss is defined as follows:

L∗ triplet =

(xa,xp,xn)∈DT

[d^2 K (xa, xp) − d^2 K (xa, xn) + α]+, (5)

Correcting the Triplet Selection Bias for Triplet Loss 7

input space embedding space

Fig. 3: An example illustrating the conditional invariant representation. There is a distribution shift between the source domain and the target domain in the input space, i.e., P S^ (X|Y ) 6 = P T^ (X|Y ). By learning a conditional invariant representation Φ(x), both source domain and target domain shares similar dis- tribution in the embedding space, i.e., P S^ (Φ(X)|Y ) = P T^ (Φ(X)|Y ). That is, the embedding Φ(x) generalizes well on the target domain while it is learned on source domain. Intuitively, the source domain consists of the selected triplets DS while the target domain consists of all triplets DT. That is, we learn a condi- tional invariant embedding using selected triplets and it will generalize well on all triplets.

i.e., a dimension fixed feature vector. Inspired by [48, 11], we learn a shared conditional invariant representation between DˆS and DˆT , i.e.,

P S^ (Φ(X)|Y ) = P T^ (Φ(X)|Y ). (11)

See Fig. 3 for an example of the conditional invariant representation. Maxi- mum Mean Discrepancy (MMD) has been widely used to estimate the difference between two distributions [16] and we thus use the conditional mean feature embedding to estimate the difference between P S^ (Φ(X)|Y ) and P T^ (Φ(X)|Y ). As a result, the distribution matching loss can be defined as follows:

Lmatch =

y

‖ΦS y − ΦT y ‖^22 , (12)

8 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

where ΦS y and ΦT y are class-specific mean feature embeddings on DˆS and DˆT respectively, i.e.,

ΦS y =

(X,Y =y)∈ DˆS

P S^ (Φ(X)|Y ) ∗ Φ(X) (13)

and ΦT y =

(X,Y =y)∈ DˆT

P T^ (Φ(X)|Y ) ∗ Φ(X). (14)

To correct the distribution shift in learning with triplet loss, we thus learn a discriminative and conditional invariant feature embedding by jointly minimizing the triplet loss as well as the distribution matching loss, i.e.,

L = Ltriplet + λ ∗ Lmatch, (15)

where λ is a trade-off parameter. Considering that this new variant of triplet loss adaptively corrects the triplet selection bias, we refer to it as adapted triplet loss.

3.4 Semi-supervised Adapted Triplet Loss

Unlabeled data are usually very helpful for domain adaptation. We believe that the unlabeled data will also be helpful for correcting the triplet selection bias. To scale the adapted triplet loss for exploiting large scale unlabeled data, we extend it for the semi-supervised setting. Given a set of labeled data D 1 and a set of unlabeled data D 2. Let DT 1 denote the all triplets constructed from D 1 and DS denote the subset of selected triplets, i.e., DS ⊆ DT 1. Let DT 2 be the latent triplets constructed using the un- labeled data D 2 , which is actually unavailable since we do not know the latent labels of D 2. Different from the supervised setting, in which we learn a condi- tional invariant representation among DS and DT 1 , we consider how to learn a conditional invariant representation between DS , DT 1 , and DT 2 , i.e.,

P S^ (Φ(X)|Y ) = P T^1 (Φ(X)|Y ) = P T^2 (Φ(X)|Y ). (16)

Given the target P S^ (Φ(X)|Y ) = P T^2 (Φ(X)|Y ), we then have

y

P T^2 (Φ(X)|Y )P T^2 (Y ) =

y

P S^ (Φ(X)|Y )P T^2 (Y ). (17)

That is, if we know the class ratio P T^2 (Y ) for triplet-induced data DˆT 2 , we are able to estimate the difference between P S^ (Φ(X)|Y ) and P T^2 (Φ(X)|Y ). Inspired by [18], we estimate the class ratio P T^2 (Y ) by converting it into an optimization problem, i.e.,

θT^2 = arg min θ

y

θT y 2 ∗ ΦS y − ΦT^2 ‖^22 , s.t.

y

θy = 1, (18)

10 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

randomly cropped to 224×224. We use a learning rate 0.0005 with the batch size 120 and the maximum training iterations are set to 15k iterations on CARS196, 20k iterations on CUB200-2011, and 50k iterations on Stanford Online Products datasets. To ensure enough triplets in each mini-batch, we prepare the training data using a similar method with [34], i.e., each mini-batch is randomly sampled from 20 classes with 6 images per class.

4.2 Experiment on Image Classification

In this subsection, we describe the experimental results on MNIST and Fashion- MNIST datasets. To demonstrate the effectiveness of the proposed method, we compare the classification accuracy of models trained using the original triplet loss function (baseline) and the adapted triplet loss function. The evaluation metric can be described as follows: to learn a fixed dimensional feature embed- ding Φ(x), we train our models using the original triplet loss function and the adapted triplet loss function respectively.

0 5k 10k 15k 20k Iteration

1

Accuracy Original Adapted

(a)

0.0 0.1 0.5 1.0 2.0 5. 6

Accuracy

(b)

Fig. 4: Results on MNIST dataset. In figure (a), we use λ = 2.0 for adapted triplet loss and compare its performance with the original triplet loss for every 100 iterations. In figure (b), we compare the test accuracy for using different λ.

For testing, we first evaluate the conditional mean embedding E [Φ(x)|y], i.e., the mean point in embedding space, for each class y using the training data. For each input x in test set, we then assign it to a class ˆy according to the nearest mean point, i.e., yˆ = arg min y

‖Φ(x) − E[(Φ(x)|y)]‖^22 (21)

We demonstrate the results on MNIST dataset in Fig. 4. More specifically, we find that: in figure (a), the adapted triplet loss brings improvement after 5k itera- tions. Possible explanations for this improvement can be described as follows: for

Correcting the Triplet Selection Bias for Triplet Loss 11

the original triplet loss, the gradient might be dominated by the noise triplets or the hard triplets from some specific classes while the distribution matching loss can adaptively corrects the triplet selection bias between selected triplets and all possible triplets. That is, the adapted triplet loss will generate more balanced gradients for each iteration. Another reason is that the distribution matching loss acts as a regularizer for the original triplet loss, which reduces the risk of overfitting. We evaluate the performance for the adapted triplet loss using dif- ferent loss weight λ, i.e., λ = 0, 0. 1 , 0. 5 , 1. 0 , 2. 0 , 5 .0 in figure (b). Specifically, we use λ = 0 for the original triplet loss, which is a special case of the adapted triplet loss. We find that a trade-off on λ are required for using adapted triplet loss to learn a discriminative and conditional invariant embedding. Furthermore, we demonstrate similar results on Fashion-MNIST in Fig. 5.

0 10k 20k 30k 40k 50k Iteration

Accuracy Original Adapted

(a)

0.0 0.5 1.0 2.0 5. 6

Accuracy

(b)

Fig. 5: Results on Fashion-MNIST dataset. In figure (a), we use λ = 2.0. For the original triplet loss, the test accuracy is reduced after 40k iterations, while the adapted triplet loss does not suffer from the problem of overfitting.

To demonstrate the feature embeddings learned by both the original triplet loss and the adapted triplet loss, we use t-SNE [28], which has been widely used for the visualization of high dimensional data, to convert embeddings into 2D space. In Fig. 6, we show the embeddings learned by the adapted triplet loss. Comparing with the embeddings learned by the original triplet loss, we find that the embedding learned by the adapted triplet loss forms uniform margins between different classes, while the embedding learned by the original triplet loss fails to keep a clear margins between some classes.

4.3 Experiment on Image Retrieval

In this subsection, we evaluate the adapted triplet loss on image retrieval. For CARS196, CUB200-2011, and Stanford Online Products datasets, we use similar

Correcting the Triplet Selection Bias for Triplet Loss 13

Loss Function R@1 R@2 R@3 R@4 R@5 R@10 R@ Original (λ = 0) 0.7781 0.8582 0.8903 0.9105 0.9217 0.9523 0. Adapted (λ = 0.001) 0.7858 0.8587 0.8921 0.9094 0.9228 0.9525 0. Adapted (λ = 0.005) 0.7912 0.8666 0.8966 0.9133 0.9250 0.9535 0. Adapted (λ = 0.010) 0.7917 0.8627 0.8939 0.9135 0.9237 0.9570 0. Adapted (λ = 0.050) 0.7917 0.8627 0.8939 0.9135 0.9237 0.9570 0. Adapted (λ = 0.100) 0.7631 0.8463 0.8774 0.8996 0.9130 0.9449 0. (a) CARS

Loss function R@1 R@2 R@3 R@4 R@5 R@10 R@ Original (λ = 0) 0.4450 0.5724 0.6435 0.6913 0.7275 0.8207 0. Adapted (λ = 0.005) 0.4439 0.5763 0.6464 0.6884 0.7250 0.8253 0. Adapted (λ = 0.010) 0.4660 0.5861 0.6555 0.6997 0.7343 0.8288 0. Adapted (λ = 0.100) 0.4512 0.5768 0.6475 0.6904 0.7230 0.8160 0. Adapted (λ = 0.500) 0.4483 0.5682 0.6381 0.6879 0.7245 0.8114 0. (b) CUB200-

Loss function R@1 R@2 R@3 R@4 R@5 R@10 R@ Original (λ = 0) 0.6274 0.6865 0.7170 0.7384 0.7524 0.7955 0. Adapted (λ = 0.010) 0.6303 0.6882 0.7206 0.7416 0.7550 0.7982 0. Adapted (λ = 0.050) 0.6303 0.6876 0.7191 0.7386 0.7530 0.7964 0. Adapted (λ = 0.100) 0.6297 0.6874 0.7183 0.7378 0.7526 0.7957 0. (c) Stanford Online Products

Table 1: Recall rate on CARS196, CUB200-2011, and Stanford Online Products datasets. For the adapted triplet loss, we train multiple models on all datasets using different λ, i.e., we use λ = 0. 001 , 0. 005 , 0. 01 , 0. 05 , 0 .1 on CARS196, λ = 0. 005 , 0. 01 , 0. 1 , 0 .5 on CUB200-2011, and λ = 0. 01 , 0. 05 , 0 .1 on Stanford Online Products. For the original triplet loss, we use the adapted triplet loss with λ = 0.

query images and 10 retrieval results for each query image using the adapted triplet loss and the original triplet loss respectively.

5 Conclusion

In this paper, we address the problem of triplet selection bias for triplet loss by using a domain adaption method. We propose an adapted triplet loss, which adaptively corrects the selection bias for the original triplet loss. Considering that the selection bias is common in deep metric learning, the proposed method can be extended to a variety of loss functions, e.g., pair-based [36], triplet-based[29], and quadruplet-based [5] loss functions, which will be the subject of future study.

14 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

Query 1 2 3 4 5 6 7 8 9 10

(a) CARS

(b) CUB200-

Fig. 7: Retrieval results on CARS196 and CUB200-2011. The first column is the query image. For each query image, the first row contains 10 nearest neighbors for the original triplet loss; The second row contains 10 nearest neighbors for the adapted triplet loss. We highlight false positive examples with a white/black cross (best view in color).

6 Acknowledgement

Baosheng Yu, Tongliang Liu, and Dacheng Tao were partially supported by Aus- tralian Research Council Projects FL-170100117, DP-180103424, LP-150100671. Changxing Ding was partially supported by the National Natural Science Foun- dation of China (Grant No.: 61702193) and Science and Technology Program of Guangzhou (Grant No.: 201804010272).

16 B. Yu, T. Liu, M. Gong, C. Ding, and D. Tao

  1. Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR. pp. 3270–3278 (2015)
  2. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), 2278–2324 (1998)
  3. Li, Y., Gong, M., Tian, X., Liu, T., Tao, D.: Domain generalization via conditional invariant representations. In: AAAI (2018)
  4. Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE T-IP (2017)
  5. Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning: Tell the difference between similar vehicles. In: CVPR (2016)
  6. Liu, T., Yang, Q., Tao, D.: Understanding how feature structure transfers in trans- fer learning. In: IJCAI. pp. 2365–2371 (2017)
  7. Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. JMLR 9 (Nov), 2579– (2008)
  8. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR. pp. 4004–4012 (2016)
  9. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE T-NN 22 (2), 199–210 (2011)
  10. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC. vol. 1, p. 6 (2015)
  11. Prabhu, Y., Varma, M.: Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: SIGKDD. pp. 263–272. ACM (2014)
  12. Ramanathan, V., Li, C., Deng, J., Han, W., Li, Z., Gu, K., Song, Y., Bengio, S., Rosenberg, C., Fei-Fei, L.: Learning semantic relationships for better action retrieval in images. In: CVPR (2015)
  13. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp. 815–823 (2015)
  14. Shi, H., Yang, Y., Zhu, X., Liao, S., Lei, Z., Zheng, W., Li, S.Z.: Embedding deep metric for person re-identification: A study against large variations. In: ECCV. pp. 732–748. Springer (2016)
  15. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NIPS. pp. 1857–1865 (2016)
  16. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., et al.: Going deeper with convolutions. CVPR (2015)
  17. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
  18. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR. pp. 1386–1393 (2014)
  19. Wang, L., Li, Y., Lazebnik, S.: Learning deep structure-preserving image-text em- beddings. In: CVPR (2016)
  20. Wang, X., Huang, T.K., Schneider, J.: Active transfer learning under model shift. In: ICML. pp. 1305–1313 (2014)
  21. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. JMLR 10 (Feb), 207–244 (2009)
  22. Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: CVPR. pp. 3109–3118 (2015)
  23. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: CVPR. pp. 2840–2848 (2017)

Correcting the Triplet Selection Bias for Triplet Loss 17

  1. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench- marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
  2. Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR. pp. 1249– (2016)
  3. Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: ICCV. pp. 814–823. IEEE (2017)
  4. Zhang, K., Sch¨olkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: ICML. pp. 819–827 (2013)
  5. Zhuang, B., Lin, G., Shen, C., Reid, I.: Fast training of triplet-based deep binary embedding networks. In: CVPR. pp. 5955–5964 (2016)