1
00:00:06,299 --> 00:00:07,079
All right.

2
00:00:08,730 --> 00:00:12,780
Uh, before I started my talk, let me think
on organizer first of all this invitation.

3
00:00:13,860 --> 00:00:19,110
The title of my talk is detecting new
physics as nobody as the start of this

4
00:00:19,110 --> 00:00:19,620
talk.

5
00:00:28,470 --> 00:00:33,180
At the start of this talk, let me bring
two papers to your attention which was

6
00:00:33,180 --> 00:00:38,970
submitted to archive this month. The first
one is adversarially learned anomaly

7
00:00:38,970 --> 00:00:45,000
detection and CMS opening data
rediscovering the top Quark, and another

8
00:00:45,000 --> 00:00:52,410
one is from outlasts here is extract. This
letter describes a search for retina new

9
00:00:52,410 --> 00:00:57,120
Felix, using a machine learning anomaly
detection procedure that doesn't rely on

10
00:00:57,120 --> 00:01:03,630
our signal model hypothesis. And these two
more of these two papers, an important

11
00:01:03,630 --> 00:01:10,650
step about application of the machine
learning techniques and nobody detection

12
00:01:11,790 --> 00:01:18,690
from proof of concept to real data
analysis, what is and nobody detection

13
00:01:19,650 --> 00:01:25,860
here is the definition nobody detection is
a task of classifying test data that

14
00:01:25,860 --> 00:01:32,550
differ in some respect from the data that
are available during training. Okay. Why

15
00:01:32,550 --> 00:01:39,570
important step in nobody detection is to
evaluate the noted response of the testing

16
00:01:39,570 --> 00:01:43,920
data and then based on the note the
response of the testing data, we're able

17
00:01:43,920 --> 00:01:51,540
to analyze detection sensitivity Okay,
this sounds like a PDE analysis. First we

18
00:01:51,540 --> 00:01:57,570
analyze the PDS response or testing data
then we are able to get summative analysis

19
00:01:57,570 --> 00:02:07,350
sensitivity if we If we take a survey and
the literature's are nobody detection then

20
00:02:07,350 --> 00:02:11,790
we will say the history of the novelty
detection is basically a history of

21
00:02:11,790 --> 00:02:20,700
developing nobody evaluator or a methods
for the testing sample, okay. Here it is.

22
00:02:24,150 --> 00:02:30,000
It is Westwell corn out in literature's
sometimes we say that terminology semi

23
00:02:30,000 --> 00:02:35,970
supervised lending or fully unsupervised
here, the point is this terminology

24
00:02:36,000 --> 00:02:40,740
actually is something separate from the
discussion here. They refer to how to

25
00:02:40,740 --> 00:02:45,330
simulate or evaluate the backgrounds okay.
So, look, you know, what evaluator or

26
00:02:45,330 --> 00:02:55,980
methods suggest suggested in principle,
patient and the roughly speaking, the

27
00:02:56,070 --> 00:03:03,630
evaluators can be classified into two
classes. based and classroom based, the

28
00:03:03,630 --> 00:03:10,320
first one is isolation based evaluators,
that is the novelty of a given testing

29
00:03:10,320 --> 00:03:15,390
data point is evaluated according to its
distance to or isolation from the

30
00:03:15,390 --> 00:03:21,690
distribution of the no parent data in the
feature space. So, here the point is, the

31
00:03:21,720 --> 00:03:26,700
point is the evaluation is purely based on
the relation between the given testing

32
00:03:26,700 --> 00:03:32,850
data point and the distribution of the no
pattern data in the feature space and all

33
00:03:32,850 --> 00:03:37,110
the other testing points in the same
sample are irrelevant for this evaluation.

34
00:03:37,350 --> 00:03:45,270
Okay. And here is one example it is the
forms the method areas the neighbors and

35
00:03:45,270 --> 00:03:49,380
the left one is the mayor and right one is
normalized the using the cumulative

36
00:03:49,380 --> 00:03:54,630
distribution function. And in this
definition, the D term in the numerator

37
00:03:54,900 --> 00:03:59,490
represents the mean distance of a testing
data point that tweets a king Aries the

38
00:03:59,490 --> 00:04:04,770
neighbors And the deep fried trim
represents the average of the mean

39
00:04:04,770 --> 00:04:13,800
distance is defined for its k nearest
neighbors k nearest neighbors okay and you

40
00:04:13,800 --> 00:04:18,360
know neighbor represent a standard a
standard deviation of the letter. So, here

41
00:04:18,360 --> 00:04:22,560
the point is all of these three quantities
are defined with respect to the training

42
00:04:22,560 --> 00:04:27,780
sample, okay. So, you can imagine in the
feature space one test important if you

43
00:04:27,780 --> 00:04:32,430
fall away from the distribution of the
training sample, then the value of the

44
00:04:32,430 --> 00:04:37,680
train is big and these points will be
scored high okay. So, this is idea and

45
00:04:37,710 --> 00:04:43,080
another another proposal is the
reconstruction error or loss function. And

46
00:04:43,080 --> 00:04:48,210
this was proposed by two collaborations
two years ago. And reconstruction error in

47
00:04:48,240 --> 00:04:54,630
essence, is isolation based. So, this can
be understood that since, for each test

48
00:04:54,630 --> 00:04:58,590
important when we calculate it, we can
throw an error somehow basically just

49
00:04:58,590 --> 00:05:05,520
based on the relation between This has
been point and no pattern data okay and it

50
00:05:05,520 --> 00:05:09,840
has nothing to do with the other data
points in the same has been sample okay

51
00:05:12,690 --> 00:05:20,640
another class of evaluators are classroom
based that is the nobody of a given has

52
00:05:20,640 --> 00:05:26,850
important event evaluated according to the
data clustering around this testing point

53
00:05:27,090 --> 00:05:32,340
on top of the no pattern data distribution
in the feature space. So, different from

54
00:05:32,340 --> 00:05:38,370
the first one in this process. The
evaluation takes into account evaluate

55
00:05:38,370 --> 00:05:42,960
takes into account the correlation between
the testing data points and the other

56
00:05:42,960 --> 00:05:47,340
points around in the same data sample. So,
this is the generic difference between

57
00:05:47,340 --> 00:05:55,530
these two classes of evaluators. One
example again is from the method of the

58
00:05:55,530 --> 00:06:00,780
king areas, the neighbors, okay, the left
one is isolation, isolation base The right

59
00:06:00,780 --> 00:06:06,840
one is classroom based here detached in
the denominator of the classroom based on

60
00:06:07,140 --> 00:06:12,690
his minty stance of a testing data of this
passing data point to the K nearest

61
00:06:12,720 --> 00:06:19,980
neighbors in the state in the training
data sample, okay. And the test is a mean

62
00:06:19,980 --> 00:06:24,900
distance of the testing data point with
King RS neighbors in the testing data as a

63
00:06:24,900 --> 00:06:30,660
testing data sample. As for M is a hyper
parameter, it can be chosen to be the

64
00:06:30,660 --> 00:06:36,630
dimension of the feature space and afford
that case t test two minus m represents

65
00:06:36,630 --> 00:06:43,950
the local density of this testing data
point in the testing sample and a four D

66
00:06:43,950 --> 00:06:50,190
twin to minus n, okay. So in this case,
the no water response is evaluated by

67
00:06:50,190 --> 00:06:56,370
comparing the local densities of the
testing point in the training sample and

68
00:06:56,370 --> 00:07:01,200
the testing sample. Actually, there is the
approximate the statistic interpretation

69
00:07:01,410 --> 00:07:07,560
for such a evaluator is proportional to
the s over r squared to be in the local

70
00:07:07,560 --> 00:07:19,200
being of this testing data points. Another
proposal was made recently by a Panama and

71
00:07:19,200 --> 00:07:25,470
the DaVinci. And this proposal was made in
the similar spirit, okay, the difference

72
00:07:25,470 --> 00:07:29,790
between these two is just from the
definition of the density was the first

73
00:07:29,790 --> 00:07:36,750
one the density is defined to be the local
density space and in the second case, the

74
00:07:36,750 --> 00:07:41,700
density is replaced with probability
density. This is the only difference

75
00:07:41,700 --> 00:07:51,480
between that but this is not the whole
story. We want you to synergize these two

76
00:07:51,480 --> 00:07:56,340
different classes of evaluators, what do
we like to do that? Okay, we can consider

77
00:07:56,550 --> 00:08:04,470
example, a two dimensional Gaussian sample
to what To show the reason the left column

78
00:08:04,620 --> 00:08:08,400
shows the left column shows the
performance of the isolation basic

79
00:08:08,400 --> 00:08:12,630
evaluator. And we can say, we can say the

80
00:08:13,980 --> 00:08:19,530
which, okay, for the samples, the red
points represent the new pattern data. The

81
00:08:19,530 --> 00:08:25,950
blue points represents the unknown pattern
data, let's say, which read a strong

82
00:08:25,950 --> 00:08:31,560
response to the isolation based evaluator.
So, here the point is, in addition to the

83
00:08:31,560 --> 00:08:38,280
red points in the original signal being at
the two dimensional plane, the red points

84
00:08:38,310 --> 00:08:43,380
between the two yellow circles also has
also have a strong response to the

85
00:08:43,380 --> 00:08:48,960
isolation based evaluator, okay,
similarly, because for all red points

86
00:08:48,990 --> 00:08:54,510
between these two yellow circles, how
their local densities are comparable, in

87
00:08:54,510 --> 00:09:00,000
the middle column, it shows the
performance of the classroom based nobody

88
00:09:00,000 --> 00:09:08,430
evaluator So, here look for red points. In
addition, in addition to the the ones from

89
00:09:08,430 --> 00:09:14,250
the original signal from the original
signal being the red points from the non

90
00:09:14,250 --> 00:09:19,680
signal raging with a prior fluctuation can
also have a strong response to the

91
00:09:19,740 --> 00:09:24,930
classroom based evaluator, this also can
be understood because statistically

92
00:09:25,020 --> 00:09:31,440
somehow it is not easy to distinguish the
fluctuation upward and the resonance okay.

93
00:09:31,740 --> 00:09:38,880
So this idea so here we can say in both
cases somehow a being at the bottom

94
00:09:38,910 --> 00:09:46,350
received contributions from non signal
regions, okay. So here we want one is it

95
00:09:46,350 --> 00:09:51,840
So here, we can say if somehow these two
classes are invalid, evaluators can be

96
00:09:51,840 --> 00:09:57,900
properly synergized then we would expect
only in the intersection in the second set

97
00:09:57,990 --> 00:10:03,510
of these red points will continue. built
two tools have been at the bottom, okay,

98
00:10:03,540 --> 00:10:08,490
for that case some others and acuity is
expected to be improved. Okay, this is the

99
00:10:08,490 --> 00:10:16,200
idea. And the first proposal is geometric
mean of the isolation basic evaluator and

100
00:10:17,130 --> 00:10:21,840
classroom basic evaluator, what are these
hell? Let's consider one example in the

101
00:10:21,840 --> 00:10:27,060
right panel, the black curve represents
the normal response of the new pattern

102
00:10:27,060 --> 00:10:35,790
data that one okay on events. So, for some
of the events have to the Austin based

103
00:10:36,000 --> 00:10:40,950
evaluator and these black points, some of
them are from the generic second region

104
00:10:41,040 --> 00:10:46,650
and some of them are from the non second
region with upward fluctuation. And

105
00:10:46,680 --> 00:10:51,720
However, for these two classes of
contributions, they are scored differently

106
00:10:51,780 --> 00:10:56,340
that is better to be scored differently by
the isolation based evaluators. Why

107
00:10:56,370 --> 00:11:02,190
because why is a sacred Undertale and
another one is the Nobody. Okay, so here

108
00:11:02,220 --> 00:11:09,150
we will expand the low scoring of the AAS
database, the evaluator will compensate

109
00:11:09,180 --> 00:11:17,520
for the high scoring of the cluster basic
evaluator for these fake a signal points.

110
00:11:17,580 --> 00:11:24,030
Okay, so basically the rough idea. And
here is a two dimensional toy toy example.

111
00:11:24,030 --> 00:11:28,740
And the bottom right panel shows some
activities, significant performance of

112
00:11:28,740 --> 00:11:32,910
these three different types of evaluators.
In this panel, we can see the black curve,

113
00:11:32,940 --> 00:11:38,970
it will represent the performance of
supervised learning, okay, it is used as a

114
00:11:39,000 --> 00:11:45,060
reference and the magenta curve represent
performance of isolation based evaluator,

115
00:11:45,180 --> 00:11:50,670
and the blue curve represents the
significant, awesome based evaluate

116
00:11:50,880 --> 00:11:56,100
evaluator. As for the red curve, it
represents the signature a signature based

117
00:11:56,310 --> 00:12:01,170
evaluator and we can say indeed, somehow
in this simple contract Next, the

118
00:12:01,230 --> 00:12:07,650
performance evaluator can be improved
significantly by this as soon as you base

119
00:12:07,650 --> 00:12:13,500
the evaluator okay. But, but they such a
design is very intuitive Okay, this means

120
00:12:13,500 --> 00:12:18,330
that it is not a fully optimized. So, here
the design of this classroom based

121
00:12:19,320 --> 00:12:26,730
evaluator and and this enhance the synergy
base evaluator somehow breaks the

122
00:12:26,730 --> 00:12:32,220
independence of the testing points, why
because when we evaluate their response,

123
00:12:32,250 --> 00:12:37,020
somehow we take into a correlation between
the given testing point and its neighbors

124
00:12:37,080 --> 00:12:44,070
okay. So, this will make So, this will
laters will lose nice properties in

125
00:12:44,070 --> 00:12:49,530
statistics of the original sample and then
if we want to calculate the significance

126
00:12:49,560 --> 00:12:58,080
based on based on the data response to
this evaluators name, it means we have to

127
00:12:58,080 --> 00:13:05,250
do something extra. Okay. And motivated by
this, somehow we wanted to. So here we

128
00:13:05,340 --> 00:13:06,900
were, we purposed some

129
00:13:08,490 --> 00:13:14,640
aters. The idea is based on the nobody
responds all the testing data sample, to

130
00:13:14,640 --> 00:13:20,250
the isolation basic evaluator and to the
classroom based evaluator, we find some

131
00:13:20,250 --> 00:13:25,710
new set of signals and a new set of
backgrounds. They are not real signal

132
00:13:25,710 --> 00:13:30,090
events and the real spec one events
instead, for the signal set and the

133
00:13:30,090 --> 00:13:35,760
background to set the ratio between the
real signal the real background events,

134
00:13:35,790 --> 00:13:42,240
the ratios that are different. Okay, and
then we introduce a deep neural network of

135
00:13:42,240 --> 00:13:49,980
supervised learning and the two twin days
to a twin, then the new signal set and the

136
00:13:50,040 --> 00:13:55,260
new background set, and then we'll use
auto neuron as this is a new signature

137
00:13:55,260 --> 00:14:02,130
based evaluator and to compare the
performance Band Aid hurts the isolation

138
00:14:02,220 --> 00:14:07,740
isolation base to one class based one
signature based based one. Here we can see

139
00:14:07,740 --> 00:14:13,530
there are nine two dimensional Gaussian
benchmarks, okay. So here the again red

140
00:14:13,530 --> 00:14:17,280
points represent Background The blue
points represents signals as well as the

141
00:14:17,280 --> 00:14:24,420
rows, they represent the broadness of the
distributions of the signal signal events

142
00:14:24,510 --> 00:14:32,190
as well as the columns and the different
center positions for the signal events.

143
00:14:33,840 --> 00:14:40,680
Okay, here as our C curves and AUC values
here, I will not go here I will not go

144
00:14:40,680 --> 00:14:46,620
into details, but if you compare these
curves, then we will see the performance

145
00:14:46,620 --> 00:14:52,380
complementarity between the ICER isolation
based evaluator and a classroom based

146
00:14:52,380 --> 00:14:59,280
evaluator, and this complementarity
somehow would result in improvements of

147
00:14:59,280 --> 00:15:05,940
the AUC values for the signature based
evaluators, especially for the, for this

148
00:15:05,970 --> 00:15:12,270
alternative signature base, the evaluator
somehow it has the best AUC universally in

149
00:15:12,300 --> 00:15:19,530
all of these nine benchmarks. So, for
example, for the ABC values supervised

150
00:15:19,530 --> 00:15:24,120
learning versus this evaluator, the first
one is point nine, nine versus point nine,

151
00:15:24,120 --> 00:15:28,800
nine, second one is point nine versus
point nine, nine. The third one is 1.0

152
00:15:28,830 --> 00:15:35,760
versus 1.0. Second, second row, point nine
one versus point nine 1.93 versus point a

153
00:15:35,760 --> 00:15:43,050
9.99. Versus point nine, eight, last low
point seven two versus point seven 1.84

154
00:15:43,050 --> 00:15:48,330
versus point a one and point nine, six
versus point nine, three. So you can say

155
00:15:48,540 --> 00:15:53,970
the gap between the supervisor lending and
this new evaluator somehow is very, is

156
00:15:53,970 --> 00:15:59,430
very unnatural. Okay, so here is a list of
efforts in the last eight years since I

157
00:15:59,430 --> 00:16:04,080
work for Do you want all the time? Somehow
I will stop here. Here is the

158
00:16:04,950 --> 00:16:05,970
summary. Okay.

159
00:16:09,149 --> 00:16:10,619
Okay, thank you very much, though.

160
00:16:12,990 --> 00:16:18,420
Other any questions or comments for Tao?

161
00:16:20,909 --> 00:16:26,339
Yeah, I have a question on the first, I
think on the first few slides on the paper

162
00:16:26,339 --> 00:16:28,979
that you showed from

163
00:16:30,870 --> 00:16:31,890
what is it home

164
00:16:33,210 --> 00:16:41,130
adversary learned anomaly detection the
rediscovering of the top core. So do you

165
00:16:41,160 --> 00:16:47,940
know how they did they have a pre
selection applied and then and then run

166
00:16:47,940 --> 00:16:57,540
the anomaly detector or did they just run
on reconstructed data or raw data or

167
00:17:00,000 --> 00:17:01,560
Wow, this is

168
00:17:03,929 --> 00:17:05,819
this is a challenging question to me.

169
00:17:07,950 --> 00:17:09,060
Yeah, and

170
00:17:10,620 --> 00:17:16,080
I think it is not not easy to read out
this substitute information from, from

171
00:17:16,080 --> 00:17:17,640
these two papers. So

172
00:17:20,129 --> 00:17:24,239
yes, I'm sorry, I don't know the answer.
Okay.

173
00:17:29,550 --> 00:17:36,870
And then I get when with a lot of these
anomaly detectors, um, how do you mean you

174
00:17:36,870 --> 00:17:41,370
train on a certain data set, but our data
sets are not static because you are, you

175
00:17:41,370 --> 00:17:45,960
know, the running conditions are
constantly changing Well, not constantly,

176
00:17:45,960 --> 00:17:53,100
but they change within the run quite
often. And so, hasn't been validated that

177
00:17:53,130 --> 00:18:00,540
these the each of these algorithms is able
to detect that running for missions are

178
00:18:00,540 --> 00:18:06,360
changing and change the expectation of the
distribution.

179
00:18:09,690 --> 00:18:10,260
Okay.

180
00:18:12,090 --> 00:18:19,260
This is good question. The answer is yes,
indeed for different scenarios, different

181
00:18:20,310 --> 00:18:26,130
evaluators may have different performance
and one example is the first type of

182
00:18:26,130 --> 00:18:30,900
signature based evaluator that I just
mentioned. And for that case, let me go to

183
00:18:30,900 --> 00:18:40,200
the slide. Okay. So this evaluator work
works pretty well, if the signal beam with

184
00:18:40,200 --> 00:18:47,340
generic a single beam is essentially
Undertale. And is somehow this this

185
00:18:47,370 --> 00:18:54,570
signature based evaluator can works
nicely. But if these two are switched, for

186
00:18:54,570 --> 00:18:59,850
that case, it was a generic signal B is a
seat on a Bock And the fluctuation in

187
00:18:59,850 --> 00:19:03,510
front On the tail and then for that case
and somehow the performance of this

188
00:19:03,960 --> 00:19:09,120
evaluator is not that good actually you
can also say this point from the outside

189
00:19:09,120 --> 00:19:09,540
curves

190
00:19:13,860 --> 00:19:14,520
in

191
00:19:15,990 --> 00:19:25,860
Okay, in slide 19 and 20, but here, it is
true that we are trying to find some, we

192
00:19:25,860 --> 00:19:32,280
are trying to build up the evaluator which
can work universally well. And this is

193
00:19:32,280 --> 00:19:36,630
also one of the motivations for us to
propose the second type of percentage

194
00:19:36,630 --> 00:19:37,860
based evaluators

195
00:19:39,210 --> 00:19:40,140
compare

196
00:19:41,760 --> 00:19:47,370
the LC curves in this slide, you will say,
indeed, these new signature based

197
00:19:47,460 --> 00:19:52,590
evaluator works well, what's the
universally better than the other ones?

198
00:19:57,210 --> 00:19:58,350
Okay, thanks.

199
00:19:59,100 --> 00:19:59,610
Oh, sorry.

200
00:20:00,539 --> 00:20:06,359
Yeah, yeah, I'm afraid we are running a
bit late. So if this is a very quick

201
00:20:06,359 --> 00:20:06,929
question,

202
00:20:07,289 --> 00:20:09,269
no further discussion.

203
00:20:10,019 --> 00:20:17,099
Okay. Okay, so thanks again. Tau for for
for the nice talk, and we move to the

204
00:20:17,099 --> 00:20:19,559
following one, which is by Oscar.

205
00:20:21,630 --> 00:20:24,360
And Hello. Hey, I cannot share why he says
that.

206
00:20:25,440 --> 00:20:25,980
No, it's