1
00:00:01,860 --> 00:00:06,930
I assume you can see this now. Yes. And

2
00:00:08,220 --> 00:00:11,820
since you're so kindly Okay.

3
00:00:15,179 --> 00:00:15,689
I will start.

4
00:00:17,279 --> 00:00:22,679
Thank you very much for the possibility to
speak. And I will now follow Isaac's talk,

5
00:00:23,309 --> 00:00:27,389
talking about discussing generative models
and events in relation following his talk

6
00:00:27,389 --> 00:00:31,379
on the detector simulation. So, what I'm
going to present has been done together

7
00:00:31,379 --> 00:00:35,999
with Miko, like and you can see it's
gotten clean and time on the data.

8
00:00:39,569 --> 00:00:43,319
Position simulations based on fish
printable MonteCarlo generators are

9
00:00:43,319 --> 00:00:47,759
currently at the heart of any analysis.
They're linking the fundamental theory

10
00:00:47,759 --> 00:00:54,719
side, which is basically given by a
minimal set of parameters. We see a very

11
00:00:54,719 --> 00:00:59,519
complex detector site where we are facing
very large amounts of data and in order to

12
00:00:59,519 --> 00:01:04,979
understand data, you need to simulations
to give very precise, unrelated events

13
00:01:04,979 --> 00:01:05,519
samples

14
00:01:05,640 --> 00:01:08,670
in order to get a correct interpretation.

15
00:01:10,440 --> 00:01:14,040
Now, these simulations are facing a lot of
challenges. One of them that I should

16
00:01:14,040 --> 00:01:18,570
already mentioned is the slow detector
simulations that we have this chance for

17
00:01:19,200 --> 00:01:23,700
us I find instantly complex high
dimensional phase space, which has to be

18
00:01:23,700 --> 00:01:28,260
mapped out very precisely. If this mapping
is not precisely, then we're facing a very

19
00:01:28,440 --> 00:01:32,700
low unweighting efficiency, making the
entire simulation process very

20
00:01:32,700 --> 00:01:36,300
inefficient. This is where neural networks
cannot

21
00:01:36,300 --> 00:01:37,020
find solution.

22
00:01:38,310 --> 00:01:43,050
And in particular, their ability to
generate very flexible parameterizations.

23
00:01:43,080 --> 00:01:48,210
And the interpolation property allow us to
really accurately map this complex face

24
00:01:48,210 --> 00:01:53,490
space. afterwards. They're also very fast
navigation, as you've already pointed out,

25
00:01:54,000 --> 00:01:59,520
and is already also a lot of generative
models on the market which we can

26
00:01:59,520 --> 00:02:04,890
basically simply pick up and use domain
three generative adversarial networks

27
00:02:04,890 --> 00:02:12,240
variational auto encoders and normalizing
flows. This is not that new anymore. So

28
00:02:12,240 --> 00:02:19,170
here you see, by all means non complete
overview of papers that have already used

29
00:02:19,650 --> 00:02:24,540
these techniques for event generation.
They're very roughly categorized in four

30
00:02:24,720 --> 00:02:30,210
categories. First of all, we can use
machine learning in order to integrate it

31
00:02:30,210 --> 00:02:34,770
into the MonteCarlo process itself for
instance, by estimating matrix element or

32
00:02:35,040 --> 00:02:36,750
for the neural important sampling

33
00:02:38,129 --> 00:02:39,209
in liquid with Magnus gonna

34
00:02:39,300 --> 00:02:44,370
talk about this later, then we can use it
for event generation where we start with a

35
00:02:44,370 --> 00:02:49,860
formal mentor often event, which is what I
will be focusing on. Then I should already

36
00:02:49,860 --> 00:02:53,550
explained to the technical part, and
finally somewhat orthogonal to these three

37
00:02:53,550 --> 00:02:58,530
categories. We can also try to inverse the
entire process without your network

38
00:02:59,190 --> 00:03:03,630
unfoldings it detector distribution.
Looking at this overview, you see that

39
00:03:03,630 --> 00:03:08,190
very many of these papers, focus on the
generative adversarial network are

40
00:03:08,190 --> 00:03:08,760
suitable

41
00:03:09,479 --> 00:03:14,009
and in the following I will do the same.
And let me therefore just give a very

42
00:03:14,009 --> 00:03:14,489
short

43
00:03:14,759 --> 00:03:17,729
recap for those who are not familiar with
it.

44
00:03:19,199 --> 00:03:24,899
So, the generative adversarial network
basically starts with training data up

45
00:03:24,899 --> 00:03:29,099
here. So, this is a treatment. On the
other hand, we have some random numbers

46
00:03:29,099 --> 00:03:34,589
which will be inputted into a generator to
generate an event sample and we have a

47
00:03:34,589 --> 00:03:39,869
second neural network the discriminator
distinguishing these two and a stranger it

48
00:03:39,899 --> 00:03:45,329
we are training loss given a camp. This is
now counteracted by the generator who is

49
00:03:45,329 --> 00:03:51,239
trying to fool the discriminator by making
noise more similar events so that in the

50
00:03:51,239 --> 00:03:55,769
end when the training has converged
successfully, then January is going to

51
00:03:55,769 --> 00:04:02,939
generate events that follows the
underlying distribution of the data Why do

52
00:04:02,939 --> 00:04:07,889
we use scans so often and not variational
encoders? Well, this is can actually not

53
00:04:07,889 --> 00:04:12,359
be stated like this in absolute value.
However, it turns out on on average, very

54
00:04:12,359 --> 00:04:18,299
often the GaNS have a better capability of
generating truthful events and very

55
00:04:18,299 --> 00:04:23,939
realistic looking events. It's often a
photographer ration out encoders under the

56
00:04:23,939 --> 00:04:28,769
large community tree supporting both
approaches. But then on the other hand, is

57
00:04:28,769 --> 00:04:35,099
also more or less big disadvantage of the
cans. And many people have already worked

58
00:04:35,099 --> 00:04:40,769
with this have faced unstable training
conditions. However, I assume you all the

59
00:04:41,039 --> 00:04:44,969
people who work with this already so
there's a lot of solutions on the market,

60
00:04:44,969 --> 00:04:48,929
which you can use in order to face these
problems. First of all, you can use

61
00:04:48,929 --> 00:04:52,289
urbanization for the discriminator, for
instance, for implementing gradient

62
00:04:52,289 --> 00:04:56,549
penalty, and this is also applied with all
the projects that I'm going to present

63
00:04:56,549 --> 00:05:00,659
later. You can modify the training
objective by using different Kind of loss

64
00:05:00,659 --> 00:05:05,789
functions but again most prominently I
guess the question again and then you can

65
00:05:05,789 --> 00:05:11,399
also process the data. For instance by and
making use of symmetries by the whitening

66
00:05:11,429 --> 00:05:16,379
or by using feature meditation out of
District so you can very nicely stabilizer

67
00:05:16,409 --> 00:05:21,209
again training Once you've done this we
can now look at a realistic and final

68
00:05:21,209 --> 00:05:27,119
state that we want to generate and we
start here with TT Bach process. With this

69
00:05:27,119 --> 00:05:33,209
we have six particles in the final, final
state. And since when fixing the external

70
00:05:33,209 --> 00:05:37,649
masses and don't enforce momentum
conservation by hand doesn't turn 18

71
00:05:37,649 --> 00:05:43,769
dimensional output especially challenges
here are the intermediate particles in

72
00:05:43,769 --> 00:05:50,549
particular the invariant mass and within
face base boundaries of can arise. From

73
00:05:50,549 --> 00:05:54,479
the very first thing one wants to check so
they're not flat observables for instance,

74
00:05:54,479 --> 00:05:58,499
from intermediate practical legs, the top
work, which you can see here as the PT of

75
00:05:58,499 --> 00:06:02,759
the top structures that again is very
nicely able to

76
00:06:04,230 --> 00:06:06,750
follow the data that we feed it with.

77
00:06:08,130 --> 00:06:12,930
However, when we look now at the ratio of
one and two data, we also find that there

78
00:06:12,930 --> 00:06:16,920
is a systematic undershoot in the tails.
So in particular, there were we have no

79
00:06:16,920 --> 00:06:22,530
statistics, we have to be very careful of
what is being learned. This is

80
00:06:22,590 --> 00:06:26,430
particularly important for the future and
when we are when a big target for the

81
00:06:26,430 --> 00:06:30,630
future will be to estimate these kind of
uncertainties and to control them when we

82
00:06:30,630 --> 00:06:37,170
want to really use neural networks for
event generation. And the next one is

83
00:06:37,170 --> 00:06:42,210
invariant masses in between. That is very
difficult again, since it has to combine

84
00:06:42,540 --> 00:06:45,780
information from all of the different
final states in order to get the invariant

85
00:06:45,780 --> 00:06:51,090
mass for instance of the W on top. And you
see a normal vanilla game is simply going

86
00:06:51,090 --> 00:06:55,740
to generate this green curve down here
which is really not coincide with the true

87
00:06:55,740 --> 00:06:59,970
distribution, either but we can use here
as the so called maximum mean discrepancy

88
00:07:00,000 --> 00:07:07,050
Mt loss where we use some arbitrary kernel
in between. And with this help, we can

89
00:07:07,050 --> 00:07:12,630
actually train the neural network, and it
starts to actually produce very nicely the

90
00:07:12,630 --> 00:07:18,420
data distribution. We see when we use for
instance, economical doesn't disappoint.

91
00:07:20,310 --> 00:07:24,780
Now that we've solved this, we can move on
to actually two dimensional correlations.

92
00:07:24,870 --> 00:07:27,480
And here we see for instance, a
correlation between the top and the

93
00:07:27,480 --> 00:07:27,780
bottom.

94
00:07:29,550 --> 00:07:33,060
And if we want to quantify Okay, how well
does this

95
00:07:35,880 --> 00:07:40,230
data generate a distribution curve
correspond to the true distribution, we

96
00:07:40,230 --> 00:07:45,330
can find some take slice yet PT of the top
to 100 TV, that is shown on the right hand

97
00:07:45,330 --> 00:07:49,470
side and you see again, the generated
events nicely follows the true data

98
00:07:49,470 --> 00:07:53,010
distribution all the way up to the Face
Face boundary weight goes down

99
00:07:53,069 --> 00:07:54,089
into the tails.

100
00:07:56,430 --> 00:08:01,920
So with this we've shown now that Ghana is
actually able to We'll learn the

101
00:08:01,920 --> 00:08:06,480
underlying distribution of data that it's
trained on. And we thought, Okay, well,

102
00:08:06,510 --> 00:08:07,830
maybe they can also

103
00:08:07,860 --> 00:08:08,550
can do

104
00:08:08,700 --> 00:08:13,170
some more complex task. For this, we look
at two distributions that are represented

105
00:08:13,170 --> 00:08:16,950
by samples, and we want to train them
again, that is able to consistently

106
00:08:17,580 --> 00:08:21,840
generate a multi dimensional difference
between these two distributions. By this

107
00:08:22,230 --> 00:08:26,970
keeping track of all the correlations in
the data. The side effects that we're

108
00:08:26,970 --> 00:08:32,250
hoping for here is that it can eat a very
effeminate use statistical uncertainty

109
00:08:32,250 --> 00:08:37,590
coming from the fact when we have
distributions that have high statistics

110
00:08:37,590 --> 00:08:41,730
that are close to each other, resulting in
very large statistic uncertainties. And

111
00:08:41,730 --> 00:08:46,710
the difference here is interpolation.
Probably properties of the can can act in

112
00:08:46,710 --> 00:08:47,310
our favor,

113
00:08:48,059 --> 00:08:50,609
such an application. Such a setup

114
00:08:50,610 --> 00:08:54,360
would have many applications for instance
of linear subtraction multitrack merging

115
00:08:54,360 --> 00:08:57,600
onto subtraction, or background
subtraction, where it's particularly

116
00:08:57,600 --> 00:09:03,480
important that we preserve the correlation
underline data. First, we have to extend

117
00:09:03,480 --> 00:09:07,380
our again set up and the most prominent
changes are here that we're using two

118
00:09:07,380 --> 00:09:11,610
discriminators one for each of the
training data sets representing the two

119
00:09:11,610 --> 00:09:16,140
different distributions. And the fact that
one of the discriminators only gets a

120
00:09:16,140 --> 00:09:18,240
subset of the generated events.

121
00:09:19,980 --> 00:09:24,630
Some results, the first one here is set up
tempo.

122
00:09:25,410 --> 00:09:30,990
So, we give the network two different data
training sets for the data. One is in blue

123
00:09:31,140 --> 00:09:36,120
dotted in here. So this dotted line
between solid line the dotted alpha

124
00:09:37,140 --> 00:09:43,620
training data from proton proton to a plus
and minus and red dotted as proton proton

125
00:09:43,620 --> 00:09:48,000
via gamma t plus and minus so this most
gamma background, and we train it with the

126
00:09:48,000 --> 00:09:54,300
red and the blue data points. And
afterwards we see what a network can

127
00:09:54,300 --> 00:09:58,920
actually generate and we see that it but
honor produces exactly these distributions

128
00:09:58,920 --> 00:10:05,040
but it can also be properly generates a
difference between the two. Second

129
00:10:05,040 --> 00:10:09,540
example, this one was a bit more
difficult. So here we actually, as a

130
00:10:09,630 --> 00:10:13,530
physics background, as said, we are
looking at Syrian certainties, which will

131
00:10:13,530 --> 00:10:18,210
be a limiting factor for the high
luminosity nhc. And for this, we have to

132
00:10:18,240 --> 00:10:22,920
attack the higher order calculations. And
one of the tasks that they have to do

133
00:10:22,920 --> 00:10:26,100
there is to subtract the equation Qahtani
see what I pulled from real emission

134
00:10:26,100 --> 00:10:33,030
terms, like emission. In this case, we
have some space distribution appear in

135
00:10:33,030 --> 00:10:38,280
blue, and we have to subtract the dipole,
which are the dotted data in red. And

136
00:10:38,280 --> 00:10:43,140
again, we train the network with the blue
and the red data. And the result is a

137
00:10:43,260 --> 00:10:49,290
black line down here. For which we zoom in
on the right hand side. And you see here

138
00:10:49,290 --> 00:10:53,520
now the gray bands are actually indicating
the statistics, uncertainty, statistical

139
00:10:53,520 --> 00:10:57,990
uncertainty coming from a naive
subtraction of the bins of the

140
00:10:57,990 --> 00:11:02,790
distribution and we'll see how that neural
network is just very nicely interpolating

141
00:11:02,790 --> 00:11:06,960
through these data over to orders of
magnitude three if you're taking the

142
00:11:06,960 --> 00:11:15,540
concept of big data we train and actually
are way larger event numbers years and the

143
00:11:15,540 --> 00:11:16,530
difference between the two.

144
00:11:19,590 --> 00:11:20,130
Having

145
00:11:20,190 --> 00:11:26,280
completed this, I want to come to the last
project, which actually goes now the other

146
00:11:26,280 --> 00:11:32,010
way. So, here we want to use again to
unfold detector events. So, a detector

147
00:11:32,010 --> 00:11:38,160
simulation is typically some kind of
process. So, in principle, even though it

148
00:11:38,160 --> 00:11:41,790
will be pride dependent in principle, it
should be possible to invert this process

149
00:11:41,790 --> 00:11:42,420
to map

150
00:11:42,420 --> 00:11:45,210
the distribution from before the detective
simulation to after the,

151
00:11:45,870 --> 00:11:47,640
from after the second summation

152
00:11:47,640 --> 00:11:52,500
to before the detector simulation. first
attempt at this was with neural networks

153
00:11:52,500 --> 00:11:58,800
was done here already a nice paper by
title. And our aim is now to unfold this

154
00:11:58,800 --> 00:12:05,430
on the full face. So, to do the multi
dimensional version of this, for this, we

155
00:12:05,430 --> 00:12:09,810
cannot use a symbol again, because they
will not be any connection between the

156
00:12:09,810 --> 00:12:14,550
input and the discriminator, which is very
important to include a notion of

157
00:12:14,550 --> 00:12:20,490
statistical or probabilistic behavior
here, and also offer the to get a notion

158
00:12:20,490 --> 00:12:27,690
of distance between before and after the
detector. So, basically the idea is if I

159
00:12:27,690 --> 00:12:32,100
have a very high energetic laptop, it
should also probably be in some kind of

160
00:12:32,100 --> 00:12:40,110
high energetic region, also detector and
vice versa. Now, here's the setup for this

161
00:12:40,110 --> 00:12:44,220
and you see the detector level information
enters both the generator as well as

162
00:12:44,250 --> 00:12:46,410
the discriminator. After what?

163
00:12:47,970 --> 00:12:51,750
And now how can we test if our unfolding
was successful, so we train it on the full

164
00:12:51,750 --> 00:12:55,260
data set. But we already know that again,
it's very good at learning the full

165
00:12:55,260 --> 00:12:59,910
structure of the data set and mapping it
to new distribution. So after doing this

166
00:13:00,000 --> 00:13:04,800
In the afterwards, slice our data set for
the testing, for instance, EA, Grecian,

167
00:13:04,800 --> 00:13:11,550
etc, that we slice it into PT of the
leading and second leading and taking

168
00:13:11,730 --> 00:13:16,290
roughly 38% of the data, and then we just
unfold this part and see if it actually

169
00:13:16,290 --> 00:13:20,310
matches the corresponding partner level
distribution. And you see again here in

170
00:13:20,310 --> 00:13:26,070
blue as the data and dotted is the data
and the solid lines are generated, and for

171
00:13:26,070 --> 00:13:30,300
the distribution, and you see this works
very nicely. Now, very interesting

172
00:13:30,300 --> 00:13:34,260
question that might arise us, okay. We
know unfolding in some way prior

173
00:13:34,500 --> 00:13:39,390
dependent. What happens now if we put any
test data that just never was in the

174
00:13:39,390 --> 00:13:45,480
training data. For this, we generated new
events, namely w prime events, and

175
00:13:45,480 --> 00:13:52,020
inserted this into our test data set
afterwards. And we know of course, a truth

176
00:13:52,050 --> 00:13:56,010
distribution which is given here in red,
and the Standard Model distribution as I

177
00:13:56,010 --> 00:14:01,920
given green here, which is what we train
on unfolding. This obtains a blue

178
00:14:01,920 --> 00:14:06,780
distribution down here. And even though
Okay, the width is somewhat broader,

179
00:14:06,810 --> 00:14:11,850
obviously than than the red distribution,
the truth, it very nicely finds the exact

180
00:14:11,850 --> 00:14:21,930
placement often very often. So w prime was
this already come to the conclusion. So

181
00:14:21,930 --> 00:14:25,980
we've seen that with scans, we can
actually learn underlying distributions

182
00:14:25,980 --> 00:14:30,690
from events. And if we have any kind of
special features that we want to learn, we

183
00:14:30,690 --> 00:14:35,520
can do this with the nmt without having to
insert first hand information. This is

184
00:14:35,520 --> 00:14:39,600
very important when we potentially don't
know what the width of the particle

185
00:14:39,600 --> 00:14:44,220
actually is. So we can just tell it to
look at certain combinations and learn

186
00:14:44,220 --> 00:14:49,170
from the data. In particular for GaNS,
often people struggle with stabilizing

187
00:14:49,170 --> 00:14:52,920
also training but it says a lot of
solutions that one can try out, read

188
00:14:52,920 --> 00:14:57,480
infinite difference in specifying gaps.
And so with the list beforehand was a lot

189
00:14:57,480 --> 00:15:04,470
of nice papers where you can duration.
Having learned this distribution, we can

190
00:15:04,500 --> 00:15:08,160
go from more complex models. For instance,
we cannot successfully sample

191
00:15:09,690 --> 00:15:10,830
have a successful

192
00:15:11,250 --> 00:15:15,120
implementation of a sample based
production which could be used for

193
00:15:15,120 --> 00:15:20,310
background detection or linear subtraction
and event generation. Finally, we can

194
00:15:20,310 --> 00:15:24,900
unfold high dimensional detector
distributions with again, by employing

195
00:15:24,900 --> 00:15:30,660
fully conditional again, because this has
a notion of locality. And what's next?

196
00:15:31,020 --> 00:15:31,740
With this, I

197
00:15:32,010 --> 00:15:35,490
come to the end. And I'm curious for
questions. Thank you very much.

198
00:15:39,270 --> 00:15:39,900
Okay, great.

199
00:15:41,730 --> 00:15:46,920
I guess I should encourage people also to,
if they're going if they come up with a

200
00:15:46,920 --> 00:15:51,120
question during their, during the talk,
they can raise their hands during the talk

201
00:15:51,120 --> 00:15:51,630
as well.

202
00:15:53,490 --> 00:15:55,470
But yeah, are there any questions from the
room?

203
00:16:02,159 --> 00:16:08,279
Um, I was kind of curious about one. So I
think the last application that you

204
00:16:08,279 --> 00:16:16,679
mentioned here, you were able to sort of
discover a new physics signal that you

205
00:16:16,679 --> 00:16:23,579
injected having already trained this using
standard, like known physics. Yes. Um, I

206
00:16:23,579 --> 00:16:28,769
guess one question here, which is, it's
probably a difficult one to answer is,

207
00:16:30,090 --> 00:16:32,820
if you know that you're known physics is
not modeled correctly.

208
00:16:35,400 --> 00:16:40,860
And then you try to use this technique.
How are you distinguishing between Miss

209
00:16:40,860 --> 00:16:43,260
modeling and something that might be near
physics?

210
00:16:44,790 --> 00:16:50,730
Or the question? Yes. It's obviously very
difficult. And at this stage, I wouldn't.

211
00:16:52,050 --> 00:16:56,490
I wouldn't say that we can answer this
properly already. We were actually rather

212
00:16:56,490 --> 00:17:03,330
surprised that this worked as good as it
did. But of course, the problem always

213
00:17:03,330 --> 00:17:08,250
accrues the moment we have new physics in
our data set. And we are trying to model

214
00:17:08,250 --> 00:17:13,980
it with standard model physics. So I think
this is just off as a possibility to see

215
00:17:13,980 --> 00:17:18,900
how good we can go with unfolding. We know
that the unfolding process is prior

216
00:17:18,900 --> 00:17:23,310
dependence, or whatever we do, it depends
in a way on the data set we trained on

217
00:17:23,340 --> 00:17:29,160
because this is what the network is
trained to map on. So I think the most

218
00:17:29,160 --> 00:17:33,660
important is to understand how prior
dependencies like how much goes into it,

219
00:17:33,660 --> 00:17:36,240
and which parts are more independent.

220
00:17:39,360 --> 00:17:44,220
For this, we just have to look more disk
kind of distributions and try to

221
00:17:44,220 --> 00:17:45,210
understand what's next release?

222
00:17:47,760 --> 00:17:52,620
Yeah, I guess it kind of goes in the
direction of if you can parameterize

223
00:17:52,620 --> 00:17:56,430
certain types of flexibility into the
model, but I think that's so much deeper

224
00:17:56,430 --> 00:17:56,910
question.

225
00:17:57,330 --> 00:18:01,980
Yes, at this point, I mean, you could for
instance, train This with w prime data

226
00:18:01,980 --> 00:18:07,380
making it conditional on the math. But we
wanted to start with a model and see okay

227
00:18:07,380 --> 00:18:11,340
what what happens if we now put something
inside that it has just never see what

228
00:18:11,670 --> 00:18:12,450
what is the outcome