1 00:00:01,860 --> 00:00:06,930 I assume you can see this now. Yes. And 2 00:00:08,220 --> 00:00:11,820 since you're so kindly Okay. 3 00:00:15,179 --> 00:00:15,689 I will start. 4 00:00:17,279 --> 00:00:22,679 Thank you very much for the possibility to speak. And I will now follow Isaac's talk, 5 00:00:23,309 --> 00:00:27,389 talking about discussing generative models and events in relation following his talk 6 00:00:27,389 --> 00:00:31,379 on the detector simulation. So, what I'm going to present has been done together 7 00:00:31,379 --> 00:00:35,999 with Miko, like and you can see it's gotten clean and time on the data. 8 00:00:39,569 --> 00:00:43,319 Position simulations based on fish printable MonteCarlo generators are 9 00:00:43,319 --> 00:00:47,759 currently at the heart of any analysis. They're linking the fundamental theory 10 00:00:47,759 --> 00:00:54,719 side, which is basically given by a minimal set of parameters. We see a very 11 00:00:54,719 --> 00:00:59,519 complex detector site where we are facing very large amounts of data and in order to 12 00:00:59,519 --> 00:01:04,979 understand data, you need to simulations to give very precise, unrelated events 13 00:01:04,979 --> 00:01:05,519 samples 14 00:01:05,640 --> 00:01:08,670 in order to get a correct interpretation. 15 00:01:10,440 --> 00:01:14,040 Now, these simulations are facing a lot of challenges. One of them that I should 16 00:01:14,040 --> 00:01:18,570 already mentioned is the slow detector simulations that we have this chance for 17 00:01:19,200 --> 00:01:23,700 us I find instantly complex high dimensional phase space, which has to be 18 00:01:23,700 --> 00:01:28,260 mapped out very precisely. If this mapping is not precisely, then we're facing a very 19 00:01:28,440 --> 00:01:32,700 low unweighting efficiency, making the entire simulation process very 20 00:01:32,700 --> 00:01:36,300 inefficient. This is where neural networks cannot 21 00:01:36,300 --> 00:01:37,020 find solution. 22 00:01:38,310 --> 00:01:43,050 And in particular, their ability to generate very flexible parameterizations. 23 00:01:43,080 --> 00:01:48,210 And the interpolation property allow us to really accurately map this complex face 24 00:01:48,210 --> 00:01:53,490 space. afterwards. They're also very fast navigation, as you've already pointed out, 25 00:01:54,000 --> 00:01:59,520 and is already also a lot of generative models on the market which we can 26 00:01:59,520 --> 00:02:04,890 basically simply pick up and use domain three generative adversarial networks 27 00:02:04,890 --> 00:02:12,240 variational auto encoders and normalizing flows. This is not that new anymore. So 28 00:02:12,240 --> 00:02:19,170 here you see, by all means non complete overview of papers that have already used 29 00:02:19,650 --> 00:02:24,540 these techniques for event generation. They're very roughly categorized in four 30 00:02:24,720 --> 00:02:30,210 categories. First of all, we can use machine learning in order to integrate it 31 00:02:30,210 --> 00:02:34,770 into the MonteCarlo process itself for instance, by estimating matrix element or 32 00:02:35,040 --> 00:02:36,750 for the neural important sampling 33 00:02:38,129 --> 00:02:39,209 in liquid with Magnus gonna 34 00:02:39,300 --> 00:02:44,370 talk about this later, then we can use it for event generation where we start with a 35 00:02:44,370 --> 00:02:49,860 formal mentor often event, which is what I will be focusing on. Then I should already 36 00:02:49,860 --> 00:02:53,550 explained to the technical part, and finally somewhat orthogonal to these three 37 00:02:53,550 --> 00:02:58,530 categories. We can also try to inverse the entire process without your network 38 00:02:59,190 --> 00:03:03,630 unfoldings it detector distribution. Looking at this overview, you see that 39 00:03:03,630 --> 00:03:08,190 very many of these papers, focus on the generative adversarial network are 40 00:03:08,190 --> 00:03:08,760 suitable 41 00:03:09,479 --> 00:03:14,009 and in the following I will do the same. And let me therefore just give a very 42 00:03:14,009 --> 00:03:14,489 short 43 00:03:14,759 --> 00:03:17,729 recap for those who are not familiar with it. 44 00:03:19,199 --> 00:03:24,899 So, the generative adversarial network basically starts with training data up 45 00:03:24,899 --> 00:03:29,099 here. So, this is a treatment. On the other hand, we have some random numbers 46 00:03:29,099 --> 00:03:34,589 which will be inputted into a generator to generate an event sample and we have a 47 00:03:34,589 --> 00:03:39,869 second neural network the discriminator distinguishing these two and a stranger it 48 00:03:39,899 --> 00:03:45,329 we are training loss given a camp. This is now counteracted by the generator who is 49 00:03:45,329 --> 00:03:51,239 trying to fool the discriminator by making noise more similar events so that in the 50 00:03:51,239 --> 00:03:55,769 end when the training has converged successfully, then January is going to 51 00:03:55,769 --> 00:04:02,939 generate events that follows the underlying distribution of the data Why do 52 00:04:02,939 --> 00:04:07,889 we use scans so often and not variational encoders? Well, this is can actually not 53 00:04:07,889 --> 00:04:12,359 be stated like this in absolute value. However, it turns out on on average, very 54 00:04:12,359 --> 00:04:18,299 often the GaNS have a better capability of generating truthful events and very 55 00:04:18,299 --> 00:04:23,939 realistic looking events. It's often a photographer ration out encoders under the 56 00:04:23,939 --> 00:04:28,769 large community tree supporting both approaches. But then on the other hand, is 57 00:04:28,769 --> 00:04:35,099 also more or less big disadvantage of the cans. And many people have already worked 58 00:04:35,099 --> 00:04:40,769 with this have faced unstable training conditions. However, I assume you all the 59 00:04:41,039 --> 00:04:44,969 people who work with this already so there's a lot of solutions on the market, 60 00:04:44,969 --> 00:04:48,929 which you can use in order to face these problems. First of all, you can use 61 00:04:48,929 --> 00:04:52,289 urbanization for the discriminator, for instance, for implementing gradient 62 00:04:52,289 --> 00:04:56,549 penalty, and this is also applied with all the projects that I'm going to present 63 00:04:56,549 --> 00:05:00,659 later. You can modify the training objective by using different Kind of loss 64 00:05:00,659 --> 00:05:05,789 functions but again most prominently I guess the question again and then you can 65 00:05:05,789 --> 00:05:11,399 also process the data. For instance by and making use of symmetries by the whitening 66 00:05:11,429 --> 00:05:16,379 or by using feature meditation out of District so you can very nicely stabilizer 67 00:05:16,409 --> 00:05:21,209 again training Once you've done this we can now look at a realistic and final 68 00:05:21,209 --> 00:05:27,119 state that we want to generate and we start here with TT Bach process. With this 69 00:05:27,119 --> 00:05:33,209 we have six particles in the final, final state. And since when fixing the external 70 00:05:33,209 --> 00:05:37,649 masses and don't enforce momentum conservation by hand doesn't turn 18 71 00:05:37,649 --> 00:05:43,769 dimensional output especially challenges here are the intermediate particles in 72 00:05:43,769 --> 00:05:50,549 particular the invariant mass and within face base boundaries of can arise. From 73 00:05:50,549 --> 00:05:54,479 the very first thing one wants to check so they're not flat observables for instance, 74 00:05:54,479 --> 00:05:58,499 from intermediate practical legs, the top work, which you can see here as the PT of 75 00:05:58,499 --> 00:06:02,759 the top structures that again is very nicely able to 76 00:06:04,230 --> 00:06:06,750 follow the data that we feed it with. 77 00:06:08,130 --> 00:06:12,930 However, when we look now at the ratio of one and two data, we also find that there 78 00:06:12,930 --> 00:06:16,920 is a systematic undershoot in the tails. So in particular, there were we have no 79 00:06:16,920 --> 00:06:22,530 statistics, we have to be very careful of what is being learned. This is 80 00:06:22,590 --> 00:06:26,430 particularly important for the future and when we are when a big target for the 81 00:06:26,430 --> 00:06:30,630 future will be to estimate these kind of uncertainties and to control them when we 82 00:06:30,630 --> 00:06:37,170 want to really use neural networks for event generation. And the next one is 83 00:06:37,170 --> 00:06:42,210 invariant masses in between. That is very difficult again, since it has to combine 84 00:06:42,540 --> 00:06:45,780 information from all of the different final states in order to get the invariant 85 00:06:45,780 --> 00:06:51,090 mass for instance of the W on top. And you see a normal vanilla game is simply going 86 00:06:51,090 --> 00:06:55,740 to generate this green curve down here which is really not coincide with the true 87 00:06:55,740 --> 00:06:59,970 distribution, either but we can use here as the so called maximum mean discrepancy 88 00:07:00,000 --> 00:07:07,050 Mt loss where we use some arbitrary kernel in between. And with this help, we can 89 00:07:07,050 --> 00:07:12,630 actually train the neural network, and it starts to actually produce very nicely the 90 00:07:12,630 --> 00:07:18,420 data distribution. We see when we use for instance, economical doesn't disappoint. 91 00:07:20,310 --> 00:07:24,780 Now that we've solved this, we can move on to actually two dimensional correlations. 92 00:07:24,870 --> 00:07:27,480 And here we see for instance, a correlation between the top and the 93 00:07:27,480 --> 00:07:27,780 bottom. 94 00:07:29,550 --> 00:07:33,060 And if we want to quantify Okay, how well does this 95 00:07:35,880 --> 00:07:40,230 data generate a distribution curve correspond to the true distribution, we 96 00:07:40,230 --> 00:07:45,330 can find some take slice yet PT of the top to 100 TV, that is shown on the right hand 97 00:07:45,330 --> 00:07:49,470 side and you see again, the generated events nicely follows the true data 98 00:07:49,470 --> 00:07:53,010 distribution all the way up to the Face Face boundary weight goes down 99 00:07:53,069 --> 00:07:54,089 into the tails. 100 00:07:56,430 --> 00:08:01,920 So with this we've shown now that Ghana is actually able to We'll learn the 101 00:08:01,920 --> 00:08:06,480 underlying distribution of data that it's trained on. And we thought, Okay, well, 102 00:08:06,510 --> 00:08:07,830 maybe they can also 103 00:08:07,860 --> 00:08:08,550 can do 104 00:08:08,700 --> 00:08:13,170 some more complex task. For this, we look at two distributions that are represented 105 00:08:13,170 --> 00:08:16,950 by samples, and we want to train them again, that is able to consistently 106 00:08:17,580 --> 00:08:21,840 generate a multi dimensional difference between these two distributions. By this 107 00:08:22,230 --> 00:08:26,970 keeping track of all the correlations in the data. The side effects that we're 108 00:08:26,970 --> 00:08:32,250 hoping for here is that it can eat a very effeminate use statistical uncertainty 109 00:08:32,250 --> 00:08:37,590 coming from the fact when we have distributions that have high statistics 110 00:08:37,590 --> 00:08:41,730 that are close to each other, resulting in very large statistic uncertainties. And 111 00:08:41,730 --> 00:08:46,710 the difference here is interpolation. Probably properties of the can can act in 112 00:08:46,710 --> 00:08:47,310 our favor, 113 00:08:48,059 --> 00:08:50,609 such an application. Such a setup 114 00:08:50,610 --> 00:08:54,360 would have many applications for instance of linear subtraction multitrack merging 115 00:08:54,360 --> 00:08:57,600 onto subtraction, or background subtraction, where it's particularly 116 00:08:57,600 --> 00:09:03,480 important that we preserve the correlation underline data. First, we have to extend 117 00:09:03,480 --> 00:09:07,380 our again set up and the most prominent changes are here that we're using two 118 00:09:07,380 --> 00:09:11,610 discriminators one for each of the training data sets representing the two 119 00:09:11,610 --> 00:09:16,140 different distributions. And the fact that one of the discriminators only gets a 120 00:09:16,140 --> 00:09:18,240 subset of the generated events. 121 00:09:19,980 --> 00:09:24,630 Some results, the first one here is set up tempo. 122 00:09:25,410 --> 00:09:30,990 So, we give the network two different data training sets for the data. One is in blue 123 00:09:31,140 --> 00:09:36,120 dotted in here. So this dotted line between solid line the dotted alpha 124 00:09:37,140 --> 00:09:43,620 training data from proton proton to a plus and minus and red dotted as proton proton 125 00:09:43,620 --> 00:09:48,000 via gamma t plus and minus so this most gamma background, and we train it with the 126 00:09:48,000 --> 00:09:54,300 red and the blue data points. And afterwards we see what a network can 127 00:09:54,300 --> 00:09:58,920 actually generate and we see that it but honor produces exactly these distributions 128 00:09:58,920 --> 00:10:05,040 but it can also be properly generates a difference between the two. Second 129 00:10:05,040 --> 00:10:09,540 example, this one was a bit more difficult. So here we actually, as a 130 00:10:09,630 --> 00:10:13,530 physics background, as said, we are looking at Syrian certainties, which will 131 00:10:13,530 --> 00:10:18,210 be a limiting factor for the high luminosity nhc. And for this, we have to 132 00:10:18,240 --> 00:10:22,920 attack the higher order calculations. And one of the tasks that they have to do 133 00:10:22,920 --> 00:10:26,100 there is to subtract the equation Qahtani see what I pulled from real emission 134 00:10:26,100 --> 00:10:33,030 terms, like emission. In this case, we have some space distribution appear in 135 00:10:33,030 --> 00:10:38,280 blue, and we have to subtract the dipole, which are the dotted data in red. And 136 00:10:38,280 --> 00:10:43,140 again, we train the network with the blue and the red data. And the result is a 137 00:10:43,260 --> 00:10:49,290 black line down here. For which we zoom in on the right hand side. And you see here 138 00:10:49,290 --> 00:10:53,520 now the gray bands are actually indicating the statistics, uncertainty, statistical 139 00:10:53,520 --> 00:10:57,990 uncertainty coming from a naive subtraction of the bins of the 140 00:10:57,990 --> 00:11:02,790 distribution and we'll see how that neural network is just very nicely interpolating 141 00:11:02,790 --> 00:11:06,960 through these data over to orders of magnitude three if you're taking the 142 00:11:06,960 --> 00:11:15,540 concept of big data we train and actually are way larger event numbers years and the 143 00:11:15,540 --> 00:11:16,530 difference between the two. 144 00:11:19,590 --> 00:11:20,130 Having 145 00:11:20,190 --> 00:11:26,280 completed this, I want to come to the last project, which actually goes now the other 146 00:11:26,280 --> 00:11:32,010 way. So, here we want to use again to unfold detector events. So, a detector 147 00:11:32,010 --> 00:11:38,160 simulation is typically some kind of process. So, in principle, even though it 148 00:11:38,160 --> 00:11:41,790 will be pride dependent in principle, it should be possible to invert this process 149 00:11:41,790 --> 00:11:42,420 to map 150 00:11:42,420 --> 00:11:45,210 the distribution from before the detective simulation to after the, 151 00:11:45,870 --> 00:11:47,640 from after the second summation 152 00:11:47,640 --> 00:11:52,500 to before the detector simulation. first attempt at this was with neural networks 153 00:11:52,500 --> 00:11:58,800 was done here already a nice paper by title. And our aim is now to unfold this 154 00:11:58,800 --> 00:12:05,430 on the full face. So, to do the multi dimensional version of this, for this, we 155 00:12:05,430 --> 00:12:09,810 cannot use a symbol again, because they will not be any connection between the 156 00:12:09,810 --> 00:12:14,550 input and the discriminator, which is very important to include a notion of 157 00:12:14,550 --> 00:12:20,490 statistical or probabilistic behavior here, and also offer the to get a notion 158 00:12:20,490 --> 00:12:27,690 of distance between before and after the detector. So, basically the idea is if I 159 00:12:27,690 --> 00:12:32,100 have a very high energetic laptop, it should also probably be in some kind of 160 00:12:32,100 --> 00:12:40,110 high energetic region, also detector and vice versa. Now, here's the setup for this 161 00:12:40,110 --> 00:12:44,220 and you see the detector level information enters both the generator as well as 162 00:12:44,250 --> 00:12:46,410 the discriminator. After what? 163 00:12:47,970 --> 00:12:51,750 And now how can we test if our unfolding was successful, so we train it on the full 164 00:12:51,750 --> 00:12:55,260 data set. But we already know that again, it's very good at learning the full 165 00:12:55,260 --> 00:12:59,910 structure of the data set and mapping it to new distribution. So after doing this 166 00:13:00,000 --> 00:13:04,800 In the afterwards, slice our data set for the testing, for instance, EA, Grecian, 167 00:13:04,800 --> 00:13:11,550 etc, that we slice it into PT of the leading and second leading and taking 168 00:13:11,730 --> 00:13:16,290 roughly 38% of the data, and then we just unfold this part and see if it actually 169 00:13:16,290 --> 00:13:20,310 matches the corresponding partner level distribution. And you see again here in 170 00:13:20,310 --> 00:13:26,070 blue as the data and dotted is the data and the solid lines are generated, and for 171 00:13:26,070 --> 00:13:30,300 the distribution, and you see this works very nicely. Now, very interesting 172 00:13:30,300 --> 00:13:34,260 question that might arise us, okay. We know unfolding in some way prior 173 00:13:34,500 --> 00:13:39,390 dependent. What happens now if we put any test data that just never was in the 174 00:13:39,390 --> 00:13:45,480 training data. For this, we generated new events, namely w prime events, and 175 00:13:45,480 --> 00:13:52,020 inserted this into our test data set afterwards. And we know of course, a truth 176 00:13:52,050 --> 00:13:56,010 distribution which is given here in red, and the Standard Model distribution as I 177 00:13:56,010 --> 00:14:01,920 given green here, which is what we train on unfolding. This obtains a blue 178 00:14:01,920 --> 00:14:06,780 distribution down here. And even though Okay, the width is somewhat broader, 179 00:14:06,810 --> 00:14:11,850 obviously than than the red distribution, the truth, it very nicely finds the exact 180 00:14:11,850 --> 00:14:21,930 placement often very often. So w prime was this already come to the conclusion. So 181 00:14:21,930 --> 00:14:25,980 we've seen that with scans, we can actually learn underlying distributions 182 00:14:25,980 --> 00:14:30,690 from events. And if we have any kind of special features that we want to learn, we 183 00:14:30,690 --> 00:14:35,520 can do this with the nmt without having to insert first hand information. This is 184 00:14:35,520 --> 00:14:39,600 very important when we potentially don't know what the width of the particle 185 00:14:39,600 --> 00:14:44,220 actually is. So we can just tell it to look at certain combinations and learn 186 00:14:44,220 --> 00:14:49,170 from the data. In particular for GaNS, often people struggle with stabilizing 187 00:14:49,170 --> 00:14:52,920 also training but it says a lot of solutions that one can try out, read 188 00:14:52,920 --> 00:14:57,480 infinite difference in specifying gaps. And so with the list beforehand was a lot 189 00:14:57,480 --> 00:15:04,470 of nice papers where you can duration. Having learned this distribution, we can 190 00:15:04,500 --> 00:15:08,160 go from more complex models. For instance, we cannot successfully sample 191 00:15:09,690 --> 00:15:10,830 have a successful 192 00:15:11,250 --> 00:15:15,120 implementation of a sample based production which could be used for 193 00:15:15,120 --> 00:15:20,310 background detection or linear subtraction and event generation. Finally, we can 194 00:15:20,310 --> 00:15:24,900 unfold high dimensional detector distributions with again, by employing 195 00:15:24,900 --> 00:15:30,660 fully conditional again, because this has a notion of locality. And what's next? 196 00:15:31,020 --> 00:15:31,740 With this, I 197 00:15:32,010 --> 00:15:35,490 come to the end. And I'm curious for questions. Thank you very much. 198 00:15:39,270 --> 00:15:39,900 Okay, great. 199 00:15:41,730 --> 00:15:46,920 I guess I should encourage people also to, if they're going if they come up with a 200 00:15:46,920 --> 00:15:51,120 question during their, during the talk, they can raise their hands during the talk 201 00:15:51,120 --> 00:15:51,630 as well. 202 00:15:53,490 --> 00:15:55,470 But yeah, are there any questions from the room? 203 00:16:02,159 --> 00:16:08,279 Um, I was kind of curious about one. So I think the last application that you 204 00:16:08,279 --> 00:16:16,679 mentioned here, you were able to sort of discover a new physics signal that you 205 00:16:16,679 --> 00:16:23,579 injected having already trained this using standard, like known physics. Yes. Um, I 206 00:16:23,579 --> 00:16:28,769 guess one question here, which is, it's probably a difficult one to answer is, 207 00:16:30,090 --> 00:16:32,820 if you know that you're known physics is not modeled correctly. 208 00:16:35,400 --> 00:16:40,860 And then you try to use this technique. How are you distinguishing between Miss 209 00:16:40,860 --> 00:16:43,260 modeling and something that might be near physics? 210 00:16:44,790 --> 00:16:50,730 Or the question? Yes. It's obviously very difficult. And at this stage, I wouldn't. 211 00:16:52,050 --> 00:16:56,490 I wouldn't say that we can answer this properly already. We were actually rather 212 00:16:56,490 --> 00:17:03,330 surprised that this worked as good as it did. But of course, the problem always 213 00:17:03,330 --> 00:17:08,250 accrues the moment we have new physics in our data set. And we are trying to model 214 00:17:08,250 --> 00:17:13,980 it with standard model physics. So I think this is just off as a possibility to see 215 00:17:13,980 --> 00:17:18,900 how good we can go with unfolding. We know that the unfolding process is prior 216 00:17:18,900 --> 00:17:23,310 dependence, or whatever we do, it depends in a way on the data set we trained on 217 00:17:23,340 --> 00:17:29,160 because this is what the network is trained to map on. So I think the most 218 00:17:29,160 --> 00:17:33,660 important is to understand how prior dependencies like how much goes into it, 219 00:17:33,660 --> 00:17:36,240 and which parts are more independent. 220 00:17:39,360 --> 00:17:44,220 For this, we just have to look more disk kind of distributions and try to 221 00:17:44,220 --> 00:17:45,210 understand what's next release? 222 00:17:47,760 --> 00:17:52,620 Yeah, I guess it kind of goes in the direction of if you can parameterize 223 00:17:52,620 --> 00:17:56,430 certain types of flexibility into the model, but I think that's so much deeper 224 00:17:56,430 --> 00:17:56,910 question. 225 00:17:57,330 --> 00:18:01,980 Yes, at this point, I mean, you could for instance, train This with w prime data 226 00:18:01,980 --> 00:18:07,380 making it conditional on the math. But we wanted to start with a model and see okay 227 00:18:07,380 --> 00:18:11,340 what what happens if we now put something inside that it has just never see what 228 00:18:11,670 --> 00:18:12,450 what is the outcome