1 00:00:06,299 --> 00:00:07,079 All right. 2 00:00:08,730 --> 00:00:12,780 Uh, before I started my talk, let me think on organizer first of all this invitation. 3 00:00:13,860 --> 00:00:19,110 The title of my talk is detecting new physics as nobody as the start of this 4 00:00:19,110 --> 00:00:19,620 talk. 5 00:00:28,470 --> 00:00:33,180 At the start of this talk, let me bring two papers to your attention which was 6 00:00:33,180 --> 00:00:38,970 submitted to archive this month. The first one is adversarially learned anomaly 7 00:00:38,970 --> 00:00:45,000 detection and CMS opening data rediscovering the top Quark, and another 8 00:00:45,000 --> 00:00:52,410 one is from outlasts here is extract. This letter describes a search for retina new 9 00:00:52,410 --> 00:00:57,120 Felix, using a machine learning anomaly detection procedure that doesn't rely on 10 00:00:57,120 --> 00:01:03,630 our signal model hypothesis. And these two more of these two papers, an important 11 00:01:03,630 --> 00:01:10,650 step about application of the machine learning techniques and nobody detection 12 00:01:11,790 --> 00:01:18,690 from proof of concept to real data analysis, what is and nobody detection 13 00:01:19,650 --> 00:01:25,860 here is the definition nobody detection is a task of classifying test data that 14 00:01:25,860 --> 00:01:32,550 differ in some respect from the data that are available during training. Okay. Why 15 00:01:32,550 --> 00:01:39,570 important step in nobody detection is to evaluate the noted response of the testing 16 00:01:39,570 --> 00:01:43,920 data and then based on the note the response of the testing data, we're able 17 00:01:43,920 --> 00:01:51,540 to analyze detection sensitivity Okay, this sounds like a PDE analysis. First we 18 00:01:51,540 --> 00:01:57,570 analyze the PDS response or testing data then we are able to get summative analysis 19 00:01:57,570 --> 00:02:07,350 sensitivity if we If we take a survey and the literature's are nobody detection then 20 00:02:07,350 --> 00:02:11,790 we will say the history of the novelty detection is basically a history of 21 00:02:11,790 --> 00:02:20,700 developing nobody evaluator or a methods for the testing sample, okay. Here it is. 22 00:02:24,150 --> 00:02:30,000 It is Westwell corn out in literature's sometimes we say that terminology semi 23 00:02:30,000 --> 00:02:35,970 supervised lending or fully unsupervised here, the point is this terminology 24 00:02:36,000 --> 00:02:40,740 actually is something separate from the discussion here. They refer to how to 25 00:02:40,740 --> 00:02:45,330 simulate or evaluate the backgrounds okay. So, look, you know, what evaluator or 26 00:02:45,330 --> 00:02:55,980 methods suggest suggested in principle, patient and the roughly speaking, the 27 00:02:56,070 --> 00:03:03,630 evaluators can be classified into two classes. based and classroom based, the 28 00:03:03,630 --> 00:03:10,320 first one is isolation based evaluators, that is the novelty of a given testing 29 00:03:10,320 --> 00:03:15,390 data point is evaluated according to its distance to or isolation from the 30 00:03:15,390 --> 00:03:21,690 distribution of the no parent data in the feature space. So, here the point is, the 31 00:03:21,720 --> 00:03:26,700 point is the evaluation is purely based on the relation between the given testing 32 00:03:26,700 --> 00:03:32,850 data point and the distribution of the no pattern data in the feature space and all 33 00:03:32,850 --> 00:03:37,110 the other testing points in the same sample are irrelevant for this evaluation. 34 00:03:37,350 --> 00:03:45,270 Okay. And here is one example it is the forms the method areas the neighbors and 35 00:03:45,270 --> 00:03:49,380 the left one is the mayor and right one is normalized the using the cumulative 36 00:03:49,380 --> 00:03:54,630 distribution function. And in this definition, the D term in the numerator 37 00:03:54,900 --> 00:03:59,490 represents the mean distance of a testing data point that tweets a king Aries the 38 00:03:59,490 --> 00:04:04,770 neighbors And the deep fried trim represents the average of the mean 39 00:04:04,770 --> 00:04:13,800 distance is defined for its k nearest neighbors k nearest neighbors okay and you 40 00:04:13,800 --> 00:04:18,360 know neighbor represent a standard a standard deviation of the letter. So, here 41 00:04:18,360 --> 00:04:22,560 the point is all of these three quantities are defined with respect to the training 42 00:04:22,560 --> 00:04:27,780 sample, okay. So, you can imagine in the feature space one test important if you 43 00:04:27,780 --> 00:04:32,430 fall away from the distribution of the training sample, then the value of the 44 00:04:32,430 --> 00:04:37,680 train is big and these points will be scored high okay. So, this is idea and 45 00:04:37,710 --> 00:04:43,080 another another proposal is the reconstruction error or loss function. And 46 00:04:43,080 --> 00:04:48,210 this was proposed by two collaborations two years ago. And reconstruction error in 47 00:04:48,240 --> 00:04:54,630 essence, is isolation based. So, this can be understood that since, for each test 48 00:04:54,630 --> 00:04:58,590 important when we calculate it, we can throw an error somehow basically just 49 00:04:58,590 --> 00:05:05,520 based on the relation between This has been point and no pattern data okay and it 50 00:05:05,520 --> 00:05:09,840 has nothing to do with the other data points in the same has been sample okay 51 00:05:12,690 --> 00:05:20,640 another class of evaluators are classroom based that is the nobody of a given has 52 00:05:20,640 --> 00:05:26,850 important event evaluated according to the data clustering around this testing point 53 00:05:27,090 --> 00:05:32,340 on top of the no pattern data distribution in the feature space. So, different from 54 00:05:32,340 --> 00:05:38,370 the first one in this process. The evaluation takes into account evaluate 55 00:05:38,370 --> 00:05:42,960 takes into account the correlation between the testing data points and the other 56 00:05:42,960 --> 00:05:47,340 points around in the same data sample. So, this is the generic difference between 57 00:05:47,340 --> 00:05:55,530 these two classes of evaluators. One example again is from the method of the 58 00:05:55,530 --> 00:06:00,780 king areas, the neighbors, okay, the left one is isolation, isolation base The right 59 00:06:00,780 --> 00:06:06,840 one is classroom based here detached in the denominator of the classroom based on 60 00:06:07,140 --> 00:06:12,690 his minty stance of a testing data of this passing data point to the K nearest 61 00:06:12,720 --> 00:06:19,980 neighbors in the state in the training data sample, okay. And the test is a mean 62 00:06:19,980 --> 00:06:24,900 distance of the testing data point with King RS neighbors in the testing data as a 63 00:06:24,900 --> 00:06:30,660 testing data sample. As for M is a hyper parameter, it can be chosen to be the 64 00:06:30,660 --> 00:06:36,630 dimension of the feature space and afford that case t test two minus m represents 65 00:06:36,630 --> 00:06:43,950 the local density of this testing data point in the testing sample and a four D 66 00:06:43,950 --> 00:06:50,190 twin to minus n, okay. So in this case, the no water response is evaluated by 67 00:06:50,190 --> 00:06:56,370 comparing the local densities of the testing point in the training sample and 68 00:06:56,370 --> 00:07:01,200 the testing sample. Actually, there is the approximate the statistic interpretation 69 00:07:01,410 --> 00:07:07,560 for such a evaluator is proportional to the s over r squared to be in the local 70 00:07:07,560 --> 00:07:19,200 being of this testing data points. Another proposal was made recently by a Panama and 71 00:07:19,200 --> 00:07:25,470 the DaVinci. And this proposal was made in the similar spirit, okay, the difference 72 00:07:25,470 --> 00:07:29,790 between these two is just from the definition of the density was the first 73 00:07:29,790 --> 00:07:36,750 one the density is defined to be the local density space and in the second case, the 74 00:07:36,750 --> 00:07:41,700 density is replaced with probability density. This is the only difference 75 00:07:41,700 --> 00:07:51,480 between that but this is not the whole story. We want you to synergize these two 76 00:07:51,480 --> 00:07:56,340 different classes of evaluators, what do we like to do that? Okay, we can consider 77 00:07:56,550 --> 00:08:04,470 example, a two dimensional Gaussian sample to what To show the reason the left column 78 00:08:04,620 --> 00:08:08,400 shows the left column shows the performance of the isolation basic 79 00:08:08,400 --> 00:08:12,630 evaluator. And we can say, we can say the 80 00:08:13,980 --> 00:08:19,530 which, okay, for the samples, the red points represent the new pattern data. The 81 00:08:19,530 --> 00:08:25,950 blue points represents the unknown pattern data, let's say, which read a strong 82 00:08:25,950 --> 00:08:31,560 response to the isolation based evaluator. So, here the point is, in addition to the 83 00:08:31,560 --> 00:08:38,280 red points in the original signal being at the two dimensional plane, the red points 84 00:08:38,310 --> 00:08:43,380 between the two yellow circles also has also have a strong response to the 85 00:08:43,380 --> 00:08:48,960 isolation based evaluator, okay, similarly, because for all red points 86 00:08:48,990 --> 00:08:54,510 between these two yellow circles, how their local densities are comparable, in 87 00:08:54,510 --> 00:09:00,000 the middle column, it shows the performance of the classroom based nobody 88 00:09:00,000 --> 00:09:08,430 evaluator So, here look for red points. In addition, in addition to the the ones from 89 00:09:08,430 --> 00:09:14,250 the original signal from the original signal being the red points from the non 90 00:09:14,250 --> 00:09:19,680 signal raging with a prior fluctuation can also have a strong response to the 91 00:09:19,740 --> 00:09:24,930 classroom based evaluator, this also can be understood because statistically 92 00:09:25,020 --> 00:09:31,440 somehow it is not easy to distinguish the fluctuation upward and the resonance okay. 93 00:09:31,740 --> 00:09:38,880 So this idea so here we can say in both cases somehow a being at the bottom 94 00:09:38,910 --> 00:09:46,350 received contributions from non signal regions, okay. So here we want one is it 95 00:09:46,350 --> 00:09:51,840 So here, we can say if somehow these two classes are invalid, evaluators can be 96 00:09:51,840 --> 00:09:57,900 properly synergized then we would expect only in the intersection in the second set 97 00:09:57,990 --> 00:10:03,510 of these red points will continue. built two tools have been at the bottom, okay, 98 00:10:03,540 --> 00:10:08,490 for that case some others and acuity is expected to be improved. Okay, this is the 99 00:10:08,490 --> 00:10:16,200 idea. And the first proposal is geometric mean of the isolation basic evaluator and 100 00:10:17,130 --> 00:10:21,840 classroom basic evaluator, what are these hell? Let's consider one example in the 101 00:10:21,840 --> 00:10:27,060 right panel, the black curve represents the normal response of the new pattern 102 00:10:27,060 --> 00:10:35,790 data that one okay on events. So, for some of the events have to the Austin based 103 00:10:36,000 --> 00:10:40,950 evaluator and these black points, some of them are from the generic second region 104 00:10:41,040 --> 00:10:46,650 and some of them are from the non second region with upward fluctuation. And 105 00:10:46,680 --> 00:10:51,720 However, for these two classes of contributions, they are scored differently 106 00:10:51,780 --> 00:10:56,340 that is better to be scored differently by the isolation based evaluators. Why 107 00:10:56,370 --> 00:11:02,190 because why is a sacred Undertale and another one is the Nobody. Okay, so here 108 00:11:02,220 --> 00:11:09,150 we will expand the low scoring of the AAS database, the evaluator will compensate 109 00:11:09,180 --> 00:11:17,520 for the high scoring of the cluster basic evaluator for these fake a signal points. 110 00:11:17,580 --> 00:11:24,030 Okay, so basically the rough idea. And here is a two dimensional toy toy example. 111 00:11:24,030 --> 00:11:28,740 And the bottom right panel shows some activities, significant performance of 112 00:11:28,740 --> 00:11:32,910 these three different types of evaluators. In this panel, we can see the black curve, 113 00:11:32,940 --> 00:11:38,970 it will represent the performance of supervised learning, okay, it is used as a 114 00:11:39,000 --> 00:11:45,060 reference and the magenta curve represent performance of isolation based evaluator, 115 00:11:45,180 --> 00:11:50,670 and the blue curve represents the significant, awesome based evaluate 116 00:11:50,880 --> 00:11:56,100 evaluator. As for the red curve, it represents the signature a signature based 117 00:11:56,310 --> 00:12:01,170 evaluator and we can say indeed, somehow in this simple contract Next, the 118 00:12:01,230 --> 00:12:07,650 performance evaluator can be improved significantly by this as soon as you base 119 00:12:07,650 --> 00:12:13,500 the evaluator okay. But, but they such a design is very intuitive Okay, this means 120 00:12:13,500 --> 00:12:18,330 that it is not a fully optimized. So, here the design of this classroom based 121 00:12:19,320 --> 00:12:26,730 evaluator and and this enhance the synergy base evaluator somehow breaks the 122 00:12:26,730 --> 00:12:32,220 independence of the testing points, why because when we evaluate their response, 123 00:12:32,250 --> 00:12:37,020 somehow we take into a correlation between the given testing point and its neighbors 124 00:12:37,080 --> 00:12:44,070 okay. So, this will make So, this will laters will lose nice properties in 125 00:12:44,070 --> 00:12:49,530 statistics of the original sample and then if we want to calculate the significance 126 00:12:49,560 --> 00:12:58,080 based on based on the data response to this evaluators name, it means we have to 127 00:12:58,080 --> 00:13:05,250 do something extra. Okay. And motivated by this, somehow we wanted to. So here we 128 00:13:05,340 --> 00:13:06,900 were, we purposed some 129 00:13:08,490 --> 00:13:14,640 aters. The idea is based on the nobody responds all the testing data sample, to 130 00:13:14,640 --> 00:13:20,250 the isolation basic evaluator and to the classroom based evaluator, we find some 131 00:13:20,250 --> 00:13:25,710 new set of signals and a new set of backgrounds. They are not real signal 132 00:13:25,710 --> 00:13:30,090 events and the real spec one events instead, for the signal set and the 133 00:13:30,090 --> 00:13:35,760 background to set the ratio between the real signal the real background events, 134 00:13:35,790 --> 00:13:42,240 the ratios that are different. Okay, and then we introduce a deep neural network of 135 00:13:42,240 --> 00:13:49,980 supervised learning and the two twin days to a twin, then the new signal set and the 136 00:13:50,040 --> 00:13:55,260 new background set, and then we'll use auto neuron as this is a new signature 137 00:13:55,260 --> 00:14:02,130 based evaluator and to compare the performance Band Aid hurts the isolation 138 00:14:02,220 --> 00:14:07,740 isolation base to one class based one signature based based one. Here we can see 139 00:14:07,740 --> 00:14:13,530 there are nine two dimensional Gaussian benchmarks, okay. So here the again red 140 00:14:13,530 --> 00:14:17,280 points represent Background The blue points represents signals as well as the 141 00:14:17,280 --> 00:14:24,420 rows, they represent the broadness of the distributions of the signal signal events 142 00:14:24,510 --> 00:14:32,190 as well as the columns and the different center positions for the signal events. 143 00:14:33,840 --> 00:14:40,680 Okay, here as our C curves and AUC values here, I will not go here I will not go 144 00:14:40,680 --> 00:14:46,620 into details, but if you compare these curves, then we will see the performance 145 00:14:46,620 --> 00:14:52,380 complementarity between the ICER isolation based evaluator and a classroom based 146 00:14:52,380 --> 00:14:59,280 evaluator, and this complementarity somehow would result in improvements of 147 00:14:59,280 --> 00:15:05,940 the AUC values for the signature based evaluators, especially for the, for this 148 00:15:05,970 --> 00:15:12,270 alternative signature base, the evaluator somehow it has the best AUC universally in 149 00:15:12,300 --> 00:15:19,530 all of these nine benchmarks. So, for example, for the ABC values supervised 150 00:15:19,530 --> 00:15:24,120 learning versus this evaluator, the first one is point nine, nine versus point nine, 151 00:15:24,120 --> 00:15:28,800 nine, second one is point nine versus point nine, nine. The third one is 1.0 152 00:15:28,830 --> 00:15:35,760 versus 1.0. Second, second row, point nine one versus point nine 1.93 versus point a 153 00:15:35,760 --> 00:15:43,050 9.99. Versus point nine, eight, last low point seven two versus point seven 1.84 154 00:15:43,050 --> 00:15:48,330 versus point a one and point nine, six versus point nine, three. So you can say 155 00:15:48,540 --> 00:15:53,970 the gap between the supervisor lending and this new evaluator somehow is very, is 156 00:15:53,970 --> 00:15:59,430 very unnatural. Okay, so here is a list of efforts in the last eight years since I 157 00:15:59,430 --> 00:16:04,080 work for Do you want all the time? Somehow I will stop here. Here is the 158 00:16:04,950 --> 00:16:05,970 summary. Okay. 159 00:16:09,149 --> 00:16:10,619 Okay, thank you very much, though. 160 00:16:12,990 --> 00:16:18,420 Other any questions or comments for Tao? 161 00:16:20,909 --> 00:16:26,339 Yeah, I have a question on the first, I think on the first few slides on the paper 162 00:16:26,339 --> 00:16:28,979 that you showed from 163 00:16:30,870 --> 00:16:31,890 what is it home 164 00:16:33,210 --> 00:16:41,130 adversary learned anomaly detection the rediscovering of the top core. So do you 165 00:16:41,160 --> 00:16:47,940 know how they did they have a pre selection applied and then and then run 166 00:16:47,940 --> 00:16:57,540 the anomaly detector or did they just run on reconstructed data or raw data or 167 00:17:00,000 --> 00:17:01,560 Wow, this is 168 00:17:03,929 --> 00:17:05,819 this is a challenging question to me. 169 00:17:07,950 --> 00:17:09,060 Yeah, and 170 00:17:10,620 --> 00:17:16,080 I think it is not not easy to read out this substitute information from, from 171 00:17:16,080 --> 00:17:17,640 these two papers. So 172 00:17:20,129 --> 00:17:24,239 yes, I'm sorry, I don't know the answer. Okay. 173 00:17:29,550 --> 00:17:36,870 And then I get when with a lot of these anomaly detectors, um, how do you mean you 174 00:17:36,870 --> 00:17:41,370 train on a certain data set, but our data sets are not static because you are, you 175 00:17:41,370 --> 00:17:45,960 know, the running conditions are constantly changing Well, not constantly, 176 00:17:45,960 --> 00:17:53,100 but they change within the run quite often. And so, hasn't been validated that 177 00:17:53,130 --> 00:18:00,540 these the each of these algorithms is able to detect that running for missions are 178 00:18:00,540 --> 00:18:06,360 changing and change the expectation of the distribution. 179 00:18:09,690 --> 00:18:10,260 Okay. 180 00:18:12,090 --> 00:18:19,260 This is good question. The answer is yes, indeed for different scenarios, different 181 00:18:20,310 --> 00:18:26,130 evaluators may have different performance and one example is the first type of 182 00:18:26,130 --> 00:18:30,900 signature based evaluator that I just mentioned. And for that case, let me go to 183 00:18:30,900 --> 00:18:40,200 the slide. Okay. So this evaluator work works pretty well, if the signal beam with 184 00:18:40,200 --> 00:18:47,340 generic a single beam is essentially Undertale. And is somehow this this 185 00:18:47,370 --> 00:18:54,570 signature based evaluator can works nicely. But if these two are switched, for 186 00:18:54,570 --> 00:18:59,850 that case, it was a generic signal B is a seat on a Bock And the fluctuation in 187 00:18:59,850 --> 00:19:03,510 front On the tail and then for that case and somehow the performance of this 188 00:19:03,960 --> 00:19:09,120 evaluator is not that good actually you can also say this point from the outside 189 00:19:09,120 --> 00:19:09,540 curves 190 00:19:13,860 --> 00:19:14,520 in 191 00:19:15,990 --> 00:19:25,860 Okay, in slide 19 and 20, but here, it is true that we are trying to find some, we 192 00:19:25,860 --> 00:19:32,280 are trying to build up the evaluator which can work universally well. And this is 193 00:19:32,280 --> 00:19:36,630 also one of the motivations for us to propose the second type of percentage 194 00:19:36,630 --> 00:19:37,860 based evaluators 195 00:19:39,210 --> 00:19:40,140 compare 196 00:19:41,760 --> 00:19:47,370 the LC curves in this slide, you will say, indeed, these new signature based 197 00:19:47,460 --> 00:19:52,590 evaluator works well, what's the universally better than the other ones? 198 00:19:57,210 --> 00:19:58,350 Okay, thanks. 199 00:19:59,100 --> 00:19:59,610 Oh, sorry. 200 00:20:00,539 --> 00:20:06,359 Yeah, yeah, I'm afraid we are running a bit late. So if this is a very quick 201 00:20:06,359 --> 00:20:06,929 question, 202 00:20:07,289 --> 00:20:09,269 no further discussion. 203 00:20:10,019 --> 00:20:17,099 Okay. Okay, so thanks again. Tau for for for the nice talk, and we move to the 204 00:20:17,099 --> 00:20:19,559 following one, which is by Oscar. 205 00:20:21,630 --> 00:20:24,360 And Hello. Hey, I cannot share why he says that. 206 00:20:25,440 --> 00:20:25,980 No, it's