1 00:00:01,170 --> 00:00:01,710 Yes. 2 00:00:02,790 --> 00:00:09,930 Okay. Yeah. So I mean, demand, and also presentation, kind of touching on 3 00:00:09,960 --> 00:00:16,560 different approaches across different LSE experiments. So also, due to time, it's 4 00:00:16,590 --> 00:00:22,830 going to be a fairly high level, but I'll try to touch on the main things. And so 5 00:00:23,100 --> 00:00:26,820 kind of why some kind of motivation why we're doing this at all, I think 6 00:00:26,820 --> 00:00:31,890 sometimes, you know, in our kind of day to day life, we kind of forget how special 7 00:00:31,890 --> 00:00:36,960 the Elysee is. But it's really special on the kind of data and the results that we 8 00:00:36,960 --> 00:00:41,130 are extracting from this machine are pretty unique. And also the analysis that 9 00:00:41,130 --> 00:00:46,560 we use to extract this results are also unique. And so both of these things, kind 10 00:00:46,560 --> 00:00:53,970 of merit preservation for posterity. And so you can kind of ask yourself, like, 11 00:00:53,970 --> 00:00:58,590 what is the scientific output that we have beyond the papers and you know, since 12 00:00:58,590 --> 00:01:03,390 we're a kind Have a unique machines, you know, we kind of should strive to make our 13 00:01:03,390 --> 00:01:09,930 results as useful as possible and, you know, also make our data available in 14 00:01:09,930 --> 00:01:16,800 formats that are as useful as possible. Okay, so in this kind of area of activity, 15 00:01:16,920 --> 00:01:22,380 that kind of three main directions of what's going on, there's the kind of data 16 00:01:22,380 --> 00:01:28,500 like preservation of data products that we kind of extract from our analysis and high 17 00:01:28,500 --> 00:01:34,080 level data products. And then there's kind of analysis preservation type stuff where 18 00:01:34,080 --> 00:01:39,720 you try to preserve the analysis workflow itself. And then there's the third branch, 19 00:01:39,720 --> 00:01:43,800 which is kind of more towards open data, and then, you know, you just open up the 20 00:01:43,800 --> 00:01:49,680 data for researchers and people outside of the collaborations. Okay, so I'll go 21 00:01:49,680 --> 00:01:54,810 through all these three things. So that data I think everybody is familiar with 22 00:01:54,810 --> 00:01:59,610 type data ends really been a crucial piece of cyber infrastructure for our field. 23 00:02:00,000 --> 00:02:05,820 It's basically the main destination where we kind of put a numeric, machine readable 24 00:02:05,820 --> 00:02:10,680 data that relates to our publications online for people to reuse. And it's 25 00:02:10,680 --> 00:02:15,090 really like a destination for high quality. But small data products, right. 26 00:02:15,090 --> 00:02:21,270 So traditionally, this has been, you know, our kind of started as just a digitization 27 00:02:21,270 --> 00:02:26,130 of the tables that are in the papers, but then it kind of evolved into a much wider 28 00:02:26,130 --> 00:02:30,990 set of data products. And so all the LSD experiments are participating in that. And 29 00:02:30,990 --> 00:02:37,170 so there are different levels of kind of percentages of the analysis that have a 30 00:02:37,200 --> 00:02:43,710 corresponding data product, have to have data record, but in principle, most of the 31 00:02:44,009 --> 00:02:46,619 experiments are kind of using that as a platform. 32 00:02:46,860 --> 00:02:51,960 So that's good. And so as I said, we kind of started out with just having tables, 33 00:02:51,960 --> 00:02:57,120 but now kind of experiments upload all types of information that relates to the 34 00:02:57,120 --> 00:03:02,340 analysis there. So it kind of goes from c++ An episode, you know, spectrum files 35 00:03:02,340 --> 00:03:07,710 for searches, likelihoods machine learning models and all that stuff. And so I think 36 00:03:08,340 --> 00:03:09,960 heptane has been proven 37 00:03:10,440 --> 00:03:11,910 to be a really crucial piece. 38 00:03:12,450 --> 00:03:19,290 And so just an example of what kind of things we're covering there. So. So one of 39 00:03:19,290 --> 00:03:23,190 the things that a lot of people can alter the color colorations like to do is to 40 00:03:23,190 --> 00:03:27,390 kind of re implement the analysis so that they have like a fast approximate version 41 00:03:27,420 --> 00:03:32,370 of the event selection procedure, for example. And so, for example, Atlas kind 42 00:03:32,370 --> 00:03:37,740 of puts up c++ snippets, and kind of pseudo code. There are some times we also 43 00:03:37,740 --> 00:03:43,110 have rivet analysis for the different analysis, and then they're kind of linked 44 00:03:43,110 --> 00:03:48,660 on that data. And But then one thing that I want to point out is so as Ben was 45 00:03:48,660 --> 00:03:54,210 saying, obviously, we all expect machine learning to be like a major component of 46 00:03:54,510 --> 00:03:58,560 our analysis tool chain going forward. And so one of the things that is kind of a 47 00:03:58,560 --> 00:04:03,750 little bit hard as okay If you use machine learning heavily, how do you actually put 48 00:04:03,750 --> 00:04:08,010 that out there. And so, normally, you know, if you write a replica routine, you 49 00:04:08,010 --> 00:04:12,210 can go through the cuts that you're doing. And so with machine learning, this becomes 50 00:04:12,210 --> 00:04:18,330 a little bit trickier, but at least an Atlas. We have one example where we put 51 00:04:18,330 --> 00:04:22,920 the entire machine learning model publicly available on hub data on the record that 52 00:04:22,920 --> 00:04:23,190 is 53 00:04:23,399 --> 00:04:25,409 associated to the analysis. And so you 54 00:04:25,410 --> 00:04:28,890 have all the weights. And then if you have a simulation that is good enough to 55 00:04:28,890 --> 00:04:33,510 reproduce the inputs to this machine learning model faithfully, you can use 56 00:04:33,510 --> 00:04:38,580 that public model to kind of evaluate your multivariate function. And so that's 57 00:04:38,790 --> 00:04:44,820 pretty good. So one of the things that I want to kind of point out in this talk, 58 00:04:45,660 --> 00:04:50,370 kind of an expose nature is the kind of the public like that. So there has been 59 00:04:50,610 --> 00:04:54,660 kind of a lot of attention to unlikelihood. So I just want to kind of 60 00:04:54,660 --> 00:04:59,700 motivate why that is. So as far as we know, basically, what we try to do in 61 00:04:59,760 --> 00:05:03,900 mobile But then also to kind of show that can we try to extract, you know, 62 00:05:03,900 --> 00:05:09,030 information about the theory that we kind of assume that produces our data from the 63 00:05:09,030 --> 00:05:14,280 data itself, right? And then if you kind of try to specify this inference problem, 64 00:05:14,430 --> 00:05:17,070 you know, you have to like, do it, and then you have maybe your private release 65 00:05:17,940 --> 00:05:21,300 over the theory. So one part is kind of the job of the theorist. And then the 66 00:05:21,300 --> 00:05:25,950 likelihood is really the part that kind of summarizes what the experiments do, right? 67 00:05:25,950 --> 00:05:26,910 So it's basically 68 00:05:27,240 --> 00:05:30,360 quantifying how likely the data is given a theory. And 69 00:05:30,599 --> 00:05:34,499 really, it's kind of like a focal point of the entire analysis chain where all the 70 00:05:34,499 --> 00:05:39,689 different decisions about performance optimizations or data acquisition 71 00:05:39,689 --> 00:05:44,069 operations or the you know, the analysis, choices that you do in your analysis kind 72 00:05:44,069 --> 00:05:49,769 of all get reflected in this, like there's a really high information density data 73 00:05:49,769 --> 00:05:54,479 product if we are able to preserve this and then all the standard inference 74 00:05:54,479 --> 00:05:55,769 results like limit or 75 00:05:56,010 --> 00:05:58,740 yield tables or you know, data MonteCarlo plots 76 00:05:58,950 --> 00:06:03,780 are basically a result Just likelihood. So it's kind of like a bottleneck. And that 77 00:06:03,780 --> 00:06:10,440 makes it really valuable to be preserved. And so oftentimes what people do outside 78 00:06:11,220 --> 00:06:15,900 theorists they go to have data, right? And then they take all their, all the tables, 79 00:06:15,900 --> 00:06:19,470 like the background distribution that are published on that data, and they try to 80 00:06:19,470 --> 00:06:23,550 construct a likelihood out of that information. And but that's obviously 81 00:06:23,550 --> 00:06:28,260 lost, because we're not like the fidelity of the information doesn't allow you to 82 00:06:28,800 --> 00:06:32,340 reconstruct the likelihood. And so there's this idea of, Okay, what would happen if 83 00:06:32,340 --> 00:06:36,630 we would just provide the likelihood directly on that data? Because internally 84 00:06:36,630 --> 00:06:41,190 as experiments we have that. And so this is not a new ideas, it has been kind of 85 00:06:41,190 --> 00:06:46,530 long in the making. So there's the first fight conference at CERN in the year 2000. 86 00:06:46,530 --> 00:06:51,300 There was kind of like a discussion around this idea and more or less everybody. 87 00:06:51,690 --> 00:06:55,320 Like, among this, this group of experts kind of agreed that it would be a good 88 00:06:55,320 --> 00:07:00,480 idea for the OSI experiments to publish to like the function and but then There are 89 00:07:00,480 --> 00:07:04,320 various technical limitations to that idea, and also social sociological 90 00:07:04,320 --> 00:07:08,580 limitations. And so the first kind of step in that direction was to kind of just find 91 00:07:08,580 --> 00:07:12,090 a serialization format. And so this is kind of the introduction of this route 92 00:07:12,090 --> 00:07:17,490 workspace. And then in 2012, Atlas kind of published the first kind of profile 93 00:07:17,490 --> 00:07:21,330 likelihood. But that's of course limited, because, okay, this is the likely 94 00:07:21,330 --> 00:07:25,620 assumption, after you profiled out all numerous parameters. So that's, that's 95 00:07:25,620 --> 00:07:28,950 useful, but doesn't allow you to do combinations and stuff like that. And 96 00:07:28,950 --> 00:07:33,600 then, a couple years later, CMS also published a simplified likelihoods, which 97 00:07:33,630 --> 00:07:37,740 kind of have the same problem. So it's very useful and kind of a simple 98 00:07:38,219 --> 00:07:40,859 form of the likelihood, but it doesn't allow you to 99 00:07:40,860 --> 00:07:45,360 do like actual combinations where you can vary the numerous parameters across 100 00:07:45,360 --> 00:07:52,080 multiple analyses, you know, systematically So one thing that now 101 00:07:52,080 --> 00:07:58,440 happened maybe half a year, or a year ago, is that we have the full, first full 102 00:07:58,440 --> 00:08:05,160 likelihood release often Experiment. So this is Atlas kind of effort where we kind 103 00:08:05,160 --> 00:08:11,760 of took the idea that we take a bent likelihood and then an Atlas, we use this 104 00:08:11,760 --> 00:08:16,350 format called this factory, and a lot of analysis kind of use this format. And it's 105 00:08:16,950 --> 00:08:22,110 nice because you can find a pretty nice schema for it. And so you can define a 106 00:08:22,110 --> 00:08:26,010 data product around it. And so we have like a publicly available likelihood on 107 00:08:26,010 --> 00:08:30,900 health data. And that allows external people to reproduce like the key part of 108 00:08:30,900 --> 00:08:34,590 the analysis, which is this exclusion contour, which delineates, 109 00:08:34,920 --> 00:08:37,470 you know, theories that are excluded or not excluded 110 00:08:38,070 --> 00:08:40,410 to the exact same fidelity as 111 00:08:40,650 --> 00:08:47,040 inside of the experiments, which I think is a pretty good stuff. And so this has 112 00:08:47,040 --> 00:08:53,130 been a bit of a milestone and kind of open data products for the LSE. Okay, so aside 113 00:08:53,130 --> 00:08:57,750 from open data products, there's also that kind of this use case of internal reuse. 114 00:08:57,750 --> 00:09:04,170 So this is kind of this idea To preserve the analysis for internal, you know, 115 00:09:04,710 --> 00:09:09,270 projects. And so their efforts were all of the OSI experiments to foster analysis 116 00:09:09,270 --> 00:09:12,810 preservation, and then there are different ingredients that you need to do for 117 00:09:12,810 --> 00:09:16,260 analysis preservation. So first of all, you know, the analysis is basically 118 00:09:16,260 --> 00:09:18,930 implemented in software. So there are different software packages that need to 119 00:09:18,930 --> 00:09:23,790 be preserved. And then, but only having a software is not enough, you also need to 120 00:09:23,790 --> 00:09:26,970 know what to do with the software. So you need to kind of capture, like what 121 00:09:26,970 --> 00:09:31,710 commands you need to run. And since your analysis is likely a multi step procedure, 122 00:09:31,710 --> 00:09:36,420 you also need to capture the workflow. So that you know first you need to do the 123 00:09:36,420 --> 00:09:39,810 next election, then you need to do the statistical analysis and stuff like that. 124 00:09:40,050 --> 00:09:43,470 And then finally, you also need to preserve like data assets, like your 125 00:09:43,470 --> 00:09:43,980 background, 126 00:09:45,660 --> 00:09:46,980 in tuplets, and all that stuff. 127 00:09:47,280 --> 00:09:52,170 And so Sara kind of provides some infrastructure to assist the experiments 128 00:09:52,170 --> 00:09:57,750 in this effort. So there's this Viana project that basically provides like a 129 00:09:57,750 --> 00:10:03,480 workflow as a service. So you can describe Your analysis like is a sequence of steps. 130 00:10:04,020 --> 00:10:07,080 And then you can kind of run it on this platform. 131 00:10:07,440 --> 00:10:10,320 And then there's also the cap, which stands for 132 00:10:10,320 --> 00:10:14,850 certain analysis preservation portal, which basically allows you once you can 133 00:10:14,850 --> 00:10:19,830 define, or once you define your analysis as like the sequence of steps as this 134 00:10:19,830 --> 00:10:23,910 workflow, you can kind of take this description and then put it into the 135 00:10:23,910 --> 00:10:28,500 center loss preservation framework. So that later on if you want to reuse your 136 00:10:28,500 --> 00:10:29,370 analysis, you can 137 00:10:29,550 --> 00:10:31,050 kind of pull it up and then 138 00:10:31,080 --> 00:10:36,660 extract the workflow definition out of it, and then rerun the analysis. Okay, so how 139 00:10:36,660 --> 00:10:41,370 do we do this? So the capturing of the software is basically done through 140 00:10:41,370 --> 00:10:46,080 containers. So probably by now, a lot of people are familiar with this. So this is 141 00:10:46,080 --> 00:10:49,680 a new technology that kind of grew out of industry to kind of package software in a 142 00:10:49,680 --> 00:10:53,910 portable way. And so the sometimes kind of referred to as Docker containers, 143 00:10:54,180 --> 00:10:56,250 and so this has been kind of 144 00:10:56,430 --> 00:11:01,320 revolutionising how we can package the software. Kind of Part of away and has 145 00:11:01,320 --> 00:11:04,980 been picked up by all of the experiments and cannabis universally seen as the 146 00:11:05,010 --> 00:11:08,880 solution to this problem. And so an atlas were kind of providing, 147 00:11:09,660 --> 00:11:11,940 you know, official base images, and CMS and 148 00:11:13,230 --> 00:11:19,380 Alison LCB are doing something similar. And so that is largely solved even though 149 00:11:19,830 --> 00:11:23,100 initially, if you think about it, you know, it's kind of sounds like the most 150 00:11:23,100 --> 00:11:26,610 complicated thing to be doing. Like, you know, you need to not only preserve your 151 00:11:26,610 --> 00:11:29,850 analysis software, but all the dependencies, all the root version, 152 00:11:30,180 --> 00:11:33,690 Compiler version, all that stuff. And it sounds daunting, but through this 153 00:11:33,690 --> 00:11:37,470 technology, it's actually almost the easiest part and this analysis reservation 154 00:11:37,860 --> 00:11:44,640 problem. And then for the workflow, basically, as a sense of Rianna. It's kind 155 00:11:44,640 --> 00:11:48,840 of this platform that Sam provides to run this workflow, and it kind of uses this 156 00:11:48,840 --> 00:11:52,320 concept of workflow languages. So if you're familiar with continuous 157 00:11:52,320 --> 00:11:56,940 integration or things like that, basically allows you to define a pipeline in a 158 00:11:56,940 --> 00:12:02,220 declarative way. And so this is actually quite heavily used, also like in fields 159 00:12:02,220 --> 00:12:06,450 outside of financial physics, like bioinformatics. And so it's kind of now 160 00:12:06,810 --> 00:12:11,280 starting to creep in into a high energy physics and kind of allows you to go 161 00:12:11,280 --> 00:12:16,920 beyond the pure software presentation, but actually preserve the full analysis 162 00:12:16,920 --> 00:12:21,420 workflow so that you can actually execute the analysis and don't need to remember 163 00:12:21,420 --> 00:12:27,150 what the steps are. Okay, so one of the major use cases for this type of internal 164 00:12:27,180 --> 00:12:33,330 presentation reuses, the use case of reinterpretation and so this something 165 00:12:33,330 --> 00:12:37,500 goes under the name of recast. So the idea is here that you have your analysis, which 166 00:12:37,500 --> 00:12:42,570 might be a search. And so, you know, explore some corner face base that is 167 00:12:42,570 --> 00:12:48,420 interested for interesting for some class of the onset of model, physics, and then, 168 00:12:48,900 --> 00:12:53,940 okay, you probably are not finding new physics, but you might be able to set 169 00:12:53,940 --> 00:12:58,200 limits on this class of models, but the space plus region that you studied, might 170 00:12:58,200 --> 00:13:03,570 actually also be sensitive to a different class of models. And so the idea here is 171 00:13:03,570 --> 00:13:07,950 that since you already kind of studied the space space, and you have your analysis 172 00:13:08,310 --> 00:13:13,050 workflow as a tool, basically to look into the space space, you preserve this 173 00:13:13,050 --> 00:13:19,200 analysis. And then once a new class of models comes around, it seems to be, you 174 00:13:19,200 --> 00:13:22,920 know, forward, which this face base region seems to be sensitive, you can kind of 175 00:13:22,920 --> 00:13:25,110 reuse that and then extract limits. 176 00:13:25,410 --> 00:13:27,060 And so Atlas has been 177 00:13:27,120 --> 00:13:33,960 pushing this and an Atlas. All the major search groups that do such a probe beyond 178 00:13:33,960 --> 00:13:38,850 animal physics now require the analysis groups to preserve the analysis in this 179 00:13:38,970 --> 00:13:43,800 kind of reusable way. So that, you know, maybe a year down the line, if there's a 180 00:13:43,800 --> 00:13:49,350 new model, we can actually redo the analysis and then extract new limits. And 181 00:13:49,350 --> 00:13:54,180 so we've kind of seen this happen actually, so that where we have kind of 182 00:13:54,180 --> 00:13:58,020 new scientific results based on this pretty technical requirement, to be 183 00:13:58,020 --> 00:14:01,890 honest, right, so you require people to document Images, all that stuff. But after 184 00:14:01,890 --> 00:14:05,010 you go through this exercise, you actually extract new science out of it 185 00:14:05,700 --> 00:14:08,310 at a cost that is much less than the cost that 186 00:14:08,550 --> 00:14:12,570 is required to set up a new analysis that is dedicated for this new class of models. 187 00:14:12,780 --> 00:14:18,270 And so I just like to put in three different papers or publications from 188 00:14:18,300 --> 00:14:23,790 Atlas that kind of uses technology to extract new limits and based in kind of 189 00:14:23,790 --> 00:14:27,720 regions of theory space that are previously uncovered, which is nice to 190 00:14:27,720 --> 00:14:31,710 see. Okay, so far, so now's the presentation. So this is kind of this 191 00:14:31,710 --> 00:14:36,960 portal where we can save information related to the analysis. So this is also 192 00:14:37,230 --> 00:14:42,570 more meant for internal usage. So all the experiments are kind of working on kinda 193 00:14:42,570 --> 00:14:46,380 integrating their internal databases, but there's so there are two screenshots for 194 00:14:46,680 --> 00:14:53,220 lacp and CMS and so here the focus from so this is developed by the library section 195 00:14:53,280 --> 00:14:57,390 at CERN, the same people that do like inspire and video and all these things. 196 00:14:57,750 --> 00:15:02,340 And so the focus here is to kind of Make it easy for analysis teams to submit their 197 00:15:02,340 --> 00:15:08,490 information, what their analysis entails and then also make it easy to discover 198 00:15:08,490 --> 00:15:13,770 analyses that have specific feature. So like, ideally, it would be kind of working 199 00:15:13,770 --> 00:15:18,120 in a way that you're looking for analysis that use like a specific trigger or like a 200 00:15:18,120 --> 00:15:23,700 specific collection of objects, you'll be able to query this database and find 201 00:15:23,700 --> 00:15:32,400 analysis that match these criteria. Okay, so the third kind of column of this topic 202 00:15:32,400 --> 00:15:37,350 is the Open Data stuff. And so all the experiments have open data programs. And 203 00:15:38,370 --> 00:15:41,430 here again, cern is providing infrastructure with the CERN open data 204 00:15:41,430 --> 00:15:47,430 portal. And so for Atlas OCP, and Allison's mostly focus on outreach. And so 205 00:15:47,430 --> 00:15:52,590 I've kind of put in some examples of the different, you know, plots that you can 206 00:15:52,590 --> 00:15:58,170 make on this open data from these experiments. But CMS has a much more 207 00:15:58,170 --> 00:16:02,490 expansive Open Data protocol They, where it's not only for outreach and educational 208 00:16:02,490 --> 00:16:06,750 resources for research. And so what is kind of nice to see is that we actually 209 00:16:06,750 --> 00:16:11,550 see an external ecosystem slowly developing around this type of open data. 210 00:16:11,550 --> 00:16:15,690 And there's a workshop in October for external people to kind of learn more 211 00:16:15,690 --> 00:16:19,800 about this data that I put it down the link on the slide. And so we've seen a 212 00:16:19,800 --> 00:16:27,480 number of papers appear on the based on this certain open data. And so no, one 213 00:16:27,480 --> 00:16:31,290 major theme there is the use of development of machine learning methods 214 00:16:31,500 --> 00:16:36,510 based on this CMS open data. And so I've put down some examples here. And then 215 00:16:36,690 --> 00:16:42,360 there's also kind of, well, that like a schedule that is already defined well in 216 00:16:42,360 --> 00:16:47,850 advance to see where these open data release are going to be at at different 217 00:16:47,850 --> 00:16:53,250 points in time. And so this kind of makes it makes it predictable and kind of allows 218 00:16:53,250 --> 00:16:58,890 us ecosystem to develop outside of the experiment. Okay, so this brings me to my 219 00:16:58,890 --> 00:17:02,550 conclusion. So I think The OSHA requirements are pretty strong analysis 220 00:17:02,550 --> 00:17:06,930 data preservation programs. And so some of the technological progress actually helps 221 00:17:06,930 --> 00:17:11,280 drive to finish off the presentation, for example, the use of containers and stuff 222 00:17:11,280 --> 00:17:16,530 like that. And one important component in this entire endeavor is the availability 223 00:17:16,530 --> 00:17:21,630 of cyber infrastructure for the different components. So here we have a data CERN 224 00:17:21,630 --> 00:17:25,590 analysis, reservations and open data, riana recast and all these things that 225 00:17:25,590 --> 00:17:31,950 kind of allowed the community to adopt these practices in a kind of systematic 226 00:17:31,950 --> 00:17:34,890 way. And yeah, that's basically my 227 00:17:36,210 --> 00:17:37,170 conclusion. Thanks. 228 00:17:40,410 --> 00:17:57,870 Thank you very much. Just so we have time for a couple of questions. Everyone, can 229 00:17:57,870 --> 00:18:02,160 you hear me? Yes, yes, please. So just 230 00:18:03,809 --> 00:18:05,399 presentation and work of our own 231 00:18:07,710 --> 00:18:13,020 wondering on the cast, and interpretation 232 00:18:15,240 --> 00:18:23,160 what are the policy for for citing the underlying source of data and inflammation 233 00:18:23,160 --> 00:18:28,680 and whatnot, if it's not already within collaborations, I see that here you have 234 00:18:28,950 --> 00:18:37,200 two papers that are a justification, but then one that would be requesting is 235 00:18:37,200 --> 00:18:43,230 there. So how do you do this? Give credits for the data and the experiments. 236 00:18:44,490 --> 00:18:49,200 So in this example, kind of it's all done by the collaboration. So these are 237 00:18:49,200 --> 00:18:53,670 reinterpretations that are performed by the collaboration right and then 238 00:18:53,940 --> 00:18:57,900 internally if you read these papers, they will reference like what the input 239 00:18:57,900 --> 00:19:02,160 analysis are that they use for the reading to petition, so the idea is here that you 240 00:19:02,160 --> 00:19:07,470 maybe have like a. So as this gets more streamlined, you know, you might have like 241 00:19:07,470 --> 00:19:11,670 a first like analysis with a benchmark model that just kind of explores the space 242 00:19:11,670 --> 00:19:15,420 space. And then you'll have like a sequence of follow up publications with 243 00:19:15,420 --> 00:19:21,540 references, original publication, but then kind of explore a bit more specific models 244 00:19:22,050 --> 00:19:26,430 that were just face based and sensitive. If you're talking about external 245 00:19:26,430 --> 00:19:30,990 reinterpretation, where it's basically okay we have an analysis then we put out 246 00:19:30,990 --> 00:19:35,370 some information on health data that allows external people to reinterpret, 247 00:19:35,940 --> 00:19:40,800 then like the assumption is that those excellent a range of potential site to 248 00:19:40,800 --> 00:19:45,720 have data record so if you, for example, use the political or the rebut analysis 249 00:19:46,290 --> 00:19:51,000 that was kind of released as part of the publication procedure, you'll need to cite 250 00:19:51,660 --> 00:19:55,710 these things and the tools that you've used, but I'm done. That's enough. 251 00:20:03,839 --> 00:20:09,989 Thanks a lot. I have actually a very quick question more on the data presentation 252 00:20:09,989 --> 00:20:17,609 side, that side. And so I think that in the last period, we have seen more and 253 00:20:17,609 --> 00:20:26,279 more groups and collaboration using derive data set that are not root based. So in 254 00:20:26,279 --> 00:20:31,649 some sense, this could, at least in my experience, it could create some long term 255 00:20:31,979 --> 00:20:37,439 preservation problem, of course, those would be derived data. So the argument is, 256 00:20:38,339 --> 00:20:43,169 one can always go back to the original data instead of reconstructing data. But 257 00:20:43,439 --> 00:20:49,919 how do you see this evolving in the long term at CERN. In particular, I'm referring 258 00:20:49,919 --> 00:20:56,129 also to this effort of, of having our data frame to allow better storage formats for 259 00:20:56,129 --> 00:20:59,099 people that use Python data and software. For example, 260 00:21:00,000 --> 00:21:06,330 So I mean, the crucial thing is whatever format you're kind of doing long term data 261 00:21:06,330 --> 00:21:12,510 preservation has like a specification and hopefully, more than one implementation of 262 00:21:12,510 --> 00:21:17,370 reading the data. And so the agreement with all the experiments are so good. I 263 00:21:17,370 --> 00:21:21,330 mean, you need to somehow balance. So I don't think there's going to be like a 264 00:21:21,330 --> 00:21:25,260 specific open data format, and all the other experiments will agree on, but they 265 00:21:25,260 --> 00:21:29,730 will know because you need to, in order to make this work at all, you need to make it 266 00:21:29,730 --> 00:21:33,750 easy for these collaborations to release the data. And so it will likely be in 267 00:21:33,750 --> 00:21:39,600 formats that experiments defined, but then experiments also need to release the 268 00:21:39,600 --> 00:21:45,990 software to read the data, right? So an atlas case, for example, we are not using 269 00:21:45,990 --> 00:21:50,970 flat tea trees, but we have like an event data model on top of that, but that's also 270 00:21:50,970 --> 00:21:57,510 it's also public. And so it's the same for CMS where CMS is W is also open source. 271 00:21:57,870 --> 00:22:01,830 And so as long as the software is available, To read the data, I think 272 00:22:01,980 --> 00:22:06,360 you're covered whether or not separately the experiments kind of explore 273 00:22:06,390 --> 00:22:11,820 alternative data formats, like our uncouple, or, you know, HDFS or something 274 00:22:11,820 --> 00:22:15,510 like that, that I think is a separate discussion, but in the end open data 275 00:22:15,510 --> 00:22:20,640 format will likely be what the experiments use internally. You know, whether that's 276 00:22:20,670 --> 00:22:23,220 nano ad or fortnight or something like that. 277 00:22:26,700 --> 00:22:27,810 Okay, thanks. 278 00:22:29,460 --> 00:22:38,700 Is there any other questions for Lucas? If not, I think we can move to the last 279 00:22:39,360 --> 00:22:45,600 presentation of the section given by Stetson on an island scriptural languages 280 00:22:45,600 --> 00:22:46,590 for the Elysee