{"language": "en", "segments": [{"text": " Thank you for coming to my talk, sharing knowledge with truly", "start": 1.260000000000005, "stop": 4.68, "id": 0}, {"text": " open and decentralized AI.", "start": 4.68, "stop": 6.28, "id": 1}, {"text": " My name is Zacchaeus, or just Zacchaeus, and I will be presenting.", "start": 7.12, "stop": 10.42, "id": 2}, {"text": " As an overview of the talk, I will start with an introduction to machine", "start": 12.460000000000003, "stop": 15.74, "id": 3}, {"text": " learning, followed by a description of how it has been used.", "start": 15.74, "stop": 18.72, "id": 4}, {"text": " Then I will describe the first attempts at", "start": 19.14, "stop": 22.16, "id": 5}, {"text": " decentralizing machine learning frameworks.", "start": 22.16, "stop": 24.2, "id": 6}, {"text": " Finally, I will point out some of the major design considerations when developing", "start": 24.779999999999998, "stop": 29.28, "id": 7}, {"text": " decentralized machine learning frameworks.", "start": 29.8, "stop": 31.54, "id": 8}, {"text": " The last line, avoid open AI, embrace open AI, is the", "start": 32.38, "stop": 37.86, "id": 9}, {"text": " unofficial subtitle of this talk.", "start": 37.86, "stop": 39.5, "id": 10}, {"text": " It is unofficial and varied at the bottom of my overview page because open AI is", "start": 39.98, "stop": 44.42, "id": 11}, {"text": " a major sponsor of this event.", "start": 44.42, "stop": 46.66, "id": 12}, {"text": " By the end of this talk, I want to convince you that we don't need closed source", "start": 47.94, "stop": 52.04, "id": 13}, {"text": " and centralized AI gatekeepers like open AI, and that a decentralized machine", "start": 52.04, "stop": 56.54, "id": 14}, {"text": " learning framework is not only ethically but technologically superior.", "start": 56.54, "stop": 60.12, "id": 15}, {"text": " From a high level, machine learning can be broken down into three components.", "start": 61.88, "stop": 66.4, "id": 16}, {"text": " The first component is a neural network.", "start": 66.8, "stop": 68.68, "id": 17}, {"text": " A neural network is a function f which takes parameters", "start": 69.22, "stop": 74.28, "id": 18}, {"text": " and gives an output.", "start": 74.28, "stop": 79.34, "id": 19}, {"text": " For illustrative purposes, suppose f is a quadratic function and the parameters", "start": 83.46000000000001, "stop": 88.26, "id": 20}, {"text": " are the coefficients a, b, and c.", "start": 88.26, "stop": 90.82, "id": 21}, {"text": " The data is a resource which contains knowledge.", "start": 91.36, "stop": 94.16, "id": 22}, {"text": " Often it is censored data from the neural network, but can also be the personal", "start": 94.64, "stop": 99.94, "id": 23}, {"text": " data like your message history.", "start": 99.94, "stop": 101.56, "id": 24}, {"text": " The last component is training, which is an algorithm that takes a neural", "start": 102.22, "stop": 105.84, "id": 25}, {"text": " network's data and extracts knowledge from the data into", "start": 105.84, "stop": 109.54, "id": 26}, {"text": " parameters for the neural network.", "start": 109.54, "stop": 111.44, "id": 27}, {"text": " An art toy example, it finds the a, b, and c which gives", "start": 112.04, "stop": 118.9, "id": 28}, {"text": " you the correct y for any input x.", "start": 118.9, "stop": 121.92, "id": 29}, {"text": " The current state of things.", "start": 123.66, "stop": 124.76, "id": 30}, {"text": " Machine learning requires a neural network, data, and training algorithm.", "start": 125.58, "stop": 129.66, "id": 31}, {"text": " More data means more knowledge and organizations rarely like to share algorithms.", "start": 130.3, "stop": 134.86, "id": 32}, {"text": " Organizations have begun to hoard as much data as possible", "start": 135.5, "stop": 139.8, "id": 33}, {"text": " to collect the most knowledge.", "start": 139.8, "stop": 141.34, "id": 34}, {"text": " Sometimes organizations do share algorithms and sometimes they do share their", "start": 141.92, "stop": 146.72, "id": 35}, {"text": " neural networks, but they will never share their data.", "start": 146.72, "stop": 149.18, "id": 36}, {"text": " That should tell you how valuable your data is.", "start": 150.08, "stop": 152.54000000000002, "id": 37}, {"text": " This includes private personal data.", "start": 155.1, "stop": 158.24, "id": 38}, {"text": " There are many motivations to decentralized machine learning.", "start": 159.34, "stop": 162.3, "id": 39}, {"text": " For one, with so many cameras and other sensors in existence, it's not possible", "start": 162.3, "stop": 166.92, "id": 40}, {"text": " to centralize all the data because of data network constraints.", "start": 166.92, "stop": 171.34, "id": 41}, {"text": " More obviously, there are issues with privacy.", "start": 172.06, "stop": 176.04, "id": 42}, {"text": " Maybe you could steal my password or impersonate me and steal my identity.", "start": 177.52, "stop": 183.12, "id": 43}, {"text": " If you share enough data, such things inevitably become possible.", "start": 183.9, "stop": 187.6, "id": 44}, {"text": " Some might say we shouldn't do away with machine learning and keep everything", "start": 188.26000000000002, "stop": 192.98, "id": 45}, {"text": " secret, but machine learning has a great potential for good.", "start": 192.98, "stop": 196.32, "id": 46}, {"text": " Suppose I find out that I have been growing cancer for the last three months.", "start": 197.02, "stop": 200.68, "id": 47}, {"text": " Now my data, including the medical diagnosis, contains knowledge that might help", "start": 201.08, "stop": 205.4, "id": 48}, {"text": " others detect cancer before it is too late.", "start": 205.4, "stop": 207.6, "id": 49}, {"text": " Additionally, as I found out at CVPR this year, the carbon footprint of", "start": 208.64, "stop": 214.02, "id": 50}, {"text": " decentralized frameworks can actually be lower than centralized systems because", "start": 214.02, "stop": 218.54, "id": 51}, {"text": " more powerful GPUs are less power efficient.", "start": 218.54, "stop": 222.0, "id": 52}, {"text": " A lot of their power ends up going into active cooling.", "start": 223.34, "stop": 228.28, "id": 53}, {"text": " This is a plot that was provided by Nicholas Lane from the University of", "start": 229.2, "stop": 233.6, "id": 54}, {"text": " Cambridge and Samsung at AI.", "start": 233.6, "stop": 235.22, "id": 55}, {"text": " He presented this at CVPR 2023, less than a week ago,", "start": 235.96, "stop": 240.28, "id": 56}, {"text": " in the Federated Learning Workshop.", "start": 240.28, "stop": 242.08, "id": 57}, {"text": " Can you say the name one more time?", "start": 242.34, "stop": 243.42, "id": 58}, {"text": " The name of the person?", "start": 243.42, "stop": 245.0, "id": 59}, {"text": " Nicholas Lane.", "start": 245.9, "stop": 246.96, "id": 60}, {"text": " So, Federated Learning is a machine learning framework which allows you to train", "start": 251.92000000000002, "stop": 256.16, "id": 61}, {"text": " neural networks without centralizing data.", "start": 256.16, "stop": 258.42, "id": 62}, {"text": " It actually achieves the same performance as", "start": 258.94, "stop": 261.72, "id": 63}, {"text": " centralized machine learning methods.", "start": 261.72, "stop": 263.48, "id": 64}, {"text": " It was first designed to contend with laws that prohibit the", "start": 263.88, "stop": 267.36, "id": 65}, {"text": " unauthorized sharing of medical data.", "start": 267.36, "stop": 269.36, "id": 66}, {"text": " Hospitals trained neural networks locally and communicated those", "start": 270.08000000000004, "stop": 274.12, "id": 67}, {"text": " parameters to a central server.", "start": 274.12, "stop": 275.96, "id": 68}, {"text": " The central server then combined and redistributed the resulting parameters.", "start": 276.48, "stop": 280.4, "id": 69}, {"text": " This process is repeated until the combined parameters stop improving.", "start": 280.86, "stop": 285.9, "id": 70}, {"text": " Here we have a diagram.", "start": 286.62, "stop": 287.58, "id": 71}, {"text": " Notice that each line here is a hospital and that's the training happening.", "start": 289.0, "stop": 294.62, "id": 72}, {"text": " So you're converting the data into parameters.", "start": 294.84, "stop": 296.82, "id": 73}, {"text": " The data never leaves the hospital and it is eventually", "start": 296.82, "stop": 301.42, "id": 74}, {"text": " aggregated in a central server.", "start": 301.42, "stop": 304.62, "id": 75}, {"text": " The downside here is that it does still require a central server.", "start": 306.64, "stop": 312.68, "id": 76}, {"text": " The basic idea of decentralized federated learning is that instead of exchanging", "start": 314.44, "stop": 319.12, "id": 77}, {"text": " parameters with a central server, you treat each participant as a node in a graph", "start": 319.12, "stop": 323.9, "id": 78}, {"text": " and share with your graph neighbors.", "start": 323.9, "stop": 325.52, "id": 79}, {"text": " There are many ways to do this.", "start": 325.52, "stop": 327.62, "id": 80}, {"text": " Some suggest blockchain.", "start": 328.04, "stop": 328.92, "id": 81}, {"text": " I think blockchain is overkill.", "start": 329.36, "stop": 330.4, "id": 82}, {"text": " Maybe Holochain is the answer.", "start": 331.02, "stop": 332.12, "id": 83}, {"text": " Here I have an example graph.", "start": 333.8, "stop": 335.12, "id": 84}, {"text": " In this case, instead of hospitals, I want you to think of each node as a person.", "start": 335.94, "stop": 339.92, "id": 85}, {"text": " Each node shares knowledge with adjacent nodes and has a certain degree of trust", "start": 340.68, "stop": 345.04, "id": 86}, {"text": " to the knowledge it obtains from its neighbors.", "start": 346.76000000000005, "stop": 349.38, "id": 87}, {"text": " I think I train a neural network locally on my compute using my", "start": 350.32, "stop": 355.24, "id": 88}, {"text": " data collected by my sensors.", "start": 355.24, "stop": 357.0, "id": 89}, {"text": " This could be my phone's GPS, my fitness wearable, accelerometer,", "start": 357.6, "stop": 361.68, "id": 90}, {"text": " camera, security system, etc.", "start": 361.96, "stop": 363.84, "id": 91}, {"text": " And then I share that knowledge with others in the network.", "start": 364.58, "stop": 367.44, "id": 92}, {"text": " Unfortunately, the same math that allows you to turn data into parameters can", "start": 368.88, "stop": 373.24, "id": 93}, {"text": " turn parameters into data, your private data.", "start": 373.24, "stop": 375.96, "id": 94}, {"text": " This can be mitigated in several ways, including", "start": 376.58000000000004, "stop": 379.34, "id": 95}, {"text": " just sharing parameters less frequently.", "start": 379.34, "stop": 381.24, "id": 96}, {"text": " This is an ongoing area of research, but in general,", "start": 381.78, "stop": 384.36, "id": 97}, {"text": " this problem can be dealt with.", "start": 384.42, "stop": 386.0, "id": 98}, {"text": " It's a losing game.", "start": 386.96, "stop": 388.52, "id": 99}, {"text": " Yes, there's a lot of research that's been done into this.", "start": 390.74, "stop": 394.64, "id": 100}, {"text": " And in general, it is a losing game for the attackers.", "start": 394.96, "stop": 397.54, "id": 101}, {"text": " The ways that you can extract it that have been done, it's very contrived,", "start": 398.46, "stop": 402.48, "id": 102}, {"text": " very... not realistic scenarios.", "start": 402.84, "stop": 407.5, "id": 103}, {"text": " But talk to me after this, if you want to discuss this more, that the next one is", "start": 408.7, "stop": 412.36, "id": 104}, {"text": " a bit more of a problem. The other major consideration is malicious actors. For", "start": 412.36, "stop": 418.02, "id": 105}, {"text": " instance, I could take a bunch of photos of myself, label them as not a burglar,", "start": 418.02, "stop": 422.6, "id": 106}, {"text": " contribute to a federated learning network, and suddenly people's home security", "start": 423.58, "stop": 427.26, "id": 107}, {"text": " system labels me as not a burglar because I lied to the network and told it that", "start": 427.26, "stop": 432.22, "id": 108}, {"text": " things that look like me are not burglars. Basically, we need to develop", "start": 432.22, "stop": 436.4, "id": 109}, {"text": " mechanisms for detecting and ignoring liars. This is also a big area of research.", "start": 436.4, "stop": 443.22, "id": 110}, {"text": " I think it's certainly this is the major hard point of decentralized federated", "start": 444.44, "stop": 450.8, "id": 111}, {"text": " learning, but I do think that it's not an obstacle that cannot be overcome. I", "start": 450.8, "stop": 455.64, "id": 112}, {"text": " think we can overcome this obstacle. So in conclusion, internet networks cannot", "start": 455.64, "stop": 460.88, "id": 113}, {"text": " handle the data throughput to centralize all data. That means more data is", "start": 460.88, "stop": 466.42, "id": 114}, {"text": " available for federated learning frameworks, and so better machine learning", "start": 466.42, "stop": 470.48, "id": 115}, {"text": " models are possible. Federated learning is superior because it uses that data.", "start": 470.48, "stop": 476.58, "id": 116}, {"text": " Organizations want to take your resources, in this case data, and sell you the", "start": 477.4, "stop": 481.84, "id": 117}, {"text": " value, in this case the knowledge stored therein. Free neural networks like LAMA", "start": 481.84, "stop": 488.16, "id": 118}, {"text": " have largely closed the gaps on large language models like chat GPT-3.", "start": 488.16, "stop": 493.74, "id": 119}, {"text": " GPT-4, I know, is a leg up, but I do trust that we", "start": 494.74, "stop": 501.62, "id": 120}, {"text": " are making strides and that we will eventually outperform them simply because", "start": 501.62, "stop": 506.22, "id": 121}, {"text": " there's more data. We don't need centralized AI gatekeepers to control AI and", "start": 506.22, "stop": 512.24, "id": 122}, {"text": " protect us from ourselves. I'll end my talk with an updated subtitle,", "start": 512.24, "stop": 518.62, "id": 123}, {"text": " and here I mean free as in freedom. Avoid open AI, embrace free AI.", "start": 519.22, "stop": 525.32, "id": 124}, {"text": " Thank you. Any questions? We've got one minute for questions.", "start": 527.42, "stop": 533.78, "id": 125}, {"text": " I have a question. So the kind of framing of the threat model here is", "start": 535.2, "stop": 541.86, "id": 126}, {"text": " companies want to take your data so that they can train models off of it and sell", "start": 541.86, "stop": 548.28, "id": 127}, {"text": " it back to you. How does federated learning change that? Now they don't need to", "start": 548.28, "stop": 553.5, "id": 128}, {"text": " take the data, but they still are training a model without me that they can sell", "start": 553.5, "stop": 557.34, "id": 129}, {"text": " back to me. But they're not training it without you. So every node has to have", "start": 557.34, "stop": 561.94, "id": 130}, {"text": " access to the model weights in order for you to converge on one set of model", "start": 561.94, "stop": 567.26, "id": 131}, {"text": " weights. So the community owns the model together. So you're not being sold", "start": 567.26, "stop": 572.24, "id": 132}, {"text": " it. You put in your work.", "start": 572.24, "stop": 573.98, "id": 133}, {"text": " You said, I have this data. I want valuable tools. Maybe it's like I have a voice", "start": 574.16, "stop": 579.2, "id": 134}, {"text": " recognizer. I have a different accent, so I need to add some data for my accent", "start": 579.2, "stop": 584.1, "id": 135}, {"text": " so that it can work. And so I contribute to learn the shared model and I get", "start": 584.1, "stop": 587.92, "id": 136}, {"text": " something out of it and the overall global model benefits as well.", "start": 587.92, "stop": 592.4, "id": 137}, {"text": " So when it's federated, you are always getting a copy of the model", "start": 593.24, "stop": 598.28, "id": 138}, {"text": " locally. Right. Right.", "start": 598.28, "stop": 599.94, "id": 139}, {"text": " At that moment in time. Yes. That's the whole training loop is", "start": 601.14, "stop": 605.62, "id": 140}, {"text": " you get a model parameters.", "start": 605.62, "stop": 607.48, "id": 141}, {"text": " As far as we know, it's the best model weights or parameters, and you tune them a", "start": 608.86, "stop": 613.7, "id": 142}, {"text": " little bit more to improve them just a little bit. And then you send those back", "start": 613.7, "stop": 616.66, "id": 143}, {"text": " out into the world and you're constantly doing that. And if you do that enough,", "start": 616.66, "stop": 619.66, "id": 144}, {"text": " you converge to a better model because there's more data. Question in the back.", "start": 620.08, "stop": 624.74, "id": 145}, {"text": " Is there a something here that we're like essentially averaging the model weights", "start": 625.18, "stop": 629.64, "id": 146}, {"text": " across all of the individuals? And if so, is that appropriate?", "start": 629.64, "stop": 633.2, "id": 147}, {"text": " So the early works on federated learning do literally just average the weights.", "start": 634.5800000000002, "stop": 639.66, "id": 148}, {"text": " There are more advanced schemes, especially ones that can help with data privacy.", "start": 640.06, "stop": 645.74, "id": 149}, {"text": " And so sometimes it's a bit more complicated, but at a high level, you should", "start": 646.96, "stop": 650.48, "id": 150}, {"text": " think of it that way is that you're just averaging the parameters. There's some", "start": 650.48, "stop": 654.06, "id": 151}, {"text": " wrinkles in there, but that's how you should think of it.", "start": 654.06, "stop": 656.78, "id": 152}, {"text": " Great. Thank you very much, Zach.", "start": 658.0600000000001, "stop": 660.1, "id": 153}]}