Linus git tech talk download
User icon An illustration of a person's head and chest. Sign up Log in. Web icon An illustration of a computer application window Wayback Machine Texts icon An illustration of an open book. Books Video icon An illustration of two cells of a film strip. Video Audio icon An illustration of an audio speaker. When I say I hate CVS with a passion, I have to also say that if there are any SVN users in Subversion, users in the audience, you might want to leave because my hatred of CVS has meant that I see Subversion as being the most pointless project ever started, because the slogan for Subversion for a while was, CVS done right or something like that.
The positive credit is BitKeeper. And I realize that a lot of people thought there was a lot of strife over BitKeeper and that the parting was very painful in many ways. As far as I know, BitKeeper is the only commercial source control management system that actually does distribution.
Early versions of Git did require certain amount of brainpower to really wrap your mind around. With those out of the way — okay, so this slide is now one day old. I will start off trying to explain why distribution is so important.
So branching is much more inherent when you do distribution. One of the other things that, to me, is very important is that by being distributed, you also automatically get to be slightly more trustworthy. I have a theory of backup switches. I don't do them. I put stuff up on one side and everybody else mirrors it. And if I crash my own machine I don't really care, because I can just download my own work right back.
And it works beautifully well, and I don't have to have an MIS department. I heartily suggest everybody else do the same. But this only really works in a distributed environment.
If you use CVS, you can't do this. What do you use here? I'm sorry. I'm sure it's better than CVS. So that's part of it. One of the really nice things which is also-- maybe you don't have this issue inside a company, but we certainly have it in every single open source community I've ever seen that uses CVS or Subversion or something like that-- is you have this notion of commit access.
Because you have a central repository, it means that everybody who is working on that project needs to write to the central repository, which means that since you don't want everybody to write to the central repository because most people are morons, you create this class of people who are ostensibly not morons. And most of the time, what happens is you make that class too small because it's really hard to know if a person is smart or not, and even when you make it too small, you will have problems.
So this whole commit access issue, which some companies are able to ignore by just giving everybody commit access, is a huge psychological barrier and causes endless hours of politics in most open source projects. If you have a distributed model, it goes away. Everybody has commit access. You can do whatever you want to your project. You just get your own branch, you do great work or you do stupid work. Nobody cares. It's your copy, it's your branch. And later on, if it turns out you did a good job, you can tell people hey, here's my branch.
And by the way, it performs 10 times faster than anybody else's branch, so nyah nyah nyah, how about pulling from me? And people do. And that's actually how it works, and we never have any politics. That's not quite true, but we have other politics. We don't have to worry about the commit access thing. And I think this is a huge issue and that alone should mean that every single open source system should never use anything but a distributed model.
You get rid of a lot of issues. One of the things that commercial companies, distributed models actually help also with the release process. You can have a verification team that has its own tree, and they pull from people and they verify it. And when they've verified it, they can push it to the release team and say, hey, we have now verified our version. And the development people, they can go on playing with their head. Instead of having to create tagged branches, whatever you do to try to keep off each other's toes, again, you keep off each other's toes by just every single group can have its own tree and track its work and what they want done.
So distributed is really, really central to any SCM you should ever use. So get rid of Perforce now. That was my only real slide about distribution.
If you had this monstrously awesomely big code base, and you wanted to use this without stopping business for six months, how would you do it? OK, he went away. How would you do this? And that means that for a while, that feature will be very, very broken, right? Because nobody actually creates perfect code the first time around except me, but there's only one of me.
So what happens is they need to have their own tree that they can work in without affecting other people. You can do this many different ways. In CVS, one of the most common ways, because branches are so painful, is that you don't actually commit. You never commit until it passes every single test. For example, at your company you have a very strict committing rule saying you will never, ever commit until it's past the whole test suite.
And by the way, the fact that the test suite takes two hours to run, tough. You cannot afford to commit. And this is something that happens at every single company. I bet it happens even here at Google.
You probably have a strict test suite, and you are not supposed to commit unless it passes. And then in practice, people make one-liner changes and ignore the test suite because they know the one-liner changes can't possibly break. This happens. This is a horrible, horrible model.
It just means that you make huge commits because you commit something after you've worked on it for two weeks, and you have three people working in the same sandbox because before they commit, they can't see the changes that the other people made.
This is common. It happens everywhere, it's scary. The other alternative is to use branches even in a centralized environment. But branches always end up being pretty expensive to do, so you can't do them for experimental features. You don't know beforehand if it's something that's going to take one day or two weeks, but most of the time most programmers say, hey, I can do this in 48 hours.
And it turns out, yeah, no you couldn't. But because you feel you can do it in 48 hours, creating a branch, even in systems that are better at creating branches than CVS, is a big pain. So you don't do it because you think you can get it resolved and you're back to case number one. But if you decide to create a branch, you will affect everybody else's repository because in a centralized environment, branches are global.
So you're kind of screwing with everybody else, but at least you're not screwing with their main, head branch. You are adding stuff to their repositories, but hopefully in a way that they won't notice. But it does make everybody's repositories bigger. So either way, you can't win. In contrast, in a distributed environment, what you do is you have five people, they pull the current head, which is hopefully good and tested, and they start working on it and they start committing on it.
And you don't need to wait for two weeks until your commits are stable because your commits are always local. And what happens is within that group of five people, you can pull from each other.
That's what distributed means. There's no central location, it means everybody's the same. So you can merge between yourself. So not only can you commit every single line if you want to without having to run the two-hour test suite, but you can then communicate by pulling and merging each other's work and one person finds the bug again commits it and tells the other four people, hey, my repository has a fix for this.
And then when that group is done two weeks later, they can tell their manager, hey, we've done this. Can you ask the main group to pull, and they'll get this new feature and by the way, we've tested it over two weeks and it works and it performs this much better because we have actually been able to time it before we even ask anybody else to look at it.
And that's a hugely better model for doing development. And this is the model that the kernel uses. It turns out in many places, we don't need all that power, even in the kernel. So people usually don't pull within one group, but does it does happen. For example, the networking people sometimes affect the NFS people, and the fact that they can synchronize actually helps. So this is a real, practical advantage. Somebody else has a question. If everyone's got access and they're all playing with their branches and they have their sandbox and they're having fun, at the end of the day there has to be merging and resolving unless you have 80 billion flavors of every Linux kernel.
There will be 1, or maybe 20, different branches, but in practice you won't ever see them because they won't care. You will see like a few main branches, maybe you'll see only one. In the case of the kernel, a lot of people they only really look at my branch. So even though there are lots of branches, you can ignore them. What happens is the way merging is done is the way real security is done, by a network of trust. If you have ever done any security work and it did not involve the concept of network of trust, it wasn't security work.
It was masturbation. I don't know what you were doing, but trust me, it's the only way you can do security, it's the only way you can do development. The way I work, I don't trust everybody.
In fact, I am a very cynical and untrusting person. I think most of you are completely incompetent. The whole point of being distributed is I don't have to trust you. I don't have to give you commit access. But I know that among the multitude of average people, there are some people that just stand out, that I trust because I've been working with them. I only need to trust 5, 10, 15 people.
If I have a network of trust that covers those 5, 10, 15 people that are outstanding and I know they're outstanding, I can pull from them. I don't have to spend a lot of brain power on the question. When Andrew sends me patches-- he doesn't actually use Git, it's some kind of defect-- other than that, he's a very solid person. When he asks me to pull, he does it by sending me a million patches. Instead, I just do it. Sometimes I disagree with some of these patches, but at some point, trust means never having to say you're sorry.
I don't know. It basically means you have to accept other people's decisions. The nice thing about trust is it does network, that's where the network of trust comes in. I only need to trust a few people that much.
They have other people, they have determined, hey, that guy is actually smarter than I am. That's actually a really good measure of who you should pull from. If you have determined that somebody else is smarter than you, go for it. You can't lose, right? Even if it turns out you pulled crap and somebody else starts complaining, you know who you pulled from and you can just point to the other person and say, hey, I just pulled. Go to him, he knows what he's doing.
So that's how I work. That's probably most of my lieutenants work. I pull the networking changes from one person, he gets them from many other people that he's worked with over time. So this is how it all comes together.
It doesn't have to come together to one point. In the kernel, it comes together to one point largely I think for historical reasons. And actually, I've always tried to encourage people to have more trees. So we do have vendor trees, we do have -mm trees, we have multiple one points, and it happens to be that my one point is getting maybe more attention than it always should. But even if it doesn't come down to one point, it means that you can take these thousands of branches and ignore And you know that, hey, there are five branches that are really interesting to follow because I'm interested in those sub-areas.
And it all works very naturally. One of the nice things about this whole network of trust is it's not just easy to do technically, it's actually how every single person in this room is very fundamentally wired to work. It is how we think. We don't know people. We have five, seven, ten close, personal friends.
Well, we're geeks, so we have two. But I mean, that's basically how humans work, is that we have these people that we really trust. It's family, it's close friends. And it really fits. You don't even have to have a mental model.
It fits how we are wired up. So there's huge advantages to this whole model network of trust. It seems like there might be a risk of vulcanizing the code base as people not being in the same sandbox don't contribute back. There is BitKeeper. It is clearly being used at commercial companies. We might have somebody in the audience who actually knows. I'm sure they have a lot more companies. In the open source world, there are two distributed systems that are worth looking at right now.
One of them is obviously Git and you really should pick that one, but the other one is Mercurial, which actually has pretty much the same time design. There are huge differences in implementation and there are some differences in details, but it boils down to a very similar model. Git just does it better. Everything else, it's either centralized or it is too unstable or too slow to use for anything big.
I know that inside companies, I don't think a lot of companies use Git knowingly in the sense that it is a company decision. I know several companies who use Git internally, not knowing that they do so because they actually have their main repository in Subversion and a lot of developers then import it into Git because Git can actually merge things for you. So you can take a Subversion tree, just import it into Git, let Git do the merge, which would be a major headache to do in Subversion, create a merge commit, and actually export it back to Subversion, and nobody else even knew you used Git.
It's kind of sad, but we have cases of people talking about doing exactly that inside companies. Git has not been around in a form where a lot of people will be comfortable using it for more than half a year or so. We have had so huge improvements to the user interfaces that realistically, a year ago at a commercial company a lot of people would just have said it's too hard to use. I think we're way past that hump.
Build powerful and effective projects using Git Version Control Systems. It helps people with an engineering background learning Git's internals. Master versioning and manage your code with Git by controlling its workflow and using it for your projects.
Learn the basics of Git and Version Control through detailed and easy to follow steps.
0コメント