** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
I don't know anything about there architectures -- are you saying they are monorepos?
Sure. And there's definitely such a thing as breaking things up too early, but I assumed we were talking about larger systems. But even there, like with a medium sized web application with a SPA frontend, I'd generally prefer to have 2 repos. Because those two things are only supposed to touch through the API.
technically true, but in practice it affects how people think about it. Having multiple projects in separate repos establishes boundaries in a way that's clearer and harder to ignore.
Also, 1) lots of companies are actually smaller than an Amazon team.
2) monorepo does not dictate anything about the run-time architecture of your system.
https://cacm.acm.org/magazines/2016/...itory/fulltext
https://code.facebook.com/posts/2186...l-at-facebook/
Sure. And there's definitely such a thing as breaking things up too early, but I assumed we were talking about larger systems. But even there, like with a medium sized web application with a SPA frontend, I'd generally prefer to have 2 repos. Because those two things are only supposed to touch through the API.
technically true, but in practice it affects how people think about it. Having multiple projects in separate repos establishes boundaries in a way that's clearer and harder to ignore.
technically true, but in practice it affects how people think about it. Having multiple projects in separate repos establishes boundaries in a way that's clearer and harder to ignore.
In the node world specifically, with all this babel plugins, webpack/eslint configs, not to mention conflicting versions of the same frameworks and libraries and what not, it becomes very easy for code bases in separate repos to diverge enough that, despite being merely features in a single app, you can't even move things around any more. It sort of allows developers to move faster with more control at the module level at the expense of a much higher total cost of development. None of the complexity actually goes away - all the components still need to come together and work together. Over time it also leads to legacy repos/systems that everyone depends on but no one owns or knows anything about. Monorepo keeps everything, even ugliness, in plain sight.
Whenever I write code I think of my variable and function names as telling a story, and use naming conventions like suzzer mentioned. If it is not clear what the code does and why it does it from that, then I have failed and should re-write it.
Edit: Also, the premise is only mostly true. There are times where its not possible for the code to reflect why it does something in naming. For example, when dealing with performance issues or fixing certain types of bugs.
Code:
Initialize.goFaster() goFaster(allTheThings)
This is bs. As far as I'm concerned, anybody who says this is just trying to justify being lazy. Would you rather work in a code base that is heavily documented such that you know where you need to make your updates or your "self-documenting" code base? The non-documented code base assumes that whoever worked on it did a good job. That is a very generous assumption.
All else being equal, I would prefer that a poorly written codebase is accurately commented. The problem is, comments are not a defense against bad programmers, in fact they frequently make matters worse. I once inherited a disaster of a codebase that had been written by a bad offshore team. While refactoring it, I deleted all comments on sight. 99% of them were either obvious or wrong or both. Frequently code had been copied from one place to another and then the code had been modified and the comments had not been. I think programmers who write shoddy code but useful and well-maintained comments are pretty rare beasts.
I don't think it's that tricky, actually. If you need comments to explain what specific code is doing, the code needs to be rewritten.
Comments are appropriate for high-level concerns like explaining architecture, the reason module X was chosen, or the history behind some oddly chosen name.
As a rule of thumb, comments are the kind of thing you'd say to someone before you opened up a tex editor: "Oh, before we get into this, let me tell you X....".
Comments are appropriate for high-level concerns like explaining architecture, the reason module X was chosen, or the history behind some oddly chosen name.
As a rule of thumb, comments are the kind of thing you'd say to someone before you opened up a tex editor: "Oh, before we get into this, let me tell you X....".
Occasionally code will be complex enough that an explanatory note is needed, yet rewriting it a different way is not practical. An example might be a complex regular expression. Those instances should be kept to a minimum though.
I know I see far more bad code than good code, but I'd say a common thread among good code is it's self-documenting properties. Sadly, good and bad code feature little to no commenting, but I don't think comments would improve either good or bad codebases, and I think comments would be more detrimental than useful in bad codebases.
Clearly I'd need to see the whole thing in action and maybe it would make more sense, but for now I'm going with it's a conspiracy.
Also, there's something wrong about focusing on the "repo" part. Like, no matter what you do, you break things down somehow. You can't get around it. They draw the boundaries somewhere. You don't have a mono anything that big. Not really.
Occasionally code will be complex enough that an explanatory note is needed, yet rewriting it a different way is not practical. An example might be a complex regular expression. Those instances should be kept to a minimum though.
The most common good reason for commenting that I've found was when you're working around something that you can't do much about. A bug in runtime/framework/library for instance often necessitates that you write wrong-looking code - I think one time I had to write something that looks like a no-op to work around a bug in dealing with iframes in jQuery that was IE-specific. Outside of browser issues, if you deal with low-level code, it's very common to find bugs in libraries that relate to thread/exception/memory safety, which you need to write wrong-looking code to work around. While I'm perfectly fine with well-written, self-documenting code with no comments, it's a bit of fantasy to think that we can do that all the time - real software engineering isn't always pretty.
https://arstechnica.com/information-...it-repository/
Microsoft, Facebook and Google are probably the three most respected software engineeing organizations in the world.
But that was what was being discussed:
I personally favor mono-repo and having all technical design doc in that repo so that code changes and documentation changes can be reviewed together but what typically happens when you have lots of repos, lots of applications, which also means lots of different teams doing their own thing without anyone else knowing, is that things leak outside of repos as people build and use their own documentation solutions and are no longer aware of how other teams are doing things.
Like, no matter what you do, you break things down somehow. You can't get around it. They draw the boundaries somewhere. You don't have a mono anything that big. Not really.
Microsoft, Facebook and Google are probably the three most respected software engineeing organizations in the world.
I even touch on the theme my node talk - "Node for the Rest of Us". I came up with a node framework that offshore devs can use to build features w/o getting themselves into too much trouble. I imagine at Google or FB I'd give feature devs a lot more leash to the point where my framework would be a very light library.
We tried to replicate the Netflix microservice back end. Jury is still out on whether it's going to work. But there have been massive growing pains. One of the biggest is a vacuum of leadership over the entire over-arching application. I'm not really sure how that came about but usually these things are a combination of political inertia and lack of caring. I imagine Netflix doesn't have those problems.
/ramble
i think i literally don't understand what it means. surely every engineer at google is not cloning a repo of all of google's code to their laptop. so what does it mean?
It exactly does those things. The really big stuff, the whole world stuff, isn't managed top down. It's an emergent property from the smaller domains that people actually work in.
By your argument, how does npm work? you don't make npm go away... but that doesn't matter. nobody has to deal with that whole.
there is absolutely no way anyone is seeing "all of google's code" and its dependencies and making decisions about it in that way. it's not possible. code that big can't be managed like that. this was my point. ok, it's a monorepo. i still don't know what that means. but i know it doesn't mean that people understand it all. they know generally what services are available, and then how to search down with more granularity. exactly the way you'd find a library on npm or rubygems or wherever.
I'm not sure what your point is here - obviously a monorepo doesn't preclude project or component or service-level organization within the repo - but the flipside of this is that no matter how you divide things, you still have to deal with the whole. Having national borders doesn't make the concept of the earth go away and having a bunch of rooms in a house doesn't mean you don't have to maintain the house.
By your argument, how does npm work? you don't make npm go away... but that doesn't matter. nobody has to deal with that whole.
I think a huge part of the recent movement towards microservices and microrepos and microapps and what not is due to some developers not understanding this and being driven by this irrational fear of large systems, perhaps due to having been burned by large poorly architected systems in the past. And maybe resume-driven development and a degree of hubris that some companies have regarding their true nature (we are a platform company - no you're not). Yet, somebody still needs to look at the whole thing and decisions have to be made at that level.
Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer.
All writes to files are stored as snapshots in CitC, making it possible to recover previous stages of work as needed. Snapshots may be explicitly named, restored, or tagged for review.
All writes to files are stored as snapshots in CitC, making it possible to recover previous stages of work as needed. Snapshots may be explicitly named, restored, or tagged for review.
It exactly does those things. The really big stuff, the whole world stuff, isn't managed top down. It's an emergent property from the smaller domains that people actually work in.
Google is a company with a small number of products and a CEO who is accountable to the board of directors and shareholders. All of the stuff in it has to be absolutely manageable top down, even if that capability isn't utilized all the time. If you're in charge of Google Search or GMail or some other humongous project, you can't tell your boss, you know what, I have no idea what's going on, the product is just an emergent property of all these devs doing their own thing. Obviously the centralized planning is informed and influenced by what happens in the trenches but that doesn't mean you don't want to have that capacity.
By your argument, how does npm work? you don't make npm go away... but that doesn't matter. nobody has to deal with that whole.
there is absolutely no way anyone is seeing "all of google's code" and its dependencies and making decisions about it in that way.
they know generally what services are available, and then how to search down with more granularity. exactly the way you'd find a library on npm or rubygems or wherever.
Google is a company with a small number of products and a CEO who is accountable to the board of directors and shareholders. All of the stuff in it has to be absolutely manageable top down, even if that capability isn't utilized all the time. If you're in charge of Google Search or GMail or some other humongous project, you can't tell your boss, you know what, I have no idea what's going on, the product is just an emergent property of all these devs doing their own thing. Obviously the centralized planning is informed and influenced by what happens in the trenches but that doesn't mean you don't want to have that capacity.
This is because npm doesn't own the contents and each of the repos is more or less its own user-facing product with a specific owner and no one needs to be able to make changes to something they don't own. There's no overarching architecture that ties these repos togeher. Google owns all of that source code, they are not letting things randomly evolve. All of these (maybe not all, but large overlapping portions of it) work together to produce more or less a single product.
It's like saying, you know what, we can't be expected to follow the laws because we're too big and can't watch everyone.
Or there are too many cells in our body so they must all be working independently and there can't be a centralized mechanism for controlling our behavior.
You can't just throw your hands in the air, I can't micromanage everything so manage nothing and let things be.
No it's more like, you know what I need to change 5000 places where my library is used in a way that I don't want to support in the future, let's change all of them programmatically and see how many tests I break. Or, welp, there's this security vulnerability found in a commonly used library - let's see if we can fix this across the entire codebase.
I think I framed this kind of poorly but the meaningful question here isn't whether one person understands everything, but rather can you draw up boundaries such that no one needs to do things that lie across boundaries. The answer is quite clearly no. Another issue with a multi-repo system is that someone has to make a decision as to whether something is a top-level repo and this decision cannot easily be undone later. And changes in this high-level structure cannot easily be tracked. What happens when you move a project from one repo to another?
The real point has always been that without a monorepo, the whole isn't even an entity that your source control system understands. The real-world equivalent of this isn't some kind of loose federated systems we see in the wild - it's more like states not even knowing that there are other states and things that happen not entirely within one state becoming entirely unknown. Changes that cross the repo boundaries become untrackable. We're not arguing about degrees of centralization vs decentralization in the world - we're talking about whether it's useful to be able to model the whole system. The emergence of the type you're talking about happens specifically as a result of interaction between entities and it's this interaction that multi-repo systems cannot model. If you think this through, this whole decentralization and emergence and all this are another argument for the monorepo, as the major thing a multi-repo system enforces is the centralization of boundaries.
The monorepo is just for google.com and it can be seen as a single product or a suite of products, just as Excel can be seen as both a standalone product or part of Microsoft Office. But how we name this is irrelevant since you need to be able to run and test the system as a whole. User data and session information for instance, are clearly shared across their almost entire suite of products. If I change something major in the core, I need to be able to model and understand the impact on the whole system. As most major platform vendors at scale now understand, most APIs are poorly specified and testing in isolation is not enough if you care about reliability.
The real thing to understand is that the whole exists no matter what - the sum of all your repos is a thing that exists. The only remaining question is whether you should have tools that allow you work on the whole, across the arbitrary boundaries you may have placed in the first place. It's like having drive letters in Windows - this doesn't help any meaningful way and merely constrains how you organize things. Nothing about the monorepo prevents you from having a directory structure that maps to ownership/project structure such that most of your work happens within your team directory/project. It just makes meta-operations massively easier to perform and track.
except that's true. the laws happen to be viable because human beings behave more or less predictably when raised within a given culture and living in an environment with certain stable properties: infrastucture, food supply, law enforcement with some degree of effectiveness, and so on. it's that environment + people's nature that gives rise to a lawful society emergently. it's not at all as if the society is being controlled in a top down way, even though we have that mental model for convenience.
I think we're talking past each other. Google is not a single product as I see it.
ah, ok. THIS makes sense to me. this is what i was missing, or what was not made clear in the articles.
I think we're starting to converge. I still have a few points of disgreement, or possibly misunderstanding....
Is it? This is precisely how the web works. For me the web as a whole is like the multi-repo approach in this conversation. And I'm still not really seeing any other way to make things truly scale. There are only services, responsible for themselves, and the APIs they expose... I don't see why a product as gigantic as google would work differently. But perhaps I'm still misunderstanding something here.
I don't understand this part.
Okay, I thought it was for the entire company, as in every product alphabet offers.
What about oauth and other forms of SSO? Separating login/session functionality is done all the time. You have some simple API, everything uses it. Why is that a problem? Why is that not a good thing? Why does any code that uses SSO ever need to know about its implementation? Or are you saying something else?
This statement still feels to me like a wholesale rejection of encapsulation, and an endorsement of fully entangling every part of your system because it's still a "whole." I'm assuming that's _not_ what you mean, but I'm struggling to understand how what you're saying is different.
If you think this through, this whole decentralization and emergence and all this are another argument for the monorepo, as the major thing a multi-repo system enforces is the centralization of boundaries.
The monorepo is just for google.com and it can be seen as a single product or a suite of products,
User data and session information for instance, are clearly shared across their almost entire suite of products. If I change something major in the core, I need to be able to model and understand the impact on the whole system. As most major platform vendors at scale now understand, most APIs are poorly specified and testing in isolation is not enough if you care about reliability.
The real thing to understand is that the whole exists no matter what - the sum of all your repos is a thing that exists. The only remaining question is whether you should have tools that allow you work on the whole, across the arbitrary boundaries you may have placed in the first place.
And I'm still not really seeing any other way to make things truly scale. There are only services, responsible for themselves, and the APIs they expose... I don't see why a product as gigantic as google would work differently. But perhaps I'm still misunderstanding something here.
Okay, I thought it was for the entire company, as in every product alphabet offers.
What about oauth and other forms of SSO? Separating login/session functionality is done all the time. You have some simple API, everything uses it. Why is that a problem? Why is that not a good thing? Why does any code that uses SSO ever need to know about its implementation? Or are you saying something else?
This statement still feels to me like a wholesale rejection of encapsulation, and an endorsement of fully entangling every part of your system because it's still a "whole." I'm assuming that's _not_ what you mean, but I'm struggling to understand how what you're saying is different.
Just a general comment after reading your responses... you seem to be arguing so strongly in favor of monorepo now and how it alone addresses this "whole" problem that I don't understand how you explain the success of companies like amazon, or many others that don't embrace it.
I think the level of our discussion is so abstract at this point that I don't know if we're disagreeing or talking past one another. Nevertheless, it's a new POV to me so I'm interested in it, and I'll respond again. Though I fear the conversation might be approaching the tedium threshold for others (though not for me).
http. rack. oauth. this statement just isn't true.
as a general point i think appeals to "real life" are dangerous and should be made more specific -- not that the umbrella of "real life" doesn't capture real complex phenomena, but that it's not a useful for concept for understanding things deeply.
but that's exactly the point of true separation. you seem to be arguing that true separation of concerns is a chimera of optimistic engineers. you keep saying you can't ignore the whole, but i don't know what that means because any time I create a self-contained unit of functionality at any level in the stack that's what I'm doing. I think this is a point where we're talking past one another. I'm not sure how to solve it. The level of abstraction of the examples we're using is clearly not the right one (at least for me).
But you don't. That's the whole point. You can build a unit of functionality with no knowledge of the whole. Because it can be used across many "whole"s. what about standard library functions? what whole do they recognize?
you can exactly define and enforce the limits of the entanglement by limiting it through API boundaries. There is no more than that.
everyone knows exactly how they're entangle to any of their dependencies and they are in full control of that entanglement.
which problem are you referring to? there is the leftpad-acopolypse stuff but that's a problem at a very different level -- depending on npm itself to serve your versioned files. there are many solutions to that. but depending on specific versions of modules isn't a problem as long as you have a copy of the version you need. if you're talking about migrating forward i don't see how that problem has anything to do with monorepos or not. except that in a monorepo you have a clear idea of the impact a certain change would have on all you clients, and you could let them inform the design of your next version. i can see the use of that. is that all we're talking about?
I think the level of our discussion is so abstract at this point that I don't know if we're disagreeing or talking past one another. Nevertheless, it's a new POV to me so I'm interested in it, and I'll respond again. Though I fear the conversation might be approaching the tedium threshold for others (though not for me).
Nothing at this scale is a "simple API" and in real life
as a general point i think appeals to "real life" are dangerous and should be made more specific -- not that the umbrella of "real life" doesn't capture real complex phenomena, but that it's not a useful for concept for understanding things deeply.
Separation doesn't mean you can't benefit from the additional testing coverage provided by client code or that you wouldn't want to test it if you could.
You have to recognize that there is a whole before you can do any encapsulation.
Also not acknowledging the entanglement doesn't make it go away
- in a multi-repo system where people don't know what others are doing, it's much easier for dependencies build up and apps and services to get entangled in a way that's unhealthy because no one is empowered to untangle stuff.
npm is actually a great example of a decentralized entangled mess that nobody can fix. Not only do you get entanglements, but you get entanglements across time through versioning - this massively increases the complexity and makes reliability nearly unattainable at scale.
Just a general comment after reading your responses... you seem to be arguing so strongly in favor of monorepo now and how it alone addresses this "whole" problem that I don't understand how you explain the success of companies like amazon, or many others that don't embrace it.
I think the level of our discussion is so abstract at this point that I don't know if we're disagreeing or talking past one another. Nevertheless, it's a new POV to me so I'm interested in it, and I'll respond again. Though I fear the conversation might be approaching the tedium threshold for others (though not for me).
I think the level of our discussion is so abstract at this point that I don't know if we're disagreeing or talking past one another. Nevertheless, it's a new POV to me so I'm interested in it, and I'll respond again. Though I fear the conversation might be approaching the tedium threshold for others (though not for me).
I'm reading it, so it's not too tedious. I think you guys are putting too strong of a focus on the importance of monorepo vs multirepo (is that what we call it?).
This is kind of like when people argue for one branching strategy over another. Maybe there's one better way, but it's close and hard to tell in practice.
Using google as an example is kind of flawed. They've invested so much in tooling that they almost certainly could have done almost as well with multiple repos and the same amount of custom tooling.
My biggest reason that the monorepo seems oversold is because code versioning is just a part of the complexity. Even with a monorepo you're going to be dealing with changing apis / versions / etc in the actual running services (at least if you're at a scale where this discussion makes any sense at all).
It's not like a merge to master magically updates all running machines all at once. And just because you can make a change with a single commit doesn't mean you don't have to think of all of the teams impacted and figure out how the change will get rolled out. In practice you're almost always going to be backwards compatible for awhile and then eventually remove the deprecated code anyway.
I'm still a fan of the monorepo model. I think candybar makes a lot of good points. And even more simply it's clearly the right option at the start and so an alternative needs to be significantly better to justify the cost of changing at some point in the future.
Also the tooling Google has seems pretty kickass.
When the mono-repo approach is technically feasible without bespoke tools, using a multi-repo approach over a mono-repo approach essentially amounts to something like using RCS over git or svn and talking about how this prevents entanglements. If you can see why it's valuable to use svn over RCS (as it pertains to the ability to have a single commit that spans multiple files), mono-repo over multi-repo adds the same additional capability, but at a larger scale.
that I don't understand how you explain the success of companies like amazon, or many others that don't embrace it.
http. rack. oauth. this statement just isn't true.
but that's exactly the point of true separation. you seem to be arguing that true separation of concerns is a chimera of optimistic engineers. you keep saying you can't ignore the whole, but i don't know what that means because any time I create a self-contained unit of functionality at any level in the stack that's what I'm doing. I think this is a point where we're talking past one another. I'm not sure how to solve it. The level of abstraction of the examples we're using is clearly not the right one (at least for me).
But you don't. That's the whole point. You can build a unit of functionality with no knowledge of the whole. Because it can be used across many "whole"s. what about standard library functions? what whole do they recognize?
you can exactly define and enforce the limits of the entanglement by limiting it through API boundaries. There is no more than that.
everyone knows exactly how they're entangle to any of their dependencies and they are in full control of that entanglement.
which problem are you referring to?
but depending on specific versions of modules isn't a problem as long as you have a copy of the version you need.
My biggest reason that the monorepo seems oversold is because code versioning is just a part of the complexity. Even with a monorepo you're going to be dealing with changing apis / versions / etc in the actual running services (at least if you're at a scale where this discussion makes any sense at all).
It's not like a merge to master magically updates all running machines all at once. And just because you can make a change with a single commit doesn't mean you don't have to think of all of the teams impacted and figure out how the change will get rolled out. In practice you're almost always going to be backwards compatible for awhile and then eventually remove the deprecated code anyway.
It's not like a merge to master magically updates all running machines all at once. And just because you can make a change with a single commit doesn't mean you don't have to think of all of the teams impacted and figure out how the change will get rolled out. In practice you're almost always going to be backwards compatible for awhile and then eventually remove the deprecated code anyway.
And even more simply it's clearly the right option at the start and so an alternative needs to be significantly better to justify the cost of changing at some point in the future.
and you have not listed a single advantage for the multi-repo approach - we're just talking philosophically about the nature of things or something.
it's getting late, but your other points are interesting. you're definitely getting me to think about the problem differently.
Noticed this in this week's Hacker Newsletter: New CA law taking effect Jan 1 makes it illegal for employers to ask candidates their previous salary, and they must present a salary range for the position if asked
Feedback is used for internal purposes. LEARN MORE