** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** - Page 1210 - Computer Technical Help

technically true, but in practice it affects how people think about it. Having multiple projects in separate repos establishes boundaries in a way that's clearer and harder to ignore.

Quote

10-18-2017 , 04:37 PM

#30227

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

I don't know anything about there architectures -- are you saying they are monorepos?

https://danluu.com/monorepo/

https://cacm.acm.org/magazines/2016/...itory/fulltext

https://code.facebook.com/posts/2186...l-at-facebook/

Quote:

Sure. And there's definitely such a thing as breaking things up too early, but I assumed we were talking about larger systems. But even there, like with a medium sized web application with a SPA frontend, I'd generally prefer to have 2 repos. Because those two things are only supposed to touch through the API.

technically true, but in practice it affects how people think about it. Having multiple projects in separate repos establishes boundaries in a way that's clearer and harder to ignore.

The problem is that project boundaries often need to change - with more than one repo, meta changes (moving stuff from one project to another) become difficult enough that people stop bothering after a while and a bad architecture gets calcified. Also, sometimes even good boundaries aren't perfect due to cross-cutting concerns and you need to make changes across a whole bunch of projects - this is much easier to do and far easier to track with a monorepo. Also, for a large engineering organization to succeed at scale, you need heroic people who are willing to do things that somehow fall through the cracks and technically don't belong to any team, to make things work. Monorepo enables these people both technologically and culturally. Having lots of repos also often leads to engineering teams not looking at one another's work, which often leads to the decline in engineering standards and proliferation of duplicative technologies and bespoke tools.

In the node world specifically, with all this babel plugins, webpack/eslint configs, not to mention conflicting versions of the same frameworks and libraries and what not, it becomes very easy for code bases in separate repos to diverge enough that, despite being merely features in a single app, you can't even move things around any more. It sort of allows developers to move faster with more control at the module level at the expense of a much higher total cost of development. None of the complexity actually goes away - all the components still need to come together and work together. Over time it also leads to legacy repos/systems that everyone depends on but no one owns or knows anything about. Monorepo keeps everything, even ugliness, in plain sight.

Quote

10-18-2017 , 04:56 PM

#30228

Larry Legend

Celtic Pride

Join Date: Jul 2009 Posts: 42,920

Whenever I write code I think of my variable and function names as telling a story, and use naming conventions like suzzer mentioned. If it is not clear what the code does and why it does it from that, then I have failed and should re-write it.

Quote

10-18-2017 , 05:53 PM

#30229

jjshabado

Carpal Tunnel

Join Date: Jul 2006 Posts: 22,732

Quote:

Originally Posted by Larry Legend

If it is not clear what the code does and why it does it from that, then I have failed and should re-write it.

I'm going to admit that I'm a far from perfect programmer. Sometimes I have something that works and its easier/faster/more efficient to just slap a comment on there, ship it, and move on.

Edit: Also, the premise is only mostly true. There are times where its not possible for the code to reflect why it does something in naming. For example, when dealing with performance issues or fixing certain types of bugs.

Quote

10-18-2017 , 05:59 PM

#30230

kerowo

lolcat

Join Date: Nov 2005 Posts: 37,190

Code:

Initialize.goFaster()
goFaster(allTheThings)

Quote

10-18-2017 , 06:52 PM

#30231

jjshabado

Carpal Tunnel

Join Date: Jul 2006 Posts: 22,732

https://twitter.com/johnregehr/statu...91341738123264

Quote

10-18-2017 , 07:51 PM

#30232

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by Craggoo

This is bs. As far as I'm concerned, anybody who says this is just trying to justify being lazy. Would you rather work in a code base that is heavily documented such that you know where you need to make your updates or your "self-documenting" code base? The non-documented code base assumes that whoever worked on it did a good job. That is a very generous assumption.

Knowing where I need to make updates (i.e. where in the codebase I need to go to accomplish something) is the job of Confluence or some other external documentation. I shouldn't be digging through comments in the actual code to figure that out.

All else being equal, I would prefer that a poorly written codebase is accurately commented. The problem is, comments are not a defense against bad programmers, in fact they frequently make matters worse. I once inherited a disaster of a codebase that had been written by a bad offshore team. While refactoring it, I deleted all comments on sight. 99% of them were either obvious or wrong or both. Frequently code had been copied from one place to another and then the code had been modified and the comments had not been. I think programmers who write shoddy code but useful and well-maintained comments are pretty rare beasts.

Quote:

Originally Posted by gaming_mouse

I don't think it's that tricky, actually. If you need comments to explain what specific code is doing, the code needs to be rewritten.

Comments are appropriate for high-level concerns like explaining architecture, the reason module X was chosen, or the history behind some oddly chosen name.

As a rule of thumb, comments are the kind of thing you'd say to someone before you opened up a tex editor: "Oh, before we get into this, let me tell you X....".

Right. And that can be low level too, sometimes. For example, an explanation that something is being done a weird way because the "obvious" way to write it runs into some obscure bug.

Occasionally code will be complex enough that an explanatory note is needed, yet rewriting it a different way is not practical. An example might be a complex regular expression. Those instances should be kept to a minimum though.

Quote

10-18-2017 , 09:15 PM

#30233

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

I know I see far more bad code than good code, but I'd say a common thread among good code is it's self-documenting properties. Sadly, good and bad code feature little to no commenting, but I don't think comments would improve either good or bad codebases, and I think comments would be more detrimental than useful in bad codebases.

Quote

10-18-2017 , 09:18 PM

#30234

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

Originally Posted by candybar

https://danluu.com/monorepo/

https://cacm.acm.org/magazines/2016/...itory/fulltext

https://code.facebook.com/posts/2186...l-at-facebook/

Ok, I read those, and I don't know what to say. I think I don't believe it. Literally. They aren't using a monorepo. Just no.

Clearly I'd need to see the whole thing in action and maybe it would make more sense, but for now I'm going with it's a conspiracy.

Also, there's something wrong about focusing on the "repo" part. Like, no matter what you do, you break things down somehow. You can't get around it. They draw the boundaries somewhere. You don't have a mono anything that big. Not really.

Quote

10-18-2017 , 09:24 PM

#30235

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

Originally Posted by ChrisV

Right. And that can be low level too, sometimes. For example, an explanation that something is being done a weird way because the "obvious" way to write it runs into some obscure bug.

totally fine.

Quote:

Occasionally code will be complex enough that an explanatory note is needed, yet rewriting it a different way is not practical. An example might be a complex regular expression. Those instances should be kept to a minimum though.

also fine. like i said, there are exceptions. but they should remain that.

Quote

10-18-2017 , 09:27 PM

#30236

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

The most common good reason for commenting that I've found was when you're working around something that you can't do much about. A bug in runtime/framework/library for instance often necessitates that you write wrong-looking code - I think one time I had to write something that looks like a no-op to work around a bug in dealing with iframes in jQuery that was IE-specific. Outside of browser issues, if you deal with low-level code, it's very common to find bugs in libraries that relate to thread/exception/memory safety, which you need to write wrong-looking code to work around. While I'm perfectly fine with well-written, self-documenting code with no comments, it's a bit of fantasy to think that we can do that all the time - real software engineering isn't always pretty.

Quote

10-18-2017 , 10:00 PM

#30237

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

It's a pretty big conspiracy:

https://arstechnica.com/information-...it-repository/

Microsoft, Facebook and Google are probably the three most respected software engineeing organizations in the world.

Quote:

Originally Posted by gaming_mouse

Also, there's something wrong about focusing on the "repo" part.

But that was what was being discussed:

Quote:

Originally Posted by candybar

I personally favor mono-repo and having all technical design doc in that repo so that code changes and documentation changes can be reviewed together but what typically happens when you have lots of repos, lots of applications, which also means lots of different teams doing their own thing without anyone else knowing, is that things leak outside of repos as people build and use their own documentation solutions and are no longer aware of how other teams are doing things.

Quote:

Like, no matter what you do, you break things down somehow. You can't get around it. They draw the boundaries somewhere. You don't have a mono anything that big. Not really.

I'm not sure what your point is here - obviously a monorepo doesn't preclude project or component or service-level organization within the repo - but the flipside of this is that no matter how you divide things, you still have to deal with the whole. Having national borders doesn't make the concept of the earth go away and having a bunch of rooms in a house doesn't mean you don't have to maintain the house. I think a huge part of the recent movement towards microservices and microrepos and microapps and what not is due to some developers not understanding this and being driven by this irrational fear of large systems, perhaps due to having been burned by large poorly architected systems in the past. And maybe resume-driven development and a degree of hubris that some companies have regarding their true nature (we are a platform company - no you're not). Yet, somebody still needs to look at the whole thing and decisions have to be made at that level.

Quote

10-18-2017 , 10:55 PM

#30238

suzzer99

Save the Cheerleader, Save the World

Join Date: Nov 2005 Posts: 99,984

Quote:

Microsoft, Facebook and Google are probably the three most respected software engineeing organizations in the world.

Yeah but what they do with infinite rock star devs, no offshoring, and managers who actually understand and care about things other than "Am I on budget and is my feature working?", isn't always replicable in the other 99.9% of large companies out there.

I even touch on the theme my node talk - "Node for the Rest of Us".

I came up with a node framework that offshore devs can use to build features w/o getting themselves into too much trouble. I imagine at Google or FB I'd give feature devs a lot more leash to the point where my framework would be a very light library.

We tried to replicate the Netflix microservice back end. Jury is still out on whether it's going to work. But there have been massive growing pains. One of the biggest is a vacuum of leadership over the entire over-arching application. I'm not really sure how that came about but usually these things are a combination of political inertia and lack of caring. I imagine Netflix doesn't have those problems.

/ramble

Last edited by suzzer99; 10-18-2017 at 11:11 PM.

Quote

10-18-2017 , 11:39 PM

#30239

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

Originally Posted by candybar

But that was what was being discussed:

i think i literally don't understand what it means. surely every engineer at google is not cloning a repo of all of google's code to their laptop. so what does it mean?

Quote:

It exactly does those things. The really big stuff, the whole world stuff, isn't managed top down. It's an emergent property from the smaller domains that people actually work in.

By your argument, how does npm work? you don't make npm go away... but that doesn't matter. nobody has to deal with that whole.

Quote:

I think a huge part of the recent movement towards microservices and microrepos and microapps and what not is due to some developers not understanding this and being driven by this irrational fear of large systems, perhaps due to having been burned by large poorly architected systems in the past. And maybe resume-driven development and a degree of hubris that some companies have regarding their true nature (we are a platform company - no you're not). Yet, somebody still needs to look at the whole thing and decisions have to be made at that level.

there is absolutely no way anyone is seeing "all of google's code" and its dependencies and making decisions about it in that way. it's not possible. code that big can't be managed like that. this was my point. ok, it's a monorepo. i still don't know what that means. but i know it doesn't mean that people understand it all. they know generally what services are available, and then how to search down with more granularity. exactly the way you'd find a library on npm or rubygems or wherever.

Quote

10-19-2017 , 12:42 AM

#30240

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

i think i literally don't understand what it means. surely every engineer at google is not cloning a repo of all of google's code to their laptop. so what does it mean?

It's in that article:

Quote:

Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer.

All writes to files are stored as snapshots in CitC, making it possible to recover previous stages of work as needed. Snapshots may be explicitly named, restored, or tagged for review.

Quote:

It exactly does those things. The really big stuff, the whole world stuff, isn't managed top down. It's an emergent property from the smaller domains that people actually work in.

The real big stuff is often managed top down - the US, which is much bigger than Google, has a federal government that collects and spends trillions of dollars, manages a large nuclear arsenal, the military, social security, medicare, just to name a few, all in accordance with the laws that are created by a fairly small body of people. Of course there's a lot of autonomy and decentralization inherent in any large system but you don't just say this whole overarching structure is entirely useless because you can't control everything perfectly.

Google is a company with a small number of products and a CEO who is accountable to the board of directors and shareholders. All of the stuff in it has to be absolutely manageable top down, even if that capability isn't utilized all the time. If you're in charge of Google Search or GMail or some other humongous project, you can't tell your boss, you know what, I have no idea what's going on, the product is just an emergent property of all these devs doing their own thing. Obviously the centralized planning is informed and influenced by what happens in the trenches but that doesn't mean you don't want to have that capacity.

Quote:

By your argument, how does npm work? you don't make npm go away... but that doesn't matter. nobody has to deal with that whole.

This is because npm doesn't own the contents and each of the repos is more or less its own user-facing product with a specific owner and no one needs to be able to make changes to something they don't own. There's no overarching architecture that ties these repos togeher. Google owns all of that source code, they are not letting things randomly evolve. All of these (maybe not all, but large overlapping portions of it) work together to produce more or less a single product.

Quote:

there is absolutely no way anyone is seeing "all of google's code" and its dependencies and making decisions about it in that way.

It's like saying, you know what, we can't be expected to follow the laws because we're too big and can't watch everyone. Or there are too many cells in our body so they must all be working independently and there can't be a centralized mechanism for controlling our behavior. Or a million-strong army cannot possibly have a central chain of command. None of this is easy or works perfectly but it's obviously possible at some level. You can't just throw your hands in the air, I can't micromanage everything so manage nothing and let things be.

Quote:

they know generally what services are available, and then how to search down with more granularity. exactly the way you'd find a library on npm or rubygems or wherever.

No it's more like, you know what I need to change 5000 places where my library is used in a way that I don't want to support in the future, let's change all of them programmatically and see how many tests I break. Or, welp, there's this security vulnerability found in a commonly used library - let's see if we can fix this across the entire codebase. A well-managed multi-repo system and a monorepo are fairly similar when it comes to reading information, though I've never seen a well-managed multi-repo system at scale that functions well enough to approximate a monorepo in this regard. But at least it's theoretically possible. Where they completely diverge is when you need to make changes - you can't have an atomic commit across multiple repos, that is logically impossible.

Quote

10-19-2017 , 10:25 AM

#30241

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

Originally Posted by candybar

Google is a company with a small number of products and a CEO who is accountable to the board of directors and shareholders. All of the stuff in it has to be absolutely manageable top down, even if that capability isn't utilized all the time. If you're in charge of Google Search or GMail or some other humongous project, you can't tell your boss, you know what, I have no idea what's going on, the product is just an emergent property of all these devs doing their own thing. Obviously the centralized planning is informed and influenced by what happens in the trenches but that doesn't mean you don't want to have that capacity.

Yes but no person understands it all on the code level.

Quote:

I think we're talking past each other. Google is not a single product as I see it. Not even close. Yes higher up directives add constraints on what you can make and how you can make it. But then those constraints trickle down and it's made my a bunch of independent people. But yes, it's still top-down in that sense.

Quote:

It's like saying, you know what, we can't be expected to follow the laws because we're too big and can't watch everyone.

except that's true. the laws happen to be viable because human beings behave more or less predictably when raised within a given culture and living in an environment with certain stable properties: infrastucture, food supply, law enforcement with some degree of effectiveness, and so on. it's that environment + people's nature that gives rise to a lawful society emergently. it's not at all as if the society is being controlled in a top down way, even though we have that mental model for convenience.

Quote:

Or there are too many cells in our body so they must all be working independently and there can't be a centralized mechanism for controlling our behavior.

i don't believe there is, but that's another topic, too fraught to discuss here.

Quote:

You can't just throw your hands in the air, I can't micromanage everything so manage nothing and let things be.

again, i think we're just using terms differently here.

Quote:

ah, ok. THIS makes sense to me. this is what i was missing, or what was not made clear in the articles.

Quote

10-19-2017 , 02:50 PM

#30242

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

Yes but no person understands it all on the code level.

I think I framed this kind of poorly but the meaningful question here isn't whether one person understands everything, but rather can you draw up boundaries such that no one needs to do things that lie across boundaries. The answer is quite clearly no. Another issue with a multi-repo system is that someone has to make a decision as to whether something is a top-level repo and this decision cannot easily be undone later. And changes in this high-level structure cannot easily be tracked. What happens when you move a project from one repo to another?

Quote:

except that's true. the laws happen to be viable because human beings behave more or less predictably when raised within a given culture and living in an environment with certain stable properties: infrastucture, food supply, law enforcement with some degree of effectiveness, and so on. it's that environment + people's nature that gives rise to a lawful society emergently. it's not at all as if the society is being controlled in a top down way, even though we have that mental model for convenience.

The real point has always been that without a monorepo, the whole isn't even an entity that your source control system understands. The real-world equivalent of this isn't some kind of loose federated systems we see in the wild - it's more like states not even knowing that there are other states and things that happen not entirely within one state becoming entirely unknown. Changes that cross the repo boundaries become untrackable. We're not arguing about degrees of centralization vs decentralization in the world - we're talking about whether it's useful to be able to model the whole system. The emergence of the type you're talking about happens specifically as a result of interaction between entities and it's this interaction that multi-repo systems cannot model. If you think this through, this whole decentralization and emergence and all this are another argument for the monorepo, as the major thing a multi-repo system enforces is the centralization of boundaries.

Quote:

I think we're talking past each other. Google is not a single product as I see it.

The monorepo is just for google.com and it can be seen as a single product or a suite of products, just as Excel can be seen as both a standalone product or part of Microsoft Office. But how we name this is irrelevant since you need to be able to run and test the system as a whole. User data and session information for instance, are clearly shared across their almost entire suite of products. If I change something major in the core, I need to be able to model and understand the impact on the whole system. As most major platform vendors at scale now understand, most APIs are poorly specified and testing in isolation is not enough if you care about reliability.

Quote:

ah, ok. THIS makes sense to me. this is what i was missing, or what was not made clear in the articles.

The real thing to understand is that the whole exists no matter what - the sum of all your repos is a thing that exists. The only remaining question is whether you should have tools that allow you work on the whole, across the arbitrary boundaries you may have placed in the first place. It's like having drive letters in Windows - this doesn't help any meaningful way and merely constrains how you organize things. Nothing about the monorepo prevents you from having a directory structure that maps to ownership/project structure such that most of your work happens within your team directory/project. It just makes meta-operations massively easier to perform and track.

Quote

10-19-2017 , 05:44 PM

#30243

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

I think we're starting to converge. I still have a few points of disgreement, or possibly misunderstanding....

Quote:

Originally Posted by candybar

Is it? This is precisely how the web works. For me the web as a whole is like the multi-repo approach in this conversation. And I'm still not really seeing any other way to make things truly scale. There are only services, responsible for themselves, and the APIs they expose... I don't see why a product as gigantic as google would work differently. But perhaps I'm still misunderstanding something here.

Quote:

If you think this through, this whole decentralization and emergence and all this are another argument for the monorepo, as the major thing a multi-repo system enforces is the centralization of boundaries.

I don't understand this part.

Quote:

The monorepo is just for google.com and it can be seen as a single product or a suite of products,

Okay, I thought it was for the entire company, as in every product alphabet offers.

Quote:

User data and session information for instance, are clearly shared across their almost entire suite of products. If I change something major in the core, I need to be able to model and understand the impact on the whole system. As most major platform vendors at scale now understand, most APIs are poorly specified and testing in isolation is not enough if you care about reliability.

What about oauth and other forms of SSO? Separating login/session functionality is done all the time. You have some simple API, everything uses it. Why is that a problem? Why is that not a good thing? Why does any code that uses SSO ever need to know about its implementation? Or are you saying something else?

Quote:

This statement still feels to me like a wholesale rejection of encapsulation, and an endorsement of fully entangling every part of your system because it's still a "whole." I'm assuming that's _not_ what you mean, but I'm struggling to understand how what you're saying is different.

Quote

10-19-2017 , 07:05 PM

#30244

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

Is it? This is precisely how the web works. For me the web as a whole is like the multi-repo approach in this conversation.

No, it's like saying the web is decentralized so let's not bother with a search engine lets you search across the entire web, but only have site-specific search. Decentralization doesn't mean you shouldn't want to have a holistic view. Monorepo is a tool that offers additional functionality over the multi-repo approach. Also, I'd argue that the web as a whole doesn't work at this level - it's full of crap in a way that no company would tolerate or put their brand behind. If the web as a whole was a Google product, the Google CEO would be in prison.

Quote:

And I'm still not really seeing any other way to make things truly scale. There are only services, responsible for themselves, and the APIs they expose... I don't see why a product as gigantic as google would work differently. But perhaps I'm still misunderstanding something here.

What can you do with a multi-repo system that you can't do with a monorepo? You can draw up the exact same boundaries in a monorepo that you would in a multi-repo. The only real major problems with a monorepo are technical and Google/Facebook/Microsoft have more or less solved these for their specific use cases. And let's not forget that these companies largely exist to deal with the exact same type of challenges and you could just as well say what they are trying to do is also impossible, whether indexing nearly all of the web, having nearly all internet users on a single social platform or being the single operating system that runs on just about every PC and runs every type of application imaginable. But they've all succeeded (give or take a few) in their respective missions. They've managed to centralize things that didn't really seem centralizable before. That's what the monorepo offers - it allows you to do things that are seemingly impossible.

Quote:

Okay, I thought it was for the entire company, as in every product alphabet offers.

No - it's not even all of Google, let alone Alphabet. Google monorepo isn't available to other Alphabet companies AFAIK.

Quote:

Nothing at this scale is a "simple API" and in real life, changing the API implementation breaks client code all the time because people rely on undefined, underspecified or ambiguous behavior or the client and the API writers disagree on the correct behavior. And that's an ideal case where people know or care about correctness - often things weren't even really specified or fully documented. No platform vendor makes major changes in the platform without trying to ensure that major clients will keep working. Separation doesn't mean you can't benefit from the additional testing coverage provided by client code or that you wouldn't want to test it if you could. It doesn't work that way even across companies, let alone across repos. Microsoft doesn't ship a new major version of Windows without ensuring that popular third-party applications work. Nothing about the monorepo requires that you do this, but it enables you to test much more easily, to enforce testing if needed, and to fix issues at scale.

Quote:

You have to recognize that there is a whole before you can do any encapsulation. Also not acknowledging the entanglement doesn't make it go away - in a multi-repo system where people don't know what others are doing, it's much easier for dependencies build up and apps and services to get entangled in a way that's unhealthy because no one is empowered to untangle stuff. npm is actually a great example of a decentralized entangled mess that nobody can fix. Not only do you get entanglements, but you get entanglements across time through versioning - this massively increases the complexity and makes reliability nearly unattainable at scale. Again, the point is that the problems you're imagining are problems of scale, which is independent of the repo strategy, and monorepo allows you to deal with the issues rather than live with the symptoms.

Quote

10-19-2017 , 11:12 PM

#30245

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Just a general comment after reading your responses... you seem to be arguing so strongly in favor of monorepo now and how it alone addresses this "whole" problem that I don't understand how you explain the success of companies like amazon, or many others that don't embrace it.

I think the level of our discussion is so abstract at this point that I don't know if we're disagreeing or talking past one another. Nevertheless, it's a new POV to me so I'm interested in it, and I'll respond again. Though I fear the conversation might be approaching the tedium threshold for others (though not for me).

Quote:

Nothing at this scale is a "simple API" and in real life

http. rack. oauth. this statement just isn't true.

as a general point i think appeals to "real life" are dangerous and should be made more specific -- not that the umbrella of "real life" doesn't capture real complex phenomena, but that it's not a useful for concept for understanding things deeply.

Quote:

Separation doesn't mean you can't benefit from the additional testing coverage provided by client code or that you wouldn't want to test it if you could.

but that's exactly the point of true separation. you seem to be arguing that true separation of concerns is a chimera of optimistic engineers. you keep saying you can't ignore the whole, but i don't know what that means because any time I create a self-contained unit of functionality at any level in the stack that's what I'm doing. I think this is a point where we're talking past one another. I'm not sure how to solve it. The level of abstraction of the examples we're using is clearly not the right one (at least for me).

Quote:

You have to recognize that there is a whole before you can do any encapsulation.

But you don't. That's the whole point. You can build a unit of functionality with no knowledge of the whole. Because it can be used across many "whole"s. what about standard library functions? what whole do they recognize?

Quote:

Also not acknowledging the entanglement doesn't make it go away

you can exactly define and enforce the limits of the entanglement by limiting it through API boundaries. There is no more than that.

Quote:

- in a multi-repo system where people don't know what others are doing, it's much easier for dependencies build up and apps and services to get entangled in a way that's unhealthy because no one is empowered to untangle stuff.

everyone knows exactly how they're entangle to any of their dependencies and they are in full control of that entanglement.

Quote:

npm is actually a great example of a decentralized entangled mess that nobody can fix. Not only do you get entanglements, but you get entanglements across time through versioning - this massively increases the complexity and makes reliability nearly unattainable at scale.

which problem are you referring to? there is the leftpad-acopolypse stuff but that's a problem at a very different level -- depending on npm itself to serve your versioned files. there are many solutions to that. but depending on specific versions of modules isn't a problem as long as you have a copy of the version you need. if you're talking about migrating forward i don't see how that problem has anything to do with monorepos or not. except that in a monorepo you have a clear idea of the impact a certain change would have on all you clients, and you could let them inform the design of your next version. i can see the use of that. is that all we're talking about?

Quote

10-19-2017 , 11:30 PM

#30246

jjshabado

Carpal Tunnel

Join Date: Jul 2006 Posts: 22,732

Quote:

Originally Posted by gaming_mouse

I'm reading it, so it's not too tedious. I think you guys are putting too strong of a focus on the importance of monorepo vs multirepo (is that what we call it?).

This is kind of like when people argue for one branching strategy over another. Maybe there's one better way, but it's close and hard to tell in practice.

Using google as an example is kind of flawed. They've invested so much in tooling that they almost certainly could have done almost as well with multiple repos and the same amount of custom tooling.

My biggest reason that the monorepo seems oversold is because code versioning is just a part of the complexity. Even with a monorepo you're going to be dealing with changing apis / versions / etc in the actual running services (at least if you're at a scale where this discussion makes any sense at all).

It's not like a merge to master magically updates all running machines all at once. And just because you can make a change with a single commit doesn't mean you don't have to think of all of the teams impacted and figure out how the change will get rolled out. In practice you're almost always going to be backwards compatible for awhile and then eventually remove the deprecated code anyway.

I'm still a fan of the monorepo model. I think candybar makes a lot of good points. And even more simply it's clearly the right option at the start and so an alternative needs to be significantly better to justify the cost of changing at some point in the future.

Also the tooling Google has seems pretty kickass.

Quote

10-20-2017 , 02:07 AM

#30247

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by gaming_mouse

Just a general comment after reading your responses... you seem to be arguing so strongly in favor of monorepo now and how it alone addresses this "whole" problem

There's a massive drawback to the monorepo - mainly that at some scale, off-the-shelf products completely stop working. The problem here is that we're seeing all these tiny companies who are orders of magnitude away from hitting those limits somehow think it's a good idea to split every feature into its own repo. And I think most of this comes down to not having thought critically about what it actually means to have separate repos vs a single repo. Git scales well enough for most small-to-medium size orgnaizations to use the monorepo approach. I mean, we've talked for a while and you have not listed a single advantage for the multi-repo approach - we're just talking philosophically about the nature of things or something.

When the mono-repo approach is technically feasible without bespoke tools, using a multi-repo approach over a mono-repo approach essentially amounts to something like using RCS over git or svn and talking about how this prevents entanglements. If you can see why it's valuable to use svn over RCS (as it pertains to the ability to have a single commit that spans multiple files), mono-repo over multi-repo adds the same additional capability, but at a larger scale.

Quote:

that I don't understand how you explain the success of companies like amazon, or many others that don't embrace it.

Amazon isn't really a software company at the same scale that Google, Facebook or Microsoft is. It's a nice collection of businesses that are facilitated by software but it's closer to Apple in that sense than either of those three, for whom software is where it starts and ends. Virtually everyone who's worked at both Amazon and Google or Facebook, tends to agree that the latter are much better when it comes to software engineering culture and if you talk to enough of them, it is kind of striking how uniform this view is. For that matter, I've literally never met anyone who has significant experience with both at scale who prefers the multi-repo approach.

Quote:

http. rack. oauth. this statement just isn't true.

By scale here, I'm talking about the complexity of requirements, not the number of deployments. But going with your examples, even these seemingly well-defined protocols fail, despite both sides appearing to adhere to standards. There's no getting around testing - we're don't have correctness proofs so empirical results are all you're going to get and the more the better. You seem to be advocating something along the lines of releasing a new version of Chrome without ever checking if actual sites work with it.

Quote:

Every time an engineer thinks and acts this way - I've created this self-contained thing and don't have to worry about anything outside of my world - they are just creating work for someone else. You can't ship products at this scale without lots of thinking at higher levels of organization, across teams, across projects and across everything - if you don't, someone else has to do it. "But my component works" doesn't matter for customers. At large organizations, it's impossible for all that needs to be done to be assigned to someone - things inevitably fall through the cracks because nobody is able to anticipate all that needs to be done. If people are working in silos and not thinking holistically, it means you discover these problems much later than you need to.

Quote:

This is just not how things work - peoplew who implement standard libraries that are widely used absolutely do care if their changes break popular or important applications that use their libraries and where possible will do what they can to ensure that they don't break things. They would absolutely love the ability to test their changes against as much of the client code as possible - it's just not possible so they do what they can, whether offering test versions ahead of release, compatibility layers in case of breaking changes, a structured deprecation process, migration guidelines, etc, etc. And lack of confidence can also lead to significant delays in adoption of new versions and new features. This is what leads to IE6, Windows XP, DLL Hell, libc compatibility issues, android fragmentation, etc, etc. A monorepo allows you to prevent these and even fix these at scale without hacky workarounds. This is also why linux distros have centralized package repos.

Quote:

you can exactly define and enforce the limits of the entanglement by limiting it through API boundaries. There is no more than that.

Who's "you" here? How would you, as the CTO of a 100-person engineering organization, ensure that individual engineers and teams aren't boiling frogs by slowing accumulating dependencies and entangling themselves to death? Also, enforcing "API boundaries" doesn't necessarily allow for mistakes - what happens if you publish some service that became popular across the engineering organization but need to make large breaking changes? Again, there are potential solutions either way (centralization) but it's fairly easy to see that the monorepo never makes the situation worse and always makes it easier to identify and fix problems. One key thing is the culture it enables, this sense of organization-wide ownership that allows people who are willing to fill the gaps between systems, teams and projects, to do what they can.

Quote:

everyone knows exactly how they're entangle to any of their dependencies and they are in full control of that entanglement.

This doensn't help if you're the CTO and your product is made of 100+ apps and services and everyone's blaming someone else's service for the chronic reliability issues. Decoupling isn't about you being the master of your domain - it's about shipping high-quality products on time.

Quote:

which problem are you referring to?

Embarrassingly low quality of everything and embarrassingly high degrees of coupling. Everything breaks all the time - it's crazy. If you don't lock package versions, every new build is a russian roulette that may or may not break stuff. If you do lock package versions, inevitably when you need to upgrade some stuff, you're upgrading hundreds or thousands packages all at once, not to mention that you're potentially vulnerable in case there were security patches you're missing. The main reason it's serviceable is that actually critical stuff doesn't live there - it's just a collection of glue code. It's not Linux or Windows or Gmail or Office or Google Search or iOS or Google Maps or Spanner or even Postgres. If some popular node module crashes every few hours, it doesn't really matter.

Quote:

but depending on specific versions of modules isn't a problem as long as you have a copy of the version you need.

It's a big problem if you care about things like security. Also, cross-cutting concerns are such that even old versions of software need to be maintained - maybe your old library is relying on a logging service that is being deprecated, maybe it depends on something else that has a date-related issue. So if there are 10 old versions of your library, you may need to support and patch all of them potentially. Monorepo allows maintainers to weigh these concerns properly and force-migrate clients at call-site.

Quote

10-20-2017 , 02:22 AM

#30248

candybar

Pooh-Bah

Join Date: Aug 2011 Posts: 4,234

Quote:

Originally Posted by jjshabado

Using google as an example is kind of flawed. They've invested so much in tooling that they almost certainly could have done almost as well with multiple repos and the same amount of custom tooling.

True but it's important to recognize that if you have lots of custom tooling that allows you to work across repos seamlessly, it is a kind of monorepo - you just may not have all the features of a proper monorepo.

Quote:

My biggest reason that the monorepo seems oversold is because code versioning is just a part of the complexity. Even with a monorepo you're going to be dealing with changing apis / versions / etc in the actual running services (at least if you're at a scale where this discussion makes any sense at all).

It's not like a merge to master magically updates all running machines all at once. And just because you can make a change with a single commit doesn't mean you don't have to think of all of the teams impacted and figure out how the change will get rolled out. In practice you're almost always going to be backwards compatible for awhile and then eventually remove the deprecated code anyway.

It definitely doesn't solve every problem but the point here is that the alternative is worse - 50 pull requests across repos that are reviewed, merged and released completely indepdently and 10 pull requests that should've been made but not made because well, we didn't know about these. Or worse, people just not doing certain types of refactoring or cleanup because the friction is too high. Also not all dependencies are service dependencies - library dependencies are in fact often solved with a single merge to master and doesn't necessarily require an ability to deploy atomically.

Quote:

And even more simply it's clearly the right option at the start and so an alternative needs to be significantly better to justify the cost of changing at some point in the future.

This is the key thing but sadly, lots of people are going in the wrong direction because they feel a separate repo is some kind of clean abstraction barrier, which it isn't.

Quote

10-20-2017 , 03:27 AM

#30249

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

and you have not listed a single advantage for the multi-repo approach - we're just talking philosophically about the nature of things or something.

I mentioned at the beginning that I think it forces people to adhere to thin APIs and treat their code like a service and use other code like a service. having implementation details available just makes people think badly. they can't unsee them.

it's getting late, but your other points are interesting. you're definitely getting me to think about the problem differently.