How to start an engineering team
Founders often ask me how we did software engineering in the early days of Blend, what our biggest mistakes were, and how I would do it if I were starting a new team today. This is my (opinionated) guide to getting a startup engineering team up and running based on what I've learned.
Disclaimer: startup success is path-dependent (non-ergodic). A hack that allows you to survive in the short term is worth it no matter how much it will cost to fix in the future. Much of this advice is based on unnecessary mistakes I've made out of ignorance, not do-or-die hacks. The key is to be deliberate about each tradeoff, understanding what you gain and lose.
Building the team
The minimum viable team consists of a software engineer, a designer, and a product manager. More than one of these roles can be played by the same person at first. Soon you'll become bottlenecked on engineering and need to bring on more engineers. Try to hire smart, full-stack generalists. Depending on what you're building, it may be okay to bring on specialists in web front end or mobile engineering early on. All engineers, including specialists, should have solid programming and CS fundamentals. For more on evaluating engineers, check out How to interview engineers.
Define your company principles early and only hire people who believe in them. Disagreement on principles will lead to pointless arguments that slow down the company and create toxic disunity. It can be tempting to hire talented people who don't agree with your principles, but it will cost you. Every principle should be controversial to the general talent pool. If it's not controversial, there's no need to state it explicitly. The bigger your team is when you add a principle, the more conflict and disruption it will cause. So it's best to define them as early as possible.
Once you have more than one engineer, one person should become the engineering manager and own technical and hiring/firing decisions going forward. As the team grows, ownership of technical decisions can be delegated to someone else. The engineering manager should do a one-on-one with each engineer every two weeks or so to surface and address frustrations, doubts, and misalignment with the principles.
Choosing a tech stack
You'll need to choose a programming language, a database, and an infrastructure provider. In general, choose a tech stack consisting of popular technologies that will be easy for a future team to maintain and build upon. Paraphrasing Warren Buffet, "choose a tech stack that a ham sandwich could manage".
For your programming language, choose one that's popular and has static typing. Popularity is important because it allows you to use open source libraries for everything that isn't core to your app and to rely more on your search engine and Stack Overflow to get unstuck. Static typing is important because it makes the codebase easier to understand and refactor as it grows more complicated and changes hands. At Blend, we started out using Scala, which has static typing but wasn't popular. When we realized we were being forced to waste valuable time debugging and reverse engineering Scala build tools, we switched to Node.js with plain JavaScript, which is popular but lacks static typing. This was more productive because it allowed us to use higher quality, easier to use open source libraries and made build speed a non-issue. However, within a couple years, it resulted in a codebase that was messy, bug-prone, and extremely hard to refactor.
Avoid introducing more than one backend language. Using the same language across your entire backend makes it easier to reuse common libraries and to reassign engineers to different codebases. You don't want to end up in a situation where half of your team think of themselves as "C++ engineers" and the other half as "Java engineers". Smart engineers will join and pitch you on trying out different languages, but you just have to say no. There are benefits of having multiple competing stacks in your company eventually, but only much further down the line. If you must switch languages, at least stop all new work in the original language and force all new code to be written in the new language.
For your database, choose something popular and with types. As with languages, popularity is important because it gives you more options for hosting/admin and makes debugging easier. Types make it easy to ensure that your dataset remains sane and consistent. At Blend, we started out using Mongo because it was easy to get started with and allowed us to add and modify data types without worrying about migrations. As the team grew and more product edge cases came up, the data structures stored in Mongo became less consistent. We had to mitigate this (before Mongo had a schema feature) by adding slow, unreliable data consistency checks along with validation checks before database writes to ensure that data stayed in the expected schema. Later we recommended that all new services use PostgreSQL by default.
For your infrastructure provider, AWS or GCP will scale to whatever you need to do. It may be fine to start out with Heroku or similar for simplicity. Regardless, run your app in Docker from the beginning to make it somewhat portable. Avoid on-prem or any architecture that requires you to administer customer-specific infrastructure/instances.
Work in a single codebase and deploy a single service (unless you're going serverless). It doesn't make sense to organize your codebase into microservices until you have more certainty about your product and your team is big enough to have multiple pods that would benefit from shipping independently. If you do microservices prematurely, you'll end up with awkward, unnecessary seams between codebases that can be surprisingly difficult to merge.
Keeping quality and velocity high
It's much easier to maintain quality and velocity than to restore it the future. Without diligently keeping the bar high from the beginning, your codebase will tend to decay, becoming more fragile and harder to build upon and maintain. Quality enables velocity; having a robust test suite and the ability to quickly fix bugs allows you to move fast.
One part of keeping the bar high from the beginning is building a solid test suite as you build your app. Coding around testability also forces you to organize your code modularly and with minimal branching. Aim for a comprehensive unit test suite and a small end-to-end test suite. Require tests to pass on every PR before merging.
Start out with 100% unit test coverage (including on the front end) but use inline coverage exclusions liberally to avoid wasting time writing tests for unimportant edge cases or highly speculative / prototype-esque features. The important thing is thinking about unit test coverage as the default rather than a nice-to-have. The lack of unit test coverage for any given code branch is tech debt.
Write a few end-to-end tests to cover your app's most critical workflows. End-to-end tests are not a replacement for unit tests. They're much better than manual tests, but having too many will still slow down your dev cycle because they're flaky and slow relative to unit tests.
If you somehow end up with insufficient testing but need to really start shipping to customers without breaking stuff, do not hire QA testers to run through manual tests. This is a slippery slope to a huge manual regression test suite. The manual test suite will become a release bottleneck and a major hole to dig yourself out of. If you don't have time to write automated tests, start out by writing them as checklists and have the engineer doing the code review run through them manually. If you offload this to a non-engineer, it removes the engineer's incentive to automate it. We made the mistake of outsourcing pre-release manual QA, and it took a coordinated effort lasting over a year across a 50+ person team to automate the manual test suite that we had built up.
Pull requests should be scoped as minimum viable independent improvements, and code reviews should be required on all pull requests (once you have at least two people coding). This is another benefit of hiring multiple full-stack generalists: all changes can be reviewed in depth. Packing multiple orthogonal changes into a single PR reduces clarity, makes the change more difficult to read, and increases the likelihood of shipping a bug. If you receive a PR with multiple orthogonal changes, insist that the author break it up into multiple PRs.
Deploy continuously all the way to production before merging. This ensures that your master branch is always in a healthy, production-ready state. Pushing multiple changes to production simultaneously makes it nontrivial to find and revert the faulty change. GitHub flow spells out this process. Basically prepare your change, pass the tests, get an approval, deploy to your staging environment, and then deploy to production. If your change isn't working as expected in production, revert production to master. Once you're confident that it's working, merge your PR. I've been surprised by how many people fervently disagree with this approach, but it makes perfect sense.
Whenever there's a major failure, run a Five Whys to determine what went wrong and how you will mitigate it so that similar issues can never happen again. Whoever caused the issue should prepare a timeline of key events. Then you should get everyone in a room, start with the top-level error, ask why it happened, ask why that happened, etc., discussing and recording mitigations as you drill down. This is essential for learning as an organization. For bonus points, convince teams outside of engineering to follow this same practice.
Establishing an operating rhythm
Work in weekly sprints with daily stand-ups. Once a week, have a sprint planning meeting where each team member commits to the tasks they'll complete in the next sprint. The sprint planning meeting is also a good time to talk about what went well or poorly in the previous sprint. At the daily stand-up, each team member should give an update on their progress, raise any blockers, and commit to one thing they'll get done by the next stand-up. Committing to exactly one thing forces you to focus on the most important task. People should feel some shame if they don't complete their stand-up commitment.
To create accountability to the rest of the company, host a companywide demo session every two weeks or so. Each engineer should demo what they worked on since the last demo session. This helps keep engineers focused on changes with clear user impact and minimizes procrastination on fun internal improvements.
The team should work together side-by-side in person. It's important to be able to frictionlessly ask other team members questions and get unstuck. Code reviews are much more efficient when you can talk through them live. This has become next to impossible since the COVID-19 outbreak, but many startups are working on online simulations of working together in an office.
It's also important for everyone to feel accountable to each other to put in their fair share of energy and discipline. Startups don't get off the ground without hard work and dedication. The amount of time and effort each person puts in will trend down as the company grows and mainstream corporate tech culture starts to take hold. So it's important to at least start out with everyone working long hours and weekends and giving it as much energy as you can sustain.