<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Notes by Kaustav Chakraborty]]></title><description><![CDATA[Strategic DevOps & Cloud Security insights for CTOs. Architecting scalable, secure infrastructure and high-performance backend systems. 12+ years of expertise.]]></description><link>https://notes.iamkaustav.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1688247556527/5IXKGlkTk.png</url><title>Notes by Kaustav Chakraborty</title><link>https://notes.iamkaustav.com</link></image><generator>RSS for Node</generator><lastBuildDate>Sun, 19 Apr 2026 21:10:03 GMT</lastBuildDate><atom:link href="https://notes.iamkaustav.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Terraform Apply Crashed in CI? Here's How to Recover Your S3 State]]></title><description><![CDATA[TL;DR

A terraform apply killed mid-run in GitHub Actions leaves behind two DynamoDB artefacts: a stale lock and a mismatched MD5 digest.

Most guides only mention force-unlock. That fixes the lock, b]]></description><link>https://notes.iamkaustav.com/terraform-apply-crashed-in-ci-here-s-how-to-recover-your-s3-state</link><guid isPermaLink="true">https://notes.iamkaustav.com/terraform-apply-crashed-in-ci-here-s-how-to-recover-your-s3-state</guid><category><![CDATA[Terraform]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Devops]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[Infrastructure as code]]></category><category><![CDATA[S3]]></category><category><![CDATA[terraform-state]]></category><category><![CDATA[DynamoDB]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Sat, 14 Mar 2026 09:05:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/649e01fbd32fe6db996c3051/ae03f217-b11f-4237-95ef-2f81f6d55688.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h3>TL;DR</h3>
<ul>
<li><p>A <code>terraform apply</code> killed mid-run in <strong>GitHub Actions</strong> leaves behind <strong>two DynamoDB artefacts</strong>: a stale lock and a mismatched MD5 digest.</p>
</li>
<li><p>Most guides only mention <code>force-unlock</code>. That fixes the lock, but you'll still get <em>"state data in S3 does not have the expected content"</em> until you patch the digest.</p>
</li>
<li><p>This post walks through the <strong>why</strong>, the <strong>diagnosis</strong>, and the <strong>exact 7-step fix</strong> so you can recover cleanly without recreating state from scratch.</p>
</li>
</ul>
<h3>The Incident</h3>
<p>I was rolling out ECR repositories for four microservices via a reusable Terraform module. The pipeline, a standard <code>plan → apply</code> workflow on <strong>GitHub Actions</strong> had been reliable for months.</p>
<p>One afternoon the CI runner was terminated mid-<code>apply</code>. The reason didn't matter much (runner preemption, timeout, OOM — pick your favourite). What mattered was the aftermath: every subsequent <code>terraform plan</code> failed with this:</p>
<pre><code class="language-shell">Initializing modules...
- orders_api_service_ecr_repo      in ../../../modules/aws_ecr
- notifications_service_ecr_repo   in ../../../modules/aws_ecr
- inventory_service_ecr_repo       in ../../../modules/aws_ecr
- gateway_service_ecr_repo         in ../../../modules/aws_ecr

Initializing the backend...

Successfully configured the backend "s3"!

Error refreshing state: state data in S3 does not have the expected content.

This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
</code></pre>
<p>Terraform told me <em>what</em> to do, <strong>update a Digest</strong>, but not <em>where</em> or <em>why</em>. If you’ve landed here from the same error, read on.</p>
<h3>How the S3 Backend Actually Works</h3>
<p>Before jumping to the fix, it helps to understand the moving parts. Terraform’s S3 backend uses <strong>two AWS services</strong> in tandem:</p>
<img src="https://cdn-images-1.medium.com/max/1600/1*-uNl7UvE49riuq3IebmY9A.png" alt="" style="display:block;margin:0 auto" />

<p>Key insight — DynamoDB stores two items per state file, not one:</p>
<img src="https://cdn-images-1.medium.com/max/1600/1*CbflEsRRZLxM5Vu6qpkFtA.png" alt="" style="display:block;margin:0 auto" />

<p>When <code>apply</code> finishes normally, Terraform:</p>
<ol>
<li><p>Writes the new state to S3.</p>
</li>
<li><p>Computes the MD5 of that file and stores it in the <code>-md5</code> item.</p>
</li>
<li><p>Releases the lock by deleting the lock item.</p>
</li>
</ol>
<p>When the runner is <strong>killed mid-apply</strong>, steps 2 and 3 never happen. That leaves you with two problems, not one.</p>
<h3>Diagnosis: Two Problems, Not One</h3>
<img src="https://cdn-images-1.medium.com/max/1600/1*sAx1cJ6oftcMiH8BmuYzqA.png" alt="" style="display:block;margin:0 auto" />

<h3>Problem 1: Stale Lock</h3>
<p>The lock item at <code>…/terraform.tfstate</code> was never released because the runner was killed. Any future <code>plan</code> or <code>apply</code> will fail with <strong>"state is locked"</strong>.</p>
<h3>Problem 2: Digest Mismatch</h3>
<p>The interrupted <code>apply</code> may have written a <em>partial or updated</em> state file to S3, but the MD5 in the <code>-md5</code> DynamoDB item still reflects the <em>previous</em> state. Terraform computes the MD5 of the current S3 object, compares it to the stored digest, and <strong>refuses to proceed</strong> because they don't match.</p>
<blockquote>
<p><em>Most Stack Overflow answers jump straight to</em> <code>force-unlock</code><em>. That fixes Problem 1 but leaves Problem 2 untouched, and you can't even run</em> <code>force-unlock</code> <em>until</em> <code>init</code> <em>succeeds, which it won't until the digest is fixed.</em></p>
</blockquote>
<hr />
<h3>The 7-Step Recovery</h3>
<img src="https://cdn-images-1.medium.com/max/1600/1*lELWeOrnYdPK4iuDmDB8zw.png" alt="" style="display:block;margin:0 auto" />

<h3>Step 1: Confirm nothing is running</h3>
<p>Check GitHub Actions for any in-flight runs of your apply workflow. Check local terminals too. Running <code>force-unlock</code> while a legitimate operation is in progress <strong>will corrupt state</strong>.</p>
<h3>Step 2: Back up the S3 state file</h3>
<p>In the S3 bucket, locate <code>global/ecr/terraform.tfstate</code> (or your equivalent key):</p>
<ul>
<li><p>Verify it exists and is non-zero.</p>
</li>
<li><p>If S3 versioning is enabled, download the current <em>and</em> previous version. The current one may be partially written.</p>
</li>
</ul>
<pre><code class="language-shell">aws s3 cp s3://your-bucket/global/ecr/terraform.tfstate ./terraform.tfstate.bak
</code></pre>
<h3>Step 3: Patch the digest in DynamoDB</h3>
<p>Open <strong>DynamoDB → your lock table → Explore items</strong>. Search for the item whose <code>LockID</code> ends with <code>-md5</code>:</p>
<pre><code class="language-shell">your-bucket/global/ecr/terraform.tfstate-md5
</code></pre>
<ul>
<li><p><strong>If the item exists:</strong> update its <code>Digest</code> attribute to the value from the error message (e.g. <code>a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6</code>).</p>
</li>
<li><p><strong>If it doesn’t exist:</strong> create a new item with <code>LockID</code> = <code>…-md5</code> and <code>Digest</code> = that hash.</p>
</li>
</ul>
<blockquote>
<p><strong>Why this value?</strong> Terraform already computed the MD5 of the current S3 object and told you in the error. You’re simply telling DynamoDB “yes, that’s the right file.”</p>
</blockquote>
<h3>Step 4: Run <code>terraform init</code></h3>
<pre><code class="language-shell">terraform init
</code></pre>
<p>This should now succeed. If it still fails with the digest error, double-check the <code>LockID</code> key — the path must exactly match.</p>
<h3>Step 5: Force-unlock the stale lock</h3>
<pre><code class="language-shell">terraform force-unlock &lt;LOCK-ID&gt;
</code></pre>
<p>The lock ID is the UUID from the lock item’s <code>Info</code> JSON. Terraform will prompt for confirmation.</p>
<h3>Step 6: Plan and review</h3>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>Review carefully. Some resources may have been created by the interrupted apply. The plan shows exactly what’s pending.</p>
<h3>Step 7: Apply</h3>
<pre><code class="language-shell">terraform apply
</code></pre>
<h3>Why Order Matters</h3>
<img src="https://cdn-images-1.medium.com/max/1600/1*JMBnJJPPgBwmU6YmUIo6Jg.png" alt="" style="display:block;margin:0 auto" />

<p>You <strong>cannot</strong> skip ahead. <code>init</code> needs a valid digest. <code>force-unlock</code> needs a successful <code>init</code>. <code>plan</code>/<code>apply</code> need the lock released. The dependency chain is strict.</p>
<hr />
<h3>Preventing This Next Time</h3>
<p>A few guardrails I’ve added since this incident:</p>
<ol>
<li><p><strong>S3 versioning:</strong> Always enabled on the state bucket. Gives you a rollback path if the state file itself is corrupted.</p>
</li>
<li><p><strong>CI timeouts with grace periods:</strong> Set workflow <code>timeout-minutes</code> generously and add a cleanup step that logs the lock ID on failure.</p>
</li>
<li><p><strong>Alerting on stale locks:</strong> A simple scheduled Lambda that scans the DynamoDB lock table for items older than <em>N</em> hours and posts to Slack.</p>
</li>
<li><p><strong>State backup before apply:</strong> Add a pre-apply step in CI that copies the current state to a versioned “backup” prefix in S3.</p>
</li>
</ol>
<blockquote>
<p><strong>Note on Terraform 1.10+</strong>: Terraform now supports <a href="https://developer.hashicorp.com/terraform/language/backend/s3">S3-native state locking</a> without DynamoDB. If you’re starting fresh, consider this path, the digest/lock split issue goes away entirely.</p>
</blockquote>
<hr />
<h3>References</h3>
<ul>
<li><p><a href="https://developer.hashicorp.com/terraform/language/backend/s3">Terraform S3 Backend Documentation</a>: Official backend config reference including new S3-native locking.</p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/cli/commands/force-unlock">terraform force-unlock Command</a>: CLI reference for manual lock removal.</p>
</li>
<li><p><a href="https://github.com/hashicorp/terraform/issues/20708">GitHub Issue #20708</a>: Community thread on the exact “state data does not have expected content” error.</p>
</li>
<li><p><a href="https://aws.plainenglish.io/425-terraform-state-corruption-in-s3-backend-how-to-detect-recover-and-prevent-it-b86df2ff1b8f">Terraform State Corruption Recovery (Medium)</a>: A complementary deep dive on state corruption scenarios.</p>
</li>
<li><p><a href="https://terrateam.io/blog/terraform-state-aws-s3-backend">Managing Terraform State on AWS (Terrateam)</a>: Solid end-to-end guide on S3 + DynamoDB setup with GitHub Actions.</p>
</li>
</ul>
<hr />
<p>Thank you for reading this article! 🙏 If you’re interested in <strong>DevOps</strong>, <strong>Security</strong>, or <strong>Leadership</strong> for your startup, feel free to reach out at <a href="mailto:hi@iamkaustav.com"><strong>hi@iamkaustav.com</strong></a> or book a slot in <a href="https://cal.com/iamkaustav/30min">my calendar</a>.</p>
<p>👉 Don’t forget to subscribe to my newsletter for more insights on my security and product development journey. Stay tuned for more posts!</p>
<p>💡 One shameless promotion: I’m building an <strong>easy-to-use freelance management service for technical freelancers</strong>. Check it out here → <a href="https://www.getprismo.app/">https://www.getprismo.app/</a>. If you are interested to secure limited seats of early adopters, <em>Join the</em> <a href="https://www.getprismo.app/#cta"><em>waitlist</em></a>.</p>
<hr />
<div>
<div>💡</div>
<div>This post was originally published on <a target="_self" rel="noopener noreferrer nofollow" class="text-primary underline underline-offset-2 hover:text-primary/80 cursor-pointer" href="https://iamkaustav.medium.com/terraform-apply-crashed-in-ci-heres-how-to-recover-your-s3-state-ad5f2f6adfa5" style="pointer-events:none">Medium</a>.</div>
</div>]]></content:encoded></item><item><title><![CDATA[Aliasing your existing git branch]]></title><description><![CDATA[If you happened to miss the master branch and your organization has adopted the more popular main naming convention as the default branch, don’t worry! You can easily switch back to the previous environment without causing any issues. This can be don...]]></description><link>https://notes.iamkaustav.com/aliasing-your-existing-git-branch</link><guid isPermaLink="true">https://notes.iamkaustav.com/aliasing-your-existing-git-branch</guid><category><![CDATA[Git]]></category><category><![CDATA[tools]]></category><category><![CDATA[Developer Tools]]></category><category><![CDATA[Developer]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[efficiency]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Tue, 16 Jul 2024 12:22:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1721132432158/088b6515-c326-45fa-aa7f-a4c0901feddd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you happened to miss the <code>master</code> branch and your organization has adopted the more popular <code>main</code> naming convention as the default branch, don’t worry! You can easily switch back to the previous environment without causing any issues. This can be done using a simple <a target="_blank" href="https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases">Git alias command</a>, which allows you to create shortcuts for longer Git commands. By setting up an alias, you can quickly and efficiently switch between branches without having to remember the full command each time. This not only saves time but also reduces the chance of making errors when typing out commands. Here’s how you can set up a Git alias to switch back to the <code>master</code> branch:</p>
<pre><code class="lang-sh">git config --global alias.switch-master <span class="hljs-string">'checkout master'</span>
</code></pre>
<p>With this alias in place, you can simply type <code>git switch-master</code> to switch to the <code>master</code> branch.</p>
<p>Also you can create symlink for these git commands, for an example -</p>
<pre><code class="lang-bash">git symbolic-ref refs/heads/master refs/heads/main
</code></pre>
<p>This method ensures that you can seamlessly navigate between the <code>main</code> and <code>master</code> branches, maintaining your workflow without any disruptions.</p>
<p>This tool won’t replace everything, but it can help you use old commands like:</p>
<pre><code class="lang-bash">git checkout master

git rebase -i master

git diff master
</code></pre>
<hr />
<p><em>Thank you for reading this article! If you're interested in DevOps, Security, or Leadership for your startup, feel free to reach out at</em> <strong><em>hi@iamkaustav.com</em></strong> <em>or book a slot in</em> <a target="_blank" href="https://calendly.com/iamkaustav/30min"><em>my calendar</em></a><em>. Don't forget</em> <em>to subscribe to my newsletter for more insights on my security and product development journey. Stay tuned for more posts!</em></p>
]]></content:encoded></item><item><title><![CDATA[How to Monitor Custom IAM Users in Your AWS Organization]]></title><description><![CDATA[Managing AWS accounts is one of the most challenging jobs for any Platform / Security team. While maintaining availability and security, monitoring who is getting access to AWS accounts is essential.

Let’s assume the CTO of PlatformSecurityTech (an ...]]></description><link>https://notes.iamkaustav.com/how-to-monitor-custom-iam-users-in-your-aws-organization</link><guid isPermaLink="true">https://notes.iamkaustav.com/how-to-monitor-custom-iam-users-in-your-aws-organization</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS Config]]></category><category><![CDATA[Security]]></category><category><![CDATA[cloudsecurity]]></category><category><![CDATA[IAM]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Mon, 17 Jun 2024 04:00:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718452172893/569ad208-851b-41b3-86b6-d7b42fdac6eb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Managing AWS accounts is one of the most challenging jobs for any Platform / Security team. While maintaining availability and security, monitoring who is getting access to AWS accounts is essential.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718450252285/57259e6a-78e7-474b-9a7e-7d4cd056e955.png" alt="Sample AWS organization accounts" class="image--center mx-auto" /></p>
<p>Let’s assume the CTO of <strong>PlatformSecurityTech</strong> (<em>an imaginary IT company</em>) asked their Platform team to create a few accounts for their product offerings. Right after the request, the team set up 3 accounts per the above diagram and handed them to the company. The team used AWS Identity Center to manage its users securely and maintain further security.</p>
<p>Since users of <strong>Account A</strong> and <strong>Account B</strong> had unrestrictive permissions for their day-to-day needs (<em>it could be a lack of understanding of</em> <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html"><em>programmatic access</em></a> <em>through SSO</em>), many IAM users were created. Among them, a few were not even used for months 🤔. It was getting difficult to maintain their usage. The team decided to perform bi-weekly audits on existing IAM users in different accounts and later monitor them.</p>
<p>It’s easy if you have one or two accounts to monitor. But in an ideal world, there are many of them. Going through each one is not only tiring but also time-consuming. They decided to go ahead with <a target="_blank" href="https://docs.aws.amazon.com/config/latest/developerguide/querying-AWS-resources.html">Advance Query</a> in <a target="_blank" href="https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html">AWS Config</a> to increase productivity and provide a seamless experience in monitoring those IAMs.</p>
<p>Given they restructured the organization as below,</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718450766237/0175d695-6fef-4b04-82fa-fa270c8b1d97.png" alt class="image--center mx-auto" /></p>
<p>Here, they developed another AWS account called <strong>Delegated Admin,</strong> which manages all the administrative work (non-root account). Due to its nature, this new account is connected with all other accounts and can fetch data and perform specific activities in those accounts. Assuming <a target="_blank" href="https://docs.aws.amazon.com/config/latest/developerguide/gs-console.html">AWS Config</a> is confi<a target="_blank" href="https://docs.aws.amazon.com/config/latest/developerguide/gs-console.html">g</a>ured in the new account, you can now observe your account’s activity. Now, to check the existing IAM users,</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*AQflGMiEgKilC55cqUCtIQ.png" alt="AWS Config Dashboard" class="image--center mx-auto" /></p>
<p>Go to https://console.aws.amazon.com/config/ for the AWS Config dashboard. There, you will find a similar-looking sidebar. Now, go to <strong>Advance Query</strong> from the sidebar.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*xI6nxLnssin7g-i6U-KR_g.png" alt="AWS Config — Advance Query dashboard" class="image--center mx-auto" /></p>
<p>You will find something similar to the above page, which might be pre-populated with a few queries through the team-configured conformance pack. Click on the new query and add it to the query editor below.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span>
  accountId,
  resourceName,
  resourceId,
  resourceCreationTime
<span class="hljs-keyword">WHERE</span>
  resourceType = <span class="hljs-string">'AWS::IAM::User'</span>
<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span>
  accountId
</code></pre>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*nmHUXJKnswo7i6JAHkoV5A.png" alt="AWS Config — Advance Query Editor" class="image--center mx-auto" /></p>
<p>Running this query will show the list of IAM users in the AWS organization, including the other details mentioned in the selected query. Once the team collects the data from their audits, they can contact the creator of those IAM users for their needs and perform the corrections accordingly. One can export the data to any sharable format for frequent updates.</p>
<hr />
<p><em>Thank you for reading this article! If you're interested in DevOps, Security, or Leadership for your startup, feel free to reach out at</em> <a target="_blank" href="mailto:hi@iamkaustav.com"><em>hi@iamkaustav.com</em></a> <em>or book a slot in</em> <a target="_blank" href="https://calendly.com/iamkaustav/30min"><em>my calendar</em></a><em>. Don't forget to subscribe to my newsletter for more insights on my security and product development journey. Stay tuned for more posts!</em></p>
]]></content:encoded></item><item><title><![CDATA[Parallel Function Execution in Go Using Concurrency]]></title><description><![CDATA[Introduction
As part of my exploration of Golang, I came across a popular feature: first-class support for concurrency. I believe we all understand the benefit or importance of concurrency. In the HTTP way, when an endpoint needs to fetch data from m...]]></description><link>https://notes.iamkaustav.com/parallel-function-execution-in-go-using-concurrency</link><guid isPermaLink="true">https://notes.iamkaustav.com/parallel-function-execution-in-go-using-concurrency</guid><category><![CDATA[Go Language]]></category><category><![CDATA[concurrency]]></category><category><![CDATA[parallelism]]></category><category><![CDATA[goroutines]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[basics of golang]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Sat, 15 Jun 2024 10:35:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718447044558/9685f74f-dd6e-4320-bf0f-765f65504949.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>As part of my exploration of Golang, I came across a popular feature: first-class support for concurrency. I believe we all understand the benefit or importance of concurrency. In the HTTP way, when an endpoint needs to fetch data from multiple <a target="_blank" href="https://en.wikipedia.org/wiki/Upstream_server">upstreams</a>, aggregate the data and produce it as a response, Go concurrency helps to reduce the latency for that API request. Two features in Go, <a target="_blank" href="https://golangdocs.com/goroutines-in-golang">goroutines</a> and <a target="_blank" href="https://golangdocs.com/channels-in-golang">channels</a> make concurrency easier when used together.</p>
<h2 id="heading-goroutines-example-run-functions-in-parallel">Goroutines example: Run functions in parallel</h2>
<p>Modern computers are equipped with processors, or <a target="_blank" href="https://en.wikipedia.org/wiki/Central_processing_unit">CPUs</a>, designed to efficiently handle multiple streams of code simultaneously. These processors are built with one or more "cores," each capable of running one code stream at a given time. To fully utilize the speed boost <a target="_blank" href="https://en.wikipedia.org/wiki/Multi-core_processor">multiple cores</a> offer, programs must be able to split into various streams of code. This division can be challenging, but Go was explicitly developed to simplify this process.</p>
<p>Go achieves this through a feature known as <em>goroutines</em>, special functions that can run alongside other goroutines. When a program is built to execute multiple streams of code simultaneously, it operates <a target="_blank" href="https://en.wikipedia.org/wiki/Concurrency_(computer_science)">concurrently</a>. Unlike traditional foreground operations, in which a function runs to completion before the following code executes, goroutines allow for background processing, enabling the following code to run while the goroutine is still active. This background operation ensures that the code doesn't block other processes from running.</p>
<p>Goroutines provide the advantage of running on separate processor cores simultaneously. For instance, if a computer has four processor cores and a program has four goroutines, all four can run concurrently. This simultaneous execution of multiple code streams on different cores is called <a target="_blank" href="https://en.wikipedia.org/wiki/Parallel_computing"><em>parallel</em></a> processing.</p>
<p>Jumping into the example, create a <code>multifunc</code> directory named <code>go-concurrency-project</code>.</p>
<pre><code class="lang-bash">mkdir go-concurrency-project
<span class="hljs-built_in">cd</span> go-concurrency-project
</code></pre>
<p>Once you’re in the <code>go-concurrency-project</code> Directory, open a file named <code>main.go</code> using <code>nano</code>, or the editor of your choice:</p>
<pre><code class="lang-bash">nano main.go
</code></pre>
<p>Add the following code to the <code>main.go</code> file,</p>
<pre><code class="lang-go"><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">make</span><span class="hljs-params">(total <span class="hljs-keyword">int</span>)</span></span> {
    number := <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> number &lt; total {
        number = number + <span class="hljs-number">1</span>
        fmt.Printf(<span class="hljs-string">"Generated number %d\n"</span>, number)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">print</span><span class="hljs-params">()</span></span> {
    number := <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> number &lt; <span class="hljs-number">2</span> {
        number = number + <span class="hljs-number">1</span>
        fmt.Printf(<span class="hljs-string">"Print: number %d\n"</span>, number)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-built_in">print</span>()
    <span class="hljs-built_in">make</span>(<span class="hljs-number">2</span>)
}
</code></pre>
<p>Based on the above setup, <code>make</code> and <code>print</code> Functions are structured to run in sequence. <code>make</code> Accepts a number to generate up to and prints only five numbers.</p>
<p>This is how it will look like when we execute <code>main.go</code>,</p>
<pre><code class="lang-powershell">go run make.go

// Output
Print: number <span class="hljs-number">1</span>
Print: number <span class="hljs-number">2</span>
Generated number <span class="hljs-number">1</span>
Generated number <span class="hljs-number">2</span>
</code></pre>
<p>If you notice, the function printed the output in sequence based on its execution pattern.</p>
<p>When running two functions <em>synchronously</em>, the program takes the <strong>total time</strong> for both functions to run. But if the functions are independent, you can speed up the program by running them concurrently using <a target="_blank" href="https://go.dev/doc/effective_go#goroutines">goroutines</a>, potentially cutting the time in half. To run a function as a goroutine, use the <code>go</code> keyword before the function call. However, you need to add a way for the program to wait until both goroutines have finished running to ensure they all complete running.</p>
<p>To synchronize functions and wait for them to finish in Go, you can use a <a target="_blank" href="https://pkg.go.dev/sync#WaitGroup"><code>WaitGroup</code></a> from the <a target="_blank" href="https://pkg.go.dev/sync"><code>sync</code></a> package. The <code>WaitGroup</code> primitive counts how many things it needs to wait for using the <code>Add</code>, <code>Done</code>, and <code>Wait</code> functions. The Add function increases the count, <code>Done</code> decreases the count, and <code>Wait</code> can be used to wait until the count reaches zero.</p>
<p>To do that update <code>main.go</code>,</p>
<pre><code class="lang-go"><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"sync"</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">make</span><span class="hljs-params">(total <span class="hljs-keyword">int</span>, wg *sync.WaitGroup)</span></span> {
    <span class="hljs-keyword">defer</span> wg.Done()

    number := <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> number &lt; total {
        number = number + <span class="hljs-number">1</span>
        fmt.Printf(<span class="hljs-string">"Generated number %d"</span>, number)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">print</span><span class="hljs-params">(wg *sync.WaitGroup)</span></span> {
    <span class="hljs-keyword">defer</span> wg.Done()

    number := <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> number &lt; <span class="hljs-number">2</span> {
        number = number + <span class="hljs-number">1</span>
        fmt.Printf(<span class="hljs-string">"Print: number %d"</span>, number)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">var</span> wg sync.WaitGroup

    wg.Add(<span class="hljs-number">2</span>)
    <span class="hljs-keyword">go</span> <span class="hljs-built_in">print</span>(&amp;wg)
    <span class="hljs-keyword">go</span> <span class="hljs-built_in">make</span>(<span class="hljs-number">2</span>, &amp;wg)

    fmt.Println(<span class="hljs-string">"Awaiting...."</span>)
    wg.Wait()
    fmt.Println(<span class="hljs-string">"Done!"</span>)
}
</code></pre>
<p>After declaring <code>WaitGroup</code>, specify how many processes to wait for. In the example, the <code>goroutine</code> waits for two <code>Done</code> calls before finishing. If not set before starting the goroutines, things might happen out of order, or the code may panic because <code>wg</code> doesn't know if it should wait for any <code>Done</code> calls.</p>
<p>Each function will use <code>defer</code> to call <code>Done</code>, which decreases the count by one after the function finishes. The <code>main</code> function is updated to include a call to <code>Wait</code> on the <code>WaitGroup</code>. This ensures that the <code>main</code> function waits until both functions call <code>Done</code> before continuing and exiting the program.</p>
<p>After saving your <code>main.go</code> execute the file,</p>
<pre><code class="lang-bash">go run main.go

// Output
Awaiting....
Generated number 1
Generated number 2
Print: number 1
Print: number 2
Done!
</code></pre>
<p>Your output may vary each time you run the program. With both functions running concurrently, the output depends on how much time Go and your operating system allocates to each function. Sometimes, each function runs entirely, and you'll see their complete sequences. Other times, the text will be interspersed.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>If you’re interested in learning more about concurrency in Go, the <a target="_blank" href="https://golang.org/doc/effective_go#concurrency">Effective Go</a> document created by the Go team provides much more detail. The <a target="_blank" href="https://go.dev/blog/waza-talk">Concurrency is not parallelism</a> Go blog post is also an exciting follow-up about the relationship between concurrency and parallelism. These two terms are sometimes mistakenly thought to mean the same thing.</p>
<hr />
<p><em>Thank you for reading this article! If you're interested in DevOps, Security, or Leadership for your startup, feel free to reach out at</em> <a target="_blank" href="mailto:hi@iamkaustav.com"><em>hi@iamkaustav.com</em></a> <em>or book a slot in</em> <a target="_blank" href="https://calendly.com/iamkaustav/30min"><em>my calendar</em></a><em>. Don't forget to subscribe to my newsletter for more insights on my security and product development journey. Stay tuned for more posts!</em></p>
]]></content:encoded></item><item><title><![CDATA[The Crucial Role of DevOps Security in the Success of Early-Stage Startups]]></title><description><![CDATA[Are you ready to dive into the dynamic world of early-stage startups? Picture the scene: a few passionate engineers gathered at a co-working space, fueled by coffee and the thrill of bringing their disruptive ideas to life. It's an exhilarating ride ...]]></description><link>https://notes.iamkaustav.com/the-crucial-role-of-devops-security-in-the-success-of-early-stage-startups</link><guid isPermaLink="true">https://notes.iamkaustav.com/the-crucial-role-of-devops-security-in-the-success-of-early-stage-startups</guid><category><![CDATA[Devops]]></category><category><![CDATA[Security]]></category><category><![CDATA[DevSecOps]]></category><category><![CDATA[leadership]]></category><category><![CDATA[Startups]]></category><category><![CDATA[infrastructure]]></category><category><![CDATA[securityawareness]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Sat, 02 Mar 2024 11:09:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1709377651580/d2787fa5-b290-4a67-9925-fe09e4ca55a5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Are you ready to dive into the dynamic world of early-stage startups? <em>Picture the scene</em>: a few passionate engineers gathered at a co-working space, fueled by coffee and the thrill of bringing their disruptive ideas to life. It's an exhilarating ride but one that's not without risks, especially when it comes to cybersecurity. From data breaches that can cripple a startup's reputation to regulatory fines that can drain its resources, the dangers are real and daunting.</p>
<p>But fear not, my friend! In this blog post series, we will explore the exciting and essential world of DevOps security and how it's the key to safeguarding the success of early-stage startups. We'll take a deep dive into the unique challenges these startups face, the potential consequences of neglecting security, and the actionable steps they can take to mitigate risks and thrive in an ever-evolving market. So, let's grab a cup of coffee, buckle up, and join me on this thrilling journey where innovation meets safeguarding and where startups can unlock their full potential while staying one step ahead of the game. Let's go!</p>
<h2 id="heading-what-is-devops-security">What is DevOps Security?</h2>
<p>DevOps security, often called <a target="_blank" href="https://aws.amazon.com/what-is/devsecops/"><strong>DevSecOps</strong></a>, is a crucial aspect of any early-stage startup's operations. With the integration of development and operations teams (DevOps), security becomes an inherent part of the entire cycle rather than an afterthought. This approach allows startups to innovate rapidly while minimising the potential risks associated with cybersecurity threats. This balance is crucial in today's digital age, where data breaches and cyber threats can cause significant damage to a company's reputation and resources. In the following sections, we will delve deeper into the importance of DevOps security in the landscape of early-stage startups.</p>
<h3 id="heading-principles-to-secure-product-environments">Principles to secure product environments.</h3>
<p><strong>Collaboration</strong>, <strong>automation</strong>, and <strong>continuous monitoring</strong> enhance an organisation's security posture.</p>
<h4 id="heading-collaboration">Collaboration</h4>
<p>Collaboration integrates security considerations into every stage of software development, reducing overlooked vulnerabilities. Infrastructure experts and products need a solid partnership to handle all critical and significant security issues, i.e., <em>password strength for authentication, access control on a cloud provider, security key accidentally pushed on VCS, etc</em>.</p>
<h4 id="heading-automation">Automation</h4>
<p>Automation speeds up processes, minimises errors, and enables early detection of vulnerabilities by incorporating security checks and tests into the development pipeline. <em>One of the best and minimalist examples is Dependabot issues on GitHub. The first step to take this leap would be fixing them as soon as they arise.</em></p>
<h4 id="heading-continuous-monitoring">Continuous monitoring</h4>
<p>Continuous monitoring allows for ongoing assessment of vulnerabilities and immediate response to issues, minimising opportunities for attackers.</p>
<p>These DevSecOps principles strengthen defences, mitigate risks, and contribute to a startup's success.</p>
<h2 id="heading-story-time-challenge-faced-by-an-early-stage-startup">Story Time: Challenge faced by an early-stage startup</h2>
<p>A few years back, I was collaborating with a startup in the Healthcare space. They were not early in the business but were a young engineering department. This company (Let's call this <strong><em>Meditech INV</em></strong>*- It is an artificial name and not related to any past or existing company*) built an extremely modern analytics and reporting SaaS platform. This platform caters to hospitals in a government program that provides healthcare benefits to a selected patient group. The market already has veteran players with relatively old-age tech. Meditech INV wanted to establish itself as a cutting-edge SaaS provider in its <a target="_blank" href="https://en.wikipedia.org/wiki/Total_addressable_market">TAM</a>.</p>
<p>On a fine morning, a news burst that one of Meditech INV's competitors &amp; market leaders experienced a data breach. In healthcare, different countries have different lawsuits for such data breaches. This destroyed the competitor's reputation, and in the next few days, they started losing existing clients. This was an opportunity for Meditech INV to acquire those clients almost free (without any <a target="_blank" href="https://en.wikipedia.org/wiki/Customer_acquisition_cost">CAC</a>). However, this incident changed the market's mood as clients now worry about their patient's data. They needed these SaaS platforms to demonstrate their security practices, and based on that, they would move forward with the contract. Even after 6 months of being in the market and working with 5-6 big hospitals, the product &amp; infrastructure didn't have enough security observability. That means even if we had any data breach, we would barely know what had happened (however, <em>we had pretty strong network rules in place since the beginning</em>). This means we may lose all of these businesses that were organically coming our way.</p>
<p>It was a great weekend challenge. My phone rang. After 30 min, I was in front of my laptop to hack around security. We were setting up our first security modules after 6 months of going live. I remember our only principal engineer &amp; I spent the precious Saturday setting up security tools on our AWS infrastructure (having 6 AWS accounts back then) and polishing them up Sunday. We were trying to rush for the new opportunities, with a baggage of fear that we may lose our existing clients if we couldn't show them some visualisation, which may result in us being out of business and closing the offices with ~30+ engineers and 400+ other roles working there. Scary right??????</p>
<h3 id="heading-the-impact">The Impact</h3>
<p>This situation created a panic among the senior leadership. Full of excitement but stressful days, I think that's what happens when you delay your security practices. We have taken these three primary measures,</p>
<ul>
<li><p>Audit every activity across AWS as a preventive measure.</p>
</li>
<li><p>We are introducing compliance checks across all AWS accounts.</p>
</li>
<li><p>We are guarding AWS so that we know when a malicious activity has happened.</p>
</li>
<li><p>An informative dashboard with a security score that we can present to our existing and new clients to gain their trust.</p>
</li>
</ul>
<p>Over a few weeks, the veteran company went out of business, and here we are with $400M+ revenue and 30+ new clients. We are looking forward to acquiring a renowned company. This incident teaches us a few exciting things,</p>
<ul>
<li><p>Security practices are vital from the beginning, not at the end of your feature development.</p>
</li>
<li><p>Allocate some budget in every sprint to improve current security conditions. <em>You don't know when opportunity will knock on your door</em>.</p>
</li>
<li><p>When a tech organisation grows with more than 2 products at scale, you need dedicated security experts. It can be in the same Platform team or a separate team.</p>
</li>
<li><p>Most importantly, in a startup, and <em>when you are in some answerable position, don't keep any advance plans for weekends :P</em>.</p>
</li>
</ul>
<h2 id="heading-the-conclusion">The Conclusion</h2>
<p>Early-stage startups must prioritise security from the get-go. As they navigate the complexities of DevSecOps, seeking guidance from experienced professionals can be immensely beneficial. By doing so, startups will foster a security-first culture that will protect their venture and boost its credibility and trustworthiness in the eyes of stakeholders and customers. So, never underestimate the power of security.</p>
<hr />
<p><em>I help startups with DevOps, Security &amp; Leadership. If you are interested / have requirement in any of them, or starting a startup and want a quick conversation, let's</em> <a target="_blank" href="https://calendly.com/iamkaustav/30min"><em>book time on my calendar</em></a> <em>or don't hesitate to contact me at</em> <a target="_blank" href="mailto:hi@iamkaustav.com"><em>hi@iamkaustav.com</em></a><em>.</em></p>
<p><em>There will be few more post about my journey with security &amp; product development. Please subscribe to my newsletter. Thanks for sticking with me so far :)</em></p>
]]></content:encoded></item><item><title><![CDATA[An experience with Golang as a Backend Engineer]]></title><description><![CDATA[In the emerging world where apps must be performed fast, process heavy loads of data and build an amazing ecosystem, Golang is at the most forward into the race. It's known for its speed, simplicity and scalability, Golang is helping developers redef...]]></description><link>https://notes.iamkaustav.com/an-experience-with-golang-as-a-backend-engineer</link><guid isPermaLink="true">https://notes.iamkaustav.com/an-experience-with-golang-as-a-backend-engineer</guid><category><![CDATA[Go Language]]></category><category><![CDATA[engineering]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[backend developments]]></category><category><![CDATA[Freelancing]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Mon, 16 Oct 2023 20:03:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1697486235286/ca8652eb-757c-4f15-a256-7d18910bd562.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the emerging world where apps must be performed fast, process heavy loads of data and build an amazing ecosystem, <strong>Golang</strong> is at the most forward into the race. It's known for its speed, simplicity and scalability, Golang is helping developers redefine the way you architect the backend services.</p>
<p>In my experience, starting working on Golang after almost a decade with NodeJS and JVM-based languages, was a significant step forward in my coding journey. Coming from a DevOps and automation background, Golang adds a significant focus on how you define the architecture of your project.</p>
<h1 id="heading-the-speed">The speed</h1>
<p>The speed of Golang can be attributed to several factors:</p>
<ol>
<li><p><strong>The Compilation Factor:</strong> Unlike VM-dependent languages, Golang directly compiles to the binary files. The direct compilation results in faster application development.</p>
</li>
<li><p><strong>Automated Garbage Collection</strong>: Golang automatically handles garbage collection, which increases the speed significantly.</p>
</li>
<li><p><strong>Concurrency First:</strong> This beautiful language is designed keeping high concurrency in mind. It's an ideal language for implementing data-heavy operations. The lightweight goroutine and channels communicate effectively between program segments, resulting in a highly performant application.</p>
</li>
</ol>
<p>So far in my career, running tests and building took ~5-7 mins for fat products, bringing that down to 3 min was a big achievement and we were okay with it in JVM-based languages. But it took my breadth away to see unit/integration tests took a few seconds to complete those executions.</p>
<p>These factors combined make Golang a highly optimized language, capable of executing tasks quickly and efficiently.</p>
<h1 id="heading-learning-and-development">Learning and Development</h1>
<ol>
<li><p><strong>Fewer Set of Features:</strong> Golang has a very minimal set of core features to get the job done, which reduces the learning curve and helps you to onboard as soon as possible.</p>
</li>
<li><p><strong>The Clean Code Approach</strong>: Often depending on how you write, writing Golang code is clean and easy to understand. It simplifies debugging and improvement of code written by others.</p>
</li>
<li><p><strong>Inline Documentations</strong>: It can be easily documented using <a target="_blank" href="https://en.wikipedia.org/wiki/Docstring">docstrings</a> that are simple for beginners to remember.</p>
</li>
</ol>
<p>Due to the above key reasons and plenty of others around learning and development, Golang is now one of the popular choices for building core automation and heavy backend work.</p>
<h1 id="heading-effective-compiled-language">Effective Compiled Language</h1>
<ol>
<li><p><strong>Easy to Compile</strong>: On compilation of the Golang program, it becomes a self-sufficient executable binary. This executable can be run on the targeted system without anything additional.</p>
</li>
<li><p><strong>Performance</strong>: Being a compiled language, Golang is much faster than interpreted languages.</p>
</li>
<li><p><strong>Statically Typed</strong>: Golang is a statically typed language. It checks each variable at compile-time, which can lead to more efficient code and fewer runtime errors.</p>
</li>
</ol>
<h1 id="heading-what-happened-when-i-picked-it"><strong>What happened when I picked it?</strong></h1>
<p>I was super critical and concerned when I picked Golang. Pointers, struct etc and working with super experienced folks working with Golang for 5+ years or more, it's quite challenging to bring myself up to speed. Taking small steps and emptying my past knowledge and restrictions helped me to fall in love with this beautiful language and it's ecosystem.</p>
<p>TDD, clean code, and refactoring while understanding the key features like goroutines, and channels add icing on the cake and that's what makes Golang stand out from other languages.</p>
<h1 id="heading-conclusion"><strong>Conclusion</strong></h1>
<p>I was an extreme <strong>NodeJS</strong> fan when it came to quick development and JVM-based languages for heavy data processing API services until I met Golang. This path crossed during my experiments with terraform providers. However, when I started writing code on Go for Rest APIs, it gave me a different experience and now quite difficult to move back to other languages unless there is a heavy need.</p>
<p>It is a powerful revolutionizing language for backend development. Its speed, simplicity, and scalability make it an excellent choice for developers looking to build efficient and robust applications. Concurrency, garbage collection, network API, and speed, Golang provides a comprehensive toolset for backend development. As we continue to see the rise of Golang in the tech industry, one thing is clear, Golang is an experience that every backend developer wants to have.</p>
<hr />
<p><em>Thanks for reading the article. If you are interested in building something exciting together, let's</em> <a target="_blank" href="https://calendly.com/iamkaustav/30min"><em>have a chat</em></a><em>. You can also find the services that I provide</em> <a target="_blank" href="https://iamkaustav.com/#service"><em>here</em></a><em>.</em></p>
]]></content:encoded></item><item><title><![CDATA[Deep Drive to Production Readiness Checklist]]></title><description><![CDATA[As software development gets increasingly complex, moving from monolith to the distributed system, It's now essential to have a detailed checklist before we go live with our product.
The Story
Before we jump in into the list, let me tell you a story ...]]></description><link>https://notes.iamkaustav.com/deep-drive-to-production-readiness-checklist</link><guid isPermaLink="true">https://notes.iamkaustav.com/deep-drive-to-production-readiness-checklist</guid><category><![CDATA[Productivity]]></category><category><![CDATA[best practices]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Kaustav Chakraborty]]></dc:creator><pubDate>Thu, 03 Aug 2023 18:45:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1691087055876/f0259943-79b9-4c53-9f12-a8686ce69599.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As software development gets increasingly complex, moving from <a target="_blank" href="https://martinfowler.com/bliki/MonolithFirst.html">monolith</a> to the <a target="_blank" href="https://www.atlassian.com/microservices/microservices-architecture/distributed-architecture">distributed system</a>, It's now essential to have a detailed checklist before we go live with our product.</p>
<h1 id="heading-the-story">The Story</h1>
<p>Before we jump in into the list, let me tell you a story from my experience -</p>
<p>Seeing how startups build their product with limited developers is always interesting. I was part of a startup that didn't have a QA (<a target="_blank" href="https://en.wikipedia.org/wiki/Software_quality_assurance_analyst"><em>Quality Analyst</em></a>) initially. As a company, we were trying to move from a <a target="_blank" href="https://en.wikipedia.org/wiki/Brick_and_mortar">brick-and-mortar</a> business to a digital company with our very first SaaS product. It's always a risk because it will be an initial setback if the first product fails. The developer group ensured we tested each feature properly before going live. A few days before THE SPECIAL DAY, we drafted a detailed checklist to ensure we considered all possible items to support our go-live, and if all of them are marked positive, then only we will make the production deployment. This process helped us successfully deploy the application, mark the product as a massive success for the organization, and support the product as a monolith with almost 30+ developers and 3+ teams working on it for over 2 years. We used to call them <strong>production checklists</strong>.</p>
<h1 id="heading-production-readiness-checklist">Production Readiness Checklist</h1>
<p>From <a target="_blank" href="https://www.opslevel.com/resources/production-readiness-in-depth">OpsLevel</a>,</p>
<blockquote>
<p><em>Production readiness checklists can reduce the cognitive load of having to remember all the different vulnerability and failure points we need to consider in our complex landscape.</em></p>
</blockquote>
<p>To simplify it, the Production Readiness Checklist is the universal language for your team to answer, "<em>Is the product ready for launch?</em>" However, the checklist may vary between domains or teams. It may be possible to have different checklists for different tiers of an application. It gives a positive start to look into pending items before GoLive.</p>
<p>Production Readiness Checklists can be opinionated. Below are my recommendations based on the nature of the products I have dealt with.</p>
<h2 id="heading-general">General</h2>
<ol>
<li><p><strong>Ownership:</strong> Ownership of the service is clearly defined. This can be achieved by having the <a target="_blank" href="https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners">CODEOWNERS</a> file on your Git project or assigning <a target="_blank" href="https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors">contributors</a> to a repository.</p>
</li>
<li><p><strong>Documentation:</strong> It can cover multiple items, such as API documentation, development steps, README, onboarding instruction to the service and FAQs. Once the service is mature enough and if any significant defect is found, it can be documented in a troubleshooting guide, preferably staying on the repository.</p>
</li>
<li><p><strong>SLA / SLO:</strong> Since this product was a customer-facing SaaS product, it is highly important to provide a clear SLA (<a target="_blank" href="https://www.freshworks.com/freshdesk/sla/"><em>Service Level Agreement</em></a>) to the customer and SLO (<a target="_blank" href="https://sre.google/sre-book/service-level-objectives/">Service Level Objectives</a>) for the internal teams. Depending upon the business violation of SLA impacts the revenue of the company, so it is important to provide a realistic than optimistic SLA.</p>
</li>
</ol>
<h2 id="heading-data-management">Data Management</h2>
<ol>
<li><p><strong>Persistency:</strong> Ensure to have resiliency setup for the failure of a data source without loss of performance. It must have a continuous backup and a one-click restore mechanism.</p>
</li>
<li><p><strong>Scaling:</strong> Strategy to scale the data source during optimum load.</p>
</li>
</ol>
<h2 id="heading-testing">Testing</h2>
<ol>
<li><p><strong>Unit Tests:</strong> Unit tests are highly important for any service to operate with an additional safety net.</p>
</li>
<li><p><strong>Integration Tests:</strong> When one is working with multiple components inside a single repository, <em>for example, we had a multi-module structure connecting Controller, Services and Repository layer</em>, it is important to have integration tests covering the contracts.</p>
</li>
<li><p><strong>E2E Acceptance Test:</strong> End-to-end <em>(E2E)</em> tests are costly tests but extremely helpful when you are low in the availability of talent. When we were supposed to go live with our first release, we drafted a basic E2E test covering all of the business critical components.</p>
</li>
<li><p><strong>Operational Testing:</strong> There are a few optional business or customer-level tests that can be added as an extra safety check for your application to avoid any unpleasure circumstances. Tests for worse customer experience, load tests, and sanity checks help to make your product operation ready.</p>
</li>
</ol>
<p>It is recommended to execute these tests periodically on your CI/CD environments.</p>
<blockquote>
<p>It is important to structure your automated tests well to achieve larger benefits. Here are few articles that helps with few ideas based on industry best practices,</p>
<ul>
<li><p><a target="_blank" href="https://www.thoughtworks.com/insights/blog/guidelines-structuring-automated-tests">Guidelines for Structuring Automated Tests</a></p>
</li>
<li><p><a target="_blank" href="https://martinfowler.com/bliki/IntegrationTest.html">Integration test by Martin Fowler</a></p>
</li>
</ul>
</blockquote>
<h2 id="heading-deployment">Deployment</h2>
<ol>
<li><p><strong>Continuous integration:</strong> Each repositories must have their automated pipeline, When an engineer pushes their changes, it runs the tests, builds, static code analysis, security checks etc. Based on the scale of the application, the team can decide what kind of automated steps they want to have in their pipeline.</p>
</li>
<li><p><strong>Continuous Delivery:</strong> Depending upon organization standards or repository needs, one may deploy from lower-level environments to higher environments. As a standard, CD should execute after CI tasks (<em>i.e. test, lint and build</em>). Single-click deployments are quite popular and provide ease of realising new changes and downgrading applications. <a target="_blank" href="https://keepachangelog.com/en/1.0.0/">Changelog</a> and <a target="_blank" href="https://www.tutorialspoint.com/software_testing_dictionary/release_note.htm">Release Notes</a> are mandatory after each production deployment.</p>
</li>
</ol>
<h2 id="heading-operational-excellence">Operational Excellence</h2>
<ol>
<li><p><strong>Escalation Strategy:</strong> Escalation strategy itself is a big topic, but to keep it short, there are a few things one needs to set for each product/service,</p>
<ol>
<li><p>On-Call Rotation</p>
</li>
<li><p>Incident Management with practices for short-term and long-term remediation</p>
</li>
<li><p>Post Mortem of incidents</p>
</li>
</ol>
</li>
<li><p><strong>Runbooks:</strong> Runbooks are essential to understanding known failures and their quick remediations. Ensure it is up-to-date.</p>
</li>
<li><p><strong>Observability:</strong> This contains multiple touchpoints,</p>
<ol>
<li><p>Logging</p>
</li>
<li><p>Tracing</p>
</li>
<li><p>Metrics</p>
</li>
<li><p>Customer Impact / Operational Dashboard</p>
</li>
<li><p>SLO/SLA dashboard</p>
</li>
<li><p>Error Budget dashboard</p>
</li>
</ol>
</li>
</ol>
<h2 id="heading-security">Security</h2>
<ol>
<li><p><strong>Authentication/authorization:</strong> Ensure authentication and authorization strategies are taken place for any public-facing API, regardless of internal or external apps.</p>
</li>
<li><p><strong>Secrets:</strong> It is recommended to use a secret management store to keep all secret information such as database passwords. Recommendations are <a target="_blank" href="https://www.vaultproject.io/">Vault</a>, AWS <a target="_blank" href="https://docs.aws.amazon.com/managedservices/latest/userguide/secrets-manager.html">Secret Manager</a> etc.</p>
</li>
<li><p><strong>Dependency Scan:</strong> To ensure all dependencies are up-to-date with their security fix. My goto tool is <a target="_blank" href="https://github.com/dependabot">dependabot</a>, however, I used <a target="_blank" href="https://snyk.io/">Synk</a> for dependency scan and other security practices, and have had positive experiences.</p>
</li>
</ol>
<p>There are many other items that can be included based on the practices a team/organization is following. I would recommend structuring the checklist based on product and grouping them in a one-time or repetitive list. As an organization when a system matures, we must update the checklist. In our case, we kept updating our touch points every time we learned something new and focused on the automation-first approach. This helped us to ship our product with low defects that earned the trust of our users.</p>
<p>I prefer to keep the checklist as close as possible to the source of the product, usually at the GitHub repository in markdown format. This helps us to build automation if needed.</p>
]]></content:encoded></item></channel></rss>