How Productboard monitors frontend applications

Matej Minarik

2nd August 2021Life at Productboard, Engineering

Photo by Dave Phillips on Unsplash

“No matter how careful or good you are, sh!t will happen.” — Anonymous

And that is correct! Network glitches, deployment glitches, dependencies upgrades, developer’s carelessness or just small data inconsistencies. There’s plenty of stuff in web development that can go wrong, and it will go wrong at some point. What’s far more important than trying to prevent anything from going south is having solid monitoring in place and clear processes to follow.

At Productboard, we already have 5 different teams that contribute to our frontend monorepo. There are over 290 NX libraries used throughout the applications, where different teams are responsible for different sets of libraries or flows. Solid monitoring and clear ownership are crucial to future growth.

We originally used Rollbar as our logging and exception monitoring platform of choice. Rollbar is a great tool and was working just fine for quite some time. However, when anyone wanted to find out more about the current situation on production, Rollbar was not much help. A pile of log messages and random errors got mixed in with really important issues affecting our customers. We needed to improve this!

What I truly enjoy about Productboard is that when we realize something can be improved and has a reasonable price-to-value ratio, nobody is going to stop you from taking a stab at it. On the contrary, different people in different roles will support you and sponsor your initiative. Productboard engineers Michal Těhník (@MicTech) and Jakub Beneš (@jukben) as well as lower and upper management played a crucial role in this initiative, and I would like to thank them for their support! Having this kind of sponsorship empowers everyone not only to improve technical aspects within an organization they care about, but also to grow both personally and professionally.

As with all other technical initiatives, I drafted an RFC and shared it with the rest of my colleagues. After further analysis of the current state and a number of discussion, we realized we were dealing with two issues instead of one:

Mixing logging and exception monitoring services into one tool was generating too much noise
There was a clear lack of ownership as to which tribe/team/person is the best candidate for taking care of an incident should one appear

Logging

Logging takes care of any messages or errors that are handled, and we just want to record a trait that a certain scenario occurred.

“The user should be always defined here, but let’s make sure by adding a check and logging an error here.” — a Productboard developer

Our frontend monorepo offers a Logger service, which supports logging messages and additional properties. In the development setup, it only forwards the messages to the console object, which logs them to the developer tools console. We decided to use Datadog in staging and production environments. Datadog is a great tool that supports all kinds of advanced monitoring, pre- and post-processing, aggregating, filtering, and grouping of logs based on your needs. We use Datadog for our backend microservices, so it was a natural choice for us to handle logs coming from the frontend monorepo.

That’s nice pal, you might be thinking to yourself, but how can you ensure a shared level of ownership if different teams work on different parts of the monorepo?

Great question! I’m glad you asked. Let’s pretend there is a spike of error logs after a recent deployment. How could either the system or the person on-call know who to notify about this? Sure, we can ping everyone, but that’s a huge hassle. At first, we considered having different Logger factories that would just know the ownership based on the current library. However, we wanted to have a more fine-grained granularity and avoid larger manual or even semi-automatic refactoring.

This was precisely why we decided to go a different route. There is a relatively simple piece of engineering that parses the CODEOWNERS file, finds all occurrences of the Logger method calls, and attaches the corresponding team to the message.

Since we were already using Datadog, we simply added a pre-processing rule that extracts this ownership information from the log and makes it a searchable property of all of our logs. Now it enables us to create custom-alerting rules that notify the relevant owners during a spike.

Exception monitoring

Exception monitoring takes care of any unhandled errors. Luckily, there are great solutions out there that help you catch these traps. We decided to use Sentry. There were multiple reasons for this, but the main one was that Sentry has the concept of ownership baked in. This feature is called Issue Owners. It lets us define custom-matching rules based on the URL, associated tags or originating file, which goes hand-in-hand with the CODEOWNERS file for us. This works great for 90% of new issues reported to Sentry. It automatically assigns an owner based on our custom rules and sends an e-mail or a notification right away.

Apart from Issue Owners, Sentry has a pretty nice UI, great search capabilities, and awesome integration of Releases. With Releases, it’s easy to benefit from features like “Mark as fixed in current/next release” or easily search for the original release that introduced a particular issue. It even allows us to track the health of the current release and perform an automatic revert if anything goes south.

Last, but definitely not least, Sentry offers great support for custom Alerts. We have different Alerts in place. My favourite one is a Slack notification that’s sent out every time the application renders an error page. If it occurs for more than 10 users, we immediately notify the engineer on-call who is responsible for mitigating the incident.

Parting words of wisdom

The future is bright! We will be tweaking our internal monitoring and alerting rules for a little while, but I’m very happy about the current progress and improvements we made. The end game with Sentry at Productboard is that developers shouldn’t even know there is an exception monitoring tool. It just works on its own and lets you know if something’s wrong. Maybe we’ll implement/enable automatic reverts at some point in the future, who knows.

What is your setup for frontend monitoring? Let us know in the comments, or write me an email.

Interested in joining our growing team? Well, we’re hiring across the board! Check out our careers page for the latest vacancies.

How Productboard monitors frontend applications

Logging

Exception monitoring

Parting words of wisdom

Join thousands of Product Makers who already enjoy our newsletter