Archive for March, 2022

One More Week: Which debt to pay first?

March 7, 2022

The Firefox Relay sprint is wrapping up, and I can talk about the features soon. I haven’t done much on this one, but I did get to review some PRs and get more familiar with the code. I’m seeing plenty that I’d like to change, but I’m still thinking about the order of work. I’m often way off on my estimates, and if something takes longer than planned, I want to be sure that it is still worth completing. We have a short “innovation sprint” planned next, which should allow me to get one or two things completed.

Luke orders from So Bahn, the pop-up kitchen run by Se Yeon and family

There aren’t many automated tests, but they have been rapidly added in the last few months, getting to 60% coverage. Some additional tests would be useful, and it would also be nice to have metrics showing this progress. Luke added code coverage to test, and I added XML test and coverage output (see PR 1576), but I’m short of integrating with a tracking service like coveralls or codecov. It was not easy to get this to work with CircleCI’s remote docker environment, and once I get all the pieces working, it will probably be worth a stand-alone blog post.

I’m suspicious of coverage for the sake of coverage, although it is useful and possible to get 100% for new projects. I do think there are benefits to structuring the code for testing, including easier development environments and a clean separation from services. It will take some work to create seams between the services and the code, which will allow application code to be completely tested, while interface code is dumb and monitored in deployments. As the project grows, the application code will grow larger, and the glue should be a smaller fraction of the code.

In deployments, Sentry is used for capturing exceptions, as well as 50x return codes and missing translations. Se Yeon has been interested in Sentry issues for a while, and has started a weekly meeting to triage the new ones, so we’ve all been staring at Sentry events recently. Some of the data is duplicated – exceptions are logged with tracebacks, and then again as a 500 Server Error. There is also a lot of unactionable warnings from security probes. There’s some work to ensure we’re using Sentry effectively. I started with adding the deployment version (PR 1573), and there’s a little more to go.

We’re sending some statsd-style metrics to our InfluxDB server, and ingesting some from our cloud logging tools. There’s a lot, but I see some opportunities to use tagging for more effective displays, and more features of Graphana that we could use to display the data. There’s also stats from other services that may be useful to bring in, but some of those will be cross-team efforts. I don’t plan to tackle those until later in the year at the earliest.

InfluxDB does well with operational data, and limited use of tags, but has trouble with high-cardinality data, like most real-time metrics systems. Brian Pitts, a former Mozilla SRE, convinced me to emit operational data as metrics, and longer term detailed data as a canonical log, one per request or backend transaction. I like to use structlog, coercing it to emit Mozilla’s favored MozLog format, with processes to ingest the data into BigQuery data stores. There’s some work to integrate the tools, document the format, and get to one log per request.

Relay requires a bunch of cloud services to work, which means that local development is partial or requires provisioning resources. We have a development deployment, which becomes a contested shared resource toward the end of a sprint. There may be ways to emulate the services in development, either by swapping in fake versions via configuration, or mock tools like localstack or moto. Or maybe we should lean into using real services, and automate per-developer provisioning.

There are some other possibilities for quick work. Others are preparing to convert the frontend to React, which may allow some automated front-end testing. Black left beta recently, and Django is considering re-formatting in DEP-0008. Relay could use it and other linting tools. Relay is also due for an upgrade to Python 3.9 and Django 3.2.

There’s a lot I could do, but my job is not to polish the code to perfection. Relay is still finding market fit, and features are a big part of that. I’ll need to ensure I’m shipping some of those features, even if my interest is in code polishing.

I’m leaning toward this order:

  • Ensure Sentry is tracking only actionable errors, so we can discover issues before users
  • Document logging and metrics, and then write a short proposal for future changes
  • Update to Python 3.9 and Django 3.2, to avoid lagging behind our toolset
  • Implement canonical logging and an example report, and build as we build features
  • Refactor code, moving external services to the edges

I’m looking for ways to minimize the time I spend implementing the first versions, and save the bulk of the work as collaborative efforts or as part of feature work. I also need to get started while no one is expecting much of me!

In other work, Mozilla is getting its GitHub projects in order again. I touched several projects last, so I’ve been asked about a few. I think mozilla/ci-docker-bases should be deprecated by the fall, replaced by CircleCI’s convenience images like cimg/base. The app mozilla/django-dnt should be archived, since the W3 DNT working group shut down in January 2019, and DNT detection should be in the client-side code anyway. The app mozilla/django-tidings should be absorbed into kitsune, the SUMO engine, now that MDN has moved from kuma to yari, and uses GitHub for content notifications. I expect to do a lot of the lifting to retire these projects.

Finn recovers from his traumatic week

On the home front, our dog Finn caught a plastic latch in the lower eyelid, and required some emergency vet work. The latch was part of the “safety net” for our trampoline, and he enjoys chasing his sister Asha around it. It appears to have missed damaging his eye, and after a day of recovery, he’s back to chasing Asha around. He is getting some additional attention, and we’re taking daily close-ups to monitor the swelling.

My kid continues to be interested in PC upgrades, and spent money on an SSD drive since he’s used 950 GB of the 1 TB drive. We hooked it up, and thought it was a dud, but after a night sleep, I remembered how electronics work, and we connected it to the power supply as well. Now he has 2 TB to fill up with Minecraft and Roblox downloads.

Recommendations:

  • So Bahn 82, Se Yeon’s family restaurant, took over the takeover kitchen at Mother Road Market. I enjoyed the So Bahn fried chicken, the rice cakes, and the baked corn cheese. I’m excited to see it come to downtown Tulsa soon!
  • Sarah Bird is implementing https://github.com/mozilla-services/cjms in Rust, and recommends Zero To Production In Rust: An introduction to backend development as a practical guide.
  • Twitter is a waste of time, and I continue to read it every day. I feel like it is surfacing good content about the war in Ukraine, so maybe it has figured out how to be more useful and less of a weapon.
  • I finished Get Back, the lengthy Beatles documentary. I endorse breaking it up into “days” – watch a day of filming, take a break, and watch the next. Most of the drama is in the first part, and they seem to get into the groove of the project by the third part.