One More Week: gpg key refresh

January 14, 2022

I’m going to blog about what happened in a week, instead of only blogging when I think I have something significant and permanent to say and I’m in the mood to blog.

I’m still moving teams within Mozilla, but the transition is now pending without a date. I’m trying to make the best of the this in-between time.

The transition continues to be slow.
“Snail” by jamieanne, CC BY-ND 2.0

At Mozilla, we use GPG for signing commits. We sometimes use it for encrypting secrets at rest, but the current favorite secret sharing tools are 1Password and magic-wormhole. When I was setting up my keys, I was convinced that an expiring key was a good idea, because if I lose access to the private key and can’t revoke it, at least it will expire eventually. However, this means that I need to refresh the expiration date. The article How to change the expiration date of a GPG key by George Notaras was published in 2010, but GPG doesn’t change much, so it is still relevant.

Signed commits with “Verified” tag

I schedule expiration for February, and set myself a reminder to refresh in January, along with the steps I take to do it. I publish keys 9ECA 5960 3107 8B1E and 082C 735D 154F B750 to, after was taken down after a June 2019 attack. I also sync then to Keybase and recreate them in my Github account.

I cleaned up the MLS / Ichnaea documentation. PR 1764 includes a refresh of the deployment docs which is a couple of years overdue. I also updated the Location entry on the wiki and some other internal documents. This the end of my “quick” list of MLS updates, so I’m moving on to Taskcluster: fixing some build issues, reviewing and merging dependency updates, and thinking about how to monitor deployment health. I got a little stuck with building the generic-worker on my M1 MacBook, and a failing CI test, but both are surmountable.

For my next team, I read through the fx-private-relay codebase. I found this tip for getting the list of files tracked in a repo:

git ls-tree --full-tree --name-only -r HEAD

I then manipulated the output to to turn it into a list of Markdown links to the GitHub repository, and checked off each file as I viewed it in SourceTree, or on GitHub if it was more than 50 lines of code. Most of the action is in the email app, and a lot of that in emails/ and emails/ There’s not as many tests as I would expect, and some integration tests may cover a bunch of functionality.

In my non-work life, the schools are struggling with Covid-19. Isaac was in remote learning one day, and got two notices of exposure, meaning he was in class with a Covid-positive person. Ainsley has two remote learning days. I’m so glad they are both vaccinated, and wear their masks. I’m so disappointed with Oklahoma’s leadership.

I got a promotion for a local pizza place’s 17th anniversary special, and got it for delivery. It was delivered almost 5 hours after ordering. It was a bad combination of a fantastic promotion, an online ordering system that didn’t turn itself off, and using a gig economy service for delivery. I don’t want to shame anyone, so I’m avoiding names. I went from a hero for ordering pizza, to an adequate dad for making spaghetti.

Finally, my grandmother turned 93 this week! I’m so grateful for her, and that she and my grandfather have stayed safe through this pandemic.

Recommendations and links:

  • magic-wormhole is magic for transferring files securely. If you need to resend, change the filename first, to avoid problems conflicting with the original file.
  • I’m taking notes in Markdown, and started using Marked 2 again to view them. The plugin itspriddle/vim-marked adds a :MarkedOpen command, that opens a rendered version and updates when saved.
  • I printed a few photos and a board-backed poster with Walgreens Photo. The photo webapp is OK, but had a couple of hiccups. If you can, crop to the desired size in a desktop photo app first. I got them printed the same day, and the results were decent and competitive with printing myself.
  • You can support the authors of some of the best MDN Web Docs content through Open Collective.

One More Week: Slow Transitions, Multi-Platform Image Creation on CircleCI

January 7, 2022

I’m going to blog about what happened in a week, instead of only blogging when I think I have something significant and permanent to say and I’m in the mood to blog.

I’m moving teams within Mozilla. This was supposed to be my first week on the Privacy and Security team, working on Firefox Relay, an email aliasing service. I spent some time talking with Se Yeon Kim, my on-boarding buddy, about the team, the Slack channels, and the other side projects we continue to maintain (she has shavar, I have ichnaea). The transfer process between teams is slow, so I have unexpected bonus time to work on old projects for the System Engineering team.

The mascot for team transitions
“turtle” by Jazminator

For Relay, I got the development environment working, mostly following the README. The tricky bit was getting the requirements installed with Python 3.7.12 via pyenv. I was getting errors for cryptography ('openssl/opensslv.h' file not found) and psycopg2 (ld: library not found for -lssl). The error message led to the solution from Building cryptography on macOS:

LDFLAGS="-L"(brew --prefix openssl@1.1)"/lib" CFLAGS="-I"(brew --prefix openssl@1.1)"/include" pip install -r requirements.txt

I didn’t contribute a change because I feel this is standard for developing on the native system versus a Docker container – you need to figure out stuff like this. However, I did notice a warning, which lead to re-opening issue #1201, django.urls.path warning when calling runserver function, and PR #1447, Fix swagger API schema endpoint. I reported a further warning to axnsan12/drf-yasg as issue #765, ruamel.yaml 0.17.0 has deprecated dump(), load().

For my SysEng work, I’m tackling small stuff that I can work to completion, but not anything that can risk breaking production when I’m gone. I have plenty of these things, too small to make it to quarterly planning but too large to work on in the gaps between other projects.

I spent some time on generating multi-platform docker images for Ichnaea, following the guide Building Docker images for multiple operating system architectures. This required moving from the docker executor to a machine executor, to enable buildx and installing different engines for QEMU emulation. This was the easy part, and the guide was very useful. However, building just linux/amd64 and linux/arm64 took 35 minutes, versus 5 minutes for linux/amd64 in a docker executor. I decided slower builds was not worth it, since I’ve worked out the kinks in M1 MacBook development, and we’re not planning to ship on ARM. I closed the experiment in PR 1762, made some config changes in PR 1763, and added notes to issue 1636. If I try again, I think I will build the images on native machines (i.e., ARM images on ARM machines), and combine the Docker manifests in a final step.

There’s persistent slowness in provisioning Azure workers, and I’ve looked into it between other projects. I used SourceTree to walk through the history of the Taskcluster Azure provider, and added my notes to bug 1735411. It looks like the incremental provisioning and de-provisioning of Azure workers is required, and has been built up over months. The problem is each increment requires a worker scanner loop, and these loops can be as long as 90 minutes, meaning it takes hours to get a worker up and running. The next step is to speed up the worker scanner loop, ideally a minute or less, so that each of those iterations is shorter. That could be optimization, or it could be time-limiting the work done in each loop. It will hopefully be interesting work for the next person.

Ichnaea dependency updates are batched at the start of the month, and I was able to get through them quickly with pmac (PR #1759 and PR #1761). None of the updates were significant. We’re still blocked on the SQLAlchemy update (unknown work to update database code) and Python 3.10 (packaging issue with a dependency).

In my non-work life, the kids were back in school Tuesday for a 4-day week. Masks are now recommended again, but we’re not as bad as NYC (yet). Both kids missed some time for health reasons, but not Covid. Asha, our Great Dane puppy, is up to 60 lbs, and will be back in training class tonight after a winter break.

Asha and Finn hanging out on the couch

Random recommendations from me and others:

  • The family has gotten into Project Zomboid, a top-down zombie survival game where you always eventually die, but you feel like you’re getting better.
  • I continue to play Castlevania: Grimore of Souls, a free-to-play-but-pay-to-win game that was ported to Apple Arcade, lost all the pay-to-win features, but continues to have an addictive upgrade cycle. It takes 5-10 minutes to play a level, but I find it easy to repeat for hours.
  • I’m slowly watching Get Back, the epic Beatles documentary. It’s 8 hours total, but each day is 15-30 minutes, so I’m watching a day at a time. I like the Beatles, but I am also fascinated by the team dynamics between the band members, and with those that support them and rely on them for their job.
  • My client-side Sentry event filtering code in Socorro silently broke. WillKG fixed it, and launched a new project kent as a fake Sentry server for testing integrations.
  • Will KG recommends CPython Internals, still likes Introduction to Algorithms, but thinks the Dragon book is showing its age.
  • Pete Moore recommends tup as a faster alternative to make, which is capable of building Firefox.

Faster virtualenv workflows with mkvirtualenvhere and workonthis

May 25, 2016

I use virtualenvwrapper to create Python virtual environments. I have a couple of functions in my ~/.bash_profile that make it even easier to use:

if [ -f $(brew --prefix)/bin/ ]; then
    . $(brew --prefix)/bin/;
    function mkvirtualenvhere() { mkvirtualenv "$@" "${PWD##*/}" ; setvirtualenvproject; }
    function workonthis() { workon "${PWD##*/}" ; }

This is for OS X and homebrew. You’ll need something slightly different for your flavor of Linux.

mkvirtualenvhere creates a new virtualenv with the same name as the current directory, and I use it like this:

$ cd ~/src
$ git clone
$ cd drf-cached-instances
$ mkvirtualenvhere
$ pip install -r requirements.txt

The virtualenv is called “drf-cached-instances”, and I can start work two ways. The standard way is:

$ workon drf-cached-instances

This will activate the virtualenv and change the current directory to ~/src/drf-cached-instances, because of the setvirtualenvproject call.

The second way is:

$ cd ~/src/drf-cached-instances
$ workonthis

This activates the virtualenv that shares the name of the current directory. Often, I think I am just looking at a project, using ack to explore the code, but then switch to running the project so I can experiment in an interactive prompt.

Another tool to help this workflow is GitHub’s hub, which can be faster than git for working with GitHub hosted projects.  I often use it for code review work, but forget to use it when cloning a new project.

Reopening Accounts on MDN

May 22, 2016

MDN has paid writing staff, but is also supported by community contributions. Much like Wikipedia, anyone can see a problem, create an account, and fix it. Some of the most valuable community contributions are translating content into non-English languages, which would be very expensive to do with paid staff.

MDN assumes users want to be helpful, and publishes changes immediately. This openness leaves MDN vulnerable to spam. Over the years, new features have been added to monitor changes, revert unwanted ones, and ban users with ill intent. These small mitigations were enough for the small volume of spam, about 1% of all edits.

On February 12th, a new phase in the spam fight started. Over 50% of the new accounts created that day were used to create spam, and spam was being created faster than staff could handle it. We responded by shutting down new accounts, which stopped the problem, but also shut the door on legitimate new contributors for almost three months.

There’s a few things we tried but didn’t work:

  • Adding reCAPTCHA to account creation. The spammers were able to bypass it and create as much spam as before. This suggests humans are in the spam creation loop.
  • Turning on our first Akismet integration. We had developed custom code to send edits to Akismet, but hadn’t had a chance to train it on our content before the attack. Too much spam was published, and too many legitimate edits were blocked.

After some further mitigations, we turned on account creation on April 26th, and have been able to manage the amount of new spam. Here’s what did work:

  • Analyzing the attack. With some scripting, I was able to gather data about the attack, make some calculations, and create some graphs. We were able to reject some planned mitigations, such as delaying the time to the first edit (it turns out legit editors are just as fast as spammers).  The time we saved not implementing bad fixes meant more time for the fixes that worked.
  • New users can no longer create pages by default. During the spam attack, over 90% of new pages were spam. Removing this ability has cut spam back down to manageable levels. The page creation privilege can be added when requested, but the spammers don’t bother.
  • Content training is improved. At first, we trained Akismet on historical spam and historical “good” edits. We rolled out training on live edits, and then turned on edit blocking, with further training for incorrectly blocked edits.
  • Removing spam magnets from profiles. Some fields on the user profiles are used more by spammers than legitimate users, and have been removed.

Spam is back down to less than 2%. There’s still more to do. I expect that the spammers are temporarily blocked, and will return once they have a new strategy. I want to be ready.

It’s good to have account creation open. Many new pages have been translated, many small typos fixed. We’re starting to see ambitious new changes as well. It also has increased MDN staff workload, monitoring all the changes, finding bugs, and keeping a busy site working.  The increased workload broke my one-a-week blog habit.  Still, better than one post every two years.




This is not XP

April 25, 2016

Today is day 1 of iteration 1 of the new MDN durable team. There’s a lot of new vocabulary, including the agile vocabulary that means something different on every team. We did prep work last week. Some of the team took classes in user stories and formulating tasks. We went over some of the tasks as a team, and broke up into smaller groups for the other tasks. This gives us a backlog for our first 3 week iteration.

My experience is on co-located development teams, using Extreme Programming (XP). In the past, I’ve used Shore and Warden’s The Art of Agile Development, which describes the authors’ 37 practices modeled after XP and Scrum. It was very useful when I first led an agile team, and I’ve used it in agile teams since.

Chapter 4, “Adopting XP”, includes some prerequisites for using XP. Here’s how the MDN durable team measures up:

  1. Management Support – The marketing organization is mandating the agile process. We have to work with other Mozilla organizations who don’t.
  2. Team Agreement – The team attitudes range from acceptance to hostility.
  3. A Colocated Team – No two people share an office. The team members are all remote and cover US and European time zones.
  4. On-Site Customers – We have a product manager, who is new to MDN this year. MDN has many customers, from internal Mozilla teams that use MDN to document their interfaces to code schools that use MDN as a reference.
  5. The Right Team Size – There are 10 team members: 2 developers, 5 content creators, and 3 managers (Product, Project, and Program). We’re augmenting this with contractors and volunteers.
  6. Use all the Practices – It’s not being run as an XP project, and we couldn’t if we wanted to.

I’d rate us around 50% for the prerequisites. We can’t overcome some of these without big changes in the team.

The chapter also includes some recommendations:

  1. A Brand-New Codebase – Kuma is 6 years old, and forked from another project. Some of the content goes back to 2005.
  2. Strong Design Skills – I’m good when I start from scratch, like BrowserCompat. An existing code base takes a different set of skills, and I have some experience here too. The legacy content is a different problem, both working with it and convincing others to prioritize tackling it. I’m also not in charge, so my skills are often irrelevant.
  3. A Language That’s Easy to Refactor – Python wins. KumaScript loses, but maybe is fixable.
  4. An Experienced Programmer-Coach – I think everyone is new to the Marketing org’s process. We have lots of agile experience as individuals, but this seems like this is neutral at best at working with the new process.
  5. A Friendly and Cohesive Team – Most of the team members still think of themselves as either on the content team or the development team.

I rate us below 50% on being able to follow the recommendations.

I wouldn’t recommend this as an XP project. But it’s not. It’s adopting this other process, new to me. I’m realizing how irrelevant my experience is, other than learned trust that we’ll adjust as we go. There’s time for 11 iterations this year, 11 chances to adjust.

It will be an interesting 3 weeks, and quite a year.

Starting a new phase at MDN

April 18, 2016

At the start of 2016, I was the API designer and back end developer for the BrowserCompat  project, with the goal of serving Mozilla Developer Network’s (MDN’s) compatibility data from an API.  I had been doing this for a while, one year as a part-time contractor, and six months as a Mozilla employee. Technically, I was an MDN web developer, but I was happy to let the other four developers do the bulk of the MDN work while I focused on implementing BrowserCompat.

It’s four months later, and everything has changed.

MDN development has moved from the Firefox organization to the Marketing organization.  Marketing developers work on important codebases, such as bedrock, which runs, and snippets, which has to quickly serve content for the Firefox Start Page.  MDN will benefit from their expertise in moving these services to modern cloud infrastructures.

While MDN development moved, the three other back end developers stayed in the Firefox organization. This makes me the sole back end developer on MDN. I had some light experience with Kuma, the MDN engine, mostly reviewing other’s pull requests. I’ve dived into the deep end of Kuma development for about a month.

Moving MDN to Marketing puts it in the same organization as the writers, allowing the formation of the MDN durable team.  I’m not 100% sure how “agile marketing practice” works (I haven’t read the book yet). We’ve gained some new management, and will start learning the new processes in earnest this Week. It will be interesting to see what MDN goals looks like, as opposed to separate content and development goals.

Our first goal was decided for us. In February, MDN was attacked by automated spammers, creating hundreds of pages advertising garbage that has nothing to do with the open web.  Stopping the spam required halting new account creation. I’ve been working on spam mitigations since March, and we hope to open account registration again soon.

This has been a hard transition. I really liked my job in January, and was very comfortable with the next steps to success. This new job is a lot more confusing, and changes every week. It is also a lot more important to Mozilla than my little experiment ever was. I’m sure it will take a few months for me to get comfortable with the code, the infrastructure, the new processes, and the new co-workers.  There’s no time for a gentle transition, since the site is under attack, and everyone is trying to figure out how MDN and its millions of visitors fit in the new organization.

I still get to develop open-source software every week, at a company dedicated to openness.  Being open means showing off stuff before it is 100% ready. That includes me. I hope blogging about this next phase of MDN will help me figure it out for myself.

multigtfs 0.4.0 released (and 0.4.1, and 0.4.2)

August 2, 2014

I released multigtfs 0.4.0 on June 21, 2014, then 0.4.1 on July 11th, and 0.4.2 on July 20th.

multigtfs is a Django app that supports importing and exporting of GTFS feeds, a standard format for describing transit schedules. multigtfs allows multiple feeds to be stored in the database at once.

These releases are a big step forward. The highlights are:

  • Much faster imports and exports. For one large GTFS feed, the import went from 12 hours to 41 minutes. In a more typical feed, the import went from 29 minutes to 1 minute, and export from 5 minutes to 24 seconds.
  • Support for extra columns, used by many agencies to extend GTFS.
  • Extra configuration to make the Django admin usable after importing lots of data.
  • Python3 support

Read the rest of this entry »

Django management commands and out-of-memory kills

June 19, 2014

tl;dr: Smart developers run database-heavy operations as management commands or celery tasks.  If DEBUG=True, you’ll be recording each database access in memory.  Ensure DEBUG=False to avoid running out of memory.

I’m working on issue 28 for django-multi-gtfs, speeding up importing a GTFS feed.  My experience has been with Tulsa Transit’s GTFS feed, a 1.3 MB zip file, which takes about 30 minutes to import on my laptop.  arichev was importing the Southeast Queensland’s GTFS feed, a 21 MB feed which takes 9-12 hours to import.  It shouldn’t take half-a-day to import 3 million rows, even if many of them include PostGIS geometries.  So, it is time to optimize.

My optimization procedure is to:

  1. Measure the performance of the existing code against real-world data, to establish the benchmark,
  2. Create a representative data set that will demonstrate the problem but allow for reasonable iteration,
  3. Make changes and measure improvements against the test data, until significant improvements are gained,
  4. Measure against the real-world data again, and report the gains.

I was having a lot of trouble with step 1.  First, I had to clear ticket 10, which was preventing me from importing the Southeast Queensland data at all.  Next, I started an import on my MacBook Pro (2.4 GHz Intel Core i7 w/ 8 GB RAM) overnight, and woke to find my laptop unusable, requiring a hard reboot.  I spent some pennies and moved to the cloud.  A t1.micro instance was unable to import the Tulsa feed, killed by the out of memory killer.  That seemed odd, but I knew the free tier instances were memory constrained (600 MB RAM), with no swap.  I then moved to a m3.medium instance (3.75 GB RAM).  The Tulsa feed imported in 50 minutes (vs. 30 minutes on my laptop, which isn’t surprising).  However, importing the Southeast Queensland feed also trigger the OOM killer.

How can importing 165MB of text use up almost 4 GB of memory?

My suspicion was a memory leak, or some out-of-control caching.  Memory leaks are rare in Python, but possible.  Caching is a more likely danger – some helpful code that speeds up a web request could cause an offline task to eat up memory.  I remembered a recent Planet Python post on memory leaks, and quickly found mem_top.  There’s a few variations that all do the same basic thing, asking Python about current memory usage and identify the top memory users.  mem_top is an easy pip install, and easy to call:

from mem_top import mem_top

I added this code in several places, tested it against a small sample feed, and then ran it against a small but real feed. The output looks like this:

multigtfs.models.base - DEBUG - Memory hogs: 
102371	<type 'list'> [{u'time': u'0.003', u'sql': u'SELECT postgis_lib_version()'}, {u'time': u'0.001', u'sql': u'INSERT 
2510	<type 'dict'> {'_multiprocessing': <module '_multiprocessing' from '/Users/john/.virtualenvs/explore/lib/python2.7
444	<type 'dict'> {'SocketType': <class 'socket._socketobject'>, 'getaddrinfo': , 'AI_N
440	<type 'dict'> {'WTERMSIG': , 'lseek': , 'EX_IOERR': 74, 'EX_N
437	<type 'list'> [,  {'SocketType': <type '_socket.socket'>, 'getaddrinfo': , 'AI_NUMERICS
364	<type 'dict'> {'SocketType': <type '_socket.socket'>, 'getaddrinfo': , 'AI_NUMERICS
330	<type 'dict'> {'WTERMSIG': , 'lseek': , 'EX_IOERR': 74, 'EX_N
330	<type 'dict'> {'WTERMSIG': , 'lseek': , 'EX_IOERR': 74, 'EX_N
328	<type 'dict'> {'empty_provider': <pkg_resources.EmptyProvider instance at 0x105781ab8>, 'ExtractionError': 
5067	 <type 'tuple'>
4765	 <type 'dict'>
2736	 <type 'list'>
1862	 <type 'cell'>
1857	 <type 'weakref'>
1423	 <type 'builtin_function_or_method'>
1326	 <type 'wrapper_descriptor'>
1200	 <type 'type'>
1072	 <type 'getset_descriptor'>

The top memory consumer, which grew as the feed was imported, was that list of dictionaries, with timing and SQL queries.  There’s not enough here to nail down the issue, but it was enough to trigger my memory of the common advice to never run Django with DEBUG=True in production.  They’ve listed it in the FAQ under “Why is Django leaking memory?”, which then links to “How can I see the raw SQL queries Django is running?”  When DEBUG=True in your settings, you can access the SQL queries used by the connection:

>>> from django.db import connection
>>> connection.queries
[{'sql': 'SELECT, polls_polls.question, polls_polls.pub_date FROM polls_polls',
'time': '0.002'}]

This is done in db/backends/__init__py:

    def cursor(self):
        Creates a cursor, opening a connection if necessary.
        if (self.use_debug_cursor or
            (self.use_debug_cursor is None and settings.DEBUG)):
            cursor = self.make_debug_cursor(self._cursor())
            cursor = util.CursorWrapper(self._cursor(), self)
        return cursor

make_debug_cursor returns CursorDebugWrapper from db/backend/, which does some extra work:

class CursorDebugWrapper(CursorWrapper):
    def execute(self, sql, params=None):
        start = time()
            return super(CursorDebugWrapper, self).execute(sql, params)
            stop = time()
            duration = stop - start
            sql = self.db.ops.last_executed_query(self.cursor, sql, params)
                'sql': sql,
                'time': "%.3f" % duration,
            logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}

When you are running Django as a web server, the request_started signal will reset the queries to an empty list, so you’ll get fresh data for the next request. This is exactly what you want for single-machine debugging, and it’s how tools like debug toolbar give you valuable information about your SQL queries. Lots of SQL queries are usually the thing making your requests slow. The solution is usually to add smarter indexes, combine similar queries to reduce total database requests, read from a cache instead of the database, and defer writes to an async task runner like celery. I usually leave these optimizations until I’m pretty sure the business team is happy with the design.

This same query caching logic runs in offline tasks. Loading an image from disk into a BLOB? The base64-encoded version is in memory. Crunching usage data for engagement metrics? Here’s a handful of SQL queries for every instance in the system. And, surprising to me, running a management command also logs all of the SQL queries.

Setting DEBUG=False will disable this behavior. I suspect New Relic acts like DEBUG="Yes and lots of it". For one project, I was running my celery tasks under the New Relic Python agent, but I’m going to stop doing that, and see if some memory issues around stats crunching go away.

I want DEBUG=True for local testing, but I want to be able to run database-intensive management commands without having to remember to turn it off.  My solution is to add this snippet to the management command:

        # Disable database query logging
        from django.conf import settings
        if settings.DEBUG:
            from django.db import connection
            connection.use_debug_cursor = False

This uses the undocumented use_debug_cursor attribute, so it may not work with exotic database backends. It only disables debugging on the default connection, so you’ll have to do more work if you use multiple database connections. Iterating through django.db.connections should do the trick (see the reset_queries code).

Also, it looks like changes are in progress for Django 1.7. The “memory leak” question has been removed from the FAQ, and the queries code has recently been updated to use a limited queue, and to always turn on database query recording when DEBUG=True. Maybe this change will take care of the issue, but I’ll miss the ability to disable this for management commands, since I just discovered it last night.  I’ve added ticket #22873, we’ll see if they agree with my patch.

Update: The Django developer doesn’t want to change the behavior back.  The attribute use_debug_cursor should probably be named force_debug_cursor, since the intention is to force SQL logging for support of assertNumQueries.  I’ve got ideas for supporting my use case in Django 1.7, but the code will get more complex.  Oh well.

This Week

June 13, 2014

It’s been a busy week.

Scott Philips is presenting in Pittsburgh, PA today (June 13th).  Civic Ninjas was awarded a grant from the Knight Prototype Fund back in January, and this is a chance for the projects to show what they’ve built.  I imported Pennsylvania census data, so that the visualizations would show live data in Pittsburgh.  I also failed to learn AngularJS in an evening, so it’s on the to-do list.  The rest of the team did amazing work on the front end, visualizations, wristbands, and fleshing out metric descriptions.  Scott will report back next week on how the demo went, and we’ll figure out next steps.

I’m on the Cultivate918 team investigating “building an online resource center with an actively managed calendar of events”.  I’m suspicious.  Why does the community need the calendar?  Is it a marketing site for event promotion?  Is it for picking events based on speakers and topics?  Is it for facilitating networking?  I’m a fan of Portland’s Calagator, which aggregates calendars for hundreds of tech events, but this feels like a poor fit for Tulsa.  There are some Oklahoma City efforts, Made in OK and Sivi, which might be part of our solution.

Step one for me is lots of discussions with different people around town, to identify the core needs and desires.  I feel that the solution has to excite and engage entrepreneurs (and not just support orgs), or it will be dead on arrival.  And, when we do start work, I want have analytics in the first release.

For django-multi-gtfs, I’m tackling some long-standing tickets on the path to 0.4.0.  I converted the code to run under both Python 2.7 and 3.4, which took a few nights.  I got a lot of useful advice from Brett Cannon’s post on his experience with caniusepython3, Django’s docs on porting to python 3, and the six docs.  The biggest pain point was the built-in csv module.  In Python 2.7, you read and writes bytes, but in Python 3.4, you read and write unicode.  I should have converted the Python 2.7 code to unicodecsv first, and I still might.  For now, I’m ready to move on to the next ticket.

In the indefensible project, I’m working through a d3 book, learning about layouts.  I’m almost done with the book, but it still feels like a tour of d3 rather than a tutorial.  I’ll have to cut my teeth on a real project before I get d3.  Or, maybe I picked the wrong book, and should read a different one.

It’s not all work.  It’s my daughter’s tenth birthday this weekend, and she’s having a sleepover.  It’s been a week of getting the house in order, assembling presents, and getting ready.  I’m reading A Feast for Crows, and I’m worried Cersei is really screwing things up.  I haven’t seen the Apple keynote yet, but boy have I heard all about it.

And, I’m trying to make my company a real thing, and regularly blog again.  You’ll know if I succeeded this time next week.


The Tulsa Transit Project at 20 months

October 12, 2012

Back in March 2011, the Tulsa Web Devs started a simple project – get the Tulsa Transit (aka MTTA) data on Google Maps. By July 2011, we had a working GTFS feed. We had some well-deserved drinks to celebrate, and I put together a presentation of the progress. It felt like we’d have the bus schedule on Google Maps by the Fall.

Well, it’s over a year later, and Tulsa is still not on the list. If you go to Google Maps and try to get transit directions, it looks a lot like Tulsa doesn’t have a bus system.

The data is ready. It’s not perfect, not 100%. It was at 75% in July 2011, it’s at 95% today. There is plenty I’d like to improve, but at any point in the last year, Google would have accepted the data and put Tulsa’s bus schedule on their maps. It is not on Google Maps because Tulsa Transit has not taken the next steps to sign up for Google Transit. There is no cost to join – MTTA just needs to provide a free license to Google for the data. We’ve recently met with the MTTA staff, and I’m hopeful that we’ll see some progress in the next 30 days.

While we haven’t met our main goal, there has been a lot of progress over the last year. Here are the highlights:

  • Tulsa Transit launched MILES, their online trip planner. It’s better than the paper or PDF schedules. It’s not as useful as Google Maps.
  • The Tulsa Web Devs launched the Tulsa Trip Planner, using OpenTripPlanner to display the GTFS data. This came with some interesting analysis tools. It’s not as useful as Google Maps, but we get to show what we’ve been working on for the last 18 months.
  • I’ve decoded the MTTA pattern data. This means that the bus appears to drive down streets and take turns at intersections. Last year, it looked like buses cut through parking lots and buildings to get from stop to stop.
  • I’ve added additional data to the GTFS feed, like the holiday schedule, so that the feed is more accurate.
  • I’ve converted the parsing code from a bunch of command-line Python scripts to a Django 1.4 web application. I still have to type on the command line to parse the data, but I get Django’s awesome admin interface as well. The code is available on github. The web app is only on my laptop. I hope to change that soon.
  • I’ve moved the GTFS code to a stand-alone project, django-multi-gtfs. This is the place for all the non-MTTA bits, which are potentially useful for other projects.

We’re going into a 24-hour Hackathon this weekend, and I hope to improve the Tulsa Transit project. My personal goals are:

  • Improve the web app:
    • Get the Django app hosted, so that others can see a running
      version for themselves.
    • Support parsing the MTTA signup data from the web interface rather
      than the command line, so that MTTA staff can use it.
  • Improve the trip planner:
    • Update to the latest OpenTripPlanner code (and keep it updated).
    • Add features like street lookup and mobile support.
  • Improve the data:
    • Show data parsing issues in the web app, so that MTTA can fix
      issues in the signup data.
    • Add an interface for linking stop data to schedule data, when it can’t
      be done automatically.
    • Add support for ‘fixups’ that can be re-applied to new MTTA

However, my goal for the weekend is:

  • Get other people involved in the project

So, I’ll be focused on getting people up to speed – get a copy of the source
code, install all the dependencies, and get it running on their laptop.
After that, I’ll find out what they think is interesting, and get started.

I’ll let you know how it goes.