MDN has paid writing staff, but is also supported by community contributions. Much like Wikipedia, anyone can see a problem, create an account, and fix it. Some of the most valuable community contributions are translating content into non-English languages, which would be very expensive to do with paid staff.
MDN assumes users want to be helpful, and publishes changes immediately. This openness leaves MDN vulnerable to spam. Over the years, new features have been added to monitor changes, revert unwanted ones, and ban users with ill intent. These small mitigations were enough for the small volume of spam, about 1% of all edits.
On February 12th, a new phase in the spam fight started. Over 50% of the new accounts created that day were used to create spam, and spam was being created faster than staff could handle it. We responded by shutting down new accounts, which stopped the problem, but also shut the door on legitimate new contributors for almost three months.
There’s a few things we tried but didn’t work:
- Adding reCAPTCHA to account creation. The spammers were able to bypass it and create as much spam as before. This suggests humans are in the spam creation loop.
- Turning on our first Akismet integration. We had developed custom code to send edits to Akismet, but hadn’t had a chance to train it on our content before the attack. Too much spam was published, and too many legitimate edits were blocked.
After some further mitigations, we turned on account creation on April 26th, and have been able to manage the amount of new spam. Here’s what did work:
- Analyzing the attack. With some scripting, I was able to gather data about the attack, make some calculations, and create some graphs. We were able to reject some planned mitigations, such as delaying the time to the first edit (it turns out legit editors are just as fast as spammers). The time we saved not implementing bad fixes meant more time for the fixes that worked.
- New users can no longer create pages by default. During the spam attack, over 90% of new pages were spam. Removing this ability has cut spam back down to manageable levels. The page creation privilege can be added when requested, but the spammers don’t bother.
- Content training is improved. At first, we trained Akismet on historical spam and historical “good” edits. We rolled out training on live edits, and then turned on edit blocking, with further training for incorrectly blocked edits.
- Removing spam magnets from profiles. Some fields on the user profiles are used more by spammers than legitimate users, and have been removed.
Spam is back down to less than 2%. There’s still more to do. I expect that the spammers are temporarily blocked, and will return once they have a new strategy. I want to be ready.
It’s good to have account creation open. Many new pages have been translated, many small typos fixed. We’re starting to see ambitious new changes as well. It also has increased MDN staff workload, monitoring all the changes, finding bugs, and keeping a busy site working. The increased workload broke my one-a-week blog habit. Still, better than one post every two years.