Archive for May, 2016

Faster virtualenv workflows with mkvirtualenvhere and workonthis

May 25, 2016

I use virtualenvwrapper to create Python virtual environments. I have a couple of functions in my ~/.bash_profile that make it even easier to use:

if [ -f $(brew --prefix)/bin/virtualenvwrapper.sh ]; then
    . $(brew --prefix)/bin/virtualenvwrapper.sh;
    function mkvirtualenvhere() { mkvirtualenv "$@" "${PWD##*/}" ; setvirtualenvproject; }
    function workonthis() { workon "${PWD##*/}" ; }
fi

This is for OS X and homebrew. You’ll need something slightly different for your flavor of Linux.

mkvirtualenvhere creates a new virtualenv with the same name as the current directory, and I use it like this:

$ cd ~/src
$ git clone https://github.com/jwhitlock/drf-cached-instances.git
$ cd drf-cached-instances
$ mkvirtualenvhere
$ pip install -r requirements.txt

The virtualenv is called “drf-cached-instances”, and I can start work two ways. The standard way is:

$ workon drf-cached-instances

This will activate the virtualenv and change the current directory to ~/src/drf-cached-instances, because of the setvirtualenvproject call.

The second way is:

$ cd ~/src/drf-cached-instances
$ workonthis

This activates the virtualenv that shares the name of the current directory. Often, I think I am just looking at a project, using ack to explore the code, but then switch to running the project so I can experiment in an interactive prompt.

Another tool to help this workflow is GitHub’s hub, which can be faster than git for working with GitHub hosted projects.  I often use it for code review work, but forget to use it when cloning a new project.

Advertisement

Reopening Accounts on MDN

May 22, 2016

MDN has paid writing staff, but is also supported by community contributions. Much like Wikipedia, anyone can see a problem, create an account, and fix it. Some of the most valuable community contributions are translating content into non-English languages, which would be very expensive to do with paid staff.

MDN assumes users want to be helpful, and publishes changes immediately. This openness leaves MDN vulnerable to spam. Over the years, new features have been added to monitor changes, revert unwanted ones, and ban users with ill intent. These small mitigations were enough for the small volume of spam, about 1% of all edits.

On February 12th, a new phase in the spam fight started. Over 50% of the new accounts created that day were used to create spam, and spam was being created faster than staff could handle it. We responded by shutting down new accounts, which stopped the problem, but also shut the door on legitimate new contributors for almost three months.

There’s a few things we tried but didn’t work:

  • Adding reCAPTCHA to account creation. The spammers were able to bypass it and create as much spam as before. This suggests humans are in the spam creation loop.
  • Turning on our first Akismet integration. We had developed custom code to send edits to Akismet, but hadn’t had a chance to train it on our content before the attack. Too much spam was published, and too many legitimate edits were blocked.

After some further mitigations, we turned on account creation on April 26th, and have been able to manage the amount of new spam. Here’s what did work:

  • Analyzing the attack. With some scripting, I was able to gather data about the attack, make some calculations, and create some graphs. We were able to reject some planned mitigations, such as delaying the time to the first edit (it turns out legit editors are just as fast as spammers).  The time we saved not implementing bad fixes meant more time for the fixes that worked.
  • New users can no longer create pages by default. During the spam attack, over 90% of new pages were spam. Removing this ability has cut spam back down to manageable levels. The page creation privilege can be added when requested, but the spammers don’t bother.
  • Content training is improved. At first, we trained Akismet on historical spam and historical “good” edits. We rolled out training on live edits, and then turned on edit blocking, with further training for incorrectly blocked edits.
  • Removing spam magnets from profiles. Some fields on the user profiles are used more by spammers than legitimate users, and have been removed.

Spam is back down to less than 2%. There’s still more to do. I expect that the spammers are temporarily blocked, and will return once they have a new strategy. I want to be ready.

It’s good to have account creation open. Many new pages have been translated, many small typos fixed. We’re starting to see ambitious new changes as well. It also has increased MDN staff workload, monitoring all the changes, finding bugs, and keeping a busy site working.  The increased workload broke my one-a-week blog habit.  Still, better than one post every two years.