MySQL Connector: Inherited Transactions

mysql python

The MySQL Connector (Python Driver) seems to be leaking file descriptors during forks.

This is what I have noticed today: if a process opens a MySQL connection and then forks, the child process not just inherits the open connection, but also the transaction state. The current transaction becomes shared between the child and the parent. That is, if the child process rolls back, the parent also gets a roll back.

Also, as it is the same transaction, a lock set by one process has no effect on another.

Here is a proof of concept:

"""
Create and populate a database before running this script:

create database mytest;
grant all on mytest.* to ''@'localhost';
flush privileges;
create table foo(a int);
insert into foo (a) values (0);
"""

import time
from multiprocessing import Process
import _mysql

reconnect = False  # change to true to make the child process block (it should)

conn = _mysql.connect("localhost", user="mike", db="mytest", passwd="")

def sub():
    if reconnect:
        sub_conn = _mysql.connect("localhost", user="mike", db="mytest", passwd="")
    else:
        sub_conn = conn
    print "SUB: start", sub_conn.thread_id()
    print "SUB: do this to get the number of connections -> sudo lsof | grep mysql.sock"
    sub_conn.query('begin')
    sub_conn.query('select * from foo for update')
    if not reconnect:
        print "SUB: NOT BLOCKED, sleeping for 30 sec to hold the conneciton open"
        time.sleep(30)
    print "SUB: result", sub_conn.use_result().fetch_row()
    sub_conn.query('rollback')
    print "SUB: end"

print "HOST: start", conn.thread_id()
conn.query('begin')
conn.query('select * from foo for update')
print "HOST: result", conn.use_result().fetch_row()

process = Process(target=sub)
print "HOST: start sub"
process.start()
process.join()
print "HOST: sub joined"

conn.query('rollback')
print "HOST: end"

When reconnect is set to False, the parent’s thread id will be the same as in the child. The reason why is that MySQL uses server-side thread ids as connection identifiers. Here’s the mysql_thread_id function (mysql-connector-c-6.1.0-src/libmysql/libmysql.c:1070):

ulong STDCALL mysql_thread_id(MYSQL *mysql)
{
  ......
  return (mysql)->thread_id;
}

And this is how it is set in CLI_MYSQL_REAL_CONNECT (mysql-connector-c-6.1.0-src/sql-common/client.c:3613):

......
server_version_end= end= strend((char*) net->read_pos+1);
mysql->thread_id=uint4korr(end+1);
end+=5;
......

The direct consequence is that children processes, created for example using the multiprocessing module, must close the inherited MySQL connections and then reopen them to avoid surprises.

When I discovered it, I immediately thought about Django management commands splitting workload between children.

Open questions:

  1. Are Celery tasks affected by this? – probably yes.
  2. What happens when two processes sharing a transaction update data at the same time?