Skip to content

Data race in the codec.errors setter (codecctx_errors_set) on a shared stateful codec #152767

Description

@Naserume

Bug report

Bug description:

Bug description:

In the free-threaded build, the codecctx_errors_set decrefs the old handler(ERROR_DECREF(self->errors)) and stores the new one (self->errors = cb) without any synchronization

static int
codecctx_errors_set(PyObject *op, PyObject *value, void *Py_UNUSED(closure))
{
PyObject *cb;
const char *str;
MultibyteStatefulCodecContext *self = _MultibyteStatefulCodecContext_CAST(op);
if (value == NULL) {
PyErr_SetString(PyExc_AttributeError, "cannot delete attribute");
return -1;
}
if (!PyUnicode_Check(value)) {
PyErr_SetString(PyExc_TypeError, "errors must be a string");
return -1;
}
str = PyUnicode_AsUTF8(value);
if (str == NULL)
return -1;
cb = internal_error_callback(str);
if (cb == NULL)
return -1;
ERROR_DECREF(self->errors);
self->errors = cb;
return 0;
}

For any handler name other than the strict/ignore/replace sentinels (e.g. backslashreplace), self->errors holds a real refcounted PyUnicode.

The decref-then-assign is not atomic, so concurrent codec.errors = drops references incorrectly and can free a handler while it is still referenced.

Reproducer:

import codecs
import random
from threading import Thread
shared = codecs.getincrementalencoder('shift_jis')()
NAMES = ['backslashreplace', 'xmlcharrefreplace', 'namereplace']

def thread1():
    for _ in range(20000):
        try:
            shared.errors = random.choice(NAMES)
        except Exception:
            pass

if __name__ == "__main__":
    threads = [Thread(target=thread1) for _ in range(8)]
    for t in threads: t.start()
    for t in threads: t.join()

TSAN Report:

==================
WARNING: ThreadSanitizer: data race (pid=3576762)
  Read of size 8 at 0x7fffb65a1cc0 by thread T2:
    #0 codecctx_errors_set /cpython/./Modules/cjkcodecs/multibytecodec.c:194:5 
    #1 getset_set /cpython/Objects/descrobject.c:250:16 
    #2 _PyObject_GenericSetAttrWithDict /cpython/Objects/object.c:2049:19
    #3 PyObject_GenericSetAttr /cpython/Objects/object.c:2120:12
    #4 PyObject_SetAttr /cpython/Objects/object.c:1533:15
    #5 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:12146:27 
...

  Previous write of size 8 at 0x7fffb65a1cc0 by thread T1:
    #0 codecctx_errors_set /cpython/./Modules/cjkcodecs/multibytecodec.c:195:18
    #1 getset_set /cpython/Objects/descrobject.c:250:16 
    #2 _PyObject_GenericSetAttrWithDict /cpython/Objects/object.c:2049:19
    #3 PyObject_GenericSetAttr /cpython/Objects/object.c:2120:12 
    #4 PyObject_SetAttr /cpython/Objects/object.c:1533:15 
    #5 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:12146:27 
...

SUMMARY: ThreadSanitizer: data race /cpython/./Modules/cjkcodecs/multibytecodec.c:194:5 in codecctx_errors_set
==================

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions