Bug report
Bug description:
Bug description:
In the free-threaded build, the codecctx_errors_set decrefs the old handler(ERROR_DECREF(self->errors)) and stores the new one (self->errors = cb) without any synchronization
|
static int |
|
codecctx_errors_set(PyObject *op, PyObject *value, void *Py_UNUSED(closure)) |
|
{ |
|
PyObject *cb; |
|
const char *str; |
|
MultibyteStatefulCodecContext *self = _MultibyteStatefulCodecContext_CAST(op); |
|
|
|
if (value == NULL) { |
|
PyErr_SetString(PyExc_AttributeError, "cannot delete attribute"); |
|
return -1; |
|
} |
|
if (!PyUnicode_Check(value)) { |
|
PyErr_SetString(PyExc_TypeError, "errors must be a string"); |
|
return -1; |
|
} |
|
|
|
str = PyUnicode_AsUTF8(value); |
|
if (str == NULL) |
|
return -1; |
|
|
|
cb = internal_error_callback(str); |
|
if (cb == NULL) |
|
return -1; |
|
|
|
ERROR_DECREF(self->errors); |
|
self->errors = cb; |
|
return 0; |
|
} |
For any handler name other than the strict/ignore/replace sentinels (e.g. backslashreplace), self->errors holds a real refcounted PyUnicode.
The decref-then-assign is not atomic, so concurrent codec.errors = drops references incorrectly and can free a handler while it is still referenced.
Reproducer:
import codecs
import random
from threading import Thread
shared = codecs.getincrementalencoder('shift_jis')()
NAMES = ['backslashreplace', 'xmlcharrefreplace', 'namereplace']
def thread1():
for _ in range(20000):
try:
shared.errors = random.choice(NAMES)
except Exception:
pass
if __name__ == "__main__":
threads = [Thread(target=thread1) for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()
TSAN Report:
==================
WARNING: ThreadSanitizer: data race (pid=3576762)
Read of size 8 at 0x7fffb65a1cc0 by thread T2:
#0 codecctx_errors_set /cpython/./Modules/cjkcodecs/multibytecodec.c:194:5
#1 getset_set /cpython/Objects/descrobject.c:250:16
#2 _PyObject_GenericSetAttrWithDict /cpython/Objects/object.c:2049:19
#3 PyObject_GenericSetAttr /cpython/Objects/object.c:2120:12
#4 PyObject_SetAttr /cpython/Objects/object.c:1533:15
#5 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:12146:27
...
Previous write of size 8 at 0x7fffb65a1cc0 by thread T1:
#0 codecctx_errors_set /cpython/./Modules/cjkcodecs/multibytecodec.c:195:18
#1 getset_set /cpython/Objects/descrobject.c:250:16
#2 _PyObject_GenericSetAttrWithDict /cpython/Objects/object.c:2049:19
#3 PyObject_GenericSetAttr /cpython/Objects/object.c:2120:12
#4 PyObject_SetAttr /cpython/Objects/object.c:1533:15
#5 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:12146:27
...
SUMMARY: ThreadSanitizer: data race /cpython/./Modules/cjkcodecs/multibytecodec.c:194:5 in codecctx_errors_set
==================
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Bug report
Bug description:
Bug description:
In the free-threaded build, the
codecctx_errors_setdecrefs the old handler(ERROR_DECREF(self->errors)) and stores the new one (self->errors = cb) without any synchronizationcpython/Modules/cjkcodecs/multibytecodec.c
Lines 170 to 197 in ecdef17
For any handler name other than the
strict/ignore/replacesentinels (e.g.backslashreplace),self->errorsholds a real refcountedPyUnicode.The decref-then-assign is not atomic, so concurrent
codec.errors =drops references incorrectly and can free a handler while it is still referenced.Reproducer:
TSAN Report:
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux