mdbx: добавление режима MDBX_NOSTICKYTHREADS вместо MDBX_NOTLS.

This commit is contained in:
Леонид Юрьев (Leonid Yuriev) 2024-04-02 00:22:09 +03:00
parent 1727b697a0
commit e56c73b4e6
14 changed files with 283 additions and 198 deletions

View File

@ -11,8 +11,7 @@ For the same reason ~~Github~~ is blacklisted forever.
So currently most of the links are broken due to noted malicious ~~Github~~ sabotage. So currently most of the links are broken due to noted malicious ~~Github~~ sabotage.
- [Replace SRW-lock on Windows to allow shrink DB with `MDBX_NOTLS` option](https://libmdbx.dqdkfa.ru/dead-github/issues/210). - [Replace SRW-lock on Windows to allow shrink DB with `MDBX_NOSTICKYTHREADS` option](https://libmdbx.dqdkfa.ru/dead-github/issues/210).
- [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200).
- [Migration guide from LMDB to MDBX](https://libmdbx.dqdkfa.ru/dead-github/issues/199). - [Migration guide from LMDB to MDBX](https://libmdbx.dqdkfa.ru/dead-github/issues/199).
- [Support for RAW devices](https://libmdbx.dqdkfa.ru/dead-github/issues/124). - [Support for RAW devices](https://libmdbx.dqdkfa.ru/dead-github/issues/124).
- [Support MessagePack for Keys & Values](https://libmdbx.dqdkfa.ru/dead-github/issues/115). - [Support MessagePack for Keys & Values](https://libmdbx.dqdkfa.ru/dead-github/issues/115).
@ -22,6 +21,7 @@ So currently most of the links are broken due to noted malicious ~~Github~~ sabo
Done Done
---- ----
- [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200).
- [Move most of `mdbx_chk` functional to the library API](https://libmdbx.dqdkfa.ru/dead-github/issues/204). - [Move most of `mdbx_chk` functional to the library API](https://libmdbx.dqdkfa.ru/dead-github/issues/204).
- [Simple careful mode for working with corrupted DB](https://libmdbx.dqdkfa.ru/dead-github/issues/223). - [Simple careful mode for working with corrupted DB](https://libmdbx.dqdkfa.ru/dead-github/issues/223).
- [Engage an "overlapped I/O" on Windows](https://libmdbx.dqdkfa.ru/dead-github/issues/224). - [Engage an "overlapped I/O" on Windows](https://libmdbx.dqdkfa.ru/dead-github/issues/224).

View File

@ -190,18 +190,20 @@ readers without writer" case.
## One thread - One transaction ## One thread - One transaction
A thread can only use one transaction at a time, plus any nested A thread can only use one transaction at a time, plus any nested
read-write transactions in the non-writemap mode. Each transaction read-write transactions in the non-writemap mode. Each transaction
belongs to one thread. The \ref MDBX_NOTLS flag changes this for read-only belongs to one thread. The \ref MDBX_NOSTICKYTHREADS flag changes this,
transactions. See below. see below.
Do not start more than one transaction for a one thread. If you think Do not start more than one transaction for a one thread. If you think
about this, it's really strange to do something with two data snapshots about this, it's really strange to do something with two data snapshots
at once, which may be different. MDBX checks and preventing this by at once, which may be different. MDBX checks and preventing this by
returning corresponding error code (\ref MDBX_TXN_OVERLAPPING, \ref MDBX_BAD_RSLOT, returning corresponding error code (\ref MDBX_TXN_OVERLAPPING,
\ref MDBX_BUSY) unless you using \ref MDBX_NOTLS option on the environment. \ref MDBX_BAD_RSLOT, \ref MDBX_BUSY) unless you using
Nonetheless, with the `MDBX_NOTLS` option, you must know exactly what you \ref MDBX_NOSTICKYTHREADS option on the environment.
are doing, otherwise you will get deadlocks or reading an alien data. Nonetheless, with the `MDBX_NOSTICKYTHREADS` option, you must know
exactly what you are doing, otherwise you will get deadlocks or reading
an alien data.
## Do not open twice ## Do not open twice

View File

@ -129,20 +129,23 @@ no open MDBX-instance(s) during fork(), or at least close it immediately after
necessary) in a child process would be both extreme complicated and so necessary) in a child process would be both extreme complicated and so
fragile. fragile.
Do not start more than one transaction for a one thread. If you think about Do not start more than one transaction for a one thread. If you think
this, it's really strange to do something with two data snapshots at once, about this, it's really strange to do something with two data snapshots
which may be different. MDBX checks and preventing this by returning at once, which may be different. MDBX checks and preventing this by
corresponding error code (\ref MDBX_TXN_OVERLAPPING, \ref MDBX_BAD_RSLOT, \ref MDBX_BUSY) returning corresponding error code (\ref MDBX_TXN_OVERLAPPING,
unless you using \ref MDBX_NOTLS option on the environment. Nonetheless, with the \ref MDBX_BAD_RSLOT, \ref MDBX_BUSY) unless you using
\ref MDBX_NOTLS option, you must know exactly what you are doing, otherwise you \ref MDBX_NOSTICKYTHREADS option on the environment. Nonetheless,
will get deadlocks or reading an alien data. with the \ref MDBX_NOSTICKYTHREADS option, you must know exactly what
you are doing, otherwise you will get deadlocks or reading an alien
data.
Also note that a transaction is tied to one thread by default using Thread Also note that a transaction is tied to one thread by default using
Local Storage. If you want to pass read-only transactions across threads, Thread Local Storage. If you want to pass transactions across threads,
you can use the \ref MDBX_NOTLS option on the environment. Nevertheless, a write you can use the \ref MDBX_NOSTICKYTHREADS option on the environment.
transaction entirely should only be used in one thread from start to finish. Nevertheless, a write transaction must be committed or aborted in the
MDBX checks this in a reasonable manner and return the \ref MDBX_THREAD_MISMATCH same thread which it was started. MDBX checks this in a reasonable
error in rules violation. manner and return the \ref MDBX_THREAD_MISMATCH error in rules
violation.
## Transactions, rollbacks etc ## Transactions, rollbacks etc

128
mdbx.h
View File

@ -1207,28 +1207,80 @@ enum MDBX_env_flags_t {
*/ */
MDBX_WRITEMAP = UINT32_C(0x80000), MDBX_WRITEMAP = UINT32_C(0x80000),
/** Tie reader locktable slots to read-only transactions /** Отвязывает транзакции от потоков/threads насколько это возможно.
* instead of to threads.
* *
* Don't use Thread-Local Storage, instead tie reader locktable slots to * Эта опция предназначена для приложений, которые мультиплексируют множество
* \ref MDBX_txn objects instead of to threads. So, \ref mdbx_txn_reset() * пользовательских легковесных потоков выполнения по отдельным потокам
* keeps the slot reserved for the \ref MDBX_txn object. A thread may use * операционной системы, например как это происходит в средах выполнения
* parallel read-only transactions. And a read-only transaction may span * GoLang и Rust. Таким приложениям также рекомендуется сериализовать
* threads if you synchronizes its use. * транзакции записи в одном потоке операционной системы, поскольку блокировка
* записи MDBX использует базовые системные примитивы синхронизации и ничего
* не знает о пользовательских потоках и/или легковесных потоков среды
* выполнения. Как минимум, обязательно требуется обеспечить завершение каждой
* пишущей транзакции строго в том же потоке операционной системы где она была
* запущена.
* *
* Applications that multiplex many user threads over individual OS threads * \note Начиная с версии v0.13 опция `MDBX_NOSTICKYTHREADS` полностью
* need this option. Such an application must also serialize the write * заменяет опцию \ref MDBX_NOTLS.
* transactions in an OS thread, since MDBX's write locking is unaware of
* the user threads.
* *
* \note Regardless to `MDBX_NOTLS` flag a write transaction entirely should * При использовании `MDBX_NOSTICKYTHREADS` транзакции становятся не
* always be used in one thread from start to finish. MDBX checks this in a * ассоциированными с создавшими их потоками выполнения. Поэтому в функциях
* reasonable manner and return the \ref MDBX_THREAD_MISMATCH error in rules * API не выполняется проверка соответствия транзакции и текущего потока
* violation. * выполнения. Большинство функций работающих с транзакциями и курсорами
* становится возможным вызывать из любых потоков выполнения. Однако, также
* становится невозможно обнаружить ошибки одновременного использования
* транзакций и/или курсоров в разных потоках.
* *
* This flag affects only at environment opening but can't be changed after. * Использование `MDBX_NOSTICKYTHREADS` также сужает возможности по изменению
* размера БД, так как теряется возможность отслеживать работающие с БД потоки
* выполнения и приостанавливать их на время снятия отображения БД в ОЗУ. В
* частности, по этой причине на Windows уменьшение файла БД не возможно до
* закрытия БД последним работающим с ней процессом или до последующего
* открытия БД в режиме чтения-записи.
*
* \warning Вне зависимости от \ref MDBX_NOSTICKYTHREADS и \ref MDBX_NOTLS не
* допускается одновременно использование объектов API из разных потоков
* выполнения! Обеспечение всех мер для исключения одновременного
* использования объектов API из разных потоков выполнения целиком ложится на
* вас!
*
* \warning Транзакции записи могут быть завершены только в том же потоке
* выполнения где они были запущены. Это ограничение следует из требований
* большинства операционных систем о том, что захваченный примитив
* синхронизации (мьютекс, семафор, критическая секция) должен освобождаться
* только захватившим его потоком выполнения.
*
* \warning Создание курсора в контексте транзакции, привязка курсора к
* транзакции, отвязка курсора от транзакции и закрытие привязанного к
* транзакции курсора, являются операциями использующими как сам курсор так и
* соответствующую транзакцию. Аналогично, завершение или прерывание
* транзакции является операцией использующей как саму транзакцию, так и все
* привязанные к ней курсоры. Во избежание повреждения внутренних структур
* данных, непредсказуемого поведения, разрушение БД и потери данных следует
* не допускать возможности одновременного использования каких-либо курсора
* или транзакций из разных потоков выполнения.
*
* Читающие транзакции при использовании `MDBX_NOSTICKYTHREADS` перестают
* использовать TLS (Thread Local Storage), а слоты блокировок MVCC-снимков в
* таблице читателей привязываются только к транзакциям. Завершение каких-либо
* потоков не приводит к снятию блокировок MVCC-снимков до явного завершения
* транзакций, либо до завершения соответствующего процесса в целом.
*
* Для пишущих транзакций не выполняется проверка соответствия текущего потока
* выполнения и потока создавшего транзакцию. Однако, фиксация или прерывание
* пишущих транзакций должны выполняться строго в потоке запустившим
* транзакцию, так как эти операции связаны с захватом и освобождением
* примитивов синхронизации (мьютексов, критических секций), для которых
* большинство операционных систем требует освобождение только потоком
* захватившим ресурс.
*
* Этот флаг вступает в силу при открытии среды и не может быть изменен после.
*/ */
MDBX_NOTLS = UINT32_C(0x200000), MDBX_NOSTICKYTHREADS = UINT32_C(0x200000),
#ifndef _MSC_VER /* avoid madness MSVC */
/** \deprecated Please use \ref MDBX_NOSTICKYTHREADS instead. */
MDBX_NOTLS MDBX_DEPRECATED = MDBX_NOSTICKYTHREADS,
#endif /* avoid madness MSVC */
/** Don't do readahead. /** Don't do readahead.
* *
@ -2121,11 +2173,12 @@ enum MDBX_option_t {
* track readers in the the environment. The default is about 100 for 4K * track readers in the the environment. The default is about 100 for 4K
* system page size. Starting a read-only transaction normally ties a lock * system page size. Starting a read-only transaction normally ties a lock
* table slot to the current thread until the environment closes or the thread * table slot to the current thread until the environment closes or the thread
* exits. If \ref MDBX_NOTLS is in use, \ref mdbx_txn_begin() instead ties the * exits. If \ref MDBX_NOSTICKYTHREADS is in use, \ref mdbx_txn_begin()
* slot to the \ref MDBX_txn object until it or the \ref MDBX_env object is * instead ties the slot to the \ref MDBX_txn object until it or the \ref
* destroyed. This option may only set after \ref mdbx_env_create() and before * MDBX_env object is destroyed. This option may only set after \ref
* \ref mdbx_env_open(), and has an effect only when the database is opened by * mdbx_env_create() and before \ref mdbx_env_open(), and has an effect only
* the first process interacts with the database. * when the database is opened by the first process interacts with the
* database.
* *
* \see mdbx_env_set_maxreaders() \see mdbx_env_get_maxreaders() */ * \see mdbx_env_set_maxreaders() \see mdbx_env_get_maxreaders() */
MDBX_opt_max_readers, MDBX_opt_max_readers,
@ -2389,7 +2442,7 @@ LIBMDBX_API int mdbx_env_get_option(const MDBX_env *env,
* *
* Flags set by mdbx_env_set_flags() are also used: * Flags set by mdbx_env_set_flags() are also used:
* - \ref MDBX_ENV_DEFAULTS, \ref MDBX_NOSUBDIR, \ref MDBX_RDONLY, * - \ref MDBX_ENV_DEFAULTS, \ref MDBX_NOSUBDIR, \ref MDBX_RDONLY,
* \ref MDBX_EXCLUSIVE, \ref MDBX_WRITEMAP, \ref MDBX_NOTLS, * \ref MDBX_EXCLUSIVE, \ref MDBX_WRITEMAP, \ref MDBX_NOSTICKYTHREADS,
* \ref MDBX_NORDAHEAD, \ref MDBX_NOMEMINIT, \ref MDBX_COALESCE, * \ref MDBX_NORDAHEAD, \ref MDBX_NOMEMINIT, \ref MDBX_COALESCE,
* \ref MDBX_LIFORECLAIM. See \ref env_flags section. * \ref MDBX_LIFORECLAIM. See \ref env_flags section.
* *
@ -3385,7 +3438,7 @@ LIBMDBX_API int mdbx_env_get_fd(const MDBX_env *env, mdbx_filehandle_t *fd);
* 2) Temporary close memory mapped is required to change * 2) Temporary close memory mapped is required to change
* geometry, but there read transaction(s) is running * geometry, but there read transaction(s) is running
* and no corresponding thread(s) could be suspended * and no corresponding thread(s) could be suspended
* since the \ref MDBX_NOTLS mode is used. * since the \ref MDBX_NOSTICKYTHREADS mode is used.
* \retval MDBX_EACCESS The environment opened in read-only. * \retval MDBX_EACCESS The environment opened in read-only.
* \retval MDBX_MAP_FULL Specified size smaller than the space already * \retval MDBX_MAP_FULL Specified size smaller than the space already
* consumed by the environment. * consumed by the environment.
@ -3504,11 +3557,11 @@ mdbx_limits_txnsize_max(intptr_t pagesize);
* track readers in the the environment. The default is about 100 for 4K system * track readers in the the environment. The default is about 100 for 4K system
* page size. Starting a read-only transaction normally ties a lock table slot * page size. Starting a read-only transaction normally ties a lock table slot
* to the current thread until the environment closes or the thread exits. If * to the current thread until the environment closes or the thread exits. If
* \ref MDBX_NOTLS is in use, \ref mdbx_txn_begin() instead ties the slot to the * \ref MDBX_NOSTICKYTHREADS is in use, \ref mdbx_txn_begin() instead ties the
* \ref MDBX_txn object until it or the \ref MDBX_env object is destroyed. * slot to the \ref MDBX_txn object until it or the \ref MDBX_env object is
* This function may only be called after \ref mdbx_env_create() and before * destroyed. This function may only be called after \ref mdbx_env_create() and
* \ref mdbx_env_open(), and has an effect only when the database is opened by * before \ref mdbx_env_open(), and has an effect only when the database is
* the first process interacts with the database. * opened by the first process interacts with the database.
* \see mdbx_env_get_maxreaders() * \see mdbx_env_get_maxreaders()
* *
* \param [in] env An environment handle returned * \param [in] env An environment handle returned
@ -3702,8 +3755,8 @@ mdbx_env_get_userctx(const MDBX_env *env);
* \see mdbx_txn_begin() * \see mdbx_txn_begin()
* *
* \note A transaction and its cursors must only be used by a single thread, * \note A transaction and its cursors must only be used by a single thread,
* and a thread may only have a single transaction at a time. If \ref MDBX_NOTLS * and a thread may only have a single transaction at a time unless
* is in use, this does not apply to read-only transactions. * the \ref MDBX_NOSTICKYTHREADS is used.
* *
* \note Cursors may not span transactions. * \note Cursors may not span transactions.
* *
@ -3764,8 +3817,8 @@ LIBMDBX_API int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent,
* \see mdbx_txn_begin_ex() * \see mdbx_txn_begin_ex()
* *
* \note A transaction and its cursors must only be used by a single thread, * \note A transaction and its cursors must only be used by a single thread,
* and a thread may only have a single transaction at a time. If \ref MDBX_NOTLS * and a thread may only have a single transaction at a time unless
* is in use, this does not apply to read-only transactions. * the \ref MDBX_NOSTICKYTHREADS is used.
* *
* \note Cursors may not span transactions. * \note Cursors may not span transactions.
* *
@ -4140,10 +4193,11 @@ LIBMDBX_API int mdbx_txn_break(MDBX_txn *txn);
* Abort the read-only transaction like \ref mdbx_txn_abort(), but keep the * Abort the read-only transaction like \ref mdbx_txn_abort(), but keep the
* transaction handle. Therefore \ref mdbx_txn_renew() may reuse the handle. * transaction handle. Therefore \ref mdbx_txn_renew() may reuse the handle.
* This saves allocation overhead if the process will start a new read-only * This saves allocation overhead if the process will start a new read-only
* transaction soon, and also locking overhead if \ref MDBX_NOTLS is in use. The * transaction soon, and also locking overhead if \ref MDBX_NOSTICKYTHREADS is
* reader table lock is released, but the table slot stays tied to its thread * in use. The reader table lock is released, but the table slot stays tied to
* or \ref MDBX_txn. Use \ref mdbx_txn_abort() to discard a reset handle, and to * its thread or \ref MDBX_txn. Use \ref mdbx_txn_abort() to discard a reset
* free its lock table slot if \ref MDBX_NOTLS is in use. * handle, and to free its lock table slot if \ref MDBX_NOSTICKYTHREADS is in
* use.
* *
* Cursors opened within the transaction must not be used again after this * Cursors opened within the transaction must not be used again after this
* call, except with \ref mdbx_cursor_renew() and \ref mdbx_cursor_close(). * call, except with \ref mdbx_cursor_renew() and \ref mdbx_cursor_close().

View File

@ -3679,8 +3679,8 @@ public:
/// \brief Operate options. /// \brief Operate options.
struct LIBMDBX_API_TYPE operate_options { struct LIBMDBX_API_TYPE operate_options {
/// \copydoc MDBX_NOTLS /// \copydoc MDBX_NOSTICKYTHREADS
bool orphan_read_transactions{false}; bool no_sticky_threads{false};
/// \brief Разрешает вложенные транзакции ценой отключения /// \brief Разрешает вложенные транзакции ценой отключения
/// \ref MDBX_WRITEMAP и увеличением накладных расходов. /// \ref MDBX_WRITEMAP и увеличением накладных расходов.
bool nested_write_transactions{false}; bool nested_write_transactions{false};

View File

@ -21,7 +21,7 @@ N | MASK | ENV | TXN | DB | PUT | DBI | NOD
18|0004 0000|NOMETASYNC |TXN_NOMETASYNC|CREATE |APPENDDUP | | | | | 18|0004 0000|NOMETASYNC |TXN_NOMETASYNC|CREATE |APPENDDUP | | | | |
19|0008 0000|WRITEMAP |<= | |MULTIPLE | | | | <= | 19|0008 0000|WRITEMAP |<= | |MULTIPLE | | | | <= |
20|0010 0000|UTTERLY | | | | | | | <= | 20|0010 0000|UTTERLY | | | | | | | <= |
21|0020 0000|NOTLS |<= | | | | | | | 21|0020 0000|NOSTICKYTHR|<= | | | | | | |
22|0040 0000|EXCLUSIVE | | | | | | | | 22|0040 0000|EXCLUSIVE | | | | | | | |
23|0080 0000|NORDAHEAD | | | | | | | | 23|0080 0000|NORDAHEAD | | | | | | | |
24|0100 0000|NOMEMINIT |TXN_PREPARE | | | | | | | 24|0100 0000|NOMEMINIT |TXN_PREPARE | | | | | | |

View File

@ -1580,7 +1580,7 @@ __cold int rthc_register(MDBX_env *const env) {
rthc_limit *= 2; rthc_limit *= 2;
} }
if ((env->me_flags & MDBX_NOTLS) == 0) { if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0) {
rc = thread_key_create(&env->me_txkey); rc = thread_key_create(&env->me_txkey);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto bailout; goto bailout;
@ -3275,7 +3275,7 @@ enum {
#define TXN_END_UPDATE 0x10 /* update env state (DBIs) */ #define TXN_END_UPDATE 0x10 /* update env state (DBIs) */
#define TXN_END_FREE 0x20 /* free txn unless it is MDBX_env.me_txn0 */ #define TXN_END_FREE 0x20 /* free txn unless it is MDBX_env.me_txn0 */
#define TXN_END_EOTDONE 0x40 /* txn's cursors already closed */ #define TXN_END_EOTDONE 0x40 /* txn's cursors already closed */
#define TXN_END_SLOT 0x80 /* release any reader slot if MDBX_NOTLS */ #define TXN_END_SLOT 0x80 /* release any reader slot if NOSTICKYTHREADS */
static int txn_end(MDBX_txn *txn, const unsigned mode); static int txn_end(MDBX_txn *txn, const unsigned mode);
static __always_inline pgr_t page_get_inline(const uint16_t ILL, static __always_inline pgr_t page_get_inline(const uint16_t ILL,
@ -6562,60 +6562,63 @@ __cold static int dxb_resize(MDBX_env *const env, const pgno_t used_pgno,
size_bytes == env->me_dxb_mmap.filesize) size_bytes == env->me_dxb_mmap.filesize)
goto bailout; goto bailout;
/* При использовании MDBX_NOSTICKYTHREADS с транзакциями могут работать любые
* потоки и у нас нет информации о том, какие именно. Поэтому нет возможности
* выполнить remap-действия требующие приостановки работающих с БД потоков. */
if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0) {
#if defined(_WIN32) || defined(_WIN64) #if defined(_WIN32) || defined(_WIN64)
if ((env->me_flags & MDBX_NOTLS) == 0 && if ((size_bytes < env->me_dxb_mmap.current && mode > implicit_grow) ||
((size_bytes < env->me_dxb_mmap.current && mode > implicit_grow) || limit_bytes != env->me_dxb_mmap.limit) {
limit_bytes != env->me_dxb_mmap.limit)) { /* 1) Windows allows only extending a read-write section, but not a
/* 1) Windows allows only extending a read-write section, but not a * corresponding mapped view. Therefore in other cases we must suspend
* corresponding mapped view. Therefore in other cases we must suspend * the local threads for safe remap.
* the local threads for safe remap. * 2) At least on Windows 10 1803 the entire mapped section is unavailable
* 2) At least on Windows 10 1803 the entire mapped section is unavailable * for short time during NtExtendSection() or VirtualAlloc() execution.
* for short time during NtExtendSection() or VirtualAlloc() execution. * 3) Under Wine runtime environment on Linux a section extending is not
* 3) Under Wine runtime environment on Linux a section extending is not * supported.
* supported. *
* * THEREFORE LOCAL THREADS SUSPENDING IS ALWAYS REQUIRED! */
* THEREFORE LOCAL THREADS SUSPENDING IS ALWAYS REQUIRED! */ array_onstack.limit = ARRAY_LENGTH(array_onstack.handles);
array_onstack.limit = ARRAY_LENGTH(array_onstack.handles); array_onstack.count = 0;
array_onstack.count = 0; suspended = &array_onstack;
suspended = &array_onstack; rc = osal_suspend_threads_before_remap(env, &suspended);
rc = osal_suspend_threads_before_remap(env, &suspended); if (rc != MDBX_SUCCESS) {
if (rc != MDBX_SUCCESS) { ERROR("failed suspend-for-remap: errcode %d", rc);
ERROR("failed suspend-for-remap: errcode %d", rc);
goto bailout;
}
mresize_flags |= (mode < explicit_resize)
? MDBX_MRESIZE_MAY_UNMAP
: MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE;
}
#else /* Windows */
MDBX_lockinfo *const lck = env->me_lck_mmap.lck;
if (mode == explicit_resize && limit_bytes != env->me_dxb_mmap.limit &&
!(env->me_flags & MDBX_NOTLS)) {
mresize_flags |= MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE;
if (lck) {
int err = osal_rdt_lock(env) /* lock readers table until remap done */;
if (unlikely(MDBX_IS_ERROR(err))) {
rc = err;
goto bailout; goto bailout;
} }
mresize_flags |= (mode < explicit_resize)
? MDBX_MRESIZE_MAY_UNMAP
: MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE;
}
#else /* Windows */
MDBX_lockinfo *const lck = env->me_lck_mmap.lck;
if (mode == explicit_resize && limit_bytes != env->me_dxb_mmap.limit) {
mresize_flags |= MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE;
if (lck) {
int err = osal_rdt_lock(env) /* lock readers table until remap done */;
if (unlikely(MDBX_IS_ERROR(err))) {
rc = err;
goto bailout;
}
/* looking for readers from this process */ /* looking for readers from this process */
const size_t snap_nreaders = const size_t snap_nreaders =
atomic_load32(&lck->mti_numreaders, mo_AcquireRelease); atomic_load32(&lck->mti_numreaders, mo_AcquireRelease);
eASSERT(env, mode == explicit_resize); eASSERT(env, mode == explicit_resize);
for (size_t i = 0; i < snap_nreaders; ++i) { for (size_t i = 0; i < snap_nreaders; ++i) {
if (lck->mti_readers[i].mr_pid.weak == env->me_pid && if (lck->mti_readers[i].mr_pid.weak == env->me_pid &&
lck->mti_readers[i].mr_tid.weak != osal_thread_self()) { lck->mti_readers[i].mr_tid.weak != osal_thread_self()) {
/* the base address of the mapping can't be changed since /* the base address of the mapping can't be changed since
* the other reader thread from this process exists. */ * the other reader thread from this process exists. */
osal_rdt_unlock(env); osal_rdt_unlock(env);
mresize_flags &= ~(MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE); mresize_flags &= ~(MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE);
break; break;
}
} }
} }
} }
}
#endif /* ! Windows */ #endif /* ! Windows */
}
const pgno_t aligned_munlock_pgno = const pgno_t aligned_munlock_pgno =
(mresize_flags & (MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE)) (mresize_flags & (MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE))
@ -8616,26 +8619,30 @@ static int meta_sync(const MDBX_env *env, const meta_ptr_t head) {
return rc; return rc;
} }
static __inline bool env_txn0_owned(const MDBX_env *env) {
return (env->me_flags & MDBX_NOSTICKYTHREADS)
? (env->me_txn0->mt_owner != 0)
: (env->me_txn0->mt_owner == osal_thread_self());
}
__cold static int env_sync(MDBX_env *env, bool force, bool nonblock) { __cold static int env_sync(MDBX_env *env, bool force, bool nonblock) {
bool locked = false; if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS;
const bool txn0_owned = env_txn0_owned(env);
bool should_unlock = false;
int rc = MDBX_RESULT_TRUE /* means "nothing to sync" */; int rc = MDBX_RESULT_TRUE /* means "nothing to sync" */;
retry:; retry:;
unsigned flags = env->me_flags & ~(MDBX_NOMETASYNC | MDBX_SHRINK_ALLOWED); unsigned flags = env->me_flags & ~(MDBX_NOMETASYNC | MDBX_SHRINK_ALLOWED);
if (unlikely((flags & (MDBX_RDONLY | MDBX_FATAL_ERROR | MDBX_ENV_ACTIVE)) != if (unlikely((flags & (MDBX_FATAL_ERROR | MDBX_ENV_ACTIVE)) !=
MDBX_ENV_ACTIVE)) { MDBX_ENV_ACTIVE)) {
rc = MDBX_EACCESS; rc = (flags & MDBX_FATAL_ERROR) ? MDBX_PANIC : MDBX_EPERM;
if (!(flags & MDBX_ENV_ACTIVE))
rc = MDBX_EPERM;
if (flags & MDBX_FATAL_ERROR)
rc = MDBX_PANIC;
goto bailout; goto bailout;
} }
const bool inside_txn =
(!locked && env->me_txn0->mt_owner == osal_thread_self());
const meta_troika_t troika = const meta_troika_t troika =
(inside_txn | locked) ? env->me_txn0->tw.troika : meta_tap(env); (txn0_owned | should_unlock) ? env->me_txn0->tw.troika : meta_tap(env);
const meta_ptr_t head = meta_recent(env, &troika); const meta_ptr_t head = meta_recent(env, &troika);
const uint64_t unsynced_pages = const uint64_t unsynced_pages =
atomic_load64(&env->me_lck->mti_unsynced_pages, mo_Relaxed); atomic_load64(&env->me_lck->mti_unsynced_pages, mo_Relaxed);
@ -8646,7 +8653,7 @@ retry:;
goto bailout; goto bailout;
} }
if (locked && (env->me_flags & MDBX_WRITEMAP) && if (should_unlock && (env->me_flags & MDBX_WRITEMAP) &&
unlikely(head.ptr_c->mm_geo.next > unlikely(head.ptr_c->mm_geo.next >
bytes2pgno(env, env->me_dxb_mmap.current))) { bytes2pgno(env, env->me_dxb_mmap.current))) {
@ -8676,8 +8683,8 @@ retry:;
osal_monotime() - eoos_timestamp >= autosync_period)) osal_monotime() - eoos_timestamp >= autosync_period))
flags &= MDBX_WRITEMAP /* clear flags for full steady sync */; flags &= MDBX_WRITEMAP /* clear flags for full steady sync */;
if (!inside_txn) { if (!txn0_owned) {
if (!locked) { if (!should_unlock) {
#if MDBX_ENABLE_PGOP_STAT #if MDBX_ENABLE_PGOP_STAT
unsigned wops = 0; unsigned wops = 0;
#endif /* MDBX_ENABLE_PGOP_STAT */ #endif /* MDBX_ENABLE_PGOP_STAT */
@ -8723,7 +8730,7 @@ retry:;
if (unlikely(err != MDBX_SUCCESS)) if (unlikely(err != MDBX_SUCCESS))
return err; return err;
locked = true; should_unlock = true;
#if MDBX_ENABLE_PGOP_STAT #if MDBX_ENABLE_PGOP_STAT
env->me_lck->mti_pgop_stat.wops.weak += wops; env->me_lck->mti_pgop_stat.wops.weak += wops;
#endif /* MDBX_ENABLE_PGOP_STAT */ #endif /* MDBX_ENABLE_PGOP_STAT */
@ -8737,8 +8744,8 @@ retry:;
flags |= MDBX_SHRINK_ALLOWED; flags |= MDBX_SHRINK_ALLOWED;
} }
eASSERT(env, inside_txn || locked); eASSERT(env, txn0_owned || should_unlock);
eASSERT(env, !inside_txn || (flags & MDBX_SHRINK_ALLOWED) == 0); eASSERT(env, !txn0_owned || (flags & MDBX_SHRINK_ALLOWED) == 0);
if (!head.is_steady && unlikely(env->me_stuck_meta >= 0) && if (!head.is_steady && unlikely(env->me_stuck_meta >= 0) &&
troika.recent != (uint8_t)env->me_stuck_meta) { troika.recent != (uint8_t)env->me_stuck_meta) {
@ -8765,7 +8772,7 @@ retry:;
rc = meta_sync(env, head); rc = meta_sync(env, head);
bailout: bailout:
if (locked) if (should_unlock)
osal_txn_unlock(env); osal_txn_unlock(env);
return rc; return rc;
} }
@ -8854,7 +8861,7 @@ static void txn_valgrind(MDBX_env *env, MDBX_txn *txn) {
if (env->me_pid != osal_getpid()) { if (env->me_pid != osal_getpid()) {
/* resurrect after fork */ /* resurrect after fork */
return; return;
} else if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self()) { } else if (env->me_txn && env_txn0_owned(env)) {
/* inside write-txn */ /* inside write-txn */
last = meta_recent(env, &env->me_txn0->tw.troika).ptr_v->mm_geo.next; last = meta_recent(env, &env->me_txn0->tw.troika).ptr_v->mm_geo.next;
} else if (env->me_flags & MDBX_RDONLY) { } else if (env->me_flags & MDBX_RDONLY) {
@ -8950,7 +8957,7 @@ static bind_rslot_result bind_rslot(MDBX_env *env, const uintptr_t tid) {
safe64_reset(&result.rslot->mr_txnid, true); safe64_reset(&result.rslot->mr_txnid, true);
if (slot == nreaders) if (slot == nreaders)
env->me_lck->mti_numreaders.weak = (uint32_t)++nreaders; env->me_lck->mti_numreaders.weak = (uint32_t)++nreaders;
result.rslot->mr_tid.weak = (env->me_flags & MDBX_NOTLS) ? 0 : tid; result.rslot->mr_tid.weak = (env->me_flags & MDBX_NOSTICKYTHREADS) ? 0 : tid;
atomic_store32(&result.rslot->mr_pid, env->me_pid, mo_AcquireRelease); atomic_store32(&result.rslot->mr_pid, env->me_pid, mo_AcquireRelease);
osal_rdt_unlock(env); osal_rdt_unlock(env);
@ -8970,12 +8977,12 @@ __cold int mdbx_thread_register(const MDBX_env *env) {
return (env->me_flags & MDBX_EXCLUSIVE) ? MDBX_EINVAL : MDBX_EPERM; return (env->me_flags & MDBX_EXCLUSIVE) ? MDBX_EINVAL : MDBX_EPERM;
if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) { if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS)); eASSERT(env, env->me_flags & MDBX_NOSTICKYTHREADS);
return MDBX_EINVAL /* MDBX_NOTLS mode */; return MDBX_EINVAL /* MDBX_NOSTICKYTHREADS mode */;
} }
eASSERT(env, (env->me_flags & (MDBX_NOTLS | MDBX_ENV_TXKEY | eASSERT(env, (env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_ENV_TXKEY)) ==
MDBX_EXCLUSIVE)) == MDBX_ENV_TXKEY); MDBX_ENV_TXKEY);
MDBX_reader *r = thread_rthc_get(env->me_txkey); MDBX_reader *r = thread_rthc_get(env->me_txkey);
if (unlikely(r != NULL)) { if (unlikely(r != NULL)) {
eASSERT(env, r->mr_pid.weak == env->me_pid); eASSERT(env, r->mr_pid.weak == env->me_pid);
@ -8986,7 +8993,7 @@ __cold int mdbx_thread_register(const MDBX_env *env) {
} }
const uintptr_t tid = osal_thread_self(); const uintptr_t tid = osal_thread_self();
if (env->me_txn0 && unlikely(env->me_txn0->mt_owner == tid) && env->me_txn) if (env->me_txn && unlikely(env->me_txn0->mt_owner == tid))
return MDBX_TXN_OVERLAPPING; return MDBX_TXN_OVERLAPPING;
return bind_rslot((MDBX_env *)env, tid).err; return bind_rslot((MDBX_env *)env, tid).err;
} }
@ -9000,12 +9007,12 @@ __cold int mdbx_thread_unregister(const MDBX_env *env) {
return MDBX_RESULT_TRUE; return MDBX_RESULT_TRUE;
if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) { if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS)); eASSERT(env, env->me_flags & MDBX_NOSTICKYTHREADS);
return MDBX_RESULT_TRUE /* MDBX_NOTLS mode */; return MDBX_RESULT_TRUE /* MDBX_NOSTICKYTHREADS mode */;
} }
eASSERT(env, (env->me_flags & (MDBX_NOTLS | MDBX_ENV_TXKEY | eASSERT(env, (env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_ENV_TXKEY)) ==
MDBX_EXCLUSIVE)) == MDBX_ENV_TXKEY); MDBX_ENV_TXKEY);
MDBX_reader *r = thread_rthc_get(env->me_txkey); MDBX_reader *r = thread_rthc_get(env->me_txkey);
if (unlikely(r == NULL)) if (unlikely(r == NULL))
return MDBX_RESULT_TRUE /* not registered */; return MDBX_RESULT_TRUE /* not registered */;
@ -9220,7 +9227,7 @@ static bool check_meta_coherency(const MDBX_env *env,
} }
/* Common code for mdbx_txn_begin() and mdbx_txn_renew(). */ /* Common code for mdbx_txn_begin() and mdbx_txn_renew(). */
static int txn_renew(MDBX_txn *txn, const unsigned flags) { static int txn_renew(MDBX_txn *txn, unsigned flags) {
MDBX_env *env = txn->mt_env; MDBX_env *env = txn->mt_env;
int rc; int rc;
@ -9245,14 +9252,15 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
0); 0);
const uintptr_t tid = osal_thread_self(); const uintptr_t tid = osal_thread_self();
flags |= env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP);
if (flags & MDBX_TXN_RDONLY) { if (flags & MDBX_TXN_RDONLY) {
eASSERT(env, (flags & ~(MDBX_TXN_RO_BEGIN_FLAGS | MDBX_WRITEMAP)) == 0); eASSERT(env, (flags & ~(MDBX_TXN_RO_BEGIN_FLAGS | MDBX_WRITEMAP |
txn->mt_flags = MDBX_NOSTICKYTHREADS)) == 0);
MDBX_TXN_RDONLY | (env->me_flags & (MDBX_NOTLS | MDBX_WRITEMAP)); txn->mt_flags = flags;
MDBX_reader *r = txn->to.reader; MDBX_reader *r = txn->to.reader;
STATIC_ASSERT(sizeof(uintptr_t) <= sizeof(r->mr_tid)); STATIC_ASSERT(sizeof(uintptr_t) <= sizeof(r->mr_tid));
if (likely(env->me_flags & MDBX_ENV_TXKEY)) { if (likely(env->me_flags & MDBX_ENV_TXKEY)) {
eASSERT(env, !(env->me_flags & MDBX_NOTLS)); eASSERT(env, !(env->me_flags & MDBX_NOSTICKYTHREADS));
r = thread_rthc_get(env->me_txkey); r = thread_rthc_get(env->me_txkey);
if (likely(r)) { if (likely(r)) {
if (unlikely(!r->mr_pid.weak) && if (unlikely(!r->mr_pid.weak) &&
@ -9265,7 +9273,8 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
} }
} }
} else { } else {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS)); eASSERT(env,
!env->me_lck_mmap.lck || (env->me_flags & MDBX_NOSTICKYTHREADS));
} }
if (likely(r)) { if (likely(r)) {
@ -9313,9 +9322,9 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
mo_Relaxed); mo_Relaxed);
safe64_write(&r->mr_txnid, head.txnid); safe64_write(&r->mr_txnid, head.txnid);
eASSERT(env, r->mr_pid.weak == osal_getpid()); eASSERT(env, r->mr_pid.weak == osal_getpid());
eASSERT(env, eASSERT(env, r->mr_tid.weak == ((env->me_flags & MDBX_NOSTICKYTHREADS)
r->mr_tid.weak == ? 0
((env->me_flags & MDBX_NOTLS) ? 0 : osal_thread_self())); : osal_thread_self()));
eASSERT(env, r->mr_txnid.weak == head.txnid || eASSERT(env, r->mr_txnid.weak == head.txnid ||
(r->mr_txnid.weak >= SAFE64_INVALID_THRESHOLD && (r->mr_txnid.weak >= SAFE64_INVALID_THRESHOLD &&
head.txnid < env->me_lck->mti_oldest_reader.weak)); head.txnid < env->me_lck->mti_oldest_reader.weak));
@ -9374,12 +9383,12 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
tASSERT(txn, db_check_flags(txn->mt_dbs[MAIN_DBI].md_flags)); tASSERT(txn, db_check_flags(txn->mt_dbs[MAIN_DBI].md_flags));
} else { } else {
eASSERT(env, (flags & ~(MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS | eASSERT(env, (flags & ~(MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS |
MDBX_WRITEMAP)) == 0); MDBX_WRITEMAP | MDBX_NOSTICKYTHREADS)) == 0);
if (unlikely(txn->mt_owner == tid || if (unlikely(txn->mt_owner == tid ||
/* not recovery mode */ env->me_stuck_meta >= 0)) /* not recovery mode */ env->me_stuck_meta >= 0))
return MDBX_BUSY; return MDBX_BUSY;
MDBX_lockinfo *const lck = env->me_lck_mmap.lck; MDBX_lockinfo *const lck = env->me_lck_mmap.lck;
if (lck && (env->me_flags & MDBX_NOTLS) == 0 && if (lck && (env->me_flags & MDBX_NOSTICKYTHREADS) == 0 &&
(mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0) { (mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0) {
const size_t snap_nreaders = const size_t snap_nreaders =
atomic_load32(&lck->mti_numreaders, mo_AcquireRelease); atomic_load32(&lck->mti_numreaders, mo_AcquireRelease);
@ -9639,7 +9648,8 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
* since Wine don't support section extending, * since Wine don't support section extending,
* i.e. in both cases unmap+map are required. */ * i.e. in both cases unmap+map are required. */
used_bytes < env->me_dbgeo.upper && env->me_dbgeo.grow)) && used_bytes < env->me_dbgeo.upper && env->me_dbgeo.grow)) &&
/* avoid recursive use SRW */ (txn->mt_flags & MDBX_NOTLS) == 0) { /* avoid recursive use SRW */ (txn->mt_flags &
MDBX_NOSTICKYTHREADS) == 0) {
txn->mt_flags |= MDBX_SHRINK_ALLOWED; txn->mt_flags |= MDBX_SHRINK_ALLOWED;
osal_srwlock_AcquireShared(&env->me_remap_guard); osal_srwlock_AcquireShared(&env->me_remap_guard);
} }
@ -9673,15 +9683,13 @@ static __always_inline int check_txn(const MDBX_txn *txn, int bad_bits) {
return MDBX_BAD_TXN; return MDBX_BAD_TXN;
tASSERT(txn, (txn->mt_flags & MDBX_TXN_FINISHED) || tASSERT(txn, (txn->mt_flags & MDBX_TXN_FINISHED) ||
(txn->mt_flags & MDBX_NOTLS) == (txn->mt_flags & MDBX_NOSTICKYTHREADS) ==
((txn->mt_flags & MDBX_TXN_RDONLY) (txn->mt_env->me_flags & MDBX_NOSTICKYTHREADS));
? txn->mt_env->me_flags & MDBX_NOTLS
: 0));
#if MDBX_TXN_CHECKOWNER #if MDBX_TXN_CHECKOWNER
STATIC_ASSERT(MDBX_NOTLS > MDBX_TXN_FINISHED + MDBX_TXN_RDONLY); STATIC_ASSERT((long)MDBX_NOSTICKYTHREADS > (long)MDBX_TXN_FINISHED);
if (unlikely(txn->mt_owner != osal_thread_self()) && if ((txn->mt_flags & (MDBX_NOSTICKYTHREADS | MDBX_TXN_FINISHED)) <
(txn->mt_flags & (MDBX_NOTLS | MDBX_TXN_FINISHED | MDBX_TXN_RDONLY)) < MDBX_TXN_FINISHED &&
(MDBX_TXN_FINISHED | MDBX_TXN_RDONLY)) unlikely(txn->mt_owner != osal_thread_self()))
return txn->mt_owner ? MDBX_THREAD_MISMATCH : MDBX_BAD_TXN; return txn->mt_owner ? MDBX_THREAD_MISMATCH : MDBX_BAD_TXN;
#endif /* MDBX_TXN_CHECKOWNER */ #endif /* MDBX_TXN_CHECKOWNER */
@ -9762,7 +9770,6 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
~flags)) /* write txn in RDONLY env */ ~flags)) /* write txn in RDONLY env */
return MDBX_EACCESS; return MDBX_EACCESS;
flags |= env->me_flags & MDBX_WRITEMAP;
MDBX_txn *txn = nullptr; MDBX_txn *txn = nullptr;
if (parent) { if (parent) {
/* Nested transactions: Max 1 child, write txns only, no writemap */ /* Nested transactions: Max 1 child, write txns only, no writemap */
@ -9781,10 +9788,11 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
} }
tASSERT(parent, audit_ex(parent, 0, false) == 0); tASSERT(parent, audit_ex(parent, 0, false) == 0);
flags |= parent->mt_flags & (MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS); flags |= parent->mt_flags & (MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS |
MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP);
} else if (flags & MDBX_TXN_RDONLY) { } else if (flags & MDBX_TXN_RDONLY) {
if (env->me_txn0 && if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0 && env->me_txn &&
unlikely(env->me_txn0->mt_owner == osal_thread_self()) && env->me_txn && unlikely(env->me_txn0->mt_owner == osal_thread_self()) &&
(mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0) (mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0)
return MDBX_TXN_OVERLAPPING; return MDBX_TXN_OVERLAPPING;
} else { } else {
@ -9967,12 +9975,13 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
eASSERT(env, txn->mt_flags == (MDBX_TXN_RDONLY | MDBX_TXN_FINISHED)); eASSERT(env, txn->mt_flags == (MDBX_TXN_RDONLY | MDBX_TXN_FINISHED));
else if (flags & MDBX_TXN_RDONLY) else if (flags & MDBX_TXN_RDONLY)
eASSERT(env, (txn->mt_flags & eASSERT(env, (txn->mt_flags &
~(MDBX_NOTLS | MDBX_TXN_RDONLY | MDBX_WRITEMAP | ~(MDBX_NOSTICKYTHREADS | MDBX_TXN_RDONLY | MDBX_WRITEMAP |
/* Win32: SRWL flag */ MDBX_SHRINK_ALLOWED)) == 0); /* Win32: SRWL flag */ MDBX_SHRINK_ALLOWED)) == 0);
else { else {
eASSERT(env, (txn->mt_flags & eASSERT(env,
~(MDBX_WRITEMAP | MDBX_SHRINK_ALLOWED | MDBX_NOMETASYNC | (txn->mt_flags &
MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0); ~(MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP | MDBX_SHRINK_ALLOWED |
MDBX_NOMETASYNC | MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0);
assert(!txn->tw.spilled.list && !txn->tw.spilled.least_removed); assert(!txn->tw.spilled.list && !txn->tw.spilled.least_removed);
} }
txn->mt_signature = MDBX_MT_SIGNATURE; txn->mt_signature = MDBX_MT_SIGNATURE;
@ -10409,6 +10418,13 @@ int mdbx_txn_abort(MDBX_txn *txn) {
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return rc; return rc;
if ((txn->mt_flags & (MDBX_TXN_RDONLY | MDBX_NOSTICKYTHREADS)) ==
MDBX_NOSTICKYTHREADS &&
unlikely(txn->mt_owner != osal_thread_self())) {
mdbx_txn_break(txn);
return MDBX_THREAD_MISMATCH;
}
return txn_abort(txn); return txn_abort(txn);
} }
@ -12093,6 +12109,12 @@ int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
if (unlikely(txn->mt_flags & MDBX_TXN_RDONLY)) if (unlikely(txn->mt_flags & MDBX_TXN_RDONLY))
goto done; goto done;
if ((txn->mt_flags & MDBX_NOSTICKYTHREADS) &&
unlikely(txn->mt_owner != osal_thread_self())) {
rc = MDBX_THREAD_MISMATCH;
goto fail;
}
if (txn->mt_child) { if (txn->mt_child) {
rc = mdbx_txn_commit_ex(txn->mt_child, NULL); rc = mdbx_txn_commit_ex(txn->mt_child, NULL);
tASSERT(txn, txn->mt_child == NULL); tASSERT(txn, txn->mt_child == NULL);
@ -13757,9 +13779,9 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower,
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return rc; return rc;
const bool need_lock = const bool txn0_owned = env->me_txn0 && env_txn0_owned(env);
!env->me_txn0 || env->me_txn0->mt_owner != osal_thread_self(); const bool inside_txn = txn0_owned && env->me_txn;
const bool inside_txn = !need_lock && env->me_txn; bool should_unlock = false;
#if MDBX_DEBUG #if MDBX_DEBUG
if (growth_step < 0) { if (growth_step < 0) {
@ -13770,13 +13792,12 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower,
#endif /* MDBX_DEBUG */ #endif /* MDBX_DEBUG */
intptr_t reasonable_maxsize = 0; intptr_t reasonable_maxsize = 0;
bool should_unlock = false;
if (env->me_map) { if (env->me_map) {
/* env already mapped */ /* env already mapped */
if (unlikely(env->me_flags & MDBX_RDONLY)) if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS; return MDBX_EACCESS;
if (need_lock) { if (!txn0_owned) {
int err = osal_txn_lock(env, false); int err = osal_txn_lock(env, false);
if (unlikely(err != MDBX_SUCCESS)) if (unlikely(err != MDBX_SUCCESS))
return err; return err;
@ -16024,6 +16045,9 @@ __cold int mdbx_env_close_ex(MDBX_env *env, bool dont_sync) {
#endif /* Windows */ #endif /* Windows */
} }
if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self())
osal_txn_unlock(env);
eASSERT(env, env->me_signature.weak == 0); eASSERT(env, env->me_signature.weak == 0);
rc = env_close(env, false) ? MDBX_PANIC : rc; rc = env_close(env, false) ? MDBX_PANIC : rc;
ENSURE(env, osal_fastmutex_destroy(&env->me_dbi_lock) == MDBX_SUCCESS); ENSURE(env, osal_fastmutex_destroy(&env->me_dbi_lock) == MDBX_SUCCESS);
@ -22997,8 +23021,8 @@ __cold int mdbx_env_set_flags(MDBX_env *env, MDBX_env_flags_t flags,
if (unlikely(env->me_flags & MDBX_RDONLY)) if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS; return MDBX_EACCESS;
const bool lock_needed = (env->me_flags & MDBX_ENV_ACTIVE) && const bool lock_needed =
env->me_txn0->mt_owner != osal_thread_self(); (env->me_flags & MDBX_ENV_ACTIVE) && !env_txn0_owned(env);
bool should_unlock = false; bool should_unlock = false;
if (lock_needed) { if (lock_needed) {
rc = osal_txn_lock(env, false); rc = osal_txn_lock(env, false);
@ -23233,8 +23257,7 @@ __cold int mdbx_env_stat_ex(const MDBX_env *env, const MDBX_txn *txn,
if (unlikely(err != MDBX_SUCCESS)) if (unlikely(err != MDBX_SUCCESS))
return err; return err;
if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self() && if (env->me_txn && env_txn0_owned(env))
env->me_txn)
/* inside write-txn */ /* inside write-txn */
return stat_acc(env->me_txn, dest, bytes); return stat_acc(env->me_txn, dest, bytes);
@ -26209,7 +26232,7 @@ __cold int mdbx_env_set_option(MDBX_env *env, const MDBX_option_t option,
return err; return err;
const bool lock_needed = ((env->me_flags & MDBX_ENV_ACTIVE) && env->me_txn0 && const bool lock_needed = ((env->me_flags & MDBX_ENV_ACTIVE) && env->me_txn0 &&
env->me_txn0->mt_owner != osal_thread_self()); !env_txn0_owned(env));
bool should_unlock = false; bool should_unlock = false;
switch (option) { switch (option) {
case MDBX_opt_sync_bytes: case MDBX_opt_sync_bytes:
@ -26324,7 +26347,7 @@ __cold int mdbx_env_set_option(MDBX_env *env, const MDBX_option_t option,
return MDBX_EACCESS; return MDBX_EACCESS;
value = osal_16dot16_to_monotime((uint32_t)value); value = osal_16dot16_to_monotime((uint32_t)value);
if (value != env->me_options.gc_time_limit) { if (value != env->me_options.gc_time_limit) {
if (env->me_txn && env->me_txn0->mt_owner != osal_thread_self()) if (env->me_txn && lock_needed)
return MDBX_EPERM; return MDBX_EPERM;
env->me_options.gc_time_limit = value; env->me_options.gc_time_limit = value;
if (!env->me_options.flags.non_auto.rp_augment_limit) if (!env->me_options.flags.non_auto.rp_augment_limit)

View File

@ -842,8 +842,9 @@ MDBX_INTERNAL_FUNC int osal_ipclock_destroy(osal_ipclock_t *ipc);
* read transactions started by the same thread need no further locking to * read transactions started by the same thread need no further locking to
* proceed. * proceed.
* *
* If MDBX_NOTLS is set, the slot address is not saved in thread-specific data. * If MDBX_NOSTICKYTHREADS is set, the slot address is not saved in
* No reader table is used if the database is on a read-only filesystem. * thread-specific data. No reader table is used if the database is on a
* read-only filesystem.
* *
* Since the database uses multi-version concurrency control, readers don't * Since the database uses multi-version concurrency control, readers don't
* actually need any locking. This table is used to keep track of which * actually need any locking. This table is used to keep track of which
@ -1786,8 +1787,8 @@ log2n_powerof2(size_t value_uintptr) {
MDBX_NOMEMINIT | MDBX_COALESCE | MDBX_PAGEPERTURB | MDBX_ACCEDE | \ MDBX_NOMEMINIT | MDBX_COALESCE | MDBX_PAGEPERTURB | MDBX_ACCEDE | \
MDBX_VALIDATION) MDBX_VALIDATION)
#define ENV_CHANGELESS_FLAGS \ #define ENV_CHANGELESS_FLAGS \
(MDBX_NOSUBDIR | MDBX_RDONLY | MDBX_WRITEMAP | MDBX_NOTLS | MDBX_NORDAHEAD | \ (MDBX_NOSUBDIR | MDBX_RDONLY | MDBX_WRITEMAP | MDBX_NOSTICKYTHREADS | \
MDBX_LIFORECLAIM | MDBX_EXCLUSIVE) MDBX_NORDAHEAD | MDBX_LIFORECLAIM | MDBX_EXCLUSIVE)
#define ENV_USABLE_FLAGS (ENV_CHANGEABLE_FLAGS | ENV_CHANGELESS_FLAGS) #define ENV_USABLE_FLAGS (ENV_CHANGEABLE_FLAGS | ENV_CHANGELESS_FLAGS)
#if !defined(__cplusplus) || CONSTEXPR_ENUM_FLAGS_OPERATIONS #if !defined(__cplusplus) || CONSTEXPR_ENUM_FLAGS_OPERATIONS

View File

@ -326,7 +326,7 @@ static int suspend_and_append(mdbx_handle_array_t **array,
MDBX_INTERNAL_FUNC int MDBX_INTERNAL_FUNC int
osal_suspend_threads_before_remap(MDBX_env *env, mdbx_handle_array_t **array) { osal_suspend_threads_before_remap(MDBX_env *env, mdbx_handle_array_t **array) {
eASSERT(env, (env->me_flags & MDBX_NOTLS) == 0); eASSERT(env, (env->me_flags & MDBX_NOSTICKYTHREADS) == 0);
const uintptr_t CurrentTid = GetCurrentThreadId(); const uintptr_t CurrentTid = GetCurrentThreadId();
int rc; int rc;
if (env->me_lck_mmap.lck) { if (env->me_lck_mmap.lck) {

View File

@ -1216,8 +1216,8 @@ env::operate_parameters::make_flags(bool accede, bool use_subdirectory) const {
flags |= MDBX_NOSUBDIR; flags |= MDBX_NOSUBDIR;
if (options.exclusive) if (options.exclusive)
flags |= MDBX_EXCLUSIVE; flags |= MDBX_EXCLUSIVE;
if (options.orphan_read_transactions) if (options.no_sticky_threads)
flags |= MDBX_NOTLS; flags |= MDBX_NOSTICKYTHREADS;
if (options.disable_readahead) if (options.disable_readahead)
flags |= MDBX_NORDAHEAD; flags |= MDBX_NORDAHEAD;
if (options.disable_clear_memory) if (options.disable_clear_memory)
@ -1275,9 +1275,10 @@ env::reclaiming_options::reclaiming_options(MDBX_env_flags_t flags) noexcept
coalesce((flags & MDBX_COALESCE) ? true : false) {} coalesce((flags & MDBX_COALESCE) ? true : false) {}
env::operate_options::operate_options(MDBX_env_flags_t flags) noexcept env::operate_options::operate_options(MDBX_env_flags_t flags) noexcept
: orphan_read_transactions( : no_sticky_threads(((flags & (MDBX_NOSTICKYTHREADS | MDBX_EXCLUSIVE)) ==
((flags & (MDBX_NOTLS | MDBX_EXCLUSIVE)) == MDBX_NOTLS) ? true MDBX_NOSTICKYTHREADS)
: false), ? true
: false),
nested_write_transactions((flags & (MDBX_WRITEMAP | MDBX_RDONLY)) ? false nested_write_transactions((flags & (MDBX_WRITEMAP | MDBX_RDONLY)) ? false
: true), : true),
exclusive((flags & MDBX_EXCLUSIVE) ? true : false), exclusive((flags & MDBX_EXCLUSIVE) ? true : false),
@ -1831,8 +1832,8 @@ __cold ::std::ostream &operator<<(::std::ostream &out,
static const char comma[] = ", "; static const char comma[] = ", ";
const char *delimiter = ""; const char *delimiter = "";
out << "{"; out << "{";
if (it.orphan_read_transactions) { if (it.no_sticky_threads) {
out << delimiter << "orphan_read_transactions"; out << delimiter << "no_sticky_threads";
delimiter = comma; delimiter = comma;
} }
if (it.nested_write_transactions) { if (it.nested_write_transactions) {

View File

@ -378,7 +378,8 @@ const struct option_verb mode_bits[] = {
{"nosync-safe", unsigned(MDBX_SAFE_NOSYNC)}, {"nosync-safe", unsigned(MDBX_SAFE_NOSYNC)},
{"nometasync", unsigned(MDBX_NOMETASYNC)}, {"nometasync", unsigned(MDBX_NOMETASYNC)},
{"writemap", unsigned(MDBX_WRITEMAP)}, {"writemap", unsigned(MDBX_WRITEMAP)},
{"notls", unsigned(MDBX_NOTLS)}, {"nostickythreads", unsigned(MDBX_NOSTICKYTHREADS)},
{"no-sticky-threads", unsigned(MDBX_NOSTICKYTHREADS)},
{"nordahead", unsigned(MDBX_NORDAHEAD)}, {"nordahead", unsigned(MDBX_NORDAHEAD)},
{"nomeminit", unsigned(MDBX_NOMEMINIT)}, {"nomeminit", unsigned(MDBX_NOMEMINIT)},
{"lifo", unsigned(MDBX_LIFORECLAIM)}, {"lifo", unsigned(MDBX_LIFORECLAIM)},

View File

@ -385,9 +385,9 @@ else
fi fi
if [ "$EXTRA" != "no" ]; then if [ "$EXTRA" != "no" ]; then
options=(writemap lifo notls perturb nomeminit nordahead) options=(writemap lifo nostickythreads perturb nomeminit nordahead)
else else
options=(writemap lifo notls) options=(writemap lifo nostickythreads)
fi fi
syncmodes=("" ,+nosync-safe ,+nosync-utterly ,+nometasync) syncmodes=("" ,+nosync-safe ,+nosync-utterly ,+nometasync)
function join { local IFS="$1"; shift; echo "$*"; } function join { local IFS="$1"; shift; echo "$*"; }

View File

@ -106,7 +106,7 @@ MDBX_NORETURN void usage(void) {
" writemap == MDBX_WRITEMAP\n" " writemap == MDBX_WRITEMAP\n"
" nosync-utterly == MDBX_UTTERLY_NOSYNC\n" " nosync-utterly == MDBX_UTTERLY_NOSYNC\n"
" perturb == MDBX_PAGEPERTURB\n" " perturb == MDBX_PAGEPERTURB\n"
" notls == MDBX_NOTLS\n" " nostickythreads== MDBX_NOSTICKYTHREADS\n"
" nordahead == MDBX_NORDAHEAD\n" " nordahead == MDBX_NORDAHEAD\n"
" nomeminit == MDBX_NOMEMINIT\n" " nomeminit == MDBX_NOMEMINIT\n"
" --random-writemap[=YES|no] Toggle MDBX_WRITEMAP randomly\n" " --random-writemap[=YES|no] Toggle MDBX_WRITEMAP randomly\n"

View File

@ -351,7 +351,7 @@ else
fi fi
syncmodes=("" ,+nosync-safe ,+nosync-utterly) syncmodes=("" ,+nosync-safe ,+nosync-utterly)
options=(writemap lifo notls perturb) options=(writemap lifo nostickythreads perturb)
function join { local IFS="$1"; shift; echo "$*"; } function join { local IFS="$1"; shift; echo "$*"; }