Compare commits

...

262 Commits

Author SHA1 Message Date
Леонид Юрьев (Leonid Yuriev)
289276f13c mdbx: merge-in ChangeLog from the stable/0.13.x branch. 2025-08-04 00:08:42 +03:00
Леонид Юрьев (Leonid Yuriev)
16f41ffcd1 mdbx: patch update for older versions of buildroot. 2025-08-04 00:07:34 +03:00
Леонид Юрьев (Leonid Yuriev)
584d5c344d mdbx: update ChangeLog. 2025-07-29 15:03:09 +03:00
Леонид Юрьев (Leonid Yuriev)
933565b1b2 mdbx-chk: count and report %-filling histogram of tree(s). 2025-07-29 14:50:47 +03:00
Леонид Юрьев (Leonid Yuriev)
0cc52a3cc3 mdbx-chk: report switching to non-exclusive/accede mode. 2025-07-29 14:50:42 +03:00
Леонид Юрьев (Leonid Yuriev)
457564c498 mdbx-chk: rename internal variables (cosmetics). 2025-07-29 14:50:42 +03:00
Леонид Юрьев (Leonid Yuriev)
3410e28e1f mdbx: fix comment typo. 2025-07-29 14:50:42 +03:00
Леонид Юрьев (Leonid Yuriev)
ecc36a11ec mdbx: report the parent-pgno in an issues during a DB check. 2025-07-28 14:36:47 +03:00
Леонид Юрьев (Leonid Yuriev)
5c6d91f7c8 mdbx: provide the parent-pgno during a tree traversal. 2025-07-28 14:36:47 +03:00
Леонид Юрьев (Leonid Yuriev)
79465dbc7f mdbx: refactor internal walking functions. 2025-07-28 14:36:47 +03:00
Леонид Юрьев (Leonid Yuriev)
d6f397145c mdbx: reorder logging functions. 2025-07-28 14:36:47 +03:00
Леонид Юрьев (Leonid Yuriev)
fb5f7f4f83 mdbx: dbi-related minor changes. 2025-07-27 21:47:12 +03:00
Леонид Юрьев (Leonid Yuriev)
f0e6db59e2
mdbx: update ChangeLog. 2025-07-26 16:12:53 +03:00
Леонид Юрьев (Leonid Yuriev)
2411b88812 mdbx: minor addition to README. 2025-07-26 16:12:05 +03:00
Леонид Юрьев (Leonid Yuriev)
a2547d21af mdbx-tests: random order of a parameterized tests inside the stochastic script. 2025-07-26 14:09:05 +03:00
Леонид Юрьев (Leonid Yuriev)
eef334235e
mdbx: *** using english for commit titles at the request of community ***.
At the request of several non-Russian-speaking developers, it was
decided to return to using English in the commit' brief-headers at least.
2025-07-26 14:06:26 +03:00
Леонид Юрьев (Leonid Yuriev)
0899ea8450 mdbx-tests: актуализация секции cross-qemu в GNUmakefile. 2025-07-25 22:33:18 +03:00
Леонид Юрьев (Leonid Yuriev)
58dd21cf98 mdbx-tests: использование SysV-семафоров при cross-тестах посредством qemu. 2025-07-25 22:33:18 +03:00
Леонид Юрьев (Leonid Yuriev)
0da87b6423 mdbx: ссылки на зеркала в документации. 2025-07-25 14:15:23 +03:00
Леонид Юрьев (Leonid Yuriev)
39326e79bc mdbx: дополнение ChangeLog. 2025-07-20 16:39:58 +03:00
Леонид Юрьев (Leonid Yuriev)
6f49e7dfeb mdbx: переработка ошибок файловых блокировок в API копирования с устранением проблемы на OSX.
На POSIX-платформах внутри API копирования используются файловый
блокировки `fcntl(F_SETLK)` и `flock()`, так как только совместное
использование обеспечивает блокировку на всех платформах и файловых
системах, включая NFS и SMB.

Однако, в зависимости от платформы, версии ядра ОС, типа файловой
системы, а в случае NFS/SMB также от удаленной стороны, используемые
системные файловые блокировки могут не работать или конфликтовать между
собой (в частности на OSX).

Поэтому в этом коммите реализуется более гибкий подход. Если кратко,
то допускается отказ одной из блокировок при успехе другой:

 - При успехе fcntl(F_SETLK) допускается EAGAIN/EWOULDBLOCK и EREMOTEIO от flock(),
   если целевой файл на не-локальной файловой системе, а также на не-Linux платформах,
   где одновременная блокировка может быть не разрешена fcntl(F_SETLK) и flock().

 - При успехе flock() допускается ENOTSUP и REMOTEIO от fcntl(F_SETLK),
   если целевой файл на не-локальной файловой системе.
2025-07-20 16:21:15 +03:00
Леонид Юрьев (Leonid Yuriev)
43f3deee00 mdbx: условное определение MDBX_ENOSYS как ENOTSUP/ENOSYS. 2025-07-20 16:21:15 +03:00
Леонид Юрьев (Leonid Yuriev)
65b9b5ec6d mdbx-tests: дополнение extra/dbi. 2025-07-19 21:18:50 +03:00
Леонид Юрьев (Leonid Yuriev)
0d73718000 mdbx: устранение возможности неверного возврата MDBX_DBS_FULL при открытии DBI-дескрипторов.
В lockfree-пути открытия DBI-дескрипторов, при просмотре уже открытых
таблиц, пропускались элементы отличающиеся не только по имени, но также
и при несовпадении запрашиваемых флагов и актуальных флагов уже открытой
таблицы.

Если при этом уже было достигнуто (ранее заданное) максимальное
количество открытых DBI-дескрипторов, то возвращалась ошибка
`MDBX_DBS_FULL`, в том числе в ситуациях когда результат должен быть
другим.

Спасибо [Артёму Воротникову](https://github.com/vorot93) за сообщение о проблеме!
2025-07-19 21:18:43 +03:00
Леонид Юрьев (Leonid Yuriev)
8ebdc6d79b mdbx: переделка обновления/загрузки флагов MainDB с выделением latch_maindb_locked().
В новом коде:
 - нет необходимости в захвате глобальной txn-блокироки;
 - меньше вероятность коллизий/гонок.
2025-07-19 21:18:43 +03:00
Леонид Юрьев (Leonid Yuriev)
9c6eed615a mdbx-tests: дополнение extra/txn сценарием провоцирующим гонку за атрибуты MainDB-DBI. 2025-07-19 21:18:43 +03:00
Леонид Юрьев (Leonid Yuriev)
f5e3cfd533 mdbx: устранение неожиданной ошибки MDBX_BAD_DBI при гонках внутри процесса.
Запуск читающих и пишущих транзакций взаимно не блокируется. Однако,
внутри одного процесса, DBI-хендлы и атрибуты таблиц используются
совместно всеми транзакциями (в рамках экземпляра среды работы с БД).
Поэтому после изменения атрибутов таблиц, в том числе при первоначальном
чтении актуальных атрибутов MainDB, может возникать состояние гонок при
одновременном старте нескольких транзакций.

Этим коммитом исправляются недочеты в обработке ситуации таких гонок,
из-за чего могла возвращается неожиданная (с точки зрения пользователя)
ошибка `MDBX_BAD_DBI`.

Формально ошибка присутствовала начиная с коммита `e6af7d7c53428ca2892bcbf7eec1c2acee06fd44` от 2023-11-05.
Однако, до этого (исторически, как было унаследовано от LMDB)
отсутствовал какой-либо контроль смены атрибутов MainDB во время старта
и/или работы транзакций. Поэтому вместо возврата каких-либо ошибок
подобные состояние гонок и/или связанные с изменением атрибутов MainDB
оставались необработанными/незамеченными, либо проявлялись как редкие
неуловимые сбои пользовательских приложений.

Спасибо [Артёму Воротникову](https://github.com/vorot93) за сообщение о проблеме!
2025-07-19 21:18:41 +03:00
Леонид Юрьев (Leonid Yuriev)
e9d47291b0 mdbx: косметический рефакторинг txn_renew(). 2025-07-15 19:01:17 +03:00
Леонид Юрьев (Leonid Yuriev)
0c70b548e8 mdbx-make: добавление цели check-posix-locking для CI-тестов всех вариантов POSIX-блокировок. 2025-07-14 21:06:39 +03:00
Леонид Юрьев (Leonid Yuriev)
b2a7942f8d mdbx-cmake: переформатирование свойств тестов для удобочитаемости. 2025-07-14 17:04:19 +03:00
Леонид Юрьев (Leonid Yuriev)
2b115069c1 mdbx: исправление resurrect-after-fork при использовании SysV-семафоров (MDBX_LOCKING=5).
Ошибка/недоработка была с первой реализации resurrect-after-fork в
ноябре 2023, но оставалась не замеченной из-за отсутствия
CI-тестирования на платформе OSX/Mac (где нет поддержки разделяемых
мьютексов).
2025-07-14 17:03:38 +03:00
Леонид Юрьев (Leonid Yuriev)
90a4e1847d mdbx: удаление лишней/отладочной assert-проверки внутри cursor_put(). 2025-07-14 00:44:35 +03:00
Леонид Юрьев (Leonid Yuriev)
2a41db6b67 mdbx-windows: исправление assert-проверок внутри txn_lock()/txn_unlock(). 2025-07-14 00:41:48 +03:00
Leonid Yuriev
8fba09ceb6 mdbx: исправление опечатки в отладочном логировании. 2025-07-14 00:40:40 +03:00
Леонид Юрьев (Leonid Yuriev)
62f4986731 mdbx-make: изменение кол-ва внутренних итераций прогона тестов для уменьшения затрат на CI. 2025-07-12 20:53:04 +03:00
Леонид Юрьев (Leonid Yuriev)
1b1bec2b30 mdbx: отключение MSVC предупреждений C5286 и C5287. 2025-07-11 10:44:17 +03:00
Leonid Yuriev
c7b119af68 mdbx: исправление сравнения знакового и беззнакового значений. 2025-07-11 10:36:32 +03:00
Леонид Юрьев (Leonid Yuriev)
57ffdf6cd9 mdbx-tests: дополнение cmake-тестов сценариями запуска mdbx_copy. 2025-07-11 10:36:09 +03:00
Леонид Юрьев (Leonid Yuriev)
e28f484947 mdbx: удаление лишних/вредных assert-проверок внутри txn_lock()/txn_unlock(). 2025-07-11 10:36:09 +03:00
Леонид Юрьев (Leonid Yuriev)
53c14bc92c mdbx-tools: добавление опции -f в утилиту mdbx_copy. 2025-07-11 10:36:09 +03:00
Леонид Юрьев (Leonid Yuriev)
4bb69a1c8f mdbx-tests: простейшее тестирование MDBX_CP_OVERWRITE. 2025-07-11 10:36:09 +03:00
Леонид Юрьев (Leonid Yuriev)
ba6ce86d5f mdbx: добавление опции MDBX_CP_OVERWRITE в API копирования БД. 2025-07-11 10:36:09 +03:00
Леонид Юрьев (Leonid Yuriev)
fa73f44ff0 mdbx: дополнение ChangeLog. 2025-06-28 10:48:11 +03:00
Леонид Юрьев (Leonid Yuriev)
79b33ba8fd mdbx: устранение предупреждения lcc-1.29 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
a600c2a7a2 mdbx: устранение вероятности ошибки MDBX_ENODATA при поиске в GC. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
10bf63eb9a mdbx: замена части PNL-макросов функциями. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
9020ea9d6c mdbx: допущение нехватки страниц при ранней/не-отложенной очистке GC (продолжение). 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
31d5ab62bf mdbx-tests: остановка/отстрел дочерних процессов с ожиданием.
Основная цель = обеспечить формирование логов без обрезания хвостов, в том числе при использовании дублирования (tee) и компрессии (lz4/gzip).
2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
ecf5acfff0 mdbx: изоляция txl-списков от кода PNL. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
bb37c93dd5 mdbx: ранняя/не-отложенная очистка GC (начало). 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
40af0565c5 mdbx-tests: явное NUMA-распределение в battery/tmux-тесте. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
0359eca01c mdbx-tests: поддержка опции --numa # для привязки стохастического теста к NUMA-узлу. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
d461ba86c7 mdbx++: несущественная корректировка конструкторов mdbx::buffer::silo. 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
b49cd49c68 mdbx: интенсификация слияния страниц изменением порога по-умолчанию с 25% до 33%.
Ранее две страницы объединялись если были заполнены на 25% или менее, с образованием страницы заполненной на 50% или менее.

Теперь порог по-умолчанию установлен в 33%, что при объединении будет порождать страницу заполненную на 66%.
2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
75131c082b mdbx: добавление rkl_destructive_merge() и унификация порядка dst/src аргументов rkl_merge(). 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
3462bc116a mdbx: удаление known_continuous аргумента rkl_push(). 2025-06-28 00:34:04 +03:00
Леонид Юрьев (Leonid Yuriev)
b3329fddf2
mdbx: исправление опечатки MDBX_ENOMEM.
Вместо `MDBX_ENOMEM` был использован идентификатор `ENOMEM`,
что могло ломать сборку на не-POSIX/Windows платформах,
в зависимости от конфигурации и/или версии SDK.
2025-06-01 11:26:17 +03:00
Леонид Юрьев (Leonid Yuriev)
8b4ec09d08 mdbx: исправление имён параметров в прототипе rkl_destructive_move(). 2025-05-29 22:56:34 +03:00
Леонид Юрьев (Leonid Yuriev)
ecbffc65f4
mdbx: дополнение ChangeLog. 2025-05-21 14:13:36 +03:00
Леонид Юрьев (Leonid Yuriev)
e03b8e1227 mdbx: добавление ignore_enosys_and_einval() и её использование для отказа от OFD-блокировок. 2025-05-20 13:37:54 +03:00
Леонид Юрьев (Leonid Yuriev)
c88c51d33c mdbx: разделение ignore_enosys() и ignore_enosys_and_eagain(). 2025-05-20 13:37:54 +03:00
Леонид Юрьев (Leonid Yuriev)
ef82fea032 mdbx: избавление от memset() внутри lck_op(). 2025-05-20 13:37:54 +03:00
Леонид Юрьев (Leonid Yuriev)
f82cf6a4b3 mdbx: перемещение и доработка проверки _FILE_OFFSET_BITS для Android. 2025-05-20 13:37:54 +03:00
Леонид Юрьев (Leonid Yuriev)
60c0483987 mdbx-tests: устранение бестолковых предупреждений MSVC. 2025-05-16 00:50:55 +03:00
Леонид Юрьев (Leonid Yuriev)
9da03deac0 mdbx: исправление пропущенного const (косметика). 2025-05-16 00:08:51 +03:00
Леонид Юрьев (Leonid Yuriev)
34f0f682da mdbx: исправление assert-проверки внутри txn_end().
В случае ошибки запуска транзакции (например из-за невозможности расширения отображения при увеличении БД в другом процессе),
сигнатура транзакции отсутствует, что вызывало срабатывание assert-проверки.
2025-05-15 23:29:18 +03:00
Леонид Юрьев (Leonid Yuriev)
9fb0919468 mdbx: опечатка в ChangLog. 2025-05-06 15:41:31 +03:00
Леонид Юрьев (Leonid Yuriev)
a13147d115
mdbx: выпуск 0.14.1 "Горналь".
Первый выпуск в новом кусте/линейке версий с добавлением функционала, расширением API и внутренними переработками.
За перечнем доработок и изменений обращайтесь к [ChangeLog](https://libmdbx.dqdkfa.ru/md__change_log.html).

git diff' stat: 166 files changed, 9467 insertions(+), 5597 deletions(-).
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
2025-05-06 14:15:36 +03:00
Леонид Юрьев (Leonid Yuriev)
800c96f22f
mdbx: доработка перераспределения резерва при возврате страниц в GC.
В экстремальных случаях, масштабирование одного операнда и 32-бит не хватает для предотвращения потери значимости расчетного коэффициента.
Поэтому здесь реализован переход на фиксированную точку 32-точка-32 с одним 64-битным делением и двумя полными умножениями 32*32->64.

Для 32-битных систем можно сделать чуть легче, заменив 64-битую арифметику масштабированием (адаптивным сдвигом) обоих операндов, но пока не вижу в этом смысла.
2025-05-02 17:59:30 +03:00
Леонид Юрьев (Leonid Yuriev)
d1023dc6b5
mdbx: merge branch devel. 2025-04-29 12:03:50 +03:00
Леонид Юрьев (Leonid Yuriev)
859c350df0
mdbx: дополнение ChangeLog. 2025-04-29 08:39:35 +03:00
Леонид Юрьев (Leonid Yuriev)
76e2544cc0 mdbx: доработки gc_handle_dense() для экстремально-редких случаев. 2025-04-29 08:39:18 +03:00
Леонид Юрьев (Leonid Yuriev)
0a96b2ad97 mdbx-doc: дополнение раздела "Containers" в README. 2025-04-28 14:43:01 +03:00
Леонид Юрьев (Leonid Yuriev)
402a8e62be mdbx: merge branch master into devel. 2025-04-26 00:17:57 +03:00
Леонид Юрьев (Leonid Yuriev)
06300de34e mdbx: подсказки для coverity. 2025-04-26 00:15:52 +03:00
Леонид Юрьев (Leonid Yuriev)
da9f78d2f6 mdbx: несущественные доработки rkl. 2025-04-26 00:15:52 +03:00
Леонид Юрьев (Leonid Yuriev)
a5af0c1a85 mdbx: исправление глупой утечки памяти в rkl_destroy(). 2025-04-26 00:15:52 +03:00
Леонид Юрьев (Leonid Yuriev)
2b36fd5974
mdbx: новый код обновления GC. 2025-04-26 00:15:41 +03:00
Леонид Юрьев (Leonid Yuriev)
3338551860 mdbx: рефакторинг макроса TXN_FOREACH_DBI_FROM с выделением функции dbi_foreach_step(). 2025-04-24 23:26:22 +03:00
Леонид Юрьев (Leonid Yuriev)
1c7a5e18fe mdbx: подсказки для coverity. 2025-04-24 15:39:07 +03:00
Леонид Юрьев (Leonid Yuriev)
6627d14edf mdbx: упрощение старта транзакций и исправление возможности double-free при ошибке создания вложенной транзакции. 2025-04-24 11:11:31 +03:00
Леонид Юрьев (Leonid Yuriev)
7db9c40fe0 mdbx-tests: установка max-dbi для extra/cursor-closing. 2025-04-23 23:01:27 +03:00
Леонид Юрьев (Leonid Yuriev)
52c9ef8807 mdbx: merge branch stable into master. 2025-04-22 15:56:02 +03:00
Леонид Юрьев (Leonid Yuriev)
011c3072da mdbx-tests: поддержка десятичных суффиксов для batch-параметров. 2025-04-21 21:38:17 +03:00
Леонид Юрьев (Leonid Yuriev)
02b56e185f mdbx: добавление rkl_find() и rkl_merge(). 2025-04-21 21:38:17 +03:00
Леонид Юрьев (Leonid Yuriev)
576fc94fef mdbx: исправление опечатки в логировании (косметика). 2025-04-21 21:30:26 +03:00
Леонид Юрьев (Leonid Yuriev)
a56f5acc3d mdbx: рефакторинг tree_rebalance() и реализации опции MDBX_opt_prefer_waf_insteadof_balance. 2025-04-20 18:46:08 +03:00
Леонид Юрьев (Leonid Yuriev)
072103ab67
mdbx-tests: исправление extra/cursor-closing для старых стандартов C++. 2025-04-20 00:45:16 +03:00
Леонид Юрьев (Leonid Yuriev)
668a1e42e3
mdbx: дополнение ChangeLog. 2025-04-19 23:52:19 +03:00
Леонид Юрьев (Leonid Yuriev)
dc747483dd mdbx-tests: привязка кол-ва потоков/проверок к кол-ву процессоров в extra/cursor-closing. 2025-04-19 20:02:09 +03:00
Леонид Юрьев (Leonid Yuriev)
89de43293d mdbx: исправление возврата MDBX_BAD_TXN вместо MDBX_EINVAL из mdbx_cursor_unbind() в особых случаях. 2025-04-19 20:01:57 +03:00
Леонид Юрьев (Leonid Yuriev)
270cf399aa mdbx: упрощение очистки MDBX_TXN_HAS_CHILD. 2025-04-19 20:01:57 +03:00
Леонид Юрьев (Leonid Yuriev)
b5503b5670 mdbx: исправление форматирования (косметика). 2025-04-19 20:01:36 +03:00
Леонид Юрьев (Leonid Yuriev)
a71cefc288 mdbx: предотвращение возврата неожиданной ошибки MDBX_BUSY из mdbx_txn_lock(dont_wait=false). 2025-04-19 14:07:26 +03:00
Леонид Юрьев (Leonid Yuriev)
6d6a19e3c3 mdbx-tests: вывод информации о salt/seed в extra/cursor-closing. 2025-04-19 14:07:26 +03:00
Леонид Юрьев (Leonid Yuriev)
0d7d4db3f1 mdbx: понижение уровня отладочного логирования lru-reduce. 2025-04-19 14:07:26 +03:00
Леонид Юрьев (Leonid Yuriev)
0f505c1377 mdbx: переупорядочивание атрибутов для совместимости с GCC-15 в режиме C23. 2025-04-18 10:49:00 +03:00
Леонид Юрьев (Leonid Yuriev)
f6ce9381af mdbx-tests: обнуление pid на входе в osal_actor_poll(). 2025-04-18 10:47:10 +03:00
Леонид Юрьев (Leonid Yuriev)
2ceda89b05 mdbx-tests: расширение и доработка сценария extra/cursor-closing. 2025-04-10 12:26:01 +03:00
Леонид Юрьев (Leonid Yuriev)
5bd99d4da2 mdbx: подсказка для Coverity для подавления ложно-положительных предупреждений. 2025-04-10 12:25:50 +03:00
Леонид Юрьев (Leonid Yuriev)
a04053ee98 mdbx: возврат MDBX_EINVAL из mdbx_cursor_bind() при невозможности отвязки курсора от его текущей транзакции. 2025-04-10 12:25:32 +03:00
Леонид Юрьев (Leonid Yuriev)
f35c1fe5bc mdbx: исправление неверной assert-проверки и микрооптимизация.
В пути фиксации вложенных транзакций, условие в assert-проверке не было
корректным для случая, когда таблица уже существовала и её дескриптор
был открыт, использовался в завершаемой вложенной транзакции, но не
использовался в родительской.

Это исправление недочета также передаёт уже загруженное из БД кешируемое
состояние таблицы в родительскую транзакцию.
2025-04-10 12:24:39 +03:00
Леонид Юрьев (Leonid Yuriev)
4691c0b5c8 mdbx: исправление ошибок merge/rebase. 2025-04-10 12:18:23 +03:00
Леонид Юрьев (Leonid Yuriev)
f91c2bb8da mdbx-doc: TODO typo and SWIG-url. 2025-04-09 10:58:49 +03:00
Леонид Юрьев (Leonid Yuriev)
6cb1b6754e mdbx-doc: исправление повтора в комментарии. 2025-04-06 14:09:51 +03:00
Леонид Юрьев (Leonid Yuriev)
187bd59aa0 mdbx: добавление badge-ссылки на телеграм и рокировка параграфов в начале README. 2025-04-02 16:23:59 +03:00
Леонид Юрьев (Leonid Yuriev)
1c49548ea5 mdbx-dc: fix typos. 2025-04-01 21:18:11 +03:00
Леонид Юрьев (Leonid Yuriev)
4b9427685a mdbx: добавление внутренней опции MDBX_DEBUG_DPL_LIMIT. 2025-03-31 00:54:07 +03:00
Леонид Юрьев (Leonid Yuriev)
650569cc6a mdbx: merge branch master into devel. 2025-03-31 00:52:52 +03:00
Леонид Юрьев (Leonid Yuriev)
d8f46344b5
mdbx: добавление MDBX_VERSION_UNSTABLE и маркировка master-ветки для предотвращения ошибок сборки. 2025-03-31 00:51:23 +03:00
Леонид Юрьев (Leonid Yuriev)
ebf1e9d8ba mdbx-tests: расширение extra/details-rkl для проверки hole-итераторов. 2025-03-30 23:35:00 +03:00
Леонид Юрьев (Leonid Yuriev)
4c3df230d3 mdbx: hole-iterator для rkl. 2025-03-30 20:04:49 +03:00
Леонид Юрьев (Leonid Yuriev)
9ea8e9b2cf mdbx-tests: добавление extra/details-rkl. 2025-03-30 20:04:49 +03:00
Леонид Юрьев (Leonid Yuriev)
b8c1b835ed mdbx: добавление rkl с итераторами.
RKL — сортированный набор txnid, использующий внутри комбинацию
непрерывного интервала и списка. Обеспечивает хранение id записей при
переработке, очистку и обновлении GC, включая возврат остатков
переработанных страниц.

Итератор для RKL — обеспечивает изоляцию внутреннего устройства rkl от
остального кода, чем существенно его упрощает. Фактически именно
использованием rkl с итераторами ликвидируется "ребус" исторически
образовавшийся в gc-update.

--

При переработке GC записи преимущественно выбираются последовательно, но
это не гарантируется. В LIFO-режиме переработка и добавление записей в
rkl происходит преимущественно в обратном порядке, но из-за завершения
читающих транзакций могут быть «скачки» в прямом направлении. В
FIFO-режиме записи GC перерабатываются в прямом порядке и при этом
линейно, но не обязательно строго последовательно, при этом
гарантируется что между добавляемыми в rkl идентификаторами в GC нет
записей, т.е. между первой (минимальный id) и последней (максимальный
id) в GC нет записей и весь интервал может быть использован для возврата
остатков страниц в GC.

Таким образом, комбинация линейного интервала и списка (отсортированного
в порядке возрастания элементов) является рациональным решением, близким
к теоретически оптимальному пределу.

Реализация rkl достаточно проста/прозрачная, если не считать неочевидную
«магию» обмена непрерывного интервала и образующихся в списке
последовательностей. Однако, именно этот автоматически выполняемый без
лишних операций обмен оправдывает все накладные расходы.
2025-03-30 20:04:49 +03:00
Леонид Юрьев (Leonid Yuriev)
db163cbcfd mdbx: перемещение узлов в node_add_branch() после проверки переполнения страницы. 2025-03-30 17:41:33 +03:00
Леонид Юрьев (Leonid Yuriev)
936c25e671 mdbx: добавление assert-проверок для отлова ошибок приводящих к переполнению/повреждению страниц. 2025-03-30 17:41:33 +03:00
Леонид Юрьев (Leonid Yuriev)
b308559dd9 mdbx: понижение уровня логирования для "skip update meta".
Спасибо [Илье Михееву](https://github.com/JkLondon) за сообщение о недочете.
2025-03-28 15:12:10 +03:00
Леонид Юрьев (Leonid Yuriev)
b4e65f5d21 mdbx: обновление NOTICE. 2025-03-22 23:29:43 +03:00
Леонид Юрьев (Leonid Yuriev)
390490edf4 mdbx: уточнение типа адреса для донатов. 2025-03-22 23:17:35 +03:00
Леонид Юрьев (Leonid Yuriev)
94531a9cdc mdbx++: вброс std::invalid_argument с явным сообщением "MDBX_EINVAL" . 2025-03-22 19:43:23 +03:00
Леонид Юрьев (Leonid Yuriev)
f8e332a205 mdbx-test: расширение extra/cursor-closing. 2025-03-22 19:43:23 +03:00
Леонид Юрьев (Leonid Yuriev)
021d83b841 mdbx: устранение регресса при использовании курсоров для DBI=0 в читающих транзакциях.
В результате рефакторинга и ряда оптимизаций для завершения/гашения
курсоров в читающих и пишущих транзакций стал использоваться общий код.
Причем за основу, был взят соответствующий фрагмент относящийся к
пишущим транзакциям, в которых пользователю не позволяется
использоваться курсоры для DBI=0 и поэтому эта итераций пропускалась.

В результате, при завершении читающих транзакциях, курсоры связанные с
DBI=0 не завершались должным образом, а при их повторном использовании
или явном закрытии после завершения читающей транзакции происходило
обращение к уже освобожденной памяти. Если же такие курсоры
отсоединялись или закрывались до завершения читающей транзакции, то
ошибка не имела шансов на проявление.

Спасибо Илье Михееву (https://github.com/JkLondon) и команде Erigon (https://erigon.tech) за сообщения о проблеме.
2025-03-22 19:08:52 +03:00
Леонид Юрьев (Leonid Yuriev)
4e33bad6e7
mdbx: отсечение хвоста ChangeLog с отделением в ChangeLog-01. 2025-03-21 00:31:54 +03:00
Леонид Юрьев (Leonid Yuriev)
a313dd2fae
mdbx: merge branch stable into master. 2025-03-21 00:11:00 +03:00
Леонид Юрьев (Leonid Yuriev)
2e4962a2f3 mdbx-docs: изменение <title> и meta-title в index.html 2025-03-20 21:50:53 +03:00
Леонид Юрьев (Leonid Yuriev)
00917f8c96 mdbx: корректировка ChangeLog. 2025-03-20 19:15:56 +03:00
Леонид Юрьев (Leonid Yuriev)
999f8644f6
mdbx: дополнение ChangeLog. 2025-03-20 18:02:00 +03:00
Леонид Юрьев (Leonid Yuriev)
06f8573f5f mdbx: усиление контроля сигнатур курсоров. 2025-03-20 17:20:47 +03:00
Леонид Юрьев (Leonid Yuriev)
7eb7931a23 mdbx-tests: корректировка обработки прерывания теста посредством SIGTERM/SIGINT. 2025-03-20 14:13:20 +03:00
Леонид Юрьев (Leonid Yuriev)
e37194affe
mdbx: дополнение ChangeLog. 2025-03-19 23:50:29 +03:00
Леонид Юрьев (Leonid Yuriev)
917e2827f5 mdbx-tests: кратное сокращение итераций тестов в зависимости от конфигурации Valgrind/Debug/CI. 2025-03-19 23:30:49 +03:00
Леонид Юрьев (Leonid Yuriev)
2fd1772503 mdbx-tests: устранение невыравненного доступа в extra/close-dbi для UBSAN. 2025-03-18 13:14:47 +03:00
Леонид Юрьев (Leonid Yuriev)
694626727f mdbx: использование cmp_lenfast() вместо cmp_lenfast(). 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
2aa47f20c3 mdbx-tests: перехват и логирование исключений в extra-C++ тестах. 2025-03-18 10:46:55 +03:00
Leo Yuriev
e6891b295b mdbx++: minor reflow Doxygen comments. 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
c0b1ab1466 mdbx-tests: расширение extra/dupfix-multiple. 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
71d95d1a5f mdbx++: добавление mdbx::cursor::put_multiple_samelength(). 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
7a923b3d41 mdbx: рефакторинг проверок с выносом в cursor_check_multiple(). 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
8008afc6e1 mdbx: поддержка MDBX_MULTIPLE с нулевым размером данных. 2025-03-18 10:46:55 +03:00
Леонид Юрьев (Leonid Yuriev)
7ae11e0fdb mdbx++: явное определение external-инстанцирования mdbx::buffer<> c API-атрибутами. 2025-03-17 23:28:58 +03:00
Леонид Юрьев (Leonid Yuriev)
5c1745a7cd mdbx: добавление гистограммы количества multi-значений/дубликатов в chk. 2025-03-17 23:28:46 +03:00
Леонид Юрьев (Leonid Yuriev)
23a417fe19 mdbx: исправление регресса в пути обработки MDBX_MULTIPLE.
Пакетная вставка значений посредством операции `MDBX_MULTIPLE` могла
приводить к падениям и повреждению структуры БД. Ошибка оставалось не
замеченной из-за специфических условий проявления, которые не
реализовались в тестах.

Проблема присутствовала во всех выпусках начиная с v0.13.1, но
соответствующая ошибка не связана с конкретным коммита в истории, а
является следствием нескольких доработок (шагов рефакторинга), которые
суммарно привели к регрессу.

Технически ошибка обусловлена не-обнулением переменной, которая не
обнулялась в некотором пути выполнения и исходно не требовала обнуления,
но такое обнуление потребовалось после ряда этапов оптимизации кода и
рефакторинга.

Основным условием проявления является пакетная вставка multi-значений в
dupsort-таблицу с фиксированным размером значений, при котором набор
значений соответствующий обновляемом ключу, перестаёт помещаться на
вложенной странице и преобразуется/выносится во вложенное дерево
страниц. Если такой вынос/преобразование происходило до исчерпания
переданного набора значений, то при следующей итерации повторно
производились действия соответствующие выносу данных в отдельное дерево
страниц. Что могла приводить как к разыменованию неверных указателей
(повреждению содержимого памяти) и/или к повреждению содержимого страниц
образующих структуру БД.

Исправление свелось к добавлению одной строчки кода, но также были
расширены тесты для покрытия соответствующих сценариев.
2025-03-17 23:28:28 +03:00
Леонид Юрьев (Leonid Yuriev)
db44f4ed71 mdbx-tools: добавление опции -c (concise) в mdbx_dump. 2025-03-17 23:28:16 +03:00
Леонид Юрьев (Leonid Yuriev)
ef9fd1f3fb mdbx-tests: уменьшение кол-ва итераций в extra/crunched-delete для 32-битных сборок во избежания MDBX_MAP_FULL. 2025-03-17 23:27:51 +03:00
Леонид Юрьев (Leonid Yuriev)
2e6d9fd4d4 mdbx++: добавление mdbx::cursor::seek_multiple_samelength(). 2025-03-17 23:27:34 +03:00
Леонид Юрьев (Leonid Yuriev)
83e42d03bb mdbx: костыли для CLANG < 20 при использовании [[аттрибутов]] C23. 2025-03-17 23:27:14 +03:00
Леонид Юрьев (Leonid Yuriev)
dfd265d46f mdbx-tests: существенное расширение extra/cursor-closing. 2025-03-17 23:26:49 +03:00
Леонид Юрьев (Leonid Yuriev)
08d10ad0a1 mdbx-tests: дополнение extra/txn. 2025-03-17 23:26:30 +03:00
Леонид Юрьев (Leonid Yuriev)
8ebedde181 mdbx++: проверка __cpp_concepts >= 202002 для использования концептов C++. 2025-03-17 23:26:18 +03:00
Леонид Юрьев (Leonid Yuriev)
dcf35e5306 mdbx: исправление затенения курсоров во вложенных транзакциях. 2025-03-17 23:25:53 +03:00
Леонид Юрьев (Leonid Yuriev)
aeac971f0b mdbx: переработка проверки курсоров на входе API-функций с добавлением cursor_check(). 2025-03-17 23:25:30 +03:00
Леонид Юрьев (Leonid Yuriev)
6c8047a402 mdbx: переработка mdbx_txn_release_all_cursors_ex(). 2025-03-17 23:20:40 +03:00
Леонид Юрьев (Leonid Yuriev)
438d185250 mdbx++: переформатирование (временно) неиспользуемого кода. 2025-03-17 23:20:28 +03:00
Леонид Юрьев (Leonid Yuriev)
ee6843062d mdbx++: удаление исключения при запросе транзакции у отсоединённого курсора. 2025-03-17 23:20:03 +03:00
Леонид Юрьев (Leonid Yuriev)
70adf71770 mdbx++: добавление inplace_storage_size_rounding в capacity_policy для буферов. 2025-03-17 23:16:30 +03:00
Леонид Юрьев (Leonid Yuriev)
fa2c27fa08 mdbx++: добавление mdbx::cursor_managed::withdraw_handle(). 2025-03-17 23:16:12 +03:00
Леонид Юрьев (Leonid Yuriev)
7a72d1b273 mdbx: корректировка описания MDBX_MVCC_RETARDED и соответствующего сообщения об ошибке. 2025-03-17 23:15:48 +03:00
Леонид Юрьев (Leonid Yuriev)
3e91500fac mdbx: устранение гонки в tbl_setup(MDBX_DUPFIXED | MDBX_INTEGERDUP) при работе в разных потоках.
Проблема была в том, что в случаях фиксированного размера значений
clc.lmin/clc.lmax устанавливались в env->kvs[], а затем корректировались
по актуальному размеру данных в БД. Поэтому при конкурентном вызове из
разных потоков, один поток мог выполнять инициализацию, а второй
прочитать временные/промежуточные значения lmin/lmax.

В результате, при конкурентном старте транзакций в разных потоках при
использовании только-что открытого dbi-хендла, проверка допустимости
длины значения могла заканчиваться ложной ошибкой MDBX_BAD_VALSIZE.
2025-03-17 23:13:26 +03:00
Леонид Юрьев (Leonid Yuriev)
546b48b6eb mdbx: переименование cursor_validate(). 2025-03-17 23:01:30 +03:00
Леонид Юрьев (Leonid Yuriev)
2ffa5cf371 mdbx: добавление MDBX_SEEK_AND_GET_MULTIPLE в API операций курсора. 2025-03-17 22:58:57 +03:00
Леонид Юрьев (Leonid Yuriev)
b546dc69d2 mdbx-doc: doxygen-описания для doubtless-positioning констант. 2025-03-17 22:58:44 +03:00
Леонид Юрьев (Leonid Yuriev)
42706c45a0 mdbx-tests: добавление поддержки опции MDBX_VALIDATION и использование в стохастическом тесте. 2025-03-17 22:58:29 +03:00
Леонид Юрьев (Leonid Yuriev)
8dda33329b mdbx-tests: поддержка значений on/off для опций командной строки. 2025-03-17 22:58:08 +03:00
Леонид Юрьев (Leonid Yuriev)
b2bd8bae38 mdbx: добавление mdbx_cursor_close2() в API. 2025-03-17 22:57:38 +03:00
Леонид Юрьев (Leonid Yuriev)
1299653457 mdbx: добавление cursor_reset() и cursor_drown(). 2025-03-17 22:24:23 +03:00
Леонид Юрьев (Leonid Yuriev)
333069e7a8 mdbx: рефакторинг cursor_eot() для упрощения txn_done_cursors(). 2025-03-17 21:38:42 +03:00
Леонид Юрьев (Leonid Yuriev)
436998ca83 mdbx: косметический рефакторинг cursor_shadow(). 2025-03-17 21:06:45 +03:00
Леонид Юрьев (Leonid Yuriev)
b0665f7016 mdbx: запрещение unbind/close курсоров для вложенных транзакций. 2025-03-17 20:48:19 +03:00
Леонид Юрьев (Leonid Yuriev)
4fcfb07b97 mdbx: корректировка mdbx_panic() для вывода переданного сообщения через __assert_failed(). 2025-03-17 20:47:47 +03:00
Леонид Юрьев (Leonid Yuriev)
ca30365d3b mdbx-make: добавление цели ninja-assertions и её использование при make check. 2025-03-17 20:46:44 +03:00
Леонид Юрьев (Leonid Yuriev)
6424747636 mdbx++: использование mdbx_txn_release_all_cursors_ex(). 2025-03-17 20:45:09 +03:00
Леонид Юрьев (Leonid Yuriev)
183610b050 mdbx-doc: исправление url в sitemap. 2025-03-09 11:41:02 +03:00
Леонид Юрьев (Leonid Yuriev)
920d9b5b2f mdbx-doc: добавление ld+json в корневой index.hml 2025-03-05 12:54:51 +03:00
Леонид Юрьев (Leonid Yuriev)
283c962fea mdbx: исправление опечатки в ChangeLog. 2025-03-05 01:46:57 +03:00
Леонид Юрьев (Leonid Yuriev)
8efcdeae9d mdbx: исправление опечатки в дате внутри ChangeLog. 2025-03-04 20:06:16 +03:00
Леонид Юрьев (Leonid Yuriev)
9c161cdafd mdbx: дополнение ChangeLog. 2025-03-04 14:27:53 +03:00
Леонид Юрьев (Leonid Yuriev)
a3265e11dc mdbx: добавление в API mdbx_txn_release_all_cursors_ex() и изменение семантики результата mdbx_txn_release_all_cursors().
По недосмотру в выпусках остался предварительный/черновой вариант
функции mdbx_txn_release_all_cursors(), который смешивает в возвращаемом
значении информацию об ошибке/успехе и количество обработанных курсоров.
За-за чего невозможно отличить одно от другого, например ошибку EPERM на
Linux от одного успешно закрытого курсора.

Теперь mdbx_txn_release_all_cursors() возвращает только код ошибки,
а для получения кол-ва закрытых курсоров в API добавлена функция mdbx_txn_release_all_cursors_ex().
2025-03-04 14:21:25 +03:00
Леонид Юрьев (Leonid Yuriev)
709d524d21 mdbx: проверка владельца потока владеющего транзакцией только при MDBX_TXN_CHECKOWNER=ON. 2025-03-04 10:52:30 +03:00
Леонид Юрьев (Leonid Yuriev)
e0843429a1 mdbx-doc: актуализация раздела MacOS в README. 2025-03-04 00:02:39 +03:00
Леонид Юрьев (Leonid Yuriev)
329eee4e4f mdbx-make: поиск gnu-sed на Darwin/MacOS. 2025-03-03 23:12:55 +03:00
Леонид Юрьев (Leonid Yuriev)
4fd165f8d2 mdbx: дополнение ChangeLog. 2025-03-03 20:16:51 +03:00
Леонид Юрьев (Leonid Yuriev)
05e7a94619 mdbx-tests: исправление extra-open для 32-битных сборок Windows (БД еще меньше). 2025-03-03 02:31:32 +03:00
Леонид Юрьев (Leonid Yuriev)
826cdb708f mdbx: корректировка log_error() для устранение ложных ошибок при работе mdbx_chk с высоким уровнем логирования.
Некая проблема была в том, что при высоком уровне логирования в логгер
также отправлялись неизбежные MDBX_NOTFOND при достижении конца
интегрируемых данных. В свою очередь, chk-логика формирования отчета
подсчитывала эти сообщения как ошибки при проверке БД...
2025-03-03 01:12:35 +03:00
Леонид Юрьев (Leonid Yuriev)
da24fda578 mdbx: добавление print-подобных функций в chk для удобства отладки. 2025-03-03 01:11:55 +03:00
Леонид Юрьев (Leonid Yuriev)
0fa21a3c0d mdbx: переделка env_owned_wrtxn() и мест её вызова.
Цель в том чтобы избавить от коллизии блокировки возникающей внутри
dxb_sanitize_tail() при использовании Valgrind/ASAN, а также упросить
код.
2025-03-02 23:29:40 +03:00
Леонид Юрьев (Leonid Yuriev)
dd9f608320 mdbx: дополнительные проверки сигнатур курсоров при итерации связанных списков. 2025-03-02 11:46:10 +03:00
Леонид Юрьев (Leonid Yuriev)
28ca18972a mdbx: более полная очистка курсоров при закрытии/отключении. 2025-03-02 11:44:10 +03:00
Леонид Юрьев (Leonid Yuriev)
fbb93f9cfb mdbx: удаление const у транзакции в cursor_bind() и cursor_renew(). 2025-03-02 10:41:38 +03:00
Леонид Юрьев (Leonid Yuriev)
bc464521c0 mdbx-tests: расширение extra/dbi. 2025-03-02 00:42:55 +03:00
Леонид Юрьев (Leonid Yuriev)
9273e2ee60 mdbx: исправление наследования dbi-хендла открытого в дочерней транзакции без изменения данных. 2025-03-02 00:40:18 +03:00
Леонид Юрьев (Leonid Yuriev)
e035f102ab mdbx: устранение сбоя аудита таблиц при инвалидации dbi-хендла вследствие отмены вложенной транзакции. 2025-03-02 00:10:56 +03:00
Леонид Юрьев (Leonid Yuriev)
1240ed2ba3 mdbx: исправление оплошности в спецификации формата при логировании имен таблиц. 2025-03-02 00:10:56 +03:00
Леонид Юрьев (Leonid Yuriev)
6ca63b46d8 mdbx: уменьшение в 16 раз предлагаемого размера БД для устранения проблем Valgrind/ASAN. 2025-03-02 00:10:56 +03:00
Леонид Юрьев (Leonid Yuriev)
9fee0bc3a6 mdbx-tests: удаление тестовой БД перед началом теста в extra/dupfix_addodd. 2025-03-02 00:10:56 +03:00
Леонид Юрьев (Leonid Yuriev)
c14bb7814f mdbx-tests: исправление extra-open для 32-битных сборок. 2025-02-20 23:48:48 +03:00
Леонид Юрьев (Leonid Yuriev)
9b31c517e6 mdbx: проверяем выравнивание размера БД на юнит выделения памяти, а не на размер страницы.
Теоретически до этого коммита могла быть некоторая неувязка:
 - при открытии БД с размером страницы 4K на Windows (где размер секции кратен 64K) в режиме read-only,
 - после того как БД использовалась на POSIX (где размер отображения кратен размеру системной страницы).

Ранее ошибка могла возвращаться со стороны системы (например INVALID_PARAMETER) и по ней крайне сложно было понять в чем дело.
Теперь же будет логирование ошибки и возврат MDBX_WANNA_RECOVERY.
2025-02-20 23:11:28 +03:00
Леонид Юрьев (Leonid Yuriev)
66c747e4a9 mdbx-cmake: корректировка форматирования (косметика). 2025-02-20 23:11:28 +03:00
Леонид Юрьев (Leonid Yuriev)
54d8c0d290 mdbx: переработка проверка размера файла БД при открытии.
Переработка 05cdf9d202b14ac09c801c7893e65271fa27f378. У предыдущего
варианта был недостаток, при необходимости выдачи предупреждения
и открытии БД с изменением геометрии, предупреждение не выдавалось,
что может затруднять анализ/разбор проблемных ситуаций.
2025-02-20 23:11:28 +03:00
Леонид Юрьев (Leonid Yuriev)
26cd5ebc43 mdbx: дополнение ChangeLog. 2025-02-20 00:13:21 +03:00
Леонид Юрьев (Leonid Yuriev)
806f819bae mdbx-tests: дополнение extra-open. 2025-02-20 00:09:58 +03:00
Леонид Юрьев (Leonid Yuriev)
05cdf9d202 mdbx: устранение излишнего предупреждения при смене размера БД во время открытия.
Изменение геометрии (увеличение размера) больших БД может быть не
возможно после их открытия вследствие системных ограничений (отсутствия
свободного адресного пространства).

Поэтому API предусматривает возможность запросить изменение
геометрии/размера БД перед её открытием. В этом сценарии ранее могло
выдаваться лишнее/ненужное предупреждение о несоответствии файла БД
новому размеру. Теперь этот недостаток исправлен.

Спасибо Илье Михееву (Erigon) за сообщение об этом недочете.
2025-02-19 23:22:18 +03:00
Леонид Юрьев (Leonid Yuriev)
818740976b mdbx-doc: добавление ссылки на привязку к Zig. 2025-02-17 15:01:57 +03:00
Леонид Юрьев (Leonid Yuriev)
287bab36a1 mdbx-doc: обновление конфигурации doxygen. 2025-02-17 14:43:20 +03:00
Леонид Юрьев (Leonid Yuriev)
5388d2273b mdbx-doc: опечатки в README. 2025-02-16 16:52:53 +03:00
Леонид Юрьев (Leonid Yuriev)
d2864029da
mdbx: информация о статусе Github. 2025-02-15 15:47:33 +03:00
Леонид Юрьев (Leonid Yuriev)
b63ca3c12e mdbx: обновление патча для старых версий buildroot. 2025-02-14 21:39:39 +03:00
Леонид Юрьев (Leonid Yuriev)
4730abe3e5 mdbx: корректировка излишне строгого условия в assert-проверке внутри recalculate_subpage_thresholds(). 2025-02-11 14:01:10 +03:00
Леонид Юрьев (Leonid Yuriev)
401454dadf mdbx-conan: исправление опечатки в имени переменной version_json_pathname в verbose-выводе.
Спасибо Виктору Логунову (https://t.me/vl_username) за сообщение о проблеме.
2025-02-03 18:46:01 +03:00
Леонид Юрьев (Leonid Yuriev)
9568209ee4 mdbx: добавление pnl_clone() и pnl_maxspan(). 2025-02-01 16:56:00 +03:00
Леонид Юрьев (Leonid Yuriev)
781c04f6e2 mdbx: корректировка излишне строгого условия в assert-проверке внутри recalculate_subpage_thresholds(). 2025-01-29 12:15:01 +03:00
Леонид Юрьев (Leonid Yuriev)
b7206c68a5 mdbx: дополнение ChangeLog. 2025-01-27 22:41:24 +03:00
Леонид Юрьев (Leonid Yuriev)
3a0b857e1d mdbx-cmake: используем -flto=auto для GCC >= 11.4
При сборке посредством GCC >= 11.4 больше не возникает предупреждений:
  lto-wrapper: warning: using serial compilation of # LTRANS jobs
  lto-wrapper: note: see the ‘-flto’ option documentation for more information

Однако, использование auto-режима не является оптимальным решением, так
как при параллельной сборке посредством make или ninja, каждая уже
запущенная ветвь компиляции породит потоки ещё для каждого ядра ЦПУ.

Таким образом реальная нагрузка может расти квадратично, т.е. чем больше
у вас ядер -- тем хуже и при 96 ядрах может работать 9216 потоков сборки.

Тем не менее, использование `job-server` в CMake пока не возможно, а при
сборке libmdbx не так много работы чтобы чтобы обрушить систему нагрузкой.
2025-01-27 21:30:01 +03:00
Леонид Юрьев (Leonid Yuriev)
6ccbce9afc mdbx-cmake: избегаем двойной работы compiler.cmake без необходимости. 2025-01-27 21:11:57 +03:00
Леонид Юрьев (Leonid Yuriev)
9d7495fa09 mdbx-cmake: расслабление условий для использования LTO с CLANG на Linux. 2025-01-27 20:41:44 +03:00
Леонид Юрьев (Leonid Yuriev)
c8f6d90e18 mdbx-cmake: расширение поиска LLVMgold.so в относительных lib-директориях. 2025-01-27 20:32:02 +03:00
Леонид Юрьев (Leonid Yuriev)
778aee25fe mdbx: дополнение ChangeLog. 2025-01-27 11:01:10 +03:00
Леонид Юрьев (Leonid Yuriev)
cb8eec6d11 mdbx: устранение регресса вероятности SIGSEGV при вытеснении/spilling страниц.
Ошибка внесена коммитом `a6f7d74a32a3cbcc310916a624a31302dbebfa07` от
2024-03-07 и присутствует в выпусках v0.13.1, v0.13.2, v0.13.3. Проблема
оставалась незамеченной из-за специфических условий и низкой вероятности
проявления.

Суть ошибки:

- функция cursor_touch() подготавливает стек страниц курсора к внесению
  изменений, при этом все страницы в стеке (от корневой до листовой
  в текущей позиции курсора) должны стать доступными для модификации.

- микрооптимизация добавленная коммитом пропускала обход стека, если
  корневая страница уже доступна для модификации, но это
  допустимо/корректно только при отсутствии в стеке вытесненных/spilled
  страниц.

- если же складывалась ситуация когда в стека была вытесненная
  некорневая страница, то она так и оставалась недоступной для записи и
  при попытке её изменения возникал SIGSEGV.
2025-01-27 10:09:04 +03:00
Леонид Юрьев (Leonid Yuriev)
f6d91b3c5b mdbx-doc: исправление опечатки в упоминании mdbx_env_resurrect_after_fork(). 2025-01-26 17:36:40 +03:00
Леонид Юрьев (Leonid Yuriev)
750fab2427 mdbx: дополнение ChangeLog. 2025-01-26 16:57:17 +03:00
Леонид Юрьев (Leonid Yuriev)
fffa78d912 mdbx: дополнение TODO. 2025-01-26 16:49:33 +03:00
Леонид Юрьев (Leonid Yuriev)
fc85d1c61f mdbx-cmake: поддержка MacOS universal binaries.
Thank Alain Picard (Castor Technologies) so much for this patch and supporting the Java bindings!
2025-01-26 16:37:34 +03:00
Леонид Юрьев (Leonid Yuriev)
340bd080c9 mdbx: исправление опечатки в cursor_touch().
При переделке курсоров было пропущено отрицание в условии, при оценке
кол-ва страниц, которые могут потребоваться для выполнения операции.

В текущем понимании ошибка не приводила к каким-либо проблемам, ибо
оценка делает по верхней границе с существенным запасом, а в худшем
случае это могло приводить к прерыванию транзакции из-за достижения
ограничения на кол-во грязных страниц.
2025-01-26 16:37:00 +03:00
Леонид Юрьев (Leonid Yuriev)
7074b94b2e mdbx: упрощение gcu_loose(). 2025-01-26 16:36:55 +03:00
Леонид Юрьев (Leonid Yuriev)
f39542a9f0 mdbx-doc: дополнение TODO. 2025-01-21 16:26:47 +03:00
Леонид Юрьев (Leonid Yuriev)
d89670bcea mdbx-doc: исправление орфографии/опечатки в ChangeLog. 2025-01-21 15:40:26 +03:00
Леонид Юрьев (Leonid Yuriev)
fce40169bd mdbx-doc: доработка/актуализация раздела "Restrictions & Caveats". 2025-01-19 02:14:19 +03:00
Леонид Юрьев (Leonid Yuriev)
560aa72f3d mdbx-doc: добавление в doxygen-документацию ссылки на архив сообщений телеграмм-группы 2020-2024 годов. 2025-01-19 01:23:31 +03:00
Леонид Юрьев (Leonid Yuriev)
cb7ba6b53f mdbx-doc: favicon для сайта с документацией. 2025-01-19 00:51:37 +03:00
Леонид Юрьев (Leonid Yuriev)
1b9ad144ea mdbx: исправление верстки README. 2025-01-18 18:15:51 +03:00
Леонид Юрьев (Leonid Yuriev)
0233eda949 mdbx-doc: добавление в README ссылки на архив сообщений телеграмм-группы 2020-2024 годов. 2025-01-17 22:41:26 +03:00
Леонид Юрьев (Leonid Yuriev)
78552a5c84 mdbx-doc: разделение актуальных и устаревших/неподдерживаемых привязок в README. 2025-01-17 20:39:25 +03:00
Леонид Юрьев (Leonid Yuriev)
beb5a81d12 mdbx-doc: обновление номера версии и даты в заголовках man-страниц. 2025-01-17 18:29:15 +03:00
Леонид Юрьев (Leonid Yuriev)
56d1dbef45 mdbx: обновление года в ©. 2025-01-15 19:36:07 +03:00
Леонид Юрьев (Leonid Yuriev)
761248cc21 mdbx-doc: дополнение описания mdbx_txn_commit(). 2025-01-15 14:56:26 +03:00
Леонид Юрьев (Leonid Yuriev)
72fb45e13d mdbx: дополнение ChangeLog. 2025-01-15 14:24:43 +03:00
Леонид Юрьев (Leonid Yuriev)
e529cd7d19 mdbx: корректировка ChangeLog. 2025-01-15 00:50:57 +03:00
Леонид Юрьев (Leonid Yuriev)
2c3b36da64 mdbx: рефакторинг txn_renew() транзакций с вычленением txn_basal_start(). 2025-01-15 00:50:57 +03:00
Леонид Юрьев (Leonid Yuriev)
314b8ce1f0 mdbx: переименование (косметика). 2025-01-15 00:50:57 +03:00
Леонид Юрьев (Leonid Yuriev)
7e772114bc mdbx: рефакторинг читающих транзакций в вычленением txn_ro_start(), txn_ro_seize(), txn_ro_slot(). 2025-01-15 00:50:36 +03:00
Леонид Юрьев (Leonid Yuriev)
0accf98ff7 mdbx: добавление опции сборки MDBX_ENABLE_NON_READONLY_EXPORT и логирование соответствующих ситуаций.
Закрывает [запрос](https://gitflic.ru/project/erthink/libmdbx/issue/16).
2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
e4054b56c3 mdbx: использование при наличии EREMOTEIO вместо ENOTBLK в качестве MDBX_EREMOTE. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
950db52fe8 mdbx: выделение basal/ro/nested txn-функций в отдельные файлы (без изменений кода). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
380385c1db mdbx: упрощение выхода по not-found пути из cursor_seek(). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
10e7e5c899 mdbx: рефакторинг mdbx_txn_commit_ex() 5/5 (вычленение txn_basal_end()). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
6d92a778a5 mdbx: оформление опции сборки MDBX_NOSUCCESS_PURE_COMMIT (выключено по умолчанию). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
c60f6afe5f mdbx: упрощение/выпрямление/рефакторинг txn_end() и затронутых зависимостей. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
a5bb555db3 mdbx: рефакторинг mdbx_txn_commit_ex() 4/5 (вычленение txn_basal_commit()). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
b9b784c18e mdbx: рефакторинг mdbx_txn_commit_ex() 3/5 (вычленение txn_nested_join()). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
c6cd482ea0 mdbx: рефакторинг mdbx_txn_commit_ex() 2/5 (struct commit_timestamp, latency_init/gcprof/done()). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
2b9401e372 mdbx: рефакторинг mdbx_txn_commit_ex() 1/5 (переменование локальных timestamp-переменных). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
6fe7baa1b8 mdbx: упрощение mdbx_txn_break(). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
1e5fef2c76 mdbx: рефакторинг txn-api с выносом отдельных txn-функций. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
0a4156fe6f mdbx: перенос check_env() из txn_end() в функции txn-api. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
a89d418c91 mdbx: рефакторинг mdbx_txn_straggler() с добавлением проверки env. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
585ccdf716 mdbx: изменение TXN_END_NAMES. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
81e2623a54 mdbx: рефакторинг затенения и завершения курсоров, с удалением TXN_END_EOTDONE и добавлением txn_may_have_cursors. 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
b681b59434 mdbx: рефакторинг/вычленение txn_basal_create/destroy(). 2025-01-14 13:26:54 +03:00
Леонид Юрьев (Leonid Yuriev)
67460dd0fd mdbx: обновление патча для старых версий buildroot. 2025-01-14 13:04:25 +03:00
Леонид Юрьев (Leonid Yuriev)
3a1ac35009 mdbx: дополнение ChangeLog. 2025-01-13 16:55:41 +03:00
Леонид Юрьев (Leonid Yuriev)
3c60e1e94c mdbx-tests: переделка seed/salt ГПСЧ для более удобного контроля и воспроизведения тестов. 2025-01-13 16:55:41 +03:00
Леонид Юрьев (Leonid Yuriev)
a994a9bbcc mdbx: использование MDBX_GET_BOTH для проверки наличия добавляемого значения в таблице. 2025-01-13 16:55:41 +03:00
Леонид Юрьев (Leonid Yuriev)
84e2c70b98
mdbx: начало разработки ветки 0.14. 2025-01-13 16:54:52 +03:00
103 changed files with 8323 additions and 5060 deletions

View File

@ -132,6 +132,8 @@ if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/preface.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/preface.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/proto.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/proto.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/refund.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/refund.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/rkl.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/rkl.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/sort.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/sort.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/spill.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/spill.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/spill.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/spill.h"
@ -149,6 +151,9 @@ if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/tree-ops.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/tree-ops.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txl.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txl.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txl.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txl.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txn-basal.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txn-nested.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txn-ro.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txn.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/txn.c"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/unaligned.h" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/unaligned.h"
AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/utils.c" AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/src/utils.c"
@ -829,6 +834,8 @@ else()
"${MDBX_SOURCE_DIR}/preface.h" "${MDBX_SOURCE_DIR}/preface.h"
"${MDBX_SOURCE_DIR}/proto.h" "${MDBX_SOURCE_DIR}/proto.h"
"${MDBX_SOURCE_DIR}/refund.c" "${MDBX_SOURCE_DIR}/refund.c"
"${MDBX_SOURCE_DIR}/rkl.c"
"${MDBX_SOURCE_DIR}/rkl.h"
"${MDBX_SOURCE_DIR}/sort.h" "${MDBX_SOURCE_DIR}/sort.h"
"${MDBX_SOURCE_DIR}/spill.c" "${MDBX_SOURCE_DIR}/spill.c"
"${MDBX_SOURCE_DIR}/spill.h" "${MDBX_SOURCE_DIR}/spill.h"
@ -838,6 +845,9 @@ else()
"${MDBX_SOURCE_DIR}/tree-ops.c" "${MDBX_SOURCE_DIR}/tree-ops.c"
"${MDBX_SOURCE_DIR}/txl.c" "${MDBX_SOURCE_DIR}/txl.c"
"${MDBX_SOURCE_DIR}/txl.h" "${MDBX_SOURCE_DIR}/txl.h"
"${MDBX_SOURCE_DIR}/txn-basal.c"
"${MDBX_SOURCE_DIR}/txn-nested.c"
"${MDBX_SOURCE_DIR}/txn-ro.c"
"${MDBX_SOURCE_DIR}/txn.c" "${MDBX_SOURCE_DIR}/txn.c"
"${MDBX_SOURCE_DIR}/unaligned.h" "${MDBX_SOURCE_DIR}/unaligned.h"
"${MDBX_SOURCE_DIR}/utils.c" "${MDBX_SOURCE_DIR}/utils.c"

1067
ChangeLog-01.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -299,9 +299,9 @@ lib-shared libmdbx.$(SO_SUFFIX): mdbx-dylib.o $(call select_by,MDBX_BUILD_CXX,md
@echo ' LD $@' @echo ' LD $@'
$(QUIET)$(call select_by,MDBX_BUILD_CXX,$(CXX) $(CXXFLAGS),$(CC) $(CFLAGS)) $^ -pthread -shared $(LDFLAGS) $(call select_by,MDBX_BUILD_CXX,$(LIB_STDCXXFS)) $(LIBS) -o $@ $(QUIET)$(call select_by,MDBX_BUILD_CXX,$(CXX) $(CXXFLAGS),$(CC) $(CFLAGS)) $^ -pthread -shared $(LDFLAGS) $(call select_by,MDBX_BUILD_CXX,$(LIB_STDCXXFS)) $(LIBS) -o $@
ninja-assertions: CMAKE_OPT += -DMDBX_FORCE_ASSERTIONS=ON ninja-assertions: CMAKE_OPT += -DMDBX_FORCE_ASSERTIONS=ON $(MDBX_BUILD_OPTIONS)
ninja-assertions: cmake-build ninja-assertions: cmake-build
ninja-debug: CMAKE_OPT += -DCMAKE_BUILD_TYPE=Debug ninja-debug: CMAKE_OPT += -DCMAKE_BUILD_TYPE=Debug $(MDBX_BUILD_OPTIONS)
ninja-debug: cmake-build ninja-debug: cmake-build
ninja: cmake-build ninja: cmake-build
cmake-build: cmake-build:
@ -367,7 +367,7 @@ else
.PHONY: build-test build-test-with-valgrind check cross-gcc cross-qemu dist doxygen gcc-analyzer long-test .PHONY: build-test build-test-with-valgrind check cross-gcc cross-qemu dist doxygen gcc-analyzer long-test
.PHONY: reformat release-assets tags smoke test test-asan smoke-fault test-leak .PHONY: reformat release-assets tags smoke test test-asan smoke-fault test-leak
.PHONY: smoke-singleprocess test-singleprocess test-ubsan test-valgrind test-memcheck memcheck smoke-memcheck .PHONY: smoke-singleprocess test-singleprocess test-ubsan test-valgrind test-memcheck memcheck smoke-memcheck
.PHONY: smoke-assertion test-assertion long-test-assertion test-ci test-ci-extra .PHONY: smoke-assertion test-assertion long-test-assertion test-ci test-ci-extra check-posix-locking
test-ci-extra: test-ci cross-gcc cross-qemu test-ci-extra: test-ci cross-gcc cross-qemu
@ -384,8 +384,9 @@ endef
define uname2titer define uname2titer
case "$(UNAME)" in case "$(UNAME)" in
CYGWIN*|MINGW*|MSYS*|Windows*) echo 2;;
Darwin*|Mach*) echo 2;; Darwin*|Mach*) echo 2;;
*) echo 12;; *) if [ -z "${CI}" ]; then echo 7; else echo 3; fi;;
esac esac
endef endef
@ -434,16 +435,27 @@ MDBX_DIST_DIR = libmdbx-$(MDBX_VERSION_NODOT)
MDBX_SMOKE_EXTRA ?= MDBX_SMOKE_EXTRA ?=
check: DESTDIR = $(shell pwd)/@check-install check: DESTDIR = $(shell pwd)/@check-install
check: CMAKE_OPT = -Werror=dev check: CMAKE_OPT += -Werror=dev
check: smoke-assertion ninja-assertions dist install test ctest check: clean | smoke-assertion ninja-assertions dist install test ctest
smoke-assertion: MDBX_BUILD_OPTIONS += -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0
smoke-assertion: MDBX_BUILD_OPTIONS:=$(strip $(MDBX_BUILD_OPTIONS) -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0)
smoke-assertion: smoke smoke-assertion: smoke
test-assertion: MDBX_BUILD_OPTIONS:=$(strip $(MDBX_BUILD_OPTIONS) -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0) test-assertion: MDBX_BUILD_OPTIONS += -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0
test-assertion: smoke test-assertion: smoke
long-test-assertion: MDBX_BUILD_OPTIONS:=$(strip $(MDBX_BUILD_OPTIONS) -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0) long-test-assertion: MDBX_BUILD_OPTIONS += -DMDBX_FORCE_ASSERTIONS=1 -UNDEBUG -DMDBX_DEBUG=0
long-test-assertion: smoke long-test-assertion: smoke
.PHONY: check-posix-locking-sysv check-posix-locking-1988 check-posix-locking-2001 check-posix-locking-2008
check-posix-locking-sysv: MDBX_BUILD_OPTIONS += -DMDBX_LOCKING=5
check-posix-locking-1988: MDBX_BUILD_OPTIONS += -DMDBX_LOCKING=1988
check-posix-locking-2001: MDBX_BUILD_OPTIONS += -DMDBX_LOCKING=2001
check-posix-locking-2008: MDBX_BUILD_OPTIONS += -DMDBX_LOCKING=2008
check-posix-locking-sysv: check
check-posix-locking-1988: check
check-posix-locking-2001: check
check-posix-locking-2008: check
check-posix-locking:
$(QUIET)for LCK in sysv 1988 2001 2008; do $(MAKE) check-posix-locking-$${LCK} || break; done;
smoke: build-test smoke: build-test
@echo ' SMOKE `mdbx_test basic`...' @echo ' SMOKE `mdbx_test basic`...'
$(QUIET)rm -f $(TEST_DB) $(TEST_LOG).gz && (set -o pipefail; \ $(QUIET)rm -f $(TEST_DB) $(TEST_LOG).gz && (set -o pipefail; \
@ -634,11 +646,12 @@ docs/usage.md: docs/__usage.md docs/_starting.md docs/__bindings.md
@echo ' MAKE $@' @echo ' MAKE $@'
$(QUIET)echo -e "\\page usage Usage\n\\section getting Building & Embedding" | cat - $^ | $(SED) 's/^Bindings$$/Bindings {#bindings}/' >$@ $(QUIET)echo -e "\\page usage Usage\n\\section getting Building & Embedding" | cat - $^ | $(SED) 's/^Bindings$$/Bindings {#bindings}/' >$@
doxygen: docs/Doxyfile docs/overall.md docs/intro.md docs/usage.md mdbx.h mdbx.h++ src/options.h ChangeLog.md COPYRIGHT LICENSE NOTICE $(lastword $(MAKEFILE_LIST)) doxygen: docs/Doxyfile docs/overall.md docs/intro.md docs/usage.md mdbx.h mdbx.h++ src/options.h ChangeLog.md COPYRIGHT LICENSE NOTICE docs/favicon.ico docs/manifest.webmanifest docs/ld+json $(lastword $(MAKEFILE_LIST))
@echo ' RUNNING doxygen...' @echo ' RUNNING doxygen...'
$(QUIET)rm -rf docs/html && \ $(QUIET)rm -rf docs/html && \
cat mdbx.h | tr '\n' '\r' | $(SED) -e 's/LIBMDBX_INLINE_API\s*(\s*\([^,]\+\),\s*\([^,]\+\),\s*(\s*\([^)]\+\)\s*)\s*)\s*{/inline \1 \2(\3) {/g' | tr '\r' '\n' >docs/mdbx.h && \ cat mdbx.h | tr '\n' '\r' | $(SED) -e 's/LIBMDBX_INLINE_API\s*(\s*\([^,]\+\),\s*\([^,]\+\),\s*(\s*\([^)]\+\)\s*)\s*)\s*{/inline \1 \2(\3) {/g' | tr '\r' '\n' >docs/mdbx.h && \
cp mdbx.h++ src/options.h ChangeLog.md docs/ && (cd docs && doxygen Doxyfile $(HUSH)) && cp COPYRIGHT LICENSE NOTICE docs/html/ cp mdbx.h++ src/options.h ChangeLog.md docs/ && (cd docs && doxygen Doxyfile $(HUSH)) && cp COPYRIGHT LICENSE NOTICE docs/favicon.ico docs/manifest.webmanifest docs/html/ && \
$(SED) -i docs/html/index.html -e '/\/MathJax.js"><\/script>/r docs/ld+json' -e 's/<title>libmdbx: Overall<\/title>//;T;r docs/title'
mdbx++-dylib.o: src/config.h src/mdbx.c++ mdbx.h mdbx.h++ $(lastword $(MAKEFILE_LIST)) mdbx++-dylib.o: src/config.h src/mdbx.c++ mdbx.h mdbx.h++ $(lastword $(MAKEFILE_LIST))
@echo ' CC $@' @echo ' CC $@'
@ -721,6 +734,7 @@ $(DIST_DIR)/@tmp-internals.inc: $(DIST_DIR)/@tmp-essentials.inc src/version.c $(
-e '/#include "essentials.h"/d' \ -e '/#include "essentials.h"/d' \
-e '/#include "atomics-ops.h"/r src/atomics-ops.h' \ -e '/#include "atomics-ops.h"/r src/atomics-ops.h' \
-e '/#include "proto.h"/r src/proto.h' \ -e '/#include "proto.h"/r src/proto.h' \
-e '/#include "rkl.h"/r src/rkl.h' \
-e '/#include "txl.h"/r src/txl.h' \ -e '/#include "txl.h"/r src/txl.h' \
-e '/#include "unaligned.h"/r src/unaligned.h' \ -e '/#include "unaligned.h"/r src/unaligned.h' \
-e '/#include "cogs.h"/r src/cogs.h' \ -e '/#include "cogs.h"/r src/cogs.h' \
@ -809,17 +823,21 @@ endif
# Cross-compilation simple test # Cross-compilation simple test
CROSS_LIST = \ CROSS_LIST = \
mips64-linux-gnuabi64-gcc mips-linux-gnu-gcc \ aarch64-linux-gnu-gcc \
hppa-linux-gnu-gcc s390x-linux-gnu-gcc \ arm-linux-gnueabihf-gcc \
powerpc64-linux-gnu-gcc powerpc-linux-gnu-gcc \ hppa-linux-gnu-gcc \
arm-linux-gnueabihf-gcc aarch64-linux-gnu-gcc mips64-linux-gnuabi64-gcc \
mips-linux-gnu-gcc \
powerpc64-linux-gnu-gcc\
riscv64-linux-gnu-gcc \
s390x-linux-gnu-gcc \
sh4-linux-gnu-gcc
## On Ubuntu Focal (22.04) with QEMU 6.2 (1:6.2+dfsg-2ubuntu6.6) & GCC 11.3 (11.3.0-1ubuntu1~22.04) ## On Ubuntu Noble (24.04.2) with QEMU 8.2 (8.2.2+ds-0ubuntu1.7) & GCC 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
# sh4-linux-gnu-gcc - coredump (qemu mmap-troubles) # sparc64-linux-gnu-gcc - fails mmap/BAD_ADDRESS (previously: qemu-coredump sice mmap-troubles, qemu fails fcntl for F_SETLK/F_GETLK)
# sparc64-linux-gnu-gcc - coredump (qemu mmap-troubles, previously: qemu fails fcntl for F_SETLK/F_GETLK) # alpha-linux-gnu-gcc - qemu-coredump (qemu mmap-troubles)
# alpha-linux-gnu-gcc - coredump (qemu mmap-troubles) # powerpc-linux-gnu-gcc - fails mmap/BAD_ADDRESS (previously: qemu-coredump sice mmap-troubles, qemu fails fcntl for F_SETLK/F_GETLK)
# risc64-linux-gnu-gcc - coredump (qemu qemu fails fcntl for F_SETLK/F_GETLK) CROSS_LIST_NOQEMU = sparc64-linux-gnu-gcc alpha-linux-gnu-gcc powerpc-linux-gnu-gcc
CROSS_LIST_NOQEMU = sh4-linux-gnu-gcc sparc64-linux-gnu-gcc alpha-linux-gnu-gcc riscv64-linux-gnu-gcc
cross-gcc: cross-gcc:
@echo ' Re-building by cross-compiler for: $(CROSS_LIST_NOQEMU) $(CROSS_LIST)' @echo ' Re-building by cross-compiler for: $(CROSS_LIST_NOQEMU) $(CROSS_LIST)'
@ -841,7 +859,7 @@ cross-qemu:
$(QUIET)for CC in $(CROSS_LIST); do \ $(QUIET)for CC in $(CROSS_LIST); do \
echo "===================== $$CC + qemu"; \ echo "===================== $$CC + qemu"; \
$(MAKE) IOARENA=false CXXSTD= clean && \ $(MAKE) IOARENA=false CXXSTD= clean && \
CC=$$CC CXX=$$(echo $$CC | $(SED) 's/-gcc/-g++/') EXE_LDFLAGS=-static MDBX_BUILD_OPTIONS="-DMDBX_SAFE4QEMU $(MDBX_BUILD_OPTIONS)" \ CC=$$CC CXX=$$(echo $$CC | $(SED) 's/-gcc/-g++/') EXE_LDFLAGS=-static MDBX_BUILD_OPTIONS="-DMDBX_LOCKING=5 -DMDBX_SAFE4QEMU $(MDBX_BUILD_OPTIONS)" \
$(MAKE) IOARENA=false smoke-singleprocess test-singleprocess || exit $$?; \ $(MAKE) IOARENA=false smoke-singleprocess test-singleprocess || exit $$?; \
done done

View File

@ -1,20 +1,7 @@
<!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences --> <!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences -->
> Please refer to the online [documentation](https://libmdbx.dqdkfa.ru)
> with [`C` API description](https://libmdbx.dqdkfa.ru/group__c__api.html)
> and pay attention to the [`C++` API](https://gitflic.ru/project/erthink/libmdbx/blob?file=mdbx.h%2B%2B#line-num-1).
> Questions, feedback and suggestions are welcome to the [Telegram' group](https://t.me/libmdbx) (archive [1](https://libmdbx.dqdkfa.ru/tg-archive/messages1.html),
> [2](https://libmdbx.dqdkfa.ru/tg-archive/messages2.html), [3](https://libmdbx.dqdkfa.ru/tg-archive/messages3.html), [4](https://libmdbx.dqdkfa.ru/tg-archive/messages4.html),
> [5](https://libmdbx.dqdkfa.ru/tg-archive/messages5.html), [6](https://libmdbx.dqdkfa.ru/tg-archive/messages6.html), [7](https://libmdbx.dqdkfa.ru/tg-archive/messages7.html)).
> See the [ChangeLog](https://gitflic.ru/project/erthink/libmdbx/blob?file=ChangeLog.md) for `NEWS` and latest updates.
> Donations are welcome to the Ethereum/ERC-20 `0xD104d8f8B2dC312aaD74899F83EBf3EEBDC1EA3A`.
> Всё будет хорошо!
libmdbx libmdbx
======== =======
<!-- section-begin overview --> <!-- section-begin overview -->
@ -39,32 +26,44 @@ tree](https://en.wikipedia.org/wiki/B%2B_tree).
[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging), but that might [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging), but that might
be a caveat for write-intensive workloads with durability requirements. be a caveat for write-intensive workloads with durability requirements.
4. **Compact and friendly for fully embedding**. Only ≈25KLOC of `C11`, 4. Enforces [serializability](https://en.wikipedia.org/wiki/Serializability) for
≈64K x86 binary code of core, no internal threads neither server process(es),
but implements a simplified variant of the [Berkeley
DB](https://en.wikipedia.org/wiki/Berkeley_DB) and
[dbm](https://en.wikipedia.org/wiki/DBM_(computing)) API.
5. Enforces [serializability](https://en.wikipedia.org/wiki/Serializability) for
writers just by single writers just by single
[mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords
[wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom)
for parallel readers without atomic/interlocked operations, while for parallel readers without atomic/interlocked operations, while
**writing and reading transactions do not block each other**. **writing and reading transactions do not block each other**.
6. **Guarantee data integrity** after crash unless this was explicitly 5. **Guarantee data integrity** after crash unless this was explicitly
neglected in favour of write performance. neglected in favour of write performance.
7. Supports Linux, Windows, MacOS, Android, iOS, FreeBSD, DragonFly, Solaris, 6. Supports Linux, Windows, MacOS, Android, iOS, FreeBSD, DragonFly, Solaris,
OpenSolaris, OpenIndiana, NetBSD, OpenBSD and other systems compliant with OpenSolaris, OpenIndiana, NetBSD, OpenBSD and other systems compliant with
**POSIX.1-2008**. **POSIX.1-2008**.
7. **Compact and friendly for fully embedding**. Only ≈25KLOC of `C11`,
≈64K x86 binary code of core, no internal threads neither server process(es),
but implements a simplified variant of the [Berkeley
DB](https://en.wikipedia.org/wiki/Berkeley_DB) and
[dbm](https://en.wikipedia.org/wiki/DBM_(computing)) API.
<!-- section-end --> <!-- section-end -->
Historically, _libmdbx_ is a deeply revised and extended descendant of the amazing Historically, _libmdbx_ is a deeply revised and extended descendant of the legendary
[Lightning Memory-Mapped Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). [Lightning Memory-Mapped Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database).
_libmdbx_ inherits all benefits from _LMDB_, but resolves some issues and adds [a set of improvements](#improvements-beyond-lmdb). _libmdbx_ inherits all benefits from _LMDB_, but resolves some issues and adds [a set of improvements](#improvements-beyond-lmdb).
[![Telergam: Support | Discussions | News](https://img.shields.io/endpoint?color=scarlet&logo=telegram&label=Support%20%7C%20Discussions%20%7C%20News&url=https%3A%2F%2Ftg.sumanjay.workers.dev%2Flibmdbx)](https://t.me/libmdbx)
> Please refer to the online [official libmdbx documentation site](https://libmdbx.dqdkfa.ru)
> with [`C` API description](https://libmdbx.dqdkfa.ru/group__c__api.html)
> and pay attention to the [`C++` API](https://gitflic.ru/project/erthink/libmdbx/blob?file=mdbx.h%2B%2B#line-num-1).
> Donations are welcome to the Ethereum/ERC-20 `0xD104d8f8B2dC312aaD74899F83EBf3EEBDC1EA3A`.
> Всё будет хорошо!
Telegram Group archive: [1](https://libmdbx.dqdkfa.ru/tg-archive/messages1.html),
[2](https://libmdbx.dqdkfa.ru/tg-archive/messages2.html), [3](https://libmdbx.dqdkfa.ru/tg-archive/messages3.html), [4](https://libmdbx.dqdkfa.ru/tg-archive/messages4.html),
[5](https://libmdbx.dqdkfa.ru/tg-archive/messages5.html), [6](https://libmdbx.dqdkfa.ru/tg-archive/messages6.html), [7](https://libmdbx.dqdkfa.ru/tg-archive/messages7.html).
## Github ## Github
### на Русском (мой родной язык) ### на Русском (мой родной язык)
@ -126,8 +125,7 @@ of the database. All fundamental architectural problems of libmdbx/LMDB
have been solved there, but now the active development has been have been solved there, but now the active development has been
suspended for top-three reasons: suspended for top-three reasons:
1. For now _libmdbx_ «mostly» enough for all [our products](https://www.ptsecurity.com/ww-en/products/), 1. For now _libmdbx_ mostly enough and Im busy for scalability.
and Im busy in development of replication for scalability.
2. Waiting for fresh [Elbrus CPU](https://wiki.elbrus.ru/) of [e2k architecture](https://en.wikipedia.org/wiki/Elbrus_2000), 2. Waiting for fresh [Elbrus CPU](https://wiki.elbrus.ru/) of [e2k architecture](https://en.wikipedia.org/wiki/Elbrus_2000),
especially with hardware acceleration of [Streebog](https://en.wikipedia.org/wiki/Streebog) and especially with hardware acceleration of [Streebog](https://en.wikipedia.org/wiki/Streebog) and
[Kuznyechik](https://en.wikipedia.org/wiki/Kuznyechik), which are required for Merkle tree, etc. [Kuznyechik](https://en.wikipedia.org/wiki/Kuznyechik), which are required for Merkle tree, etc.
@ -459,6 +457,12 @@ Currently, libmdbx is only available in a
Packages support for common Linux distributions is planned in the future, Packages support for common Linux distributions is planned in the future,
since release the version 1.0. since release the version 1.0.
The source code is availale on
[Gitflic](https://gitflic.ru/project/erthink/libmdbx) and mirrors on [abf.io](https://abf.io/erthink/libmdbx),
[hub.mos.ru](https://hub.mos.ru/leo/libmdbx) and [Github](https://github.com/erthink/libmdbx).
Please use the `stable` branch or the latest release for production environment through stagging
and the `master` branch for development a derivative projects.
## Source code embedding ## Source code embedding
_libmdbx_ provides three official ways for integration in source code form: _libmdbx_ provides three official ways for integration in source code form:
@ -556,9 +560,9 @@ Of course, in addition to this, your toolchain must ensure the reproducibility o
For more information please refer to [reproducible-builds.org](https://reproducible-builds.org/). For more information please refer to [reproducible-builds.org](https://reproducible-builds.org/).
#### Containers #### Containers
There are no special traits nor quirks if you use libmdbx ONLY inside the single container. There are no special traits nor quirks if you use _libmdbx_ ONLY inside
But in a cross-container cases or with a host-container(s) mix the two major things MUST be the single container. But in a cross-container(s) or with a host-container(s)
guaranteed: interoperability cases the three major things MUST be guaranteed:
1. Coherence of memory mapping content and unified page cache inside OS 1. Coherence of memory mapping content and unified page cache inside OS
kernel for host and all container(s) operated with a DB. Basically this kernel for host and all container(s) operated with a DB. Basically this
@ -574,6 +578,12 @@ in the system memory.
including `ERROR_ACCESS_DENIED`, including `ERROR_ACCESS_DENIED`,
but not the `ERROR_INVALID_PARAMETER` as for an invalid/non-existent PID. but not the `ERROR_INVALID_PARAMETER` as for an invalid/non-existent PID.
3. The versions/builds of _libmdbx_ and `libc`/`pthreads` (`glibc`, `musl`, etc) must be be compatible.
- Basically, the `options:` string in the output of `mdbx_chk -V` must be the same for host and container(s).
See `MDBX_LOCKING`, `MDBX_USE_OFDLOCKS` and other build options for details.
- Avoid using different versions of `libc`, especially mixing different implementations, i.e. `glibc` with `musl`, etc.
Prefer to use the same LTS version, or switch to full virtualization/isolation if in doubt.
#### DSO/DLL unloading and destructors of Thread-Local-Storage objects #### DSO/DLL unloading and destructors of Thread-Local-Storage objects
When building _libmdbx_ as a shared library or use static _libmdbx_ as a When building _libmdbx_ as a shared library or use static _libmdbx_ as a
part of another dynamic library, it is advisable to make sure that your part of another dynamic library, it is advisable to make sure that your

21
TODO.md
View File

@ -1,16 +1,15 @@
TODO TODO
---- ----
Unfortunately, on 2022-04-15 the Github administration, without any - [SWIG](https://www.swig.org/).
warning nor explanation, deleted _libmdbx_ along with a lot of other - Параллельная lto-сборка с устранением предупреждений.
projects, simultaneously blocking access for many developers. Therefore - Интеграция c DTrace и аналогами.
on 2022-04-21 we have migrated to a reliable trusted infrastructure. - Новый стиль обработки ошибок с записью "трассы" и причин.
The origin for now is at[GitFlic](https://gitflic.ru/project/erthink/libmdbx) - Формирование отладочной информации посредством gdb.
with backup at [ABF by ROSA Лаб](https://abf.rosalinux.ru/erthink/libmdbx). - Поддержка WASM.
For the same reason ~~Github~~ is blacklisted forever. - Явная и автоматические уплотнение/дефрагментация.
- Нелинейная обработка GC.
So currently most of the links are broken due to noted malicious ~~Github~~ sabotage. - Перевести курсоры на двусвязный список вместо односвязного.
- Внутри `txn_renew()` вынести проверку когерентности mmap за/после изменение размера. - Внутри `txn_renew()` вынести проверку когерентности mmap за/после изменение размера.
- [Migration guide from LMDB to MDBX](https://libmdbx.dqdkfa.ru/dead-github/issues/199). - [Migration guide from LMDB to MDBX](https://libmdbx.dqdkfa.ru/dead-github/issues/199).
- [Support for RAW devices](https://libmdbx.dqdkfa.ru/dead-github/issues/124). - [Support for RAW devices](https://libmdbx.dqdkfa.ru/dead-github/issues/124).
@ -20,6 +19,8 @@ So currently most of the links are broken due to noted malicious ~~Github~~ sabo
Done Done
---- ----
- Ранняя/не-отложенная очистка GC.
- Рефакторинг gc-get/gc-put c переходом на "интервальные" списки.
- [Engage new terminology](https://libmdbx.dqdkfa.ru/dead-github/issues/137). - [Engage new terminology](https://libmdbx.dqdkfa.ru/dead-github/issues/137).
- [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200). - [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200).
- [Move most of `mdbx_chk` functional to the library API](https://libmdbx.dqdkfa.ru/dead-github/issues/204). - [Move most of `mdbx_chk` functional to the library API](https://libmdbx.dqdkfa.ru/dead-github/issues/204).

File diff suppressed because it is too large Load Diff

View File

@ -54,7 +54,7 @@ cleans readers, as an a process aborting (especially with core dump) can
take a long time, and checking readers cannot be performed too often due take a long time, and checking readers cannot be performed too often due
to performance degradation. to performance degradation.
This issue will be addressed in MithrlDB and one of libmdbx releases, This issue will be addressed in MithrilDB and one of libmdbx releases,
presumably in 2025. To do this, nonlinear GC recycling will be presumably in 2025. To do this, nonlinear GC recycling will be
implemented, without stopping garbage recycling on the old MVCC snapshot implemented, without stopping garbage recycling on the old MVCC snapshot
used by a long read transaction. used by a long read transaction.
@ -92,7 +92,7 @@ free consecutive/adjacent pages through GC has been significantly
speeded, including acceleration using NOEN/SSE2/AVX2/AVX512 speeded, including acceleration using NOEN/SSE2/AVX2/AVX512
instructions. instructions.
This issue will be addressed in MithrlDB and refined within one of This issue will be addressed in MithrilDB and refined within one of
0.15.x libmdbx releases, presumably at end of 2025. 0.15.x libmdbx releases, presumably at end of 2025.

View File

@ -1,8 +1,12 @@
The source code is availale on [Gitflic](https://gitflic.ru/project/erthink/libmdbx). The source code is availale on [Gitflic](https://gitflic.ru/project/erthink/libmdbx) and mirrors on [abf.io](https://abf.io/erthink/libmdbx), [hub.mos.ru](https://hub.mos.ru/leo/libmdbx) and [Github](https://github.com/erthink/libmdbx).
Donations are welcome to ETH `0xD104d8f8B2dC312aaD74899F83EBf3EEBDC1EA3A`. Donations are welcome to ETH `0xD104d8f8B2dC312aaD74899F83EBf3EEBDC1EA3A`.
Please use the `stable` branch or the latest release for production environment through stagging, but the `master` branch for development a derivative projects.
Всё будет хорошо! Всё будет хорошо!
> Questions, feedback and suggestions are welcome to the [Telegram' group](https://t.me/libmdbx). > Questions, feedback and suggestions are welcome to the [Telegram' group](https://t.me/libmdbx) (archive [1](https://libmdbx.dqdkfa.ru/tg-archive/messages1.html),
> [2](https://libmdbx.dqdkfa.ru/tg-archive/messages2.html), [3](https://libmdbx.dqdkfa.ru/tg-archive/messages3.html), [4](https://libmdbx.dqdkfa.ru/tg-archive/messages4.html),
> [5](https://libmdbx.dqdkfa.ru/tg-archive/messages5.html), [6](https://libmdbx.dqdkfa.ru/tg-archive/messages6.html), [7](https://libmdbx.dqdkfa.ru/tg-archive/messages7.html)).
> See the [ChangeLog](https://gitflic.ru/project/erthink/libmdbx/blob?file=ChangeLog.md) for `NEWS` and latest updates.
\section toc Table of Contents \section toc Table of Contents

BIN
docs/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 KiB

View File

@ -1,10 +1,17 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$langISO"> <html xmlns="http://www.w3.org/1999/xhtml" lang="$langISO">
<head> <head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=11"/> <meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen $doxygenversion"/> <meta name="generator" content="Doxygen $doxygenversion"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/> <meta name="viewport" content="width=device-width, initial-scale=1"/>
<link rel="icon" href="favicon.ico">
<link rel="icon" href="img/bear.png" type="image/png">
<link rel="apple-touch-icon" href="img/bear.png">
<meta property="og:type" content="article"/>
<meta property="og:url" content="https://libmdbx.dqdkfa.ru/"/>
<meta name="twitter:title" content="One of the fastest embeddable key-value engine"/>
<meta name="twitter:description" content="MDBX surpasses the legendary LMDB in terms of reliability, features and performance. For now libmdbx is chosen by all modern Ethereum frontiers as a storage engine."/>
<!--BEGIN PROJECT_NAME--><title>$projectname: $title</title><!--END PROJECT_NAME--> <!--BEGIN PROJECT_NAME--><title>$projectname: $title</title><!--END PROJECT_NAME-->
<!--BEGIN !PROJECT_NAME--><title>$title</title><!--END !PROJECT_NAME--> <!--BEGIN !PROJECT_NAME--><title>$title</title><!--END !PROJECT_NAME-->
<!--BEGIN PROJECT_ICON--> <!--BEGIN PROJECT_ICON-->

27
docs/ld+json Normal file
View File

@ -0,0 +1,27 @@
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "ItemList",
"itemListElement": [{
"@type": "ListItem",
"position": 1,
"name": "Группа в Telegram",
"url": "https://t.me/libmdbx"
},{
"@type": "ListItem",
"position": 2,
"name": "Исходный код",
"url": "https://gitflic.ru/project/erthink/libmdbx"
},{
"@type": "ListItem",
"position": 3,
"name": "C++ API",
"url": "https://libmdbx.dqdkfa.ru/group__cxx__api.html"
},{
"@type": "ListItem",
"position": 4,
"name": "Mirror on Github",
"url": "https://github.com/erthink/libmdbx"
}]
}
</script>

View File

@ -0,0 +1,6 @@
{
"icons": [
{ "src": "favicon.ico", "type": "image/ico", "sizes": "32x32" },
{ "src": "img/bear.png", "type": "image/png", "sizes": "256x256" }
]
}

2
docs/title Normal file
View File

@ -0,0 +1,2 @@
<title>libmdbx: One of the fastest embeddable key-value engine</title>
<meta name="description" content="libmdbx surpasses the legendary LMDB in terms of reliability, features and performance. For now libmdbx is chosen by all modern Ethereum frontiers as a storage engine.">

36
mdbx.h
View File

@ -581,9 +581,10 @@ typedef mode_t mdbx_mode_t;
extern "C" { extern "C" {
#endif #endif
/* MDBX version 0.13.x */ /* MDBX version 0.14.x, but it is unstable/under-development yet. */
#define MDBX_VERSION_UNSTABLE
#define MDBX_VERSION_MAJOR 0 #define MDBX_VERSION_MAJOR 0
#define MDBX_VERSION_MINOR 13 #define MDBX_VERSION_MINOR 14
#ifndef LIBMDBX_API #ifndef LIBMDBX_API
#if defined(LIBMDBX_EXPORTS) || defined(DOXYGEN) #if defined(LIBMDBX_EXPORTS) || defined(DOXYGEN)
@ -1665,7 +1666,7 @@ DEFINE_ENUM_FLAG_OPERATORS(MDBX_put_flags)
/** \brief Environment copy flags /** \brief Environment copy flags
* \ingroup c_extra * \ingroup c_extra
* \see mdbx_env_copy() \see mdbx_env_copy2fd() */ * \see mdbx_env_copy() \see mdbx_env_copy2fd() \see mdbx_txn_copy2pathname() */
typedef enum MDBX_copy_flags { typedef enum MDBX_copy_flags {
MDBX_CP_DEFAULTS = 0, MDBX_CP_DEFAULTS = 0,
@ -1690,7 +1691,11 @@ typedef enum MDBX_copy_flags {
/** Enable renew/restart read transaction in case it use outdated /** Enable renew/restart read transaction in case it use outdated
* MVCC shapshot, otherwise the \ref MDBX_MVCC_RETARDED will be returned * MVCC shapshot, otherwise the \ref MDBX_MVCC_RETARDED will be returned
* \see mdbx_txn_copy2fd() \see mdbx_txn_copy2pathname() */ * \see mdbx_txn_copy2fd() \see mdbx_txn_copy2pathname() */
MDBX_CP_RENEW_TXN = 32u MDBX_CP_RENEW_TXN = 32u,
/** Silently overwrite the target file, if it exists, instead of returning an error
* \see mdbx_txn_copy2pathname() \see mdbx_env_copy() */
MDBX_CP_OVERWRITE = 64u
} MDBX_copy_flags_t; } MDBX_copy_flags_t;
DEFINE_ENUM_FLAG_OPERATORS(MDBX_copy_flags) DEFINE_ENUM_FLAG_OPERATORS(MDBX_copy_flags)
@ -1992,7 +1997,7 @@ typedef enum MDBX_error {
MDBX_EREMOTE = ERROR_REMOTE_STORAGE_MEDIA_ERROR, MDBX_EREMOTE = ERROR_REMOTE_STORAGE_MEDIA_ERROR,
MDBX_EDEADLK = ERROR_POSSIBLE_DEADLOCK MDBX_EDEADLK = ERROR_POSSIBLE_DEADLOCK
#else /* Windows */ #else /* Windows */
#ifdef ENODATA #if defined(ENODATA) || defined(DOXYGEN)
MDBX_ENODATA = ENODATA, MDBX_ENODATA = ENODATA,
#else #else
MDBX_ENODATA = 9919 /* for compatibility with LLVM's C++ libraries/headers */, MDBX_ENODATA = 9919 /* for compatibility with LLVM's C++ libraries/headers */,
@ -2001,7 +2006,11 @@ typedef enum MDBX_error {
MDBX_EACCESS = EACCES, MDBX_EACCESS = EACCES,
MDBX_ENOMEM = ENOMEM, MDBX_ENOMEM = ENOMEM,
MDBX_EROFS = EROFS, MDBX_EROFS = EROFS,
#if defined(ENOTSUP) || defined(DOXYGEN)
MDBX_ENOSYS = ENOTSUP,
#else
MDBX_ENOSYS = ENOSYS, MDBX_ENOSYS = ENOSYS,
#endif /* ENOTSUP */
MDBX_EIO = EIO, MDBX_EIO = EIO,
MDBX_EPERM = EPERM, MDBX_EPERM = EPERM,
MDBX_EINTR = EINTR, MDBX_EINTR = EINTR,
@ -2774,10 +2783,10 @@ typedef struct MDBX_stat MDBX_stat;
* Legacy mdbx_env_stat() correspond to calling \ref mdbx_env_stat_ex() with the * Legacy mdbx_env_stat() correspond to calling \ref mdbx_env_stat_ex() with the
* null `txn` argument. * null `txn` argument.
* *
* \param [in] env An environment handle returned by \ref mdbx_env_create() * \param [in] env An environment handle returned by \ref mdbx_env_create().
* \param [in] txn A transaction handle returned by \ref mdbx_txn_begin() * \param [in] txn A transaction handle returned by \ref mdbx_txn_begin().
* \param [out] stat The address of an \ref MDBX_stat structure where * \param [out] stat The address of an \ref MDBX_stat structure where
* the statistics will be copied * the statistics will be copied.
* \param [in] bytes The size of \ref MDBX_stat. * \param [in] bytes The size of \ref MDBX_stat.
* *
* \returns A non-zero error value on failure and 0 on success. */ * \returns A non-zero error value on failure and 0 on success. */
@ -4196,7 +4205,10 @@ LIBMDBX_API int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency);
* \returns A non-zero error value on failure and 0 on success, * \returns A non-zero error value on failure and 0 on success,
* some possible errors are: * some possible errors are:
* \retval MDBX_RESULT_TRUE Transaction was aborted since it should * \retval MDBX_RESULT_TRUE Transaction was aborted since it should
* be aborted due to previous errors. * be aborted due to previous errors,
* either no changes were made during the transaction,
* and the build time option
* \ref MDBX_NOSUCCESS_PURE_COMMIT was enabled.
* \retval MDBX_PANIC A fatal error occurred earlier * \retval MDBX_PANIC A fatal error occurred earlier
* and the environment must be shut down. * and the environment must be shut down.
* \retval MDBX_BAD_TXN Transaction is already finished or never began. * \retval MDBX_BAD_TXN Transaction is already finished or never began.
@ -6538,6 +6550,12 @@ typedef struct MDBX_chk_table {
struct MDBX_chk_histogram key_len; struct MDBX_chk_histogram key_len;
/// Values length histogram /// Values length histogram
struct MDBX_chk_histogram val_len; struct MDBX_chk_histogram val_len;
/// Number of multi-values (aka duplicates) histogram
struct MDBX_chk_histogram multival;
/// Histogram of branch and leaf pages filling in percents
struct MDBX_chk_histogram tree_filling;
/// Histogram of nested tree(s) branch and leaf pages filling in percents
struct MDBX_chk_histogram nested_tree_filling;
} histogram; } histogram;
} MDBX_chk_table_t; } MDBX_chk_table_t;

View File

@ -1766,8 +1766,8 @@ private:
silo() noexcept : allocator_type() { init(0); } silo() noexcept : allocator_type() { init(0); }
MDBX_CXX20_CONSTEXPR MDBX_CXX20_CONSTEXPR
silo(const allocator_type &alloc) noexcept : allocator_type(alloc) { init(0); } silo(const allocator_type &alloc) noexcept : allocator_type(alloc) { init(0); }
MDBX_CXX20_CONSTEXPR silo(size_t capacity) { init(capacity); } MDBX_CXX20_CONSTEXPR silo(size_t capacity) : allocator_type() { init(capacity); }
MDBX_CXX20_CONSTEXPR silo(size_t capacity, const allocator_type &alloc) : silo(alloc) { init(capacity); } MDBX_CXX20_CONSTEXPR silo(size_t capacity, const allocator_type &alloc) : allocator_type(alloc) { init(capacity); }
MDBX_CXX20_CONSTEXPR silo(silo &&ditto) noexcept(::std::is_nothrow_move_constructible<allocator_type>::value) MDBX_CXX20_CONSTEXPR silo(silo &&ditto) noexcept(::std::is_nothrow_move_constructible<allocator_type>::value)
: allocator_type(::std::move(ditto.get_allocator())), bin_(::std::move(ditto.bin_)) {} : allocator_type(::std::move(ditto.get_allocator())), bin_(::std::move(ditto.bin_)) {}
@ -1778,7 +1778,6 @@ private:
put(headroom, ptr, length); put(headroom, ptr, length);
} }
// select_on_container_copy_construction()
MDBX_CXX20_CONSTEXPR silo(size_t capacity, size_t headroom, const void *ptr, size_t length, MDBX_CXX20_CONSTEXPR silo(size_t capacity, size_t headroom, const void *ptr, size_t length,
const allocator_type &alloc) const allocator_type &alloc)
: silo(capacity, alloc) { : silo(capacity, alloc) {

View File

@ -1,7 +1,7 @@
From 349c08cf21b66ecea851340133a1b845c25675f7 Mon Sep 17 00:00:00 2001 From f2f1f6e76c1538d044b552d9e7ecedc3433e6cd9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=9B=D0=B5=D0=BE=D0=BD=D0=B8=D0=B4=20=D0=AE=D1=80=D1=8C?= From: =?UTF-8?q?=D0=9B=D0=B5=D0=BE=D0=BD=D0=B8=D0=B4=20=D0=AE=D1=80=D1=8C?=
=?UTF-8?q?=D0=B5=D0=B2=20=28Leonid=20Yuriev=29?= <leo@yuriev.ru> =?UTF-8?q?=D0=B5=D0=B2=20=28Leonid=20Yuriev=29?= <leo@yuriev.ru>
Date: Tue, 22 Apr 2025 14:38:49 +0300 Date: Sun, 3 Aug 2025 23:59:11 +0300
Subject: [PATCH] package/libmdbx: new package (library/database). Subject: [PATCH] package/libmdbx: new package (library/database).
MIME-Version: 1.0 MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8 Content-Type: text/plain; charset=UTF-8
@ -15,7 +15,7 @@ This patch adds libmdbx:
in terms of reliability, features and performance. in terms of reliability, features and performance.
- more information at https://libmdbx.dqdkfa.ru - more information at https://libmdbx.dqdkfa.ru
The 0.13.6 "Бузина" (Elderberry) is stable release of _libmdbx_ branch with new superior features. The 0.13.7 "Дружба" (Friendship) is stable release of _libmdbx_ branch with new superior features.
The complete ChangeLog: https://gitflic.ru/project/erthink/libmdbx/blob?file=ChangeLog.md The complete ChangeLog: https://gitflic.ru/project/erthink/libmdbx/blob?file=ChangeLog.md
@ -25,8 +25,8 @@ Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
package/Config.in | 1 + package/Config.in | 1 +
package/libmdbx/Config.in | 45 ++++++++++++++++++++++++++++++++++++ package/libmdbx/Config.in | 45 ++++++++++++++++++++++++++++++++++++
package/libmdbx/libmdbx.hash | 6 +++++ package/libmdbx/libmdbx.hash | 6 +++++
package/libmdbx/libmdbx.mk | 42 +++++++++++++++++++++++++++++++++ package/libmdbx/libmdbx.mk | 41 ++++++++++++++++++++++++++++++++
5 files changed, 97 insertions(+) 5 files changed, 96 insertions(+)
create mode 100644 package/libmdbx/Config.in create mode 100644 package/libmdbx/Config.in
create mode 100644 package/libmdbx/libmdbx.hash create mode 100644 package/libmdbx/libmdbx.hash
create mode 100644 package/libmdbx/libmdbx.mk create mode 100644 package/libmdbx/libmdbx.mk
@ -110,35 +110,34 @@ index 0000000000..a9a4ac45c5
+ !BR2_TOOLCHAIN_GCC_AT_LEAST_4_4 + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_4
diff --git a/package/libmdbx/libmdbx.hash b/package/libmdbx/libmdbx.hash diff --git a/package/libmdbx/libmdbx.hash b/package/libmdbx/libmdbx.hash
new file mode 100644 new file mode 100644
index 0000000000..ae5266716b index 0000000000..8c7efb184b
--- /dev/null --- /dev/null
+++ b/package/libmdbx/libmdbx.hash +++ b/package/libmdbx/libmdbx.hash
@@ -0,0 +1,6 @@ @@ -0,0 +1,6 @@
+# Hashes from: https://libmdbx.dqdkfa.ru/release/SHA256SUMS +# Hashes from: https://libmdbx.dqdkfa.ru/release/SHA256SUMS
+sha256 57db987de6f7ccc66a66ae28a7bda9f9fbb48ac5fb9279bcca92fd5de13075d1 libmdbx-amalgamated-0.13.6.tar.xz +sha256 d00c1287ec6bbc366363ccdd3eea97bd470ccb5cc102d56b341f84a9fba7e8e9 libmdbx-amalgamated-0.13.7.tar.xz
+ +
+# Locally calculated +# Locally calculated
+sha256 0d542e0c8804e39aa7f37eb00da5a762149dc682d7829451287e11b938e94594 LICENSE +sha256 0d542e0c8804e39aa7f37eb00da5a762149dc682d7829451287e11b938e94594 LICENSE
+sha256 651f71b46c6bb0046d2122df7f9def9cb24f4dc28c5b11cef059f66565cda30f NOTICE +sha256 651f71b46c6bb0046d2122df7f9def9cb24f4dc28c5b11cef059f66565cda30f NOTICE
diff --git a/package/libmdbx/libmdbx.mk b/package/libmdbx/libmdbx.mk diff --git a/package/libmdbx/libmdbx.mk b/package/libmdbx/libmdbx.mk
new file mode 100644 new file mode 100644
index 0000000000..571757262e index 0000000000..bbb37f21a6
--- /dev/null --- /dev/null
+++ b/package/libmdbx/libmdbx.mk +++ b/package/libmdbx/libmdbx.mk
@@ -0,0 +1,42 @@ @@ -0,0 +1,41 @@
+################################################################################ +################################################################################
+# +#
+# libmdbx +# libmdbx
+# +#
+################################################################################ +################################################################################
+ +
+LIBMDBX_VERSION = 0.13.6 +LIBMDBX_VERSION = 0.13.7
+LIBMDBX_SOURCE = libmdbx-amalgamated-$(LIBMDBX_VERSION).tar.xz +LIBMDBX_SOURCE = libmdbx-amalgamated-$(LIBMDBX_VERSION).tar.xz
+LIBMDBX_SITE = https://libmdbx.dqdkfa.ru/release +LIBMDBX_SITE = https://libmdbx.dqdkfa.ru/release
+LIBMDBX_SUPPORTS_IN_SOURCE_BUILD = NO +LIBMDBX_SUPPORTS_IN_SOURCE_BUILD = NO
+LIBMDBX_LICENSE = Apache-2.0 +LIBMDBX_LICENSE = Apache-2.0
+LIBMDBX_LICENSE_FILES = LICENSE NOTICE +LIBMDBX_LICENSE_FILES = LICENSE NOTICE
+LIBMDBX_REDISTRIBUTE = YES
+LIBMDBX_STRIP_COMPONENTS = 0 +LIBMDBX_STRIP_COMPONENTS = 0
+LIBMDBX_INSTALL_STAGING = YES +LIBMDBX_INSTALL_STAGING = YES
+ +
@ -169,5 +168,5 @@ index 0000000000..571757262e
+ +
+$(eval $(cmake-package)) +$(eval $(cmake-package))
-- --
2.49.0 2.50.1

View File

@ -41,12 +41,16 @@
#include "page-ops.c" #include "page-ops.c"
#include "pnl.c" #include "pnl.c"
#include "refund.c" #include "refund.c"
#include "rkl.c"
#include "spill.c" #include "spill.c"
#include "table.c" #include "table.c"
#include "tls.c" #include "tls.c"
#include "tree-ops.c" #include "tree-ops.c"
#include "tree-search.c" #include "tree-search.c"
#include "txl.c" #include "txl.c"
#include "txn-basal.c"
#include "txn-nested.c"
#include "txn-ro.c"
#include "txn.c" #include "txn.c"
#include "utils.c" #include "utils.c"
#include "version.c" #include "version.c"

View File

@ -394,7 +394,7 @@ __cold static int copy_with_compacting(MDBX_env *env, MDBX_txn *txn, mdbx_fileha
ERROR("%s/%d: %s", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid GC-record content"); ERROR("%s/%d: %s", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid GC-record content");
return MDBX_CORRUPTED; return MDBX_CORRUPTED;
} }
gc_npages += MDBX_PNL_GETSIZE(pnl); gc_npages += pnl_size(pnl);
rc = outer_next(&couple.outer, &key, &data, MDBX_NEXT); rc = outer_next(&couple.outer, &key, &data, MDBX_NEXT);
} }
if (unlikely(rc != MDBX_NOTFOUND)) if (unlikely(rc != MDBX_NOTFOUND))
@ -603,7 +603,7 @@ retry_snap_meta:
continue; continue;
} }
rc = MDBX_ENODATA; rc = MDBX_ENODATA;
if (written == 0 || ignore_enosys(rc = errno) != MDBX_RESULT_TRUE) if (written == 0 || ignore_enosys_and_eagain(rc = errno) != MDBX_RESULT_TRUE)
break; break;
sendfile_unavailable = true; sendfile_unavailable = true;
} }
@ -627,7 +627,7 @@ retry_snap_meta:
maybe useful for others FS */ maybe useful for others FS */
EINVAL) EINVAL)
not_the_same_filesystem = true; not_the_same_filesystem = true;
else if (ignore_enosys(rc) == MDBX_RESULT_TRUE) else if (ignore_enosys_and_eagain(rc) == MDBX_RESULT_TRUE)
copyfilerange_unavailable = true; copyfilerange_unavailable = true;
else else
break; break;
@ -748,42 +748,75 @@ __cold static int copy2pathname(MDBX_txn *txn, const pathchar_t *dest_path, MDBX
* We don't want the OS to cache the writes, since the source data is * We don't want the OS to cache the writes, since the source data is
* already in the OS cache. */ * already in the OS cache. */
mdbx_filehandle_t newfd = INVALID_HANDLE_VALUE; mdbx_filehandle_t newfd = INVALID_HANDLE_VALUE;
int rc = osal_openfile(MDBX_OPEN_COPY, txn->env, dest_path, &newfd, int rc = osal_openfile((flags & MDBX_CP_OVERWRITE) ? MDBX_OPEN_COPY_OVERWRITE : MDBX_OPEN_COPY_EXCL, txn->env,
dest_path, &newfd,
#if defined(_WIN32) || defined(_WIN64) #if defined(_WIN32) || defined(_WIN64)
(mdbx_mode_t)-1 (mdbx_mode_t)-1
#else #else
S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP
#endif #endif
); );
if (unlikely(rc != MDBX_SUCCESS))
return rc;
#if defined(_WIN32) || defined(_WIN64) #if defined(_WIN32) || defined(_WIN64)
/* no locking required since the file opened with ShareMode == 0 */ /* no locking required since the file opened with ShareMode == 0 */
#else #else
if (rc == MDBX_SUCCESS) {
MDBX_STRUCT_FLOCK lock_op; MDBX_STRUCT_FLOCK lock_op;
memset(&lock_op, 0, sizeof(lock_op)); memset(&lock_op, 0, sizeof(lock_op));
lock_op.l_type = F_WRLCK; lock_op.l_type = F_WRLCK;
lock_op.l_whence = SEEK_SET; lock_op.l_whence = SEEK_SET;
lock_op.l_start = 0; lock_op.l_start = 0;
lock_op.l_len = OFF_T_MAX; lock_op.l_len = OFF_T_MAX;
if (MDBX_FCNTL(newfd, MDBX_F_SETLK, &lock_op)) const int err_fcntl = MDBX_FCNTL(newfd, MDBX_F_SETLK, &lock_op) ? errno : MDBX_SUCCESS;
rc = errno;
}
#if defined(LOCK_EX) && (!defined(__ANDROID_API__) || __ANDROID_API__ >= 24) const int err_flock =
if (rc == MDBX_SUCCESS && flock(newfd, LOCK_EX | LOCK_NB)) { #ifdef LOCK_EX
const int err_flock = errno, err_fs = osal_check_fs_local(newfd, 0); flock(newfd, LOCK_EX | LOCK_NB) ? errno : MDBX_SUCCESS;
if (err_flock != EAGAIN || err_fs != MDBX_EREMOTE) { #else
ERROR("%s flock(%" MDBX_PRIsPATH ") error %d, remote-fs check status %d", "unexpected", dest_path, err_flock, MDBX_ENOSYS;
err_fs); #endif /* LOCK_EX */
const int err_check_fs_local =
/* avoid call osal_check_fs_local() on success */
(!err_fcntl && !err_flock && !MDBX_DEBUG) ? MDBX_SUCCESS :
#if !defined(__ANDROID_API__) || __ANDROID_API__ >= 24
osal_check_fs_local(newfd, 0);
#else
MDBX_ENOSYS;
#endif
const bool flock_may_fail =
#if defined(__linux__) || defined(__gnu_linux__)
err_check_fs_local != 0;
#else
true;
#endif /* Linux */
if (!err_fcntl &&
(err_flock == EWOULDBLOCK || err_flock == EAGAIN || ignore_enosys_and_eremote(err_flock) == MDBX_RESULT_TRUE)) {
rc = err_flock; rc = err_flock;
} else { if (flock_may_fail) {
WARNING("%s flock(%" MDBX_PRIsPATH ") error %d, remote-fs check status %d", "ignore", dest_path, err_flock, WARNING("ignore %s(%" MDBX_PRIsPATH ") error %d: since %s done, local/remote-fs check %d", "flock", dest_path,
err_fs); err_flock, "fcntl-lock", err_check_fs_local);
rc = MDBX_SUCCESS;
} }
} else if (!err_flock && err_check_fs_local == MDBX_RESULT_TRUE &&
ignore_enosys_and_eremote(err_fcntl) == MDBX_RESULT_TRUE) {
WARNING("ignore %s(%" MDBX_PRIsPATH ") error %d: since %s done, local/remote-fs check %d", "fcntl-lock", dest_path,
err_fcntl, "flock", err_check_fs_local);
} else if (err_fcntl || err_flock) {
ERROR("file-lock(%" MDBX_PRIsPATH ") failed: fcntl-lock %d, flock %d, local/remote-fs check %d", dest_path,
err_fcntl, err_flock, err_check_fs_local);
if (err_fcntl == ENOLCK || err_flock == ENOLCK)
rc = ENOLCK;
else if (err_fcntl == EWOULDBLOCK || err_flock == EWOULDBLOCK)
rc = EWOULDBLOCK;
else if (EWOULDBLOCK != EAGAIN && (err_fcntl == EAGAIN || err_flock == EAGAIN))
rc = EAGAIN;
else
rc = (err_fcntl && ignore_enosys_and_eremote(err_fcntl) != MDBX_RESULT_TRUE) ? err_fcntl : err_flock;
} }
#endif /* LOCK_EX && ANDROID_API >= 24 */
#endif /* Windows / POSIX */ #endif /* Windows / POSIX */
if (rc == MDBX_SUCCESS) if (rc == MDBX_SUCCESS)

View File

@ -73,6 +73,7 @@ int mdbx_cursor_bind(MDBX_txn *txn, MDBX_cursor *mc, MDBX_dbi dbi) {
mc->next = txn->cursors[dbi]; mc->next = txn->cursors[dbi];
txn->cursors[dbi] = mc; txn->cursors[dbi] = mc;
txn->flags |= txn_may_have_cursors;
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }

View File

@ -143,7 +143,7 @@ int mdbx_dbi_close(MDBX_env *env, MDBX_dbi dbi) {
if (unlikely(dbi < CORE_DBS)) if (unlikely(dbi < CORE_DBS))
return (dbi == MAIN_DBI) ? MDBX_SUCCESS : LOG_IFERR(MDBX_BAD_DBI); return (dbi == MAIN_DBI) ? MDBX_SUCCESS : LOG_IFERR(MDBX_BAD_DBI);
if (unlikely(dbi >= env->max_dbi)) if (unlikely(dbi >= env->n_dbi))
return LOG_IFERR(MDBX_BAD_DBI); return LOG_IFERR(MDBX_BAD_DBI);
rc = osal_fastmutex_acquire(&env->dbi_lock); rc = osal_fastmutex_acquire(&env->dbi_lock);
@ -167,8 +167,8 @@ int mdbx_dbi_close(MDBX_env *env, MDBX_dbi dbi) {
* в basal_txn, а уже после в env->txn. Таким образом, падение может быть * в basal_txn, а уже после в env->txn. Таким образом, падение может быть
* только при коллизии с завершением вложенной транзакции. * только при коллизии с завершением вложенной транзакции.
* *
* Альтернативно можно попробовать выполнять обновление/put записи в * Альтернативно можно попробовать выполнять обновление/put строки в
* mainDb соответствующей таблице закрываемого хендла. Семантически это * MainDB соответствующей таблице закрываемого хендла. Семантически это
* верный путь, но проблема в текущем API, в котором исторически dbi-хендл * верный путь, но проблема в текущем API, в котором исторически dbi-хендл
* живет и закрывается вне транзакции. Причем проблема не только в том, * живет и закрывается вне транзакции. Причем проблема не только в том,
* что нет указателя на текущую пишущую транзакцию, а в том что * что нет указателя на текущую пишущую транзакцию, а в том что

View File

@ -488,39 +488,12 @@ __cold int mdbx_env_openW(MDBX_env *env, const wchar_t *pathname, MDBX_env_flags
} }
if ((flags & MDBX_RDONLY) == 0) { if ((flags & MDBX_RDONLY) == 0) {
MDBX_txn *txn = nullptr; env->basal_txn = txn_basal_create(env->max_dbi);
const intptr_t bitmap_bytes = if (unlikely(!env->basal_txn)) {
#if MDBX_ENABLE_DBI_SPARSE
ceil_powerof2(env->max_dbi, CHAR_BIT * sizeof(txn->dbi_sparse[0])) / CHAR_BIT;
#else
0;
#endif /* MDBX_ENABLE_DBI_SPARSE */
const size_t base = sizeof(MDBX_txn) + sizeof(cursor_couple_t);
const size_t size = base + bitmap_bytes +
env->max_dbi * (sizeof(txn->dbs[0]) + sizeof(txn->cursors[0]) + sizeof(txn->dbi_seqs[0]) +
sizeof(txn->dbi_state[0]));
txn = osal_calloc(1, size);
if (unlikely(!txn)) {
rc = MDBX_ENOMEM;
goto bailout;
}
txn->dbs = ptr_disp(txn, base);
txn->cursors = ptr_disp(txn->dbs, env->max_dbi * sizeof(txn->dbs[0]));
txn->dbi_seqs = ptr_disp(txn->cursors, env->max_dbi * sizeof(txn->cursors[0]));
txn->dbi_state = ptr_disp(txn, size - env->max_dbi * sizeof(txn->dbi_state[0]));
#if MDBX_ENABLE_DBI_SPARSE
txn->dbi_sparse = ptr_disp(txn->dbi_state, -bitmap_bytes);
#endif /* MDBX_ENABLE_DBI_SPARSE */
txn->env = env;
txn->flags = MDBX_TXN_FINISHED;
env->basal_txn = txn;
txn->tw.retired_pages = pnl_alloc(MDBX_PNL_INITIAL);
txn->tw.repnl = pnl_alloc(MDBX_PNL_INITIAL);
if (unlikely(!txn->tw.retired_pages || !txn->tw.repnl)) {
rc = MDBX_ENOMEM; rc = MDBX_ENOMEM;
goto bailout; goto bailout;
} }
env->basal_txn->env = env;
env_options_adjust_defaults(env); env_options_adjust_defaults(env);
} }
@ -716,7 +689,7 @@ static int env_info_snap(const MDBX_env *env, const MDBX_txn *txn, MDBX_envinfo
#endif #endif
} }
*troika = (txn && !(txn->flags & MDBX_TXN_RDONLY)) ? txn->tw.troika : meta_tap(env); *troika = (txn && !(txn->flags & MDBX_TXN_RDONLY)) ? txn->wr.troika : meta_tap(env);
const meta_ptr_t head = meta_recent(env, troika); const meta_ptr_t head = meta_recent(env, troika);
const meta_t *const meta0 = METAPAGE(env, 0); const meta_t *const meta0 = METAPAGE(env, 0);
const meta_t *const meta1 = METAPAGE(env, 1); const meta_t *const meta1 = METAPAGE(env, 1);
@ -979,16 +952,16 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower, intptr_t si
if (unlikely(err != MDBX_SUCCESS)) if (unlikely(err != MDBX_SUCCESS))
return LOG_IFERR(err); return LOG_IFERR(err);
should_unlock = true; should_unlock = true;
env->basal_txn->tw.troika = meta_tap(env); env->basal_txn->wr.troika = meta_tap(env);
eASSERT(env, !env->txn && !env->basal_txn->nested); eASSERT(env, !env->txn && !env->basal_txn->nested);
env->basal_txn->txnid = env->basal_txn->tw.troika.txnid[env->basal_txn->tw.troika.recent]; env->basal_txn->txnid = env->basal_txn->wr.troika.txnid[env->basal_txn->wr.troika.recent];
txn_snapshot_oldest(env->basal_txn); txn_gc_detent(env->basal_txn);
} }
/* get untouched params from current TXN or DB */ /* get untouched params from current TXN or DB */
if (pagesize <= 0 || pagesize >= INT_MAX) if (pagesize <= 0 || pagesize >= INT_MAX)
pagesize = env->ps; pagesize = env->ps;
const geo_t *const geo = env->txn ? &env->txn->geo : &meta_recent(env, &env->basal_txn->tw.troika).ptr_c->geometry; const geo_t *const geo = env->txn ? &env->txn->geo : &meta_recent(env, &env->basal_txn->wr.troika).ptr_c->geometry;
if (size_lower < 0) if (size_lower < 0)
size_lower = pgno2bytes(env, geo->lower); size_lower = pgno2bytes(env, geo->lower);
if (size_now < 0) if (size_now < 0)
@ -1203,7 +1176,7 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower, intptr_t si
meta_t meta; meta_t meta;
memset(&meta, 0, sizeof(meta)); memset(&meta, 0, sizeof(meta));
if (!env->txn) { if (!env->txn) {
const meta_ptr_t head = meta_recent(env, &env->basal_txn->tw.troika); const meta_ptr_t head = meta_recent(env, &env->basal_txn->wr.troika);
uint64_t timestamp = 0; uint64_t timestamp = 0;
while ("workaround for " while ("workaround for "
@ -1297,7 +1270,7 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower, intptr_t si
env->txn->flags |= MDBX_TXN_DIRTY; env->txn->flags |= MDBX_TXN_DIRTY;
} else { } else {
meta.geometry = new_geo; meta.geometry = new_geo;
rc = dxb_sync_locked(env, env->flags, &meta, &env->basal_txn->tw.troika); rc = dxb_sync_locked(env, env->flags, &meta, &env->basal_txn->wr.troika);
if (likely(rc == MDBX_SUCCESS)) { if (likely(rc == MDBX_SUCCESS)) {
env->geo_in_bytes.now = pgno2bytes(env, new_geo.now = meta.geometry.now); env->geo_in_bytes.now = pgno2bytes(env, new_geo.now = meta.geometry.now);
env->geo_in_bytes.upper = pgno2bytes(env, new_geo.upper = meta.geometry.upper); env->geo_in_bytes.upper = pgno2bytes(env, new_geo.upper = meta.geometry.upper);

View File

@ -77,7 +77,7 @@ static uint16_t default_subpage_reserve_limit(const MDBX_env *env) {
static uint16_t default_merge_threshold_16dot16_percent(const MDBX_env *env) { static uint16_t default_merge_threshold_16dot16_percent(const MDBX_env *env) {
(void)env; (void)env;
return 65536 / 4 /* 25% */; return 65536 / 3 /* 33% */;
} }
static pgno_t default_dp_reserve_limit(const MDBX_env *env) { static pgno_t default_dp_reserve_limit(const MDBX_env *env) {
@ -147,6 +147,9 @@ void env_options_adjust_dp_limit(MDBX_env *env) {
if (env->options.dp_limit < CURSOR_STACK_SIZE * 4) if (env->options.dp_limit < CURSOR_STACK_SIZE * 4)
env->options.dp_limit = CURSOR_STACK_SIZE * 4; env->options.dp_limit = CURSOR_STACK_SIZE * 4;
} }
#ifdef MDBX_DEBUG_DPL_LIMIT
env->options.dp_limit = MDBX_DEBUG_DPL_LIMIT;
#endif /* MDBX_DEBUG_DPL_LIMIT */
if (env->options.dp_initial > env->options.dp_limit && env->options.dp_initial > default_dp_initial(env)) if (env->options.dp_initial > env->options.dp_limit && env->options.dp_initial > default_dp_initial(env))
env->options.dp_initial = env->options.dp_limit; env->options.dp_initial = env->options.dp_limit;
env->options.need_dp_limit_adjust = false; env->options.need_dp_limit_adjust = false;

View File

@ -411,7 +411,7 @@ int mdbx_replace_ex(MDBX_txn *txn, MDBX_dbi dbi, const MDBX_val *key, MDBX_val *
} }
if (is_modifable(txn, page)) { if (is_modifable(txn, page)) {
if (new_data && cmp_lenfast(&present_data, new_data) == 0) { if (new_data && eq_fast(&present_data, new_data)) {
/* если данные совпадают, то ничего делать не надо */ /* если данные совпадают, то ничего делать не надо */
*old_data = *new_data; *old_data = *new_data;
goto bailout; goto bailout;

View File

@ -10,10 +10,11 @@ __attribute__((__no_sanitize_thread__, __noinline__))
int mdbx_txn_straggler(const MDBX_txn *txn, int *percent) int mdbx_txn_straggler(const MDBX_txn *txn, int *percent)
{ {
int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_PARKED); int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_PARKED);
if (likely(rc == MDBX_SUCCESS))
rc = check_env(txn->env, true);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR((rc > 0) ? -rc : rc); return LOG_IFERR((rc > 0) ? -rc : rc);
MDBX_env *env = txn->env;
if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0)) { if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0)) {
if (percent) if (percent)
*percent = (int)((txn->geo.first_unallocated * UINT64_C(100) + txn->geo.end_pgno / 2) / txn->geo.end_pgno); *percent = (int)((txn->geo.first_unallocated * UINT64_C(100) + txn->geo.end_pgno / 2) / txn->geo.end_pgno);
@ -21,15 +22,15 @@ int mdbx_txn_straggler(const MDBX_txn *txn, int *percent)
} }
txnid_t lag; txnid_t lag;
troika_t troika = meta_tap(env); troika_t troika = meta_tap(txn->env);
do { do {
const meta_ptr_t head = meta_recent(env, &troika); const meta_ptr_t head = meta_recent(txn->env, &troika);
if (percent) { if (percent) {
const pgno_t maxpg = head.ptr_v->geometry.now; const pgno_t maxpg = head.ptr_v->geometry.now;
*percent = (int)((head.ptr_v->geometry.first_unallocated * UINT64_C(100) + maxpg / 2) / maxpg); *percent = (int)((head.ptr_v->geometry.first_unallocated * UINT64_C(100) + maxpg / 2) / maxpg);
} }
lag = (head.txnid - txn->txnid) / xMDBX_TXNID_STEP; lag = (head.txnid - txn->txnid) / xMDBX_TXNID_STEP;
} while (unlikely(meta_should_retry(env, &troika))); } while (unlikely(meta_should_retry(txn->env, &troika)));
return (lag > INT_MAX) ? INT_MAX : (int)lag; return (lag > INT_MAX) ? INT_MAX : (int)lag;
} }
@ -55,8 +56,8 @@ MDBX_txn_flags_t mdbx_txn_flags(const MDBX_txn *txn) {
assert(0 == (int)(txn->flags & MDBX_TXN_INVALID)); assert(0 == (int)(txn->flags & MDBX_TXN_INVALID));
MDBX_txn_flags_t flags = txn->flags; MDBX_txn_flags_t flags = txn->flags;
if (F_ISSET(flags, MDBX_TXN_PARKED | MDBX_TXN_RDONLY) && txn->to.reader && if (F_ISSET(flags, MDBX_TXN_PARKED | MDBX_TXN_RDONLY) && txn->ro.slot &&
safe64_read(&txn->to.reader->tid) == MDBX_TID_TXN_OUSTED) safe64_read(&txn->ro.slot->tid) == MDBX_TID_TXN_OUSTED)
flags |= MDBX_TXN_OUSTED; flags |= MDBX_TXN_OUSTED;
return flags; return flags;
} }
@ -66,6 +67,10 @@ int mdbx_txn_reset(MDBX_txn *txn) {
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc); return LOG_IFERR(rc);
rc = check_env(txn->env, false);
if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
/* This call is only valid for read-only txns */ /* This call is only valid for read-only txns */
if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0)) if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0))
return LOG_IFERR(MDBX_EINVAL); return LOG_IFERR(MDBX_EINVAL);
@ -85,8 +90,6 @@ int mdbx_txn_break(MDBX_txn *txn) {
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc); return LOG_IFERR(rc);
txn->flags |= MDBX_TXN_ERROR; txn->flags |= MDBX_TXN_ERROR;
if (txn->flags & MDBX_TXN_RDONLY)
break;
txn = txn->nested; txn = txn->nested;
} while (txn); } while (txn);
return MDBX_SUCCESS; return MDBX_SUCCESS;
@ -117,6 +120,11 @@ int mdbx_txn_park(MDBX_txn *txn, bool autounpark) {
int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_ERROR); int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_ERROR);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc); return LOG_IFERR(rc);
rc = check_env(txn->env, true);
if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0)) if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0))
return LOG_IFERR(MDBX_TXN_INVALID); return LOG_IFERR(MDBX_TXN_INVALID);
@ -125,7 +133,7 @@ int mdbx_txn_park(MDBX_txn *txn, bool autounpark) {
return LOG_IFERR(rc ? rc : MDBX_OUSTED); return LOG_IFERR(rc ? rc : MDBX_OUSTED);
} }
return LOG_IFERR(txn_park(txn, autounpark)); return LOG_IFERR(txn_ro_park(txn, autounpark));
} }
int mdbx_txn_unpark(MDBX_txn *txn, bool restart_if_ousted) { int mdbx_txn_unpark(MDBX_txn *txn, bool restart_if_ousted) {
@ -133,10 +141,15 @@ int mdbx_txn_unpark(MDBX_txn *txn, bool restart_if_ousted) {
int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_PARKED - MDBX_TXN_ERROR); int rc = check_txn(txn, MDBX_TXN_BLOCKED - MDBX_TXN_PARKED - MDBX_TXN_ERROR);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc); return LOG_IFERR(rc);
rc = check_env(txn->env, true);
if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
if (unlikely(!F_ISSET(txn->flags, MDBX_TXN_RDONLY | MDBX_TXN_PARKED))) if (unlikely(!F_ISSET(txn->flags, MDBX_TXN_RDONLY | MDBX_TXN_PARKED)))
return MDBX_SUCCESS; return MDBX_SUCCESS;
rc = txn_unpark(txn); rc = txn_ro_unpark(txn);
if (likely(rc != MDBX_OUSTED) || !restart_if_ousted) if (likely(rc != MDBX_OUSTED) || !restart_if_ousted)
return LOG_IFERR(rc); return LOG_IFERR(rc);
@ -146,22 +159,24 @@ int mdbx_txn_unpark(MDBX_txn *txn, bool restart_if_ousted) {
} }
int mdbx_txn_renew(MDBX_txn *txn) { int mdbx_txn_renew(MDBX_txn *txn) {
if (unlikely(!txn)) int rc = check_txn(txn, 0);
return LOG_IFERR(MDBX_EINVAL); if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
if (unlikely(txn->signature != txn_signature)) rc = check_env(txn->env, true);
return LOG_IFERR(MDBX_EBADSIGN); if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0)) if (unlikely((txn->flags & MDBX_TXN_RDONLY) == 0))
return LOG_IFERR(MDBX_EINVAL); return LOG_IFERR(MDBX_EINVAL);
if (unlikely(txn->owner != 0 || !(txn->flags & MDBX_TXN_FINISHED))) { if (unlikely(txn->owner != 0 || !(txn->flags & MDBX_TXN_FINISHED))) {
int rc = mdbx_txn_reset(txn); rc = mdbx_txn_reset(txn);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return rc; return rc;
} }
int rc = txn_renew(txn, MDBX_TXN_RDONLY); rc = txn_renew(txn, MDBX_TXN_RDONLY);
if (rc == MDBX_SUCCESS) { if (rc == MDBX_SUCCESS) {
tASSERT(txn, txn->owner == (txn->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self()); tASSERT(txn, txn->owner == (txn->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self());
DEBUG("renew txn %" PRIaTXN "%c %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid, DEBUG("renew txn %" PRIaTXN "%c %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid,
@ -172,7 +187,7 @@ int mdbx_txn_renew(MDBX_txn *txn) {
} }
int mdbx_txn_set_userctx(MDBX_txn *txn, void *ctx) { int mdbx_txn_set_userctx(MDBX_txn *txn, void *ctx) {
int rc = check_txn(txn, MDBX_TXN_FINISHED); int rc = check_txn(txn, 0);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc); return LOG_IFERR(rc);
@ -197,6 +212,8 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags, M
if (unlikely(env->flags & MDBX_RDONLY & ~flags)) /* write txn in RDONLY env */ if (unlikely(env->flags & MDBX_RDONLY & ~flags)) /* write txn in RDONLY env */
return LOG_IFERR(MDBX_EACCESS); return LOG_IFERR(MDBX_EACCESS);
/* Reuse preallocated write txn. However, do not touch it until
* txn_renew() succeeds, since it currently may be active. */
MDBX_txn *txn = nullptr; MDBX_txn *txn = nullptr;
if (parent) { if (parent) {
/* Nested transactions: Max 1 child, write txns only, no writemap */ /* Nested transactions: Max 1 child, write txns only, no writemap */
@ -212,173 +229,44 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags, M
} }
return LOG_IFERR(rc); return LOG_IFERR(rc);
} }
if (unlikely(parent->env != env))
if (env->options.spill_parent4child_denominator) { return LOG_IFERR(MDBX_BAD_TXN);
/* Spill dirty-pages of parent to provide dirtyroom for child txn */
rc = txn_spill(parent, nullptr, parent->tw.dirtylist->length / env->options.spill_parent4child_denominator);
if (unlikely(rc != MDBX_SUCCESS))
return LOG_IFERR(rc);
}
tASSERT(parent, audit_ex(parent, 0, false) == 0);
flags |= parent->flags & (txn_rw_begin_flags | MDBX_TXN_SPILLS | MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP); flags |= parent->flags & (txn_rw_begin_flags | MDBX_TXN_SPILLS | MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP);
} else if ((flags & MDBX_TXN_RDONLY) == 0) { rc = txn_nested_create(parent, flags);
/* Reuse preallocated write txn. However, do not touch it until txn = parent->nested;
* txn_renew() succeeds, since it currently may be active. */
txn = env->basal_txn;
goto renew;
}
const intptr_t bitmap_bytes =
#if MDBX_ENABLE_DBI_SPARSE
ceil_powerof2(env->max_dbi, CHAR_BIT * sizeof(txn->dbi_sparse[0])) / CHAR_BIT;
#else
0;
#endif /* MDBX_ENABLE_DBI_SPARSE */
STATIC_ASSERT(sizeof(txn->tw) > sizeof(txn->to));
const size_t base =
(flags & MDBX_TXN_RDONLY) ? sizeof(MDBX_txn) - sizeof(txn->tw) + sizeof(txn->to) : sizeof(MDBX_txn);
const size_t size = base +
((flags & MDBX_TXN_RDONLY) ? (size_t)bitmap_bytes + env->max_dbi * sizeof(txn->dbi_seqs[0]) : 0) +
env->max_dbi * (sizeof(txn->dbs[0]) + sizeof(txn->cursors[0]) + sizeof(txn->dbi_state[0]));
txn = osal_malloc(size);
if (unlikely(txn == nullptr))
return LOG_IFERR(MDBX_ENOMEM);
#if MDBX_DEBUG
memset(txn, 0xCD, size);
VALGRIND_MAKE_MEM_UNDEFINED(txn, size);
#endif /* MDBX_DEBUG */
MDBX_ANALYSIS_ASSUME(size > base);
memset(txn, 0, (MDBX_GOOFY_MSVC_STATIC_ANALYZER && base > size) ? size : base);
txn->dbs = ptr_disp(txn, base);
txn->cursors = ptr_disp(txn->dbs, env->max_dbi * sizeof(txn->dbs[0]));
#if MDBX_DEBUG
txn->cursors[FREE_DBI] = nullptr; /* avoid SIGSEGV in an assertion later */
#endif
txn->dbi_state = ptr_disp(txn, size - env->max_dbi * sizeof(txn->dbi_state[0]));
txn->flags = flags;
txn->env = env;
if (parent) {
tASSERT(parent, dpl_check(parent));
#if MDBX_ENABLE_DBI_SPARSE
txn->dbi_sparse = parent->dbi_sparse;
#endif /* MDBX_ENABLE_DBI_SPARSE */
txn->dbi_seqs = parent->dbi_seqs;
txn->geo = parent->geo;
rc = dpl_alloc(txn);
if (likely(rc == MDBX_SUCCESS)) {
const size_t len = MDBX_PNL_GETSIZE(parent->tw.repnl) + parent->tw.loose_count;
txn->tw.repnl = pnl_alloc((len > MDBX_PNL_INITIAL) ? len : MDBX_PNL_INITIAL);
if (unlikely(!txn->tw.repnl))
rc = MDBX_ENOMEM;
}
if (unlikely(rc != MDBX_SUCCESS)) { if (unlikely(rc != MDBX_SUCCESS)) {
nested_failed: int err = txn_end(txn, TXN_END_FAIL_BEGIN_NESTED);
pnl_free(txn->tw.repnl); return err ? err : rc;
dpl_free(txn);
osal_free(txn);
return LOG_IFERR(rc);
} }
/* Move loose pages to reclaimed list */
if (parent->tw.loose_count) {
do {
page_t *lp = parent->tw.loose_pages;
tASSERT(parent, lp->flags == P_LOOSE);
rc = pnl_insert_span(&parent->tw.repnl, lp->pgno, 1);
if (unlikely(rc != MDBX_SUCCESS))
goto nested_failed;
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *));
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
parent->tw.loose_pages = page_next(lp);
/* Remove from dirty list */
page_wash(parent, dpl_exist(parent, lp->pgno), lp, 1);
} while (parent->tw.loose_pages);
parent->tw.loose_count = 0;
#if MDBX_ENABLE_REFUND
parent->tw.loose_refund_wl = 0;
#endif /* MDBX_ENABLE_REFUND */
tASSERT(parent, dpl_check(parent));
}
txn->tw.dirtyroom = parent->tw.dirtyroom;
txn->tw.dirtylru = parent->tw.dirtylru;
dpl_sort(parent);
if (parent->tw.spilled.list)
spill_purge(parent);
tASSERT(txn, MDBX_PNL_ALLOCLEN(txn->tw.repnl) >= MDBX_PNL_GETSIZE(parent->tw.repnl));
memcpy(txn->tw.repnl, parent->tw.repnl, MDBX_PNL_SIZEOF(parent->tw.repnl));
eASSERT(env, pnl_check_allocated(txn->tw.repnl, (txn->geo.first_unallocated /* LY: intentional assignment
here, only for assertion */
= parent->geo.first_unallocated) -
MDBX_ENABLE_REFUND));
txn->tw.gc.time_acc = parent->tw.gc.time_acc;
txn->tw.gc.last_reclaimed = parent->tw.gc.last_reclaimed;
if (parent->tw.gc.retxl) {
txn->tw.gc.retxl = parent->tw.gc.retxl;
parent->tw.gc.retxl = (void *)(intptr_t)MDBX_PNL_GETSIZE(parent->tw.gc.retxl);
}
txn->tw.retired_pages = parent->tw.retired_pages;
parent->tw.retired_pages = (void *)(intptr_t)MDBX_PNL_GETSIZE(parent->tw.retired_pages);
txn->txnid = parent->txnid;
txn->front_txnid = parent->front_txnid + 1;
#if MDBX_ENABLE_REFUND
txn->tw.loose_refund_wl = 0;
#endif /* MDBX_ENABLE_REFUND */
txn->canary = parent->canary;
parent->flags |= MDBX_TXN_HAS_CHILD;
parent->nested = txn;
txn->parent = parent;
txn->owner = parent->owner;
txn->tw.troika = parent->tw.troika;
txn->cursors[FREE_DBI] = nullptr;
txn->cursors[MAIN_DBI] = nullptr;
txn->dbi_state[FREE_DBI] = parent->dbi_state[FREE_DBI] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY);
txn->dbi_state[MAIN_DBI] = parent->dbi_state[MAIN_DBI] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY);
memset(txn->dbi_state + CORE_DBS, 0, (txn->n_dbi = parent->n_dbi) - CORE_DBS);
memcpy(txn->dbs, parent->dbs, sizeof(txn->dbs[0]) * CORE_DBS);
tASSERT(parent, parent->tw.dirtyroom + parent->tw.dirtylist->length ==
(parent->parent ? parent->parent->tw.dirtyroom : parent->env->options.dp_limit));
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit));
env->txn = txn;
tASSERT(parent, parent->cursors[FREE_DBI] == nullptr);
rc = parent->cursors[MAIN_DBI] ? cursor_shadow(parent->cursors[MAIN_DBI], txn, MAIN_DBI) : MDBX_SUCCESS;
if (AUDIT_ENABLED() && ASSERT_ENABLED()) { if (AUDIT_ENABLED() && ASSERT_ENABLED()) {
txn->signature = txn_signature; txn->signature = txn_signature;
tASSERT(txn, audit_ex(txn, 0, false) == 0); tASSERT(txn, audit_ex(txn, 0, false) == 0);
} }
if (unlikely(rc != MDBX_SUCCESS)) } else {
txn_end(txn, TXN_END_FAIL_BEGINCHILD); txn = env->basal_txn;
} else { /* MDBX_TXN_RDONLY */ if (flags & MDBX_TXN_RDONLY) {
txn->dbi_seqs = ptr_disp(txn->cursors, env->max_dbi * sizeof(txn->cursors[0])); txn = txn_alloc(flags, env);
#if MDBX_ENABLE_DBI_SPARSE if (unlikely(!txn))
txn->dbi_sparse = ptr_disp(txn->dbi_state, -bitmap_bytes); return LOG_IFERR(MDBX_ENOMEM);
#endif /* MDBX_ENABLE_DBI_SPARSE */
renew:
rc = txn_renew(txn, flags);
} }
rc = txn_renew(txn, flags);
if (unlikely(rc != MDBX_SUCCESS)) { if (unlikely(rc != MDBX_SUCCESS)) {
if (txn != env->basal_txn) if (txn != env->basal_txn)
osal_free(txn); osal_free(txn);
} else { return LOG_IFERR(rc);
}
}
if (flags & (MDBX_TXN_RDONLY_PREPARE - MDBX_TXN_RDONLY)) if (flags & (MDBX_TXN_RDONLY_PREPARE - MDBX_TXN_RDONLY))
eASSERT(env, txn->flags == (MDBX_TXN_RDONLY | MDBX_TXN_FINISHED)); eASSERT(env, txn->flags == (MDBX_TXN_RDONLY | MDBX_TXN_FINISHED));
else if (flags & MDBX_TXN_RDONLY) else if (flags & MDBX_TXN_RDONLY)
eASSERT(env, (txn->flags & ~(MDBX_NOSTICKYTHREADS | MDBX_TXN_RDONLY | MDBX_WRITEMAP | eASSERT(env, (txn->flags & ~(MDBX_NOSTICKYTHREADS | MDBX_TXN_RDONLY | MDBX_WRITEMAP |
/* Win32: SRWL flag */ txn_shrink_allowed)) == 0); /* Win32: SRWL flag */ txn_shrink_allowed)) == 0);
else { else {
eASSERT(env, (txn->flags & ~(MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP | txn_shrink_allowed | MDBX_NOMETASYNC | eASSERT(env, (txn->flags & ~(MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP | txn_shrink_allowed | txn_may_have_cursors |
MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0); MDBX_NOMETASYNC | MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0);
assert(!txn->tw.spilled.list && !txn->tw.spilled.least_removed); assert(!txn->wr.spilled.list && !txn->wr.spilled.least_removed);
} }
txn->signature = txn_signature; txn->signature = txn_signature;
txn->userctx = context; txn->userctx = context;
@ -386,28 +274,81 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags, M
DEBUG("begin txn %" PRIaTXN "%c %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid, DEBUG("begin txn %" PRIaTXN "%c %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid,
(flags & MDBX_TXN_RDONLY) ? 'r' : 'w', (void *)txn, (void *)env, txn->dbs[MAIN_DBI].root, (flags & MDBX_TXN_RDONLY) ? 'r' : 'w', (void *)txn, (void *)env, txn->dbs[MAIN_DBI].root,
txn->dbs[FREE_DBI].root); txn->dbs[FREE_DBI].root);
return MDBX_SUCCESS;
} }
return LOG_IFERR(rc); static void latency_gcprof(MDBX_commit_latency *latency, const MDBX_txn *txn) {
MDBX_env *const env = txn->env;
if (latency && likely(env->lck) && MDBX_ENABLE_PROFGC) {
pgop_stat_t *const ptr = &env->lck->pgops;
latency->gc_prof.work_counter = ptr->gc_prof.work.spe_counter;
latency->gc_prof.work_rtime_monotonic = osal_monotime_to_16dot16(ptr->gc_prof.work.rtime_monotonic);
latency->gc_prof.work_xtime_cpu = osal_monotime_to_16dot16(ptr->gc_prof.work.xtime_cpu);
latency->gc_prof.work_rsteps = ptr->gc_prof.work.rsteps;
latency->gc_prof.work_xpages = ptr->gc_prof.work.xpages;
latency->gc_prof.work_majflt = ptr->gc_prof.work.majflt;
latency->gc_prof.self_counter = ptr->gc_prof.self.spe_counter;
latency->gc_prof.self_rtime_monotonic = osal_monotime_to_16dot16(ptr->gc_prof.self.rtime_monotonic);
latency->gc_prof.self_xtime_cpu = osal_monotime_to_16dot16(ptr->gc_prof.self.xtime_cpu);
latency->gc_prof.self_rsteps = ptr->gc_prof.self.rsteps;
latency->gc_prof.self_xpages = ptr->gc_prof.self.xpages;
latency->gc_prof.self_majflt = ptr->gc_prof.self.majflt;
latency->gc_prof.wloops = ptr->gc_prof.wloops;
latency->gc_prof.coalescences = ptr->gc_prof.coalescences;
latency->gc_prof.wipes = ptr->gc_prof.wipes;
latency->gc_prof.flushes = ptr->gc_prof.flushes;
latency->gc_prof.kicks = ptr->gc_prof.kicks;
latency->gc_prof.pnl_merge_work.time = osal_monotime_to_16dot16(ptr->gc_prof.work.pnl_merge.time);
latency->gc_prof.pnl_merge_work.calls = ptr->gc_prof.work.pnl_merge.calls;
latency->gc_prof.pnl_merge_work.volume = ptr->gc_prof.work.pnl_merge.volume;
latency->gc_prof.pnl_merge_self.time = osal_monotime_to_16dot16(ptr->gc_prof.self.pnl_merge.time);
latency->gc_prof.pnl_merge_self.calls = ptr->gc_prof.self.pnl_merge.calls;
latency->gc_prof.pnl_merge_self.volume = ptr->gc_prof.self.pnl_merge.volume;
if (txn == env->basal_txn)
memset(&ptr->gc_prof, 0, sizeof(ptr->gc_prof));
}
}
static void latency_init(MDBX_commit_latency *latency, struct commit_timestamp *ts) {
ts->start = 0;
ts->gc_cpu = 0;
if (latency) {
ts->start = osal_monotime();
memset(latency, 0, sizeof(*latency));
}
ts->prep = ts->gc = ts->audit = ts->write = ts->sync = ts->start;
}
static void latency_done(MDBX_commit_latency *latency, struct commit_timestamp *ts) {
if (latency) {
latency->preparation = (ts->prep > ts->start) ? osal_monotime_to_16dot16(ts->prep - ts->start) : 0;
latency->gc_wallclock = (ts->gc > ts->prep) ? osal_monotime_to_16dot16(ts->gc - ts->prep) : 0;
latency->gc_cputime = ts->gc_cpu ? osal_monotime_to_16dot16(ts->gc_cpu) : 0;
latency->audit = (ts->audit > ts->gc) ? osal_monotime_to_16dot16(ts->audit - ts->gc) : 0;
latency->write = (ts->write > ts->audit) ? osal_monotime_to_16dot16(ts->write - ts->audit) : 0;
latency->sync = (ts->sync > ts->write) ? osal_monotime_to_16dot16(ts->sync - ts->write) : 0;
const uint64_t ts_end = osal_monotime();
latency->ending = (ts_end > ts->sync) ? osal_monotime_to_16dot16(ts_end - ts->sync) : 0;
latency->whole = osal_monotime_to_16dot16_noUnderflow(ts_end - ts->start);
}
} }
int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) { int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
STATIC_ASSERT(MDBX_TXN_FINISHED == MDBX_TXN_BLOCKED - MDBX_TXN_HAS_CHILD - MDBX_TXN_ERROR - MDBX_TXN_PARKED); STATIC_ASSERT(MDBX_TXN_FINISHED == MDBX_TXN_BLOCKED - MDBX_TXN_HAS_CHILD - MDBX_TXN_ERROR - MDBX_TXN_PARKED);
const uint64_t ts_0 = latency ? osal_monotime() : 0;
uint64_t ts_1 = 0, ts_2 = 0, ts_3 = 0, ts_4 = 0, ts_5 = 0, gc_cputime = 0;
/* txn_end() mode for a commit which writes nothing */ struct commit_timestamp ts;
unsigned end_mode = TXN_END_PURE_COMMIT | TXN_END_UPDATE | TXN_END_SLOT | TXN_END_FREE; latency_init(latency, &ts);
int rc = check_txn(txn, MDBX_TXN_FINISHED); int rc = check_txn(txn, MDBX_TXN_FINISHED);
if (unlikely(rc != MDBX_SUCCESS)) { if (unlikely(rc != MDBX_SUCCESS)) {
if (rc == MDBX_BAD_TXN && (txn->flags & MDBX_TXN_RDONLY)) { if (rc == MDBX_BAD_TXN && F_ISSET(txn->flags, MDBX_TXN_FINISHED | MDBX_TXN_RDONLY)) {
rc = MDBX_RESULT_TRUE; rc = MDBX_RESULT_TRUE;
goto fail; goto fail;
} }
bailout:
if (latency)
memset(latency, 0, sizeof(*latency));
return LOG_IFERR(rc); return LOG_IFERR(rc);
} }
@ -415,14 +356,17 @@ int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
if (MDBX_ENV_CHECKPID && unlikely(env->pid != osal_getpid())) { if (MDBX_ENV_CHECKPID && unlikely(env->pid != osal_getpid())) {
env->flags |= ENV_FATAL_ERROR; env->flags |= ENV_FATAL_ERROR;
rc = MDBX_PANIC; rc = MDBX_PANIC;
goto bailout; return LOG_IFERR(rc);
} }
if (unlikely(txn->flags & MDBX_TXN_RDONLY)) { if (txn->flags & MDBX_TXN_RDONLY) {
if (txn->flags & MDBX_TXN_ERROR) { if (unlikely(txn->parent || (txn->flags & MDBX_TXN_HAS_CHILD) || txn == env->txn || txn == env->basal_txn)) {
rc = MDBX_RESULT_TRUE; ERROR("attempt to commit %s txn %p", "strange read-only", (void *)txn);
goto fail; return MDBX_PROBLEM;
} }
latency_gcprof(latency, txn);
rc = (txn->flags & MDBX_TXN_ERROR) ? MDBX_RESULT_TRUE : MDBX_SUCCESS;
txn_end(txn, TXN_END_PURE_COMMIT | TXN_END_UPDATE | TXN_END_SLOT | TXN_END_FREE);
goto done; goto done;
} }
@ -436,7 +380,12 @@ int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
if (unlikely(txn->flags & MDBX_TXN_ERROR)) { if (unlikely(txn->flags & MDBX_TXN_ERROR)) {
rc = MDBX_RESULT_TRUE; rc = MDBX_RESULT_TRUE;
goto fail; fail:
latency_gcprof(latency, txn);
int err = txn_abort(txn);
if (unlikely(err != MDBX_SUCCESS))
rc = err;
goto done;
} }
if (txn->nested) { if (txn->nested) {
@ -447,370 +396,38 @@ int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
} }
if (unlikely(txn != env->txn)) { if (unlikely(txn != env->txn)) {
DEBUG("%s", "attempt to commit unknown transaction"); ERROR("attempt to commit %s txn %p", "unknown", (void *)txn);
rc = MDBX_EINVAL; return MDBX_EINVAL;
goto fail;
} }
if (txn->parent) { if (txn->parent) {
tASSERT(txn, audit_ex(txn, 0, false) == 0); if (unlikely(txn->parent->nested != txn || txn->parent->env != env)) {
eASSERT(env, txn != env->basal_txn); ERROR("attempt to commit %s txn %p", "strange nested", (void *)txn);
MDBX_txn *const parent = txn->parent; return MDBX_PROBLEM;
eASSERT(env, parent->signature == txn_signature);
eASSERT(env, parent->nested == txn && (parent->flags & MDBX_TXN_HAS_CHILD) != 0);
eASSERT(env, dpl_check(txn));
if (txn->tw.dirtylist->length == 0 && !(txn->flags & MDBX_TXN_DIRTY) && parent->n_dbi == txn->n_dbi) {
/* fast completion of pure nested transaction */
VERBOSE("fast-complete pure nested txn %" PRIaTXN, txn->txnid);
tASSERT(txn, memcmp(&parent->geo, &txn->geo, sizeof(parent->geo)) == 0);
tASSERT(txn, memcmp(&parent->canary, &txn->canary, sizeof(parent->canary)) == 0);
tASSERT(txn, !txn->tw.spilled.list || MDBX_PNL_GETSIZE(txn->tw.spilled.list) == 0);
tASSERT(txn, txn->tw.loose_count == 0);
/* Update parent's DBs array */
eASSERT(env, parent->n_dbi == txn->n_dbi);
TXN_FOREACH_DBI_ALL(txn, dbi) {
tASSERT(txn, (txn->dbi_state[dbi] & (DBI_CREAT | DBI_DIRTY)) == 0);
if (txn->dbi_state[dbi] & DBI_FRESH) {
parent->dbs[dbi] = txn->dbs[dbi];
/* preserve parent's status */
const uint8_t state = txn->dbi_state[dbi] | DBI_FRESH;
DEBUG("dbi %zu dbi-state %s 0x%02x -> 0x%02x", dbi, (parent->dbi_state[dbi] != state) ? "update" : "still",
parent->dbi_state[dbi], state);
parent->dbi_state[dbi] = state;
} }
}
txn_done_cursors(txn, true); latency_gcprof(latency, txn);
end_mode = TXN_END_PURE_COMMIT | TXN_END_SLOT | TXN_END_FREE | TXN_END_EOTDONE; rc = txn_nested_join(txn, latency ? &ts : nullptr);
goto done; goto done;
} }
/* Preserve space for spill list to avoid parent's state corruption rc = txn_basal_commit(txn, latency ? &ts : nullptr);
* if allocation fails. */ latency_gcprof(latency, txn);
const size_t parent_retired_len = (uintptr_t)parent->tw.retired_pages; int end = TXN_END_COMMITTED | TXN_END_UPDATE;
tASSERT(txn, parent_retired_len <= MDBX_PNL_GETSIZE(txn->tw.retired_pages));
const size_t retired_delta = MDBX_PNL_GETSIZE(txn->tw.retired_pages) - parent_retired_len;
if (retired_delta) {
rc = pnl_need(&txn->tw.repnl, retired_delta);
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
}
if (txn->tw.spilled.list) {
if (parent->tw.spilled.list) {
rc = pnl_need(&parent->tw.spilled.list, MDBX_PNL_GETSIZE(txn->tw.spilled.list));
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
}
spill_purge(txn);
}
if (unlikely(txn->tw.dirtylist->length + parent->tw.dirtylist->length > parent->tw.dirtylist->detent &&
!dpl_reserve(parent, txn->tw.dirtylist->length + parent->tw.dirtylist->length))) {
rc = MDBX_ENOMEM;
goto fail;
}
//-------------------------------------------------------------------------
parent->tw.gc.retxl = txn->tw.gc.retxl;
txn->tw.gc.retxl = nullptr;
parent->tw.retired_pages = txn->tw.retired_pages;
txn->tw.retired_pages = nullptr;
pnl_free(parent->tw.repnl);
parent->tw.repnl = txn->tw.repnl;
txn->tw.repnl = nullptr;
parent->tw.gc.time_acc = txn->tw.gc.time_acc;
parent->tw.gc.last_reclaimed = txn->tw.gc.last_reclaimed;
parent->geo = txn->geo;
parent->canary = txn->canary;
parent->flags |= txn->flags & MDBX_TXN_DIRTY;
/* Move loose pages to parent */
#if MDBX_ENABLE_REFUND
parent->tw.loose_refund_wl = txn->tw.loose_refund_wl;
#endif /* MDBX_ENABLE_REFUND */
parent->tw.loose_count = txn->tw.loose_count;
parent->tw.loose_pages = txn->tw.loose_pages;
/* Merge our cursors into parent's and close them */
txn_done_cursors(txn, true);
end_mode |= TXN_END_EOTDONE;
/* Update parent's DBs array */
eASSERT(env, parent->n_dbi == txn->n_dbi);
TXN_FOREACH_DBI_ALL(txn, dbi) {
if (txn->dbi_state[dbi] != (parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY))) {
eASSERT(env, (txn->dbi_state[dbi] & (DBI_CREAT | DBI_FRESH | DBI_DIRTY)) != 0 ||
(txn->dbi_state[dbi] | DBI_STALE) ==
(parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY)));
parent->dbs[dbi] = txn->dbs[dbi];
/* preserve parent's status */
const uint8_t state = txn->dbi_state[dbi] | (parent->dbi_state[dbi] & (DBI_CREAT | DBI_FRESH | DBI_DIRTY));
DEBUG("dbi %zu dbi-state %s 0x%02x -> 0x%02x", dbi, (parent->dbi_state[dbi] != state) ? "update" : "still",
parent->dbi_state[dbi], state);
parent->dbi_state[dbi] = state;
}
}
if (latency) {
ts_1 = osal_monotime();
ts_2 = /* no gc-update */ ts_1;
ts_3 = /* no audit */ ts_2;
ts_4 = /* no write */ ts_3;
ts_5 = /* no sync */ ts_4;
}
txn_merge(parent, txn, parent_retired_len);
env->txn = parent;
parent->nested = nullptr;
tASSERT(parent, dpl_check(parent));
#if MDBX_ENABLE_REFUND
txn_refund(parent);
if (ASSERT_ENABLED()) {
/* Check parent's loose pages not suitable for refund */
for (page_t *lp = parent->tw.loose_pages; lp; lp = page_next(lp)) {
tASSERT(parent, lp->pgno < parent->tw.loose_refund_wl && lp->pgno + 1 < parent->geo.first_unallocated);
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *));
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
}
/* Check parent's reclaimed pages not suitable for refund */
if (MDBX_PNL_GETSIZE(parent->tw.repnl))
tASSERT(parent, MDBX_PNL_MOST(parent->tw.repnl) + 1 < parent->geo.first_unallocated);
}
#endif /* MDBX_ENABLE_REFUND */
txn->signature = 0;
osal_free(txn);
tASSERT(parent, audit_ex(parent, 0, false) == 0);
rc = MDBX_SUCCESS;
goto provide_latency;
}
if (!txn->tw.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
} else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : env->options.dp_limit));
}
txn_done_cursors(txn, false);
end_mode |= TXN_END_EOTDONE;
if ((!txn->tw.dirtylist || txn->tw.dirtylist->length == 0) &&
(txn->flags & (MDBX_TXN_DIRTY | MDBX_TXN_SPILLS)) == 0) {
TXN_FOREACH_DBI_ALL(txn, i) { tASSERT(txn, !(txn->dbi_state[i] & DBI_DIRTY)); }
#if defined(MDBX_NOSUCCESS_EMPTY_COMMIT) && MDBX_NOSUCCESS_EMPTY_COMMIT
rc = txn_end(txn, end_mode);
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
rc = MDBX_RESULT_TRUE;
goto provide_latency;
#else
goto done;
#endif /* MDBX_NOSUCCESS_EMPTY_COMMIT */
}
DEBUG("committing txn %" PRIaTXN " %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid, (void *)txn,
(void *)env, txn->dbs[MAIN_DBI].root, txn->dbs[FREE_DBI].root);
if (txn->n_dbi > CORE_DBS) {
/* Update table root pointers */
cursor_couple_t cx;
rc = cursor_init(&cx.outer, txn, MAIN_DBI);
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
cx.outer.next = txn->cursors[MAIN_DBI];
txn->cursors[MAIN_DBI] = &cx.outer;
TXN_FOREACH_DBI_USER(txn, i) {
if ((txn->dbi_state[i] & DBI_DIRTY) == 0)
continue;
tree_t *const db = &txn->dbs[i];
DEBUG("update main's entry for sub-db %zu, mod_txnid %" PRIaTXN " -> %" PRIaTXN, i, db->mod_txnid, txn->txnid);
/* Может быть mod_txnid > front после коммита вложенных тразакций */
db->mod_txnid = txn->txnid;
MDBX_val data = {db, sizeof(tree_t)};
rc = cursor_put(&cx.outer, &env->kvs[i].name, &data, N_TREE);
if (unlikely(rc != MDBX_SUCCESS)) { if (unlikely(rc != MDBX_SUCCESS)) {
txn->cursors[MAIN_DBI] = cx.outer.next; end = TXN_END_ABORT;
goto fail; if (rc == MDBX_RESULT_TRUE) {
end = TXN_END_PURE_COMMIT | TXN_END_UPDATE;
rc = MDBX_NOSUCCESS_PURE_COMMIT ? MDBX_RESULT_TRUE : MDBX_SUCCESS;
} }
} }
txn->cursors[MAIN_DBI] = cx.outer.next; int err = txn_end(txn, end);
} if (unlikely(err != MDBX_SUCCESS))
rc = err;
ts_1 = latency ? osal_monotime() : 0;
gcu_t gcu_ctx;
gc_cputime = latency ? osal_cputime(nullptr) : 0;
rc = gc_update_init(txn, &gcu_ctx);
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
rc = gc_update(txn, &gcu_ctx);
gc_cputime = latency ? osal_cputime(nullptr) - gc_cputime : 0;
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
tASSERT(txn, txn->tw.loose_count == 0);
txn->dbs[FREE_DBI].mod_txnid = (txn->dbi_state[FREE_DBI] & DBI_DIRTY) ? txn->txnid : txn->dbs[FREE_DBI].mod_txnid;
txn->dbs[MAIN_DBI].mod_txnid = (txn->dbi_state[MAIN_DBI] & DBI_DIRTY) ? txn->txnid : txn->dbs[MAIN_DBI].mod_txnid;
ts_2 = latency ? osal_monotime() : 0;
ts_3 = ts_2;
if (AUDIT_ENABLED()) {
rc = audit_ex(txn, MDBX_PNL_GETSIZE(txn->tw.retired_pages), true);
ts_3 = osal_monotime();
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
}
bool need_flush_for_nometasync = false;
const meta_ptr_t head = meta_recent(env, &txn->tw.troika);
const uint32_t meta_sync_txnid = atomic_load32(&env->lck->meta_sync_txnid, mo_Relaxed);
/* sync prev meta */
if (head.is_steady && meta_sync_txnid != (uint32_t)head.txnid) {
/* Исправление унаследованного от LMDB недочета:
*
* Всё хорошо, если все процессы работающие с БД не используют WRITEMAP.
* Тогда мета-страница (обновленная, но не сброшенная на диск) будет
* сохранена в результате fdatasync() при записи данных этой транзакции.
*
* Всё хорошо, если все процессы работающие с БД используют WRITEMAP
* без MDBX_AVOID_MSYNC.
* Тогда мета-страница (обновленная, но не сброшенная на диск) будет
* сохранена в результате msync() при записи данных этой транзакции.
*
* Если же в процессах работающих с БД используется оба метода, как sync()
* в режиме MDBX_WRITEMAP, так и записи через файловый дескриптор, то
* становится невозможным обеспечить фиксацию на диске мета-страницы
* предыдущей транзакции и данных текущей транзакции, за счет одной
* sync-операцией выполняемой после записи данных текущей транзакции.
* Соответственно, требуется явно обновлять мета-страницу, что полностью
* уничтожает выгоду от NOMETASYNC. */
const uint32_t txnid_dist = ((txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC) ? MDBX_NOMETASYNC_LAZY_FD
: MDBX_NOMETASYNC_LAZY_WRITEMAP;
/* Смысл "магии" в том, чтобы избежать отдельного вызова fdatasync()
* или msync() для гарантированной фиксации на диске мета-страницы,
* которая была "лениво" отправлена на запись в предыдущей транзакции,
* но не сброшена на диск из-за активного режима MDBX_NOMETASYNC. */
if (
#if defined(_WIN32) || defined(_WIN64)
!env->ioring.overlapped_fd &&
#endif
meta_sync_txnid == (uint32_t)head.txnid - txnid_dist)
need_flush_for_nometasync = true;
else {
rc = meta_sync(env, head);
if (unlikely(rc != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "presync-meta", rc);
goto fail;
}
}
}
if (txn->tw.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
tASSERT(txn, txn->tw.loose_count == 0);
mdbx_filehandle_t fd =
#if defined(_WIN32) || defined(_WIN64)
env->ioring.overlapped_fd ? env->ioring.overlapped_fd : env->lazy_fd;
(void)need_flush_for_nometasync;
#else
(need_flush_for_nometasync || env->dsync_fd == INVALID_HANDLE_VALUE ||
txn->tw.dirtylist->length > env->options.writethrough_threshold ||
atomic_load64(&env->lck->unsynced_pages, mo_Relaxed))
? env->lazy_fd
: env->dsync_fd;
#endif /* Windows */
iov_ctx_t write_ctx;
rc = iov_init(txn, &write_ctx, txn->tw.dirtylist->length, txn->tw.dirtylist->pages_including_loose, fd, false);
if (unlikely(rc != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "iov-init", rc);
goto fail;
}
rc = txn_write(txn, &write_ctx);
if (unlikely(rc != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "write", rc);
goto fail;
}
} else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
env->lck->unsynced_pages.weak += txn->tw.writemap_dirty_npages;
if (!env->lck->eoos_timestamp.weak)
env->lck->eoos_timestamp.weak = osal_monotime();
}
/* TODO: use ctx.flush_begin & ctx.flush_end for range-sync */
ts_4 = latency ? osal_monotime() : 0;
meta_t meta;
memcpy(meta.magic_and_version, head.ptr_c->magic_and_version, 8);
meta.reserve16 = head.ptr_c->reserve16;
meta.validator_id = head.ptr_c->validator_id;
meta.extra_pagehdr = head.ptr_c->extra_pagehdr;
unaligned_poke_u64(4, meta.pages_retired,
unaligned_peek_u64(4, head.ptr_c->pages_retired) + MDBX_PNL_GETSIZE(txn->tw.retired_pages));
meta.geometry = txn->geo;
meta.trees.gc = txn->dbs[FREE_DBI];
meta.trees.main = txn->dbs[MAIN_DBI];
meta.canary = txn->canary;
memcpy(&meta.dxbid, &head.ptr_c->dxbid, sizeof(meta.dxbid));
txnid_t commit_txnid = txn->txnid;
#if MDBX_ENABLE_BIGFOOT
if (gcu_ctx.bigfoot > txn->txnid) {
commit_txnid = gcu_ctx.bigfoot;
TRACE("use @%" PRIaTXN " (+%zu) for commit bigfoot-txn", commit_txnid, (size_t)(commit_txnid - txn->txnid));
}
#endif
meta.unsafe_sign = DATASIGN_NONE;
meta_set_txnid(env, &meta, commit_txnid);
rc = dxb_sync_locked(env, env->flags | txn->flags | txn_shrink_allowed, &meta, &txn->tw.troika);
ts_5 = latency ? osal_monotime() : 0;
if (unlikely(rc != MDBX_SUCCESS)) {
env->flags |= ENV_FATAL_ERROR;
ERROR("txn-%s: error %d", "sync", rc);
goto fail;
}
end_mode = TXN_END_COMMITTED | TXN_END_UPDATE | TXN_END_EOTDONE;
done: done:
if (latency) latency_done(latency, &ts);
txn_take_gcprof(txn, latency);
rc = txn_end(txn, end_mode);
provide_latency:
if (latency) {
latency->preparation = ts_1 ? osal_monotime_to_16dot16(ts_1 - ts_0) : 0;
latency->gc_wallclock = (ts_2 > ts_1) ? osal_monotime_to_16dot16(ts_2 - ts_1) : 0;
latency->gc_cputime = gc_cputime ? osal_monotime_to_16dot16(gc_cputime) : 0;
latency->audit = (ts_3 > ts_2) ? osal_monotime_to_16dot16(ts_3 - ts_2) : 0;
latency->write = (ts_4 > ts_3) ? osal_monotime_to_16dot16(ts_4 - ts_3) : 0;
latency->sync = (ts_5 > ts_4) ? osal_monotime_to_16dot16(ts_5 - ts_4) : 0;
const uint64_t ts_6 = osal_monotime();
latency->ending = ts_5 ? osal_monotime_to_16dot16(ts_6 - ts_5) : 0;
latency->whole = osal_monotime_to_16dot16_noUnderflow(ts_6 - ts_0);
}
return LOG_IFERR(rc); return LOG_IFERR(rc);
fail:
txn->flags |= MDBX_TXN_ERROR;
if (latency)
txn_take_gcprof(txn, latency);
txn_abort(txn);
goto provide_latency;
} }
int mdbx_txn_info(const MDBX_txn *txn, MDBX_txn_info *info, bool scan_rlt) { int mdbx_txn_info(const MDBX_txn *txn, MDBX_txn_info *info, bool scan_rlt) {
@ -848,10 +465,10 @@ int mdbx_txn_info(const MDBX_txn *txn, MDBX_txn_info *info, bool scan_rlt) {
info->txn_reader_lag = head.txnid - info->txn_id; info->txn_reader_lag = head.txnid - info->txn_id;
info->txn_space_dirty = info->txn_space_retired = 0; info->txn_space_dirty = info->txn_space_retired = 0;
uint64_t reader_snapshot_pages_retired = 0; uint64_t reader_snapshot_pages_retired = 0;
if (txn->to.reader && if (txn->ro.slot &&
((txn->flags & MDBX_TXN_PARKED) == 0 || safe64_read(&txn->to.reader->tid) != MDBX_TID_TXN_OUSTED) && ((txn->flags & MDBX_TXN_PARKED) == 0 || safe64_read(&txn->ro.slot->tid) != MDBX_TID_TXN_OUSTED) &&
head_retired > head_retired >
(reader_snapshot_pages_retired = atomic_load64(&txn->to.reader->snapshot_pages_retired, mo_Relaxed))) { (reader_snapshot_pages_retired = atomic_load64(&txn->ro.slot->snapshot_pages_retired, mo_Relaxed))) {
info->txn_space_dirty = info->txn_space_retired = info->txn_space_dirty = info->txn_space_retired =
pgno2bytes(env, (pgno_t)(head_retired - reader_snapshot_pages_retired)); pgno2bytes(env, (pgno_t)(head_retired - reader_snapshot_pages_retired));
@ -878,7 +495,7 @@ int mdbx_txn_info(const MDBX_txn *txn, MDBX_txn_info *info, bool scan_rlt) {
if (snap_txnid < next_reader && snap_tid >= MDBX_TID_TXN_OUSTED) { if (snap_txnid < next_reader && snap_tid >= MDBX_TID_TXN_OUSTED) {
next_reader = snap_txnid; next_reader = snap_txnid;
retired_next_reader = pgno2bytes( retired_next_reader = pgno2bytes(
env, (pgno_t)(snap_retired - atomic_load64(&txn->to.reader->snapshot_pages_retired, mo_Relaxed))); env, (pgno_t)(snap_retired - atomic_load64(&txn->ro.slot->snapshot_pages_retired, mo_Relaxed)));
} }
} }
} }
@ -889,31 +506,33 @@ int mdbx_txn_info(const MDBX_txn *txn, MDBX_txn_info *info, bool scan_rlt) {
info->txn_space_limit_soft = pgno2bytes(env, txn->geo.now); info->txn_space_limit_soft = pgno2bytes(env, txn->geo.now);
info->txn_space_limit_hard = pgno2bytes(env, txn->geo.upper); info->txn_space_limit_hard = pgno2bytes(env, txn->geo.upper);
info->txn_space_retired = info->txn_space_retired =
pgno2bytes(env, txn->nested ? (size_t)txn->tw.retired_pages : MDBX_PNL_GETSIZE(txn->tw.retired_pages)); pgno2bytes(env, txn->nested ? (size_t)txn->wr.retired_pages : pnl_size(txn->wr.retired_pages));
info->txn_space_leftover = pgno2bytes(env, txn->tw.dirtyroom); info->txn_space_leftover = pgno2bytes(env, txn->wr.dirtyroom);
info->txn_space_dirty = info->txn_space_dirty =
pgno2bytes(env, txn->tw.dirtylist ? txn->tw.dirtylist->pages_including_loose pgno2bytes(env, txn->wr.dirtylist ? txn->wr.dirtylist->pages_including_loose
: (txn->tw.writemap_dirty_npages + txn->tw.writemap_spilled_npages)); : (txn->wr.writemap_dirty_npages + txn->wr.writemap_spilled_npages));
info->txn_reader_lag = INT64_MAX; info->txn_reader_lag = INT64_MAX;
lck_t *const lck = env->lck_mmap.lck; lck_t *const lck = env->lck_mmap.lck;
if (scan_rlt && lck) { if (scan_rlt && lck) {
txnid_t oldest_snapshot = txn->txnid; txnid_t oldest_reading = txn->txnid;
const size_t snap_nreaders = atomic_load32(&lck->rdt_length, mo_AcquireRelease); const size_t snap_nreaders = atomic_load32(&lck->rdt_length, mo_AcquireRelease);
if (snap_nreaders) { if (snap_nreaders) {
oldest_snapshot = txn_snapshot_oldest(txn); txn_gc_detent(txn);
if (oldest_snapshot == txn->txnid - 1) { oldest_reading = txn->env->gc.detent;
/* check if there is at least one reader */ if (oldest_reading == txn->wr.troika.txnid[txn->wr.troika.recent]) {
bool exists = false; /* Если самый старый используемый снимок является предыдущим, т.е. непосредственно предшествующим текущей
* транзакции, то просматриваем таблицу читателей чтобы выяснить действительно ли снимок используется
* читателями. */
oldest_reading = txn->txnid;
for (size_t i = 0; i < snap_nreaders; ++i) { for (size_t i = 0; i < snap_nreaders; ++i) {
if (atomic_load32(&lck->rdt[i].pid, mo_Relaxed) && txn->txnid > safe64_read(&lck->rdt[i].txnid)) { if (atomic_load32(&lck->rdt[i].pid, mo_Relaxed) && txn->env->gc.detent == safe64_read(&lck->rdt[i].txnid)) {
exists = true; oldest_reading = txn->env->gc.detent;
break; break;
} }
} }
oldest_snapshot += !exists;
} }
} }
info->txn_reader_lag = txn->txnid - oldest_snapshot; info->txn_reader_lag = txn->txnid - oldest_reading;
} }
} }

View File

@ -24,12 +24,11 @@ static size_t audit_db_used(const tree_t *db) {
return db ? (size_t)db->branch_pages + (size_t)db->leaf_pages + (size_t)db->large_pages : 0; return db ? (size_t)db->branch_pages + (size_t)db->leaf_pages + (size_t)db->large_pages : 0;
} }
__cold static int audit_ex_locked(MDBX_txn *txn, size_t retired_stored, bool dont_filter_gc) { __cold static int audit_ex_locked(MDBX_txn *txn, const size_t retired_stored, const bool dont_filter_gc) {
const MDBX_env *const env = txn->env; const MDBX_env *const env = txn->env;
size_t pending = 0; tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
if ((txn->flags & MDBX_TXN_RDONLY) == 0) const size_t pending =
pending = txn->tw.loose_count + MDBX_PNL_GETSIZE(txn->tw.repnl) + txn->wr.loose_count + pnl_size(txn->wr.repnl) + (pnl_size(txn->wr.retired_pages) - retired_stored);
(MDBX_PNL_GETSIZE(txn->tw.retired_pages) - retired_stored);
cursor_couple_t cx; cursor_couple_t cx;
int rc = cursor_init(&cx.outer, txn, FREE_DBI); int rc = cursor_init(&cx.outer, txn, FREE_DBI);
@ -40,17 +39,16 @@ __cold static int audit_ex_locked(MDBX_txn *txn, size_t retired_stored, bool don
MDBX_val key, data; MDBX_val key, data;
rc = outer_first(&cx.outer, &key, &data); rc = outer_first(&cx.outer, &key, &data);
while (rc == MDBX_SUCCESS) { while (rc == MDBX_SUCCESS) {
if (!dont_filter_gc) {
if (unlikely(key.iov_len != sizeof(txnid_t))) { if (unlikely(key.iov_len != sizeof(txnid_t))) {
ERROR("%s/%d: %s %u", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid GC-key size", (unsigned)key.iov_len); ERROR("%s/%d: %s %u", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid GC-key size", (unsigned)key.iov_len);
return MDBX_CORRUPTED; return MDBX_CORRUPTED;
} }
txnid_t id = unaligned_peek_u64(4, key.iov_base); const txnid_t id = unaligned_peek_u64(4, key.iov_base);
if (txn->tw.gc.retxl ? txl_contain(txn->tw.gc.retxl, id) : (id <= txn->tw.gc.last_reclaimed)) const size_t len = *(pgno_t *)data.iov_base;
goto skip; const bool acc = dont_filter_gc || !gc_is_reclaimed(txn, id);
} TRACE("%s id %" PRIaTXN " len %zu", acc ? "acc" : "skip", id, len);
gc += *(pgno_t *)data.iov_base; if (acc)
skip: gc += len;
rc = outer_next(&cx.outer, &key, &data, MDBX_NEXT); rc = outer_next(&cx.outer, &key, &data, MDBX_NEXT);
} }
tASSERT(txn, rc == MDBX_NOTFOUND); tASSERT(txn, rc == MDBX_NOTFOUND);
@ -89,8 +87,8 @@ __cold static int audit_ex_locked(MDBX_txn *txn, size_t retired_stored, bool don
if ((txn->flags & MDBX_TXN_RDONLY) == 0) if ((txn->flags & MDBX_TXN_RDONLY) == 0)
ERROR("audit @%" PRIaTXN ": %zu(pending) = %zu(loose) + " ERROR("audit @%" PRIaTXN ": %zu(pending) = %zu(loose) + "
"%zu(reclaimed) + %zu(retired-pending) - %zu(retired-stored)", "%zu(reclaimed) + %zu(retired-pending) - %zu(retired-stored)",
txn->txnid, pending, txn->tw.loose_count, MDBX_PNL_GETSIZE(txn->tw.repnl), txn->txnid, pending, txn->wr.loose_count, pnl_size(txn->wr.repnl),
txn->tw.retired_pages ? MDBX_PNL_GETSIZE(txn->tw.retired_pages) : 0, retired_stored); txn->wr.retired_pages ? pnl_size(txn->wr.retired_pages) : 0, retired_stored);
ERROR("audit @%" PRIaTXN ": %zu(pending) + %zu" ERROR("audit @%" PRIaTXN ": %zu(pending) + %zu"
"(gc) + %zu(count) = %zu(total) <> %zu" "(gc) + %zu(count) = %zu(total) <> %zu"
"(allocated)", "(allocated)",

View File

@ -8,7 +8,7 @@ N | MASK | ENV | TXN | DB | PUT | DBI | NOD
5 |0000 0020| |TXN_PARKED |INTEGERDUP|NODUPDATA | | |P_DUPFIX | | 5 |0000 0020| |TXN_PARKED |INTEGERDUP|NODUPDATA | | |P_DUPFIX | |
6 |0000 0040| |TXN_AUTOUNPARK|REVERSEDUP|CURRENT |DBI_OLDEN | |P_SUBP | | 6 |0000 0040| |TXN_AUTOUNPARK|REVERSEDUP|CURRENT |DBI_OLDEN | |P_SUBP | |
7 |0000 0080| |TXN_DRAINED_GC|DB_VALID |ALLDUPS |DBI_LINDO | | | | 7 |0000 0080| |TXN_DRAINED_GC|DB_VALID |ALLDUPS |DBI_LINDO | | | |
8 |0000 0100| _MAY_MOVE | | | | | | | <= | 8 |0000 0100| _MAY_MOVE |TXN_CURSORS | | | | | | <= |
9 |0000 0200| _MAY_UNMAP| | | | | | | <= | 9 |0000 0200| _MAY_UNMAP| | | | | | | <= |
10|0000 0400| | | | | | | | | 10|0000 0400| | | | | | | | |
11|0000 0800| | | | | | | | | 11|0000 0800| | | | | | | | |

108
src/chk.c
View File

@ -159,6 +159,19 @@ __cold static MDBX_chk_line_t *MDBX_PRINTF_ARGS(2, 3) chk_print(MDBX_chk_line_t
return line; return line;
} }
MDBX_MAYBE_UNUSED __cold static void chk_println_va(MDBX_chk_scope_t *const scope, enum MDBX_chk_severity severity,
const char *fmt, va_list args) {
chk_line_end(chk_print_va(chk_line_begin(scope, severity), fmt, args));
}
MDBX_MAYBE_UNUSED __cold static void chk_println(MDBX_chk_scope_t *const scope, enum MDBX_chk_severity severity,
const char *fmt, ...) {
va_list args;
va_start(args, fmt);
chk_println_va(scope, severity, fmt, args);
va_end(args);
}
__cold static MDBX_chk_line_t *chk_print_size(MDBX_chk_line_t *line, const char *prefix, const uint64_t value, __cold static MDBX_chk_line_t *chk_print_size(MDBX_chk_line_t *line, const char *prefix, const uint64_t value,
const char *suffix) { const char *suffix) {
static const char sf[] = "KMGTPEZY"; /* LY: Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta! */ static const char sf[] = "KMGTPEZY"; /* LY: Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta! */
@ -213,7 +226,7 @@ __cold static void MDBX_PRINTF_ARGS(5, 6)
issue->next = chk->usr->scope->issues; issue->next = chk->usr->scope->issues;
chk->usr->scope->issues = issue; chk->usr->scope->issues = issue;
} else } else
chk_error_rc(scope, ENOMEM, "adding issue"); chk_error_rc(scope, MDBX_ENOMEM, "adding issue");
} }
va_list args; va_list args;
@ -455,11 +468,10 @@ __cold static void chk_dispose(MDBX_chk_internal_t *chk) {
chk->cb->table_dispose(chk->usr, tbl); chk->cb->table_dispose(chk->usr, tbl);
tbl->cookie = nullptr; tbl->cookie = nullptr;
} }
if (tbl != &chk->table_gc && tbl != &chk->table_main) { if (tbl != &chk->table_gc && tbl != &chk->table_main)
osal_free(tbl); osal_free(tbl);
} }
} }
}
osal_free(chk->v2a_buf.iov_base); osal_free(chk->v2a_buf.iov_base);
osal_free(chk->pagemap); osal_free(chk->pagemap);
chk->usr->internal = nullptr; chk->usr->internal = nullptr;
@ -674,7 +686,7 @@ __cold static void chk_verbose_meta(MDBX_chk_scope_t *const scope, const unsigne
__cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *const ctx, const int deep, __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *const ctx, const int deep,
const walk_tbl_t *tbl_info, const size_t page_size, const page_type_t pagetype, const walk_tbl_t *tbl_info, const size_t page_size, const page_type_t pagetype,
const MDBX_error_t page_err, const size_t nentries, const size_t payload_bytes, const MDBX_error_t page_err, const size_t nentries, const size_t payload_bytes,
const size_t header_bytes, const size_t unused_bytes) { const size_t header_bytes, const size_t unused_bytes, const size_t parent_pgno) {
MDBX_chk_scope_t *const scope = ctx; MDBX_chk_scope_t *const scope = ctx;
MDBX_chk_internal_t *const chk = scope->internal; MDBX_chk_internal_t *const chk = scope->internal;
MDBX_chk_context_t *const usr = chk->usr; MDBX_chk_context_t *const usr = chk->usr;
@ -686,7 +698,7 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
return err; return err;
if (deep > 42) { if (deep > 42) {
chk_scope_issue(scope, "too deeply %u", deep); chk_scope_issue(scope, "too deeply %u, page %zu, parent %zu", deep, pgno, parent_pgno);
return MDBX_CORRUPTED /* avoid infinite loop/recursion */; return MDBX_CORRUPTED /* avoid infinite loop/recursion */;
} }
histogram_acc(deep, &tbl->histogram.deep); histogram_acc(deep, &tbl->histogram.deep);
@ -710,9 +722,11 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
const char *pagetype_caption; const char *pagetype_caption;
bool branch = false; bool branch = false;
struct MDBX_chk_histogram *filling = nullptr;
switch (pagetype) { switch (pagetype) {
default: default:
chk_object_issue(scope, "page", pgno, "unknown page-type", "type %u, deep %i", (unsigned)pagetype, deep); chk_object_issue(scope, "page", pgno, "unknown page-type", "type %u, deep %i, parent %zu", (unsigned)pagetype, deep,
parent_pgno);
pagetype_caption = "unknown"; pagetype_caption = "unknown";
tbl->pages.other += npages; tbl->pages.other += npages;
break; break;
@ -730,42 +744,46 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
pagetype_caption = "large"; pagetype_caption = "large";
histogram_acc(npages, &tbl->histogram.large_pages); histogram_acc(npages, &tbl->histogram.large_pages);
if (tbl->flags & MDBX_DUPSORT) if (tbl->flags & MDBX_DUPSORT)
chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i", (unsigned)pagetype, chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i, parent %zu",
chk_v2a(chk, &tbl->name), tbl->flags, deep); (unsigned)pagetype, chk_v2a(chk, &tbl->name), tbl->flags, deep, parent_pgno);
break; break;
case page_branch: case page_branch:
branch = true; branch = true;
if (!nested) { if (!nested) {
pagetype_caption = "branch"; pagetype_caption = "branch";
tbl->pages.branch += 1; tbl->pages.branch += 1;
filling = &tbl->histogram.tree_filling;
} else { } else {
pagetype_caption = "nested-branch"; pagetype_caption = "nested-branch";
tbl->pages.nested_branch += 1; tbl->pages.nested_branch += 1;
filling = &tbl->histogram.nested_tree_filling;
} }
break; break;
case page_dupfix_leaf: case page_dupfix_leaf:
if (!nested) if (!nested)
chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i", (unsigned)pagetype, chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i, parent %zu",
chk_v2a(chk, &tbl->name), tbl->flags, deep); (unsigned)pagetype, chk_v2a(chk, &tbl->name), tbl->flags, deep, parent_pgno);
/* fall through */ /* fall through */
__fallthrough; __fallthrough;
case page_leaf: case page_leaf:
if (!nested) { if (!nested) {
pagetype_caption = "leaf"; pagetype_caption = "leaf";
tbl->pages.leaf += 1; tbl->pages.leaf += 1;
filling = &tbl->histogram.tree_filling;
if (height != tbl_info->internal->height) if (height != tbl_info->internal->height)
chk_object_issue(scope, "page", pgno, "wrong tree height", "actual %i != %i table %s", height, chk_object_issue(scope, "page", pgno, "wrong tree height", "actual %i != %i table %s, parent %zu", height,
tbl_info->internal->height, chk_v2a(chk, &tbl->name)); tbl_info->internal->height, chk_v2a(chk, &tbl->name), parent_pgno);
} else { } else {
pagetype_caption = (pagetype == page_leaf) ? "nested-leaf" : "nested-leaf-dupfix"; pagetype_caption = (pagetype == page_leaf) ? "nested-leaf" : "nested-leaf-dupfix";
tbl->pages.nested_leaf += 1; tbl->pages.nested_leaf += 1;
filling = &tbl->histogram.nested_tree_filling;
if (chk->last_nested != nested) { if (chk->last_nested != nested) {
histogram_acc(height, &tbl->histogram.nested_tree); histogram_acc(height, &tbl->histogram.nested_tree);
chk->last_nested = nested; chk->last_nested = nested;
} }
if (height != nested->height) if (height != nested->height)
chk_object_issue(scope, "page", pgno, "wrong nested-tree height", "actual %i != %i dupsort-node %s", height, chk_object_issue(scope, "page", pgno, "wrong nested-tree height", "actual %i != %i dupsort-node %s, parent %zu",
nested->height, chk_v2a(chk, &tbl->name)); height, nested->height, chk_v2a(chk, &tbl->name), parent_pgno);
} }
break; break;
case page_sub_dupfix_leaf: case page_sub_dupfix_leaf:
@ -773,11 +791,16 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
pagetype_caption = (pagetype == page_sub_leaf) ? "subleaf-dupsort" : "subleaf-dupfix"; pagetype_caption = (pagetype == page_sub_leaf) ? "subleaf-dupsort" : "subleaf-dupfix";
tbl->pages.nested_subleaf += 1; tbl->pages.nested_subleaf += 1;
if ((tbl->flags & MDBX_DUPSORT) == 0 || nested) if ((tbl->flags & MDBX_DUPSORT) == 0 || nested)
chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i", (unsigned)pagetype, chk_object_issue(scope, "page", pgno, "unexpected", "type %u, table %s flags 0x%x, deep %i, parent %zu",
chk_v2a(chk, &tbl->name), tbl->flags, deep); (unsigned)pagetype, chk_v2a(chk, &tbl->name), tbl->flags, deep, parent_pgno);
else
filling = &tbl->histogram.nested_tree_filling;
break; break;
} }
if (filling)
histogram_acc((page_size - unused_bytes) * 100 / page_size, filling);
if (npages) { if (npages) {
if (tbl->cookie) { if (tbl->cookie) {
MDBX_chk_line_t *line = chk_line_begin(scope, MDBX_chk_extra); MDBX_chk_line_t *line = chk_line_begin(scope, MDBX_chk_extra);
@ -801,7 +824,8 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
} else if (chk->pagemap[spanpgno]) { } else if (chk->pagemap[spanpgno]) {
const MDBX_chk_table_t *const rival = chk->table[chk->pagemap[spanpgno] - 1]; const MDBX_chk_table_t *const rival = chk->table[chk->pagemap[spanpgno] - 1];
chk_object_issue(scope, "page", spanpgno, (branch && rival == tbl) ? "loop" : "already used", chk_object_issue(scope, "page", spanpgno, (branch && rival == tbl) ? "loop" : "already used",
"%s-page: by %s, deep %i", pagetype_caption, chk_v2a(chk, &rival->name), deep); "%s-page: by %s, deep %i, parent %zu", pagetype_caption, chk_v2a(chk, &rival->name), deep,
parent_pgno);
already_used = true; already_used = true;
} else { } else {
chk->pagemap[spanpgno] = (int16_t)tbl->id + 1; chk->pagemap[spanpgno] = (int16_t)tbl->id + 1;
@ -815,21 +839,21 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
} }
if (MDBX_IS_ERROR(page_err)) { if (MDBX_IS_ERROR(page_err)) {
chk_object_issue(scope, "page", pgno, "invalid/corrupted", "%s-page", pagetype_caption); chk_object_issue(scope, "page", pgno, "invalid/corrupted", "%s-page, parent %zu", pagetype_caption, parent_pgno);
} else { } else {
if (unused_bytes > page_size) if (unused_bytes > page_size)
chk_object_issue(scope, "page", pgno, "illegal unused-bytes", "%s-page: %u < %" PRIuSIZE " < %u", chk_object_issue(scope, "page", pgno, "illegal unused-bytes", "%s-page: %u < %" PRIuSIZE " < %u, parent %zu",
pagetype_caption, 0, unused_bytes, env->ps); pagetype_caption, 0, unused_bytes, env->ps, parent_pgno);
if (header_bytes < (int)sizeof(long) || (size_t)header_bytes >= env->ps - sizeof(long)) { if (header_bytes < (int)sizeof(long) || (size_t)header_bytes >= env->ps - sizeof(long)) {
chk_object_issue(scope, "page", pgno, "illegal header-length", chk_object_issue(scope, "page", pgno, "illegal header-length",
"%s-page: %" PRIuSIZE " < %" PRIuSIZE " < %" PRIuSIZE, pagetype_caption, sizeof(long), "%s-page: %" PRIuSIZE " < %" PRIuSIZE " < %" PRIuSIZE ", parent %zu", pagetype_caption,
header_bytes, env->ps - sizeof(long)); sizeof(long), header_bytes, env->ps - sizeof(long), parent_pgno);
} }
if (nentries < 1 || (pagetype == page_branch && nentries < 2)) { if (nentries < 1 || (pagetype == page_branch && nentries < 2)) {
chk_object_issue(scope, "page", pgno, nentries ? "half-empty" : "empty", chk_object_issue(scope, "page", pgno, nentries ? "half-empty" : "empty",
"%s-page: payload %" PRIuSIZE " bytes, %" PRIuSIZE " entries, deep %i", pagetype_caption, "%s-page: payload %" PRIuSIZE " bytes, %" PRIuSIZE " entries, deep %i, parent %zu",
payload_bytes, nentries, deep); pagetype_caption, payload_bytes, nentries, deep, parent_pgno);
tbl->pages.empty += 1; tbl->pages.empty += 1;
} }
@ -837,8 +861,9 @@ __cold static int chk_pgvisitor(const size_t pgno, const unsigned npages, void *
if (page_bytes != page_size) { if (page_bytes != page_size) {
chk_object_issue(scope, "page", pgno, "misused", chk_object_issue(scope, "page", pgno, "misused",
"%s-page: %" PRIuPTR " != %" PRIuPTR " (%" PRIuPTR "h + %" PRIuPTR "p + %" PRIuPTR "%s-page: %" PRIuPTR " != %" PRIuPTR " (%" PRIuPTR "h + %" PRIuPTR "p + %" PRIuPTR
"u), deep %i", "u), deep %i, parent %zu",
pagetype_caption, page_size, page_bytes, header_bytes, payload_bytes, unused_bytes, deep); pagetype_caption, page_size, page_bytes, header_bytes, payload_bytes, unused_bytes, deep,
parent_pgno);
if (page_size > page_bytes) if (page_size > page_bytes)
tbl->lost_bytes += page_size - page_bytes; tbl->lost_bytes += page_size - page_bytes;
} else { } else {
@ -950,6 +975,12 @@ __cold static int chk_tree(MDBX_chk_scope_t *const scope) {
line = chk_print(line, ", %" PRIuSIZE " empty pages", tbl->pages.empty); line = chk_print(line, ", %" PRIuSIZE " empty pages", tbl->pages.empty);
if (tbl->lost_bytes) if (tbl->lost_bytes)
line = chk_print(line, ", %" PRIuSIZE " bytes lost", tbl->lost_bytes); line = chk_print(line, ", %" PRIuSIZE " bytes lost", tbl->lost_bytes);
line =
histogram_dist(chk_line_feed(line), &tbl->histogram.tree_filling, "tree %-filling density", "1", false);
if (tbl->histogram.nested_tree_filling.count)
line = histogram_dist(chk_line_feed(line), &tbl->histogram.nested_tree_filling,
"nested tree(s) %-filling density", "1", false);
chk_line_end(line); chk_line_end(line);
} }
} }
@ -1127,6 +1158,7 @@ __cold static int chk_db(MDBX_chk_scope_t *const scope, MDBX_dbi dbi, MDBX_chk_t
const size_t maxkeysize = mdbx_env_get_maxkeysize_ex(env, tbl->flags); const size_t maxkeysize = mdbx_env_get_maxkeysize_ex(env, tbl->flags);
MDBX_val prev_key = {nullptr, 0}, prev_data = {nullptr, 0}; MDBX_val prev_key = {nullptr, 0}, prev_data = {nullptr, 0};
MDBX_val key, data; MDBX_val key, data;
size_t dups_count = 0;
err = mdbx_cursor_get(cursor, &key, &data, MDBX_FIRST); err = mdbx_cursor_get(cursor, &key, &data, MDBX_FIRST);
while (err == MDBX_SUCCESS) { while (err == MDBX_SUCCESS) {
err = chk_check_break(scope); err = chk_check_break(scope);
@ -1150,6 +1182,12 @@ __cold static int chk_db(MDBX_chk_scope_t *const scope, MDBX_dbi dbi, MDBX_chk_t
} }
if (prev_key.iov_base) { if (prev_key.iov_base) {
if (key.iov_base == prev_key.iov_base)
dups_count += 1;
else {
histogram_acc(dups_count, &tbl->histogram.multival);
dups_count = 0;
}
if (prev_data.iov_base && !bad_data && (tbl->flags & MDBX_DUPFIXED) && prev_data.iov_len != data.iov_len) { if (prev_data.iov_base && !bad_data && (tbl->flags & MDBX_DUPFIXED) && prev_data.iov_len != data.iov_len) {
chk_object_issue(scope, "entry", record_count, "different data length", "%" PRIuPTR " != %" PRIuPTR, chk_object_issue(scope, "entry", record_count, "different data length", "%" PRIuPTR " != %" PRIuPTR,
prev_data.iov_len, data.iov_len); prev_data.iov_len, data.iov_len);
@ -1236,17 +1274,27 @@ __cold static int chk_db(MDBX_chk_scope_t *const scope, MDBX_dbi dbi, MDBX_chk_t
err = mdbx_cursor_get(cursor, &key, &data, MDBX_NEXT); err = mdbx_cursor_get(cursor, &key, &data, MDBX_NEXT);
} }
if (prev_key.iov_base)
histogram_acc(dups_count, &tbl->histogram.multival);
err = (err != MDBX_NOTFOUND) ? chk_error_rc(scope, err, "mdbx_cursor_get") : MDBX_SUCCESS; err = (err != MDBX_NOTFOUND) ? chk_error_rc(scope, err, "mdbx_cursor_get") : MDBX_SUCCESS;
if (err == MDBX_SUCCESS && record_count != db->items) if (err == MDBX_SUCCESS && record_count != db->items)
chk_scope_issue(scope, "different number of entries %" PRIuSIZE " != %" PRIu64, record_count, db->items); chk_scope_issue(scope, "different number of entries %" PRIuSIZE " != %" PRIu64, record_count, db->items);
bailout: bailout:
if (cursor) { if (cursor) {
if (handler) { if (handler) {
if (tbl->histogram.key_len.count) { if (record_count) {
MDBX_chk_line_t *line = chk_line_begin(scope, MDBX_chk_info); MDBX_chk_line_t *line = chk_line_begin(scope, MDBX_chk_info);
line = histogram_dist(line, &tbl->histogram.key_len, "key length density", "0/1", false); line = histogram_dist(line, &tbl->histogram.key_len, "key length density", "0/1", false);
chk_line_feed(line); chk_line_feed(line);
line = histogram_dist(line, &tbl->histogram.val_len, "value length density", "0/1", false); line = histogram_dist(line, &tbl->histogram.val_len, "value length density", "0/1", false);
if (tbl->histogram.multival.amount) {
chk_line_feed(line);
line = histogram_dist(line, &tbl->histogram.multival, "number of multi-values density", "single", false);
chk_line_feed(line);
line = chk_print(line, "number of keys %" PRIuSIZE ", average values per key %.1f",
tbl->histogram.multival.count, record_count / (double)tbl->histogram.multival.count);
}
chk_line_end(line); chk_line_end(line);
} }
if (scope->stage == MDBX_chk_maindb) if (scope->stage == MDBX_chk_maindb)
@ -1301,9 +1349,9 @@ __cold static int chk_handle_gc(MDBX_chk_scope_t *const scope, MDBX_chk_table_t
(number + 1) * sizeof(pgno_t), data->iov_len); (number + 1) * sizeof(pgno_t), data->iov_len);
number = data->iov_len / sizeof(pgno_t) - 1; number = data->iov_len / sizeof(pgno_t) - 1;
} else if (data->iov_len - (number + 1) * sizeof(pgno_t) >= } else if (data->iov_len - (number + 1) * sizeof(pgno_t) >=
/* LY: allow gap up to one page. it is ok /* LY: allow gap up to two page. it is ok
* and better than shink-and-retry inside gc_update() */ * and better than shrink-and-retry inside gc_update() */
usr->env->ps) usr->env->ps * 2)
chk_object_issue(scope, "entry", txnid, "extra idl space", chk_object_issue(scope, "entry", txnid, "extra idl space",
"%" PRIuSIZE " < %" PRIuSIZE " (minor, not a trouble)", (number + 1) * sizeof(pgno_t), "%" PRIuSIZE " < %" PRIuSIZE " (minor, not a trouble)", (number + 1) * sizeof(pgno_t),
data->iov_len); data->iov_len);

View File

@ -250,9 +250,15 @@ MDBX_NOTHROW_PURE_FUNCTION static inline const page_t *data_page(const void *dat
MDBX_NOTHROW_PURE_FUNCTION static inline meta_t *page_meta(page_t *mp) { return (meta_t *)page_data(mp); } MDBX_NOTHROW_PURE_FUNCTION static inline meta_t *page_meta(page_t *mp) { return (meta_t *)page_data(mp); }
MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_numkeys(const page_t *mp) { return mp->lower >> 1; } MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_numkeys(const page_t *mp) {
assert(mp->lower <= mp->upper);
return mp->lower >> 1;
}
MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_room(const page_t *mp) { return mp->upper - mp->lower; } MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_room(const page_t *mp) {
assert(mp->lower <= mp->upper);
return mp->upper - mp->lower;
}
MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_space(const MDBX_env *env) { MDBX_NOTHROW_PURE_FUNCTION static inline size_t page_space(const MDBX_env *env) {
STATIC_ASSERT(PAGEHDRSZ % 2 == 0); STATIC_ASSERT(PAGEHDRSZ % 2 == 0);
@ -352,7 +358,7 @@ MDBX_CONST_FUNCTION static inline lck_t *lckless_stub(const MDBX_env *env) {
} }
#if !(defined(_WIN32) || defined(_WIN64)) #if !(defined(_WIN32) || defined(_WIN64))
MDBX_MAYBE_UNUSED static inline int ignore_enosys(int err) { MDBX_CONST_FUNCTION static inline int ignore_enosys(int err) {
#ifdef ENOSYS #ifdef ENOSYS
if (err == ENOSYS) if (err == ENOSYS)
return MDBX_RESULT_TRUE; return MDBX_RESULT_TRUE;
@ -373,10 +379,21 @@ MDBX_MAYBE_UNUSED static inline int ignore_enosys(int err) {
if (err == EOPNOTSUPP) if (err == EOPNOTSUPP)
return MDBX_RESULT_TRUE; return MDBX_RESULT_TRUE;
#endif /* EOPNOTSUPP */ #endif /* EOPNOTSUPP */
if (err == EAGAIN)
return MDBX_RESULT_TRUE;
return err; return err;
} }
MDBX_MAYBE_UNUSED MDBX_CONST_FUNCTION static inline int ignore_enosys_and_eagain(int err) {
return (err == EAGAIN) ? MDBX_RESULT_TRUE : ignore_enosys(err);
}
MDBX_MAYBE_UNUSED MDBX_CONST_FUNCTION static inline int ignore_enosys_and_einval(int err) {
return (err == EINVAL) ? MDBX_RESULT_TRUE : ignore_enosys(err);
}
MDBX_MAYBE_UNUSED MDBX_CONST_FUNCTION static inline int ignore_enosys_and_eremote(int err) {
return (err == MDBX_EREMOTE) ? MDBX_RESULT_TRUE : ignore_enosys(err);
}
#endif /* defined(_WIN32) || defined(_WIN64) */ #endif /* defined(_WIN32) || defined(_WIN64) */
static inline int check_env(const MDBX_env *env, const bool wanna_active) { static inline int check_env(const MDBX_env *env, const bool wanna_active) {

View File

@ -6,12 +6,12 @@
#include "internals.h" #include "internals.h"
__cold int cursor_validate(const MDBX_cursor *mc) { __cold int cursor_validate(const MDBX_cursor *mc) {
if (!mc->txn->tw.dirtylist) { if (!mc->txn->wr.dirtylist) {
cASSERT(mc, (mc->txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC); cASSERT(mc, (mc->txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
} else { } else {
cASSERT(mc, (mc->txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); cASSERT(mc, (mc->txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
cASSERT(mc, mc->txn->tw.dirtyroom + mc->txn->tw.dirtylist->length == cASSERT(mc, mc->txn->wr.dirtyroom + mc->txn->wr.dirtylist->length ==
(mc->txn->parent ? mc->txn->parent->tw.dirtyroom : mc->txn->env->options.dp_limit)); (mc->txn->parent ? mc->txn->parent->wr.dirtyroom : mc->txn->env->options.dp_limit));
} }
cASSERT(mc, (mc->checking & z_updating) ? mc->top + 1 <= mc->tree->height : mc->top + 1 == mc->tree->height); cASSERT(mc, (mc->checking & z_updating) ? mc->top + 1 <= mc->tree->height : mc->top + 1 == mc->tree->height);
@ -184,79 +184,74 @@ __hot int cursor_touch(MDBX_cursor *const mc, const MDBX_val *key, const MDBX_va
/*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/
int cursor_shadow(MDBX_cursor *mc, MDBX_txn *nested, const size_t dbi) { int cursor_shadow(MDBX_cursor *cursor, MDBX_txn *nested, const size_t dbi) {
tASSERT(nested, cursor->signature == cur_signature_live);
tASSERT(nested, cursor->txn != nested);
cASSERT(cursor, cursor->txn->flags & txn_may_have_cursors);
cASSERT(cursor, dbi == cursor_dbi(cursor));
tASSERT(nested, dbi > FREE_DBI && dbi < nested->n_dbi); tASSERT(nested, dbi > FREE_DBI && dbi < nested->n_dbi);
const size_t size = mc->subcur ? sizeof(MDBX_cursor) + sizeof(subcur_t) : sizeof(MDBX_cursor);
for (MDBX_cursor *bk; mc; mc = bk->next) { const size_t size = cursor->subcur ? sizeof(MDBX_cursor) + sizeof(subcur_t) : sizeof(MDBX_cursor);
cASSERT(mc, mc != mc->next); MDBX_cursor *const shadow = osal_malloc(size);
if (mc->signature != cur_signature_live) { if (unlikely(!shadow))
ENSURE(nested->env, mc->signature == cur_signature_wait4eot);
bk = mc;
continue;
}
bk = osal_malloc(size);
if (unlikely(!bk))
return MDBX_ENOMEM; return MDBX_ENOMEM;
#if MDBX_DEBUG #if MDBX_DEBUG
memset(bk, 0xCD, size); memset(shadow, 0xCD, size);
VALGRIND_MAKE_MEM_UNDEFINED(bk, size); VALGRIND_MAKE_MEM_UNDEFINED(shadow, size);
#endif /* MDBX_DEBUG */ #endif /* MDBX_DEBUG */
*bk = *mc; *shadow = *cursor;
mc->backup = bk; cursor->backup = shadow;
mc->txn = nested; cursor->txn = nested;
mc->tree = &nested->dbs[dbi]; cursor->tree = &nested->dbs[dbi];
mc->dbi_state = &nested->dbi_state[dbi]; cursor->dbi_state = &nested->dbi_state[dbi];
subcur_t *mx = mc->subcur; subcur_t *subcur = cursor->subcur;
if (mx) { if (subcur) {
*(subcur_t *)(bk + 1) = *mx; *(subcur_t *)(shadow + 1) = *subcur;
mx->cursor.txn = nested; subcur->cursor.txn = nested;
mx->cursor.dbi_state = &nested->dbi_state[dbi]; subcur->cursor.dbi_state = &nested->dbi_state[dbi];
}
mc->next = nested->cursors[dbi];
nested->cursors[dbi] = mc;
} }
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
MDBX_cursor *cursor_eot(MDBX_cursor *mc, MDBX_txn *txn, const bool merge) { MDBX_cursor *cursor_eot(MDBX_cursor *cursor, MDBX_txn *txn) {
MDBX_cursor *const next = mc->next; MDBX_cursor *const next = cursor->next;
const unsigned stage = mc->signature; const unsigned stage = cursor->signature;
MDBX_cursor *const bk = mc->backup; MDBX_cursor *const shadow = cursor->backup;
ENSURE(txn->env, stage == cur_signature_live || (stage == cur_signature_wait4eot && bk)); ENSURE(txn->env, stage == cur_signature_live || (stage == cur_signature_wait4eot && shadow));
tASSERT(txn, mc->txn == txn); tASSERT(txn, cursor->txn == txn);
if (bk) { if (shadow) {
subcur_t *mx = mc->subcur; subcur_t *subcur = cursor->subcur;
tASSERT(txn, mc->txn->parent != nullptr); tASSERT(txn, txn->parent != nullptr && shadow->txn == txn->parent);
tASSERT(txn, bk->txn == txn->parent); /* Zap: Using uninitialized memory '*subcur->backup'. */
/* Zap: Using uninitialized memory '*mc->backup'. */
MDBX_SUPPRESS_GOOFY_MSVC_ANALYZER(6001); MDBX_SUPPRESS_GOOFY_MSVC_ANALYZER(6001);
ENSURE(txn->env, bk->signature == cur_signature_live); ENSURE(txn->env, shadow->signature == cur_signature_live);
tASSERT(txn, mx == bk->subcur); tASSERT(txn, subcur == shadow->subcur);
if (merge) { if ((txn->flags & MDBX_TXN_ERROR) == 0) {
/* Update pointers to parent txn */ /* Update pointers to parent txn */
mc->next = bk->next; cursor->next = shadow->next;
mc->backup = bk->backup; cursor->backup = shadow->backup;
mc->txn = bk->txn; cursor->txn = shadow->txn;
mc->tree = bk->tree; cursor->tree = shadow->tree;
mc->dbi_state = bk->dbi_state; cursor->dbi_state = shadow->dbi_state;
if (mx) { if (subcur) {
mx->cursor.txn = bk->txn; subcur->cursor.txn = shadow->txn;
mx->cursor.dbi_state = bk->dbi_state; subcur->cursor.dbi_state = shadow->dbi_state;
} }
} else { } else {
/* Restore from backup, i.e. rollback/abort nested txn */ /* Restore from backup, i.e. rollback/abort nested txn */
*mc = *bk; *cursor = *shadow;
mc->signature = stage /* Promote (cur_signature_wait4eot) state to parent txn */; cursor->signature = stage /* Promote (cur_signature_wait4eot) state to parent txn */;
if (mx) if (subcur)
*mx = *(subcur_t *)(bk + 1); *subcur = *(subcur_t *)(shadow + 1);
} }
bk->signature = 0; shadow->signature = 0;
osal_free(bk); osal_free(shadow);
} else { } else {
ENSURE(mc->txn->env, stage == cur_signature_live); ENSURE(cursor->txn->env, stage == cur_signature_live);
mc->signature = cur_signature_ready4dispose /* Cursor may be reused */; cursor->signature = cur_signature_ready4dispose /* Cursor may be reused */;
mc->next = mc; cursor->next = cursor;
cursor_drown((cursor_couple_t *)mc); cursor_drown((cursor_couple_t *)cursor);
} }
return next; return next;
} }
@ -643,7 +638,7 @@ static __always_inline int cursor_step(const bool inner, const bool forward, MDB
inner_gone(mc); inner_gone(mc);
} else { } else {
if (mc->flags & z_hollow) { if (mc->flags & z_hollow) {
cASSERT(mc, !inner_pointed(mc)); cASSERT(mc, !inner_pointed(mc) || inner_hollow(mc));
return MDBX_ENODATA; return MDBX_ENODATA;
} }
@ -771,7 +766,7 @@ __hot int cursor_put(MDBX_cursor *mc, const MDBX_val *key, MDBX_val *data, unsig
goto skip_check_samedata; goto skip_check_samedata;
} }
} }
if (!(flags & MDBX_RESERVE) && unlikely(cmp_lenfast(&current_data, data) == 0)) if (!(flags & MDBX_RESERVE) && unlikely(eq_fast(&current_data, data)))
return MDBX_SUCCESS /* the same data, nothing to update */; return MDBX_SUCCESS /* the same data, nothing to update */;
skip_check_samedata:; skip_check_samedata:;
} }
@ -783,8 +778,9 @@ __hot int cursor_put(MDBX_cursor *mc, const MDBX_val *key, MDBX_val *data, unsig
rc = MDBX_NO_ROOT; rc = MDBX_NO_ROOT;
} else if ((flags & MDBX_CURRENT) == 0) { } else if ((flags & MDBX_CURRENT) == 0) {
bool exact = false; bool exact = false;
MDBX_val last_key, old_data; MDBX_val old_data;
if ((flags & MDBX_APPEND) && mc->tree->items > 0) { if ((flags & MDBX_APPEND) && mc->tree->items > 0) {
MDBX_val last_key;
old_data.iov_base = nullptr; old_data.iov_base = nullptr;
old_data.iov_len = 0; old_data.iov_len = 0;
rc = (mc->flags & z_inner) ? inner_last(mc, &last_key) : outer_last(mc, &last_key, &old_data); rc = (mc->flags & z_inner) ? inner_last(mc, &last_key) : outer_last(mc, &last_key, &old_data);
@ -802,52 +798,53 @@ __hot int cursor_put(MDBX_cursor *mc, const MDBX_val *key, MDBX_val *data, unsig
} }
} }
} else { } else {
csr_t csr = csr_t csr = cursor_seek(mc, (MDBX_val *)key, &old_data, MDBX_SET);
/* olddata may not be updated in case DUPFIX-page of dupfix-table */
cursor_seek(mc, (MDBX_val *)key, &old_data, MDBX_SET);
rc = csr.err; rc = csr.err;
exact = csr.exact; exact = csr.exact;
} }
if (likely(rc == MDBX_SUCCESS)) {
if (exact) { if (exact) {
cASSERT(mc, rc == MDBX_SUCCESS);
if (unlikely(flags & MDBX_NOOVERWRITE)) { if (unlikely(flags & MDBX_NOOVERWRITE)) {
DEBUG("duplicate key [%s]", DKEY_DEBUG(key)); DEBUG("duplicate key [%s]", DKEY_DEBUG(key));
*data = old_data; *data = old_data;
return MDBX_KEYEXIST; return MDBX_KEYEXIST;
} }
if (unlikely(mc->flags & z_inner)) { if (unlikely(mc->flags & z_inner)) {
/* nested subtree of DUPSORT-database with the same key, /* nested subtree of DUPSORT-database with the same key, nothing to update */
* nothing to update */ return (flags & MDBX_NODUPDATA) ? MDBX_KEYEXIST : MDBX_SUCCESS;
eASSERT(env, data->iov_len == 0 && (old_data.iov_len == 0 ||
/* olddata may not be updated in case
DUPFIX-page of dupfix-table */
(mc->tree->flags & MDBX_DUPFIXED)));
return MDBX_SUCCESS;
} }
if (unlikely(flags & MDBX_ALLDUPS) && inner_pointed(mc)) { if (inner_pointed(mc)) {
err = cursor_del(mc, MDBX_ALLDUPS); if (unlikely(flags & MDBX_ALLDUPS)) {
if (unlikely(err != MDBX_SUCCESS)) rc = cursor_del(mc, MDBX_ALLDUPS);
return err; if (unlikely(rc != MDBX_SUCCESS))
return rc;
flags -= MDBX_ALLDUPS; flags -= MDBX_ALLDUPS;
cASSERT(mc, mc->top + 1 == mc->tree->height); cASSERT(mc, mc->top + 1 == mc->tree->height);
rc = (mc->top >= 0) ? MDBX_NOTFOUND : MDBX_NO_ROOT; rc = (mc->top >= 0) ? MDBX_NOTFOUND : MDBX_NO_ROOT;
exact = false; } else if ((flags & (MDBX_RESERVE | MDBX_MULTIPLE)) == 0) {
} else if (!(flags & (MDBX_RESERVE | MDBX_MULTIPLE))) { old_data = *data;
/* checking for early exit without dirtying pages */ csr_t csr = cursor_seek(&mc->subcur->cursor, &old_data, nullptr, MDBX_SET_RANGE);
if (unlikely(eq_fast(data, &old_data))) { if (unlikely(csr.exact)) {
cASSERT(mc, mc->clc->v.cmp(data, &old_data) == 0); cASSERT(mc, csr.err == MDBX_SUCCESS);
if (mc->subcur) {
if (flags & MDBX_NODUPDATA) if (flags & MDBX_NODUPDATA)
return MDBX_KEYEXIST; return MDBX_KEYEXIST;
if (flags & MDBX_APPENDDUP) if (flags & MDBX_APPENDDUP)
return MDBX_EKEYMISMATCH; return MDBX_EKEYMISMATCH;
}
/* the same data, nothing to update */ /* the same data, nothing to update */
return MDBX_SUCCESS; return MDBX_SUCCESS;
} else if (csr.err != MDBX_SUCCESS && unlikely(csr.err != MDBX_NOTFOUND)) {
be_poor(mc);
return csr.err;
}
}
} else if (!(flags & (MDBX_RESERVE | MDBX_MULTIPLE))) {
if (unlikely(eq_fast(data, &old_data))) {
cASSERT(mc, mc->clc->v.cmp(data, &old_data) == 0);
/* the same data, nothing to update */
return (mc->subcur && (flags & MDBX_NODUPDATA)) ? MDBX_KEYEXIST : MDBX_SUCCESS;
} }
cASSERT(mc, mc->clc->v.cmp(data, &old_data) != 0); cASSERT(mc, mc->clc->v.cmp(data, &old_data) != 0);
} }
}
} else if (unlikely(rc != MDBX_NOTFOUND)) } else if (unlikely(rc != MDBX_NOTFOUND))
return rc; return rc;
} }
@ -1052,6 +1049,7 @@ __hot int cursor_put(MDBX_cursor *mc, const MDBX_val *key, MDBX_val *data, unsig
return MDBX_EKEYMISMATCH; return MDBX_EKEYMISMATCH;
} else if (eq_fast(data, &old_data)) { } else if (eq_fast(data, &old_data)) {
cASSERT(mc, mc->clc->v.cmp(data, &old_data) == 0); cASSERT(mc, mc->clc->v.cmp(data, &old_data) == 0);
cASSERT(mc, !"Should not happen since" || batch_dupfix_done);
if (flags & MDBX_NODUPDATA) if (flags & MDBX_NODUPDATA)
return MDBX_KEYEXIST; return MDBX_KEYEXIST;
/* data is match exactly byte-to-byte, nothing to update */ /* data is match exactly byte-to-byte, nothing to update */
@ -1727,6 +1725,7 @@ __hot csr_t cursor_seek(MDBX_cursor *mc, MDBX_val *key, MDBX_val *data, MDBX_cur
csr_t ret; csr_t ret;
ret.exact = false; ret.exact = false;
/* coverity[logical_vs_bitwise] */
if (unlikely(key->iov_len < mc->clc->k.lmin || if (unlikely(key->iov_len < mc->clc->k.lmin ||
(key->iov_len > mc->clc->k.lmax && (key->iov_len > mc->clc->k.lmax &&
(mc->clc->k.lmin == mc->clc->k.lmax || MDBX_DEBUG || MDBX_FORCE_ASSERTIONS)))) { (mc->clc->k.lmin == mc->clc->k.lmax || MDBX_DEBUG || MDBX_FORCE_ASSERTIONS)))) {
@ -1781,8 +1780,7 @@ __hot csr_t cursor_seek(MDBX_cursor *mc, MDBX_val *key, MDBX_val *data, MDBX_cur
} }
int cmp = mc->clc->k.cmp(&aligned_key, &nodekey); int cmp = mc->clc->k.cmp(&aligned_key, &nodekey);
if (unlikely(cmp == 0)) { if (unlikely(cmp == 0)) {
/* Probably happens rarely, but first node on the page /* Probably happens rarely, but first node on the page was the one we wanted. */
* was the one we wanted. */
mc->ki[mc->top] = 0; mc->ki[mc->top] = 0;
ret.exact = true; ret.exact = true;
goto got_node; goto got_node;
@ -1845,8 +1843,9 @@ __hot csr_t cursor_seek(MDBX_cursor *mc, MDBX_val *key, MDBX_val *data, MDBX_cur
* Поэтому переводим курсор в неустановленное состояние, но без сброса * Поэтому переводим курсор в неустановленное состояние, но без сброса
* top, что позволяет работать fastpath при последующем поиске по дереву * top, что позволяет работать fastpath при последующем поиске по дереву
* страниц. */ * страниц. */
mc->flags = z_hollow | (mc->flags & z_clear_mask); mc->flags |= z_hollow;
inner_gone(mc); if (inner_pointed(mc))
mc->subcur->cursor.flags |= z_hollow;
ret.err = MDBX_NOTFOUND; ret.err = MDBX_NOTFOUND;
return ret; return ret;
} }

View File

@ -151,7 +151,7 @@ MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION static inline bool is_hollow(const
cASSERT(mc, mc->top >= 0); cASSERT(mc, mc->top >= 0);
cASSERT(mc, (mc->flags & z_eof_hard) || mc->ki[mc->top] < page_numkeys(mc->pg[mc->top])); cASSERT(mc, (mc->flags & z_eof_hard) || mc->ki[mc->top] < page_numkeys(mc->pg[mc->top]));
} else if (mc->subcur) } else if (mc->subcur)
cASSERT(mc, is_poor(&mc->subcur->cursor)); cASSERT(mc, is_poor(&mc->subcur->cursor) || (is_pointed(mc) && mc->subcur->cursor.flags < 0));
return r; return r;
} }
@ -307,8 +307,8 @@ static inline int cursor_check_rw(const MDBX_cursor *mc) {
return cursor_check(mc, (MDBX_TXN_BLOCKED - MDBX_TXN_PARKED) | MDBX_TXN_RDONLY); return cursor_check(mc, (MDBX_TXN_BLOCKED - MDBX_TXN_PARKED) | MDBX_TXN_RDONLY);
} }
MDBX_INTERNAL MDBX_cursor *cursor_eot(MDBX_cursor *mc, MDBX_txn *txn, const bool merge); MDBX_INTERNAL MDBX_cursor *cursor_eot(MDBX_cursor *cursor, MDBX_txn *txn);
MDBX_INTERNAL int cursor_shadow(MDBX_cursor *mc, MDBX_txn *nested, const size_t dbi); MDBX_INTERNAL int cursor_shadow(MDBX_cursor *cursor, MDBX_txn *nested, const size_t dbi);
MDBX_INTERNAL MDBX_cursor *cursor_cpstk(const MDBX_cursor *csrc, MDBX_cursor *cdst); MDBX_INTERNAL MDBX_cursor *cursor_cpstk(const MDBX_cursor *csrc, MDBX_cursor *cdst);

View File

@ -84,22 +84,15 @@ __noinline int dbi_import(MDBX_txn *txn, const size_t dbi) {
/* dbi-слот еще не инициализирован в транзакции, а хендл не использовался */ /* dbi-слот еще не инициализирован в транзакции, а хендл не использовался */
txn->cursors[dbi] = nullptr; txn->cursors[dbi] = nullptr;
MDBX_txn *const parent = txn->parent; MDBX_txn *const parent = txn->parent;
if (parent) { if (unlikely(parent)) {
/* вложенная пишущая транзакция */ /* вложенная пишущая транзакция */
int rc = dbi_check(parent, dbi); int rc = dbi_check(parent, dbi);
/* копируем состояние table очищая new-флаги. */ /* копируем состояние dbi-хендла очищая new-флаги. */
eASSERT(env, txn->dbi_seqs == parent->dbi_seqs); eASSERT(env, txn->dbi_seqs == parent->dbi_seqs);
txn->dbi_state[dbi] = parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY); txn->dbi_state[dbi] = parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY);
if (likely(rc == MDBX_SUCCESS)) { if (likely(rc == MDBX_SUCCESS)) {
txn->dbs[dbi] = parent->dbs[dbi]; txn->dbs[dbi] = parent->dbs[dbi];
if (parent->cursors[dbi]) { rc = txn_shadow_cursors(parent, dbi);
rc = cursor_shadow(parent->cursors[dbi], txn, dbi);
if (unlikely(rc != MDBX_SUCCESS)) {
/* не получилось забекапить курсоры */
txn->dbi_state[dbi] = DBI_OLDEN | DBI_LINDO | DBI_STALE;
txn->flags |= MDBX_TXN_ERROR;
}
}
} }
return rc; return rc;
} }
@ -107,28 +100,34 @@ __noinline int dbi_import(MDBX_txn *txn, const size_t dbi) {
txn->dbi_state[dbi] = DBI_LINDO; txn->dbi_state[dbi] = DBI_LINDO;
} else { } else {
eASSERT(env, txn->dbi_seqs[dbi] != env->dbi_seqs[dbi].weak); eASSERT(env, txn->dbi_seqs[dbi] != env->dbi_seqs[dbi].weak);
if (unlikely((txn->dbi_state[dbi] & (DBI_VALID | DBI_OLDEN)) || txn->cursors[dbi])) { if (unlikely(txn->cursors[dbi])) {
/* хендл уже использовался в транзакции, но был закрыт или переоткрыт, /* хендл уже использовался в транзакции и остались висячие курсоры */
* либо при явном пере-открытии хендла есть висячие курсоры */
eASSERT(env, (txn->dbi_state[dbi] & DBI_STALE) == 0);
txn->dbi_seqs[dbi] = env->dbi_seqs[dbi].weak; txn->dbi_seqs[dbi] = env->dbi_seqs[dbi].weak;
txn->dbi_state[dbi] = DBI_OLDEN | DBI_LINDO; txn->dbi_state[dbi] = DBI_OLDEN | DBI_LINDO;
return txn->cursors[dbi] ? MDBX_DANGLING_DBI : MDBX_BAD_DBI; return MDBX_DANGLING_DBI;
}
if (unlikely(txn->dbi_state[dbi] & (DBI_OLDEN | DBI_VALID))) {
/* хендл уже использовался в транзакции, но был закрыт или переоткрыт,
* висячих курсоров нет */
txn->dbi_seqs[dbi] = env->dbi_seqs[dbi].weak;
txn->dbi_state[dbi] = DBI_OLDEN | DBI_LINDO;
return MDBX_BAD_DBI;
} }
} }
/* хендл не использовался в транзакции, либо явно пере-отрывается при /* хендл не использовался в транзакции, либо явно пере-отрывается при
* отсутствии висячих курсоров */ * отсутствии висячих курсоров */
eASSERT(env, (txn->dbi_state[dbi] & DBI_LINDO) && !txn->cursors[dbi]); eASSERT(env, (txn->dbi_state[dbi] & (DBI_LINDO | DBI_VALID)) == DBI_LINDO && !txn->cursors[dbi]);
/* читаем актуальные флаги и sequence */ /* читаем актуальные флаги и sequence */
struct dbi_snap_result snap = dbi_snap(env, dbi); struct dbi_snap_result snap = dbi_snap(env, dbi);
txn->dbi_seqs[dbi] = snap.sequence; txn->dbi_seqs[dbi] = snap.sequence;
if (snap.flags & DB_VALID) { if (snap.flags & DB_VALID) {
txn->dbs[dbi].flags = snap.flags & DB_PERSISTENT_FLAGS; txn->dbs[dbi].flags = snap.flags & DB_PERSISTENT_FLAGS;
txn->dbi_state[dbi] = DBI_LINDO | DBI_VALID | DBI_STALE; txn->dbi_state[dbi] = (dbi >= CORE_DBS) ? DBI_LINDO | DBI_VALID | DBI_STALE : DBI_LINDO | DBI_VALID;
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
return MDBX_BAD_DBI; return MDBX_BAD_DBI;
} }
@ -183,7 +182,7 @@ int dbi_defer_release(MDBX_env *const env, defer_free_item_t *const chain) {
} }
/* Export or close DBI handles opened in this txn. */ /* Export or close DBI handles opened in this txn. */
int dbi_update(MDBX_txn *txn, int keep) { int dbi_update(MDBX_txn *txn, bool keep) {
MDBX_env *const env = txn->env; MDBX_env *const env = txn->env;
tASSERT(txn, !txn->parent && txn == env->basal_txn); tASSERT(txn, !txn->parent && txn == env->basal_txn);
bool locked = false; bool locked = false;
@ -223,6 +222,7 @@ int dbi_update(MDBX_txn *txn, int keep) {
if (locked) { if (locked) {
size_t i = env->n_dbi; size_t i = env->n_dbi;
eASSERT(env, env->n_dbi >= CORE_DBS);
while ((env->dbs_flags[i - 1] & DB_VALID) == 0) { while ((env->dbs_flags[i - 1] & DB_VALID) == 0) {
--i; --i;
eASSERT(env, i >= CORE_DBS); eASSERT(env, i >= CORE_DBS);
@ -380,7 +380,7 @@ static int dbi_open_locked(MDBX_txn *txn, unsigned user_flags, MDBX_dbi *dbi, MD
slot = (slot < scan) ? slot : scan; slot = (slot < scan) ? slot : scan;
continue; continue;
} }
if (!env->kvs[MAIN_DBI].clc.k.cmp(&name, &env->kvs[scan].name)) { if (env->kvs[MAIN_DBI].clc.k.cmp(&name, &env->kvs[scan].name) == 0) {
slot = scan; slot = scan;
int err = dbi_check(txn, slot); int err = dbi_check(txn, slot);
if (err == MDBX_BAD_DBI && txn->dbi_state[slot] == (DBI_OLDEN | DBI_LINDO)) { if (err == MDBX_BAD_DBI && txn->dbi_state[slot] == (DBI_OLDEN | DBI_LINDO)) {
@ -416,7 +416,7 @@ static int dbi_open_locked(MDBX_txn *txn, unsigned user_flags, MDBX_dbi *dbi, MD
int err = dbi_check(txn, slot); int err = dbi_check(txn, slot);
eASSERT(env, err == MDBX_BAD_DBI); eASSERT(env, err == MDBX_BAD_DBI);
if (err != MDBX_BAD_DBI) if (unlikely(err != MDBX_BAD_DBI))
return MDBX_PROBLEM; return MDBX_PROBLEM;
/* Find the DB info */ /* Find the DB info */
@ -449,7 +449,7 @@ static int dbi_open_locked(MDBX_txn *txn, unsigned user_flags, MDBX_dbi *dbi, MD
name.iov_base = clone; name.iov_base = clone;
uint8_t dbi_state = DBI_LINDO | DBI_VALID | DBI_FRESH; uint8_t dbi_state = DBI_LINDO | DBI_VALID | DBI_FRESH;
if (unlikely(rc)) { if (unlikely(rc != MDBX_SUCCESS)) {
/* MDBX_NOTFOUND and MDBX_CREATE: Create new DB */ /* MDBX_NOTFOUND and MDBX_CREATE: Create new DB */
tASSERT(txn, rc == MDBX_NOTFOUND); tASSERT(txn, rc == MDBX_NOTFOUND);
body.iov_base = memset(&txn->dbs[slot], 0, body.iov_len = sizeof(tree_t)); body.iov_base = memset(&txn->dbs[slot], 0, body.iov_len = sizeof(tree_t));
@ -536,34 +536,40 @@ int dbi_open(MDBX_txn *txn, const MDBX_val *const name, unsigned user_flags, MDB
#if MDBX_ENABLE_DBI_LOCKFREE #if MDBX_ENABLE_DBI_LOCKFREE
/* Is the DB already open? */ /* Is the DB already open? */
const MDBX_env *const env = txn->env; const MDBX_env *const env = txn->env;
size_t free_slot = env->n_dbi; bool have_free_slot = env->n_dbi < env->max_dbi;
for (size_t i = CORE_DBS; i < env->n_dbi; ++i) { for (size_t i = CORE_DBS; i < env->n_dbi; ++i) {
retry:
if ((env->dbs_flags[i] & DB_VALID) == 0) { if ((env->dbs_flags[i] & DB_VALID) == 0) {
free_slot = i; have_free_slot = true;
continue; continue;
} }
const uint32_t snap_seq = atomic_load32(&env->dbi_seqs[i], mo_AcquireRelease); struct dbi_snap_result snap = dbi_snap(env, i);
const uint16_t snap_flags = env->dbs_flags[i];
const MDBX_val snap_name = env->kvs[i].name; const MDBX_val snap_name = env->kvs[i].name;
if (user_flags != MDBX_ACCEDE &&
(((user_flags ^ snap_flags) & DB_PERSISTENT_FLAGS) || (keycmp && keycmp != env->kvs[i].clc.k.cmp) ||
(datacmp && datacmp != env->kvs[i].clc.v.cmp)))
continue;
const uint32_t main_seq = atomic_load32(&env->dbi_seqs[MAIN_DBI], mo_AcquireRelease); const uint32_t main_seq = atomic_load32(&env->dbi_seqs[MAIN_DBI], mo_AcquireRelease);
MDBX_cmp_func *const snap_cmp = env->kvs[MAIN_DBI].clc.k.cmp; MDBX_cmp_func *const snap_cmp = env->kvs[MAIN_DBI].clc.k.cmp;
if (unlikely(!(snap_flags & DB_VALID) || !snap_name.iov_base || !snap_name.iov_len || !snap_cmp)) if (unlikely(!(snap.flags & DB_VALID) || !snap_name.iov_base || !snap_name.iov_len || !snap_cmp))
continue; /* похоже на столкновение с параллельно работающим обновлением */
goto slowpath_locking;
const bool name_match = snap_cmp(&snap_name, name) == 0; const bool name_match = snap_cmp(&snap_name, name) == 0;
osal_flush_incoherent_cpu_writeback(); if (unlikely(snap.sequence != atomic_load32(&env->dbi_seqs[i], mo_AcquireRelease) ||
if (unlikely(snap_seq != atomic_load32(&env->dbi_seqs[i], mo_AcquireRelease) ||
main_seq != atomic_load32(&env->dbi_seqs[MAIN_DBI], mo_AcquireRelease) || main_seq != atomic_load32(&env->dbi_seqs[MAIN_DBI], mo_AcquireRelease) ||
snap_flags != env->dbs_flags[i] || snap_name.iov_base != env->kvs[i].name.iov_base || snap.flags != env->dbs_flags[i] || snap_name.iov_base != env->kvs[i].name.iov_base ||
snap_name.iov_len != env->kvs[i].name.iov_len)) snap_name.iov_len != env->kvs[i].name.iov_len))
goto retry; /* похоже на столкновение с параллельно работающим обновлением */
if (name_match) { goto slowpath_locking;
if (!name_match)
continue;
osal_flush_incoherent_cpu_writeback();
if (user_flags != MDBX_ACCEDE &&
(((user_flags ^ snap.flags) & DB_PERSISTENT_FLAGS) || (keycmp && keycmp != env->kvs[i].clc.k.cmp) ||
(datacmp && datacmp != env->kvs[i].clc.v.cmp)))
/* есть подозрение что пользователь открывает таблицу с другими флагами/атрибутами
* или другими компараторами, поэтому уходим в безопасный режим */
goto slowpath_locking;
rc = dbi_check(txn, i); rc = dbi_check(txn, i);
if (rc == MDBX_BAD_DBI && txn->dbi_state[i] == (DBI_OLDEN | DBI_LINDO)) { if (rc == MDBX_BAD_DBI && txn->dbi_state[i] == (DBI_OLDEN | DBI_LINDO)) {
/* хендл использовался, стал невалидным, /* хендл использовался, стал невалидным,
@ -573,17 +579,25 @@ int dbi_open(MDBX_txn *txn, const MDBX_val *const name, unsigned user_flags, MDB
rc = dbi_check(txn, i); rc = dbi_check(txn, i);
} }
if (likely(rc == MDBX_SUCCESS)) { if (likely(rc == MDBX_SUCCESS)) {
if (unlikely(snap.sequence != atomic_load32(&env->dbi_seqs[i], mo_AcquireRelease) ||
main_seq != atomic_load32(&env->dbi_seqs[MAIN_DBI], mo_AcquireRelease) ||
snap.flags != env->dbs_flags[i] || snap_name.iov_base != env->kvs[i].name.iov_base ||
snap_name.iov_len != env->kvs[i].name.iov_len))
/* похоже на столкновение с параллельно работающим обновлением */
goto slowpath_locking;
rc = dbi_bind(txn, i, user_flags, keycmp, datacmp); rc = dbi_bind(txn, i, user_flags, keycmp, datacmp);
if (likely(rc == MDBX_SUCCESS)) if (likely(rc == MDBX_SUCCESS))
*dbi = (MDBX_dbi)i; *dbi = (MDBX_dbi)i;
} }
return rc; return rc;
} }
}
/* Fail, if no free slot and max hit */ /* Fail, if no free slot and max hit */
if (unlikely(free_slot >= env->max_dbi)) if (unlikely(!have_free_slot))
return MDBX_DBS_FULL; return MDBX_DBS_FULL;
slowpath_locking:
#endif /* MDBX_ENABLE_DBI_LOCKFREE */ #endif /* MDBX_ENABLE_DBI_LOCKFREE */
rc = osal_fastmutex_acquire(&txn->env->dbi_lock); rc = osal_fastmutex_acquire(&txn->env->dbi_lock);

View File

@ -43,30 +43,35 @@ static inline size_t dbi_bitmap_ctz(const MDBX_txn *txn, intptr_t bmi) {
return dbi_bitmap_ctz_fallback(txn, bmi); return dbi_bitmap_ctz_fallback(txn, bmi);
} }
static inline bool dbi_foreach_step(const MDBX_txn *const txn, size_t *bitmap_item, size_t *dbi) {
const size_t bitmap_chunk = CHAR_BIT * sizeof(txn->dbi_sparse[0]);
if (*bitmap_item & 1) {
*bitmap_item >>= 1;
return txn->dbi_state[*dbi] != 0;
}
if (*bitmap_item) {
size_t bitmap_skip = dbi_bitmap_ctz(txn, *bitmap_item);
*bitmap_item >>= bitmap_skip;
*dbi += bitmap_skip - 1;
} else {
*dbi = (*dbi - 1) | (bitmap_chunk - 1);
*bitmap_item = txn->dbi_sparse[(1 + *dbi) / bitmap_chunk];
if (*bitmap_item == 0)
*dbi += bitmap_chunk;
}
return false;
}
/* LY: Макрос целенаправленно сделан с одним циклом, чтобы сохранить возможность /* LY: Макрос целенаправленно сделан с одним циклом, чтобы сохранить возможность
* использования оператора break */ * использования оператора break */
#define TXN_FOREACH_DBI_FROM(TXN, I, FROM) \ #define TXN_FOREACH_DBI_FROM(TXN, I, FROM) \
for (size_t bitmap_chunk = CHAR_BIT * sizeof(TXN->dbi_sparse[0]), bitmap_item = TXN->dbi_sparse[0] >> FROM, \ for (size_t bitmap_item = TXN->dbi_sparse[0] >> FROM, I = FROM; I < TXN->n_dbi; ++I) \
I = FROM; \ if (dbi_foreach_step(TXN, &bitmap_item, &I))
I < TXN->n_dbi; ++I) \
if (bitmap_item == 0) { \
I = (I - 1) | (bitmap_chunk - 1); \
bitmap_item = TXN->dbi_sparse[(1 + I) / bitmap_chunk]; \
if (!bitmap_item) \
/* coverity[const_overflow] */ \
I += bitmap_chunk; \
continue; \
} else if ((bitmap_item & 1) == 0) { \
size_t bitmap_skip = dbi_bitmap_ctz(txn, bitmap_item); \
bitmap_item >>= bitmap_skip; \
I += bitmap_skip - 1; \
continue; \
} else if (bitmap_item >>= 1, TXN->dbi_state[I])
#else #else
#define TXN_FOREACH_DBI_FROM(TXN, I, SKIP) \ #define TXN_FOREACH_DBI_FROM(TXN, I, FROM) \
for (size_t I = SKIP; I < TXN->n_dbi; ++I) \ for (size_t I = FROM; I < TXN->n_dbi; ++I) \
if (TXN->dbi_state[I]) if (TXN->dbi_state[I])
#endif /* MDBX_ENABLE_DBI_SPARSE */ #endif /* MDBX_ENABLE_DBI_SPARSE */
@ -82,7 +87,7 @@ struct dbi_snap_result {
}; };
MDBX_INTERNAL struct dbi_snap_result dbi_snap(const MDBX_env *env, const size_t dbi); MDBX_INTERNAL struct dbi_snap_result dbi_snap(const MDBX_env *env, const size_t dbi);
MDBX_INTERNAL int dbi_update(MDBX_txn *txn, int keep); MDBX_INTERNAL int dbi_update(MDBX_txn *txn, bool keep);
static inline uint8_t dbi_state(const MDBX_txn *txn, const size_t dbi) { static inline uint8_t dbi_state(const MDBX_txn *txn, const size_t dbi) {
STATIC_ASSERT((int)DBI_DIRTY == MDBX_DBI_DIRTY && (int)DBI_STALE == MDBX_DBI_STALE && STATIC_ASSERT((int)DBI_DIRTY == MDBX_DBI_DIRTY && (int)DBI_STALE == MDBX_DBI_STALE &&

View File

@ -28,9 +28,9 @@ static inline size_t dpl_bytes2size(const ptrdiff_t bytes) {
} }
void dpl_free(MDBX_txn *txn) { void dpl_free(MDBX_txn *txn) {
if (likely(txn->tw.dirtylist)) { if (likely(txn->wr.dirtylist)) {
osal_free(txn->tw.dirtylist); osal_free(txn->wr.dirtylist);
txn->tw.dirtylist = nullptr; txn->wr.dirtylist = nullptr;
} }
} }
@ -39,14 +39,14 @@ dpl_t *dpl_reserve(MDBX_txn *txn, size_t size) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
size_t bytes = dpl_size2bytes((size < PAGELIST_LIMIT) ? size : PAGELIST_LIMIT); size_t bytes = dpl_size2bytes((size < PAGELIST_LIMIT) ? size : PAGELIST_LIMIT);
dpl_t *const dl = osal_realloc(txn->tw.dirtylist, bytes); dpl_t *const dl = osal_realloc(txn->wr.dirtylist, bytes);
if (likely(dl)) { if (likely(dl)) {
#ifdef osal_malloc_usable_size #ifdef osal_malloc_usable_size
bytes = osal_malloc_usable_size(dl); bytes = osal_malloc_usable_size(dl);
#endif /* osal_malloc_usable_size */ #endif /* osal_malloc_usable_size */
dl->detent = dpl_bytes2size(bytes); dl->detent = dpl_bytes2size(bytes);
tASSERT(txn, txn->tw.dirtylist == nullptr || dl->length <= dl->detent); tASSERT(txn, txn->wr.dirtylist == nullptr || dl->length <= dl->detent);
txn->tw.dirtylist = dl; txn->wr.dirtylist = dl;
} }
return dl; return dl;
} }
@ -57,15 +57,17 @@ int dpl_alloc(MDBX_txn *txn) {
const size_t wanna = (txn->env->options.dp_initial < txn->geo.upper) ? txn->env->options.dp_initial : txn->geo.upper; const size_t wanna = (txn->env->options.dp_initial < txn->geo.upper) ? txn->env->options.dp_initial : txn->geo.upper;
#if MDBX_FORCE_ASSERTIONS || MDBX_DEBUG #if MDBX_FORCE_ASSERTIONS || MDBX_DEBUG
if (txn->tw.dirtylist) if (txn->wr.dirtylist)
/* обнуляем чтобы не сработал ассерт внутри dpl_reserve() */ /* обнуляем чтобы не сработал ассерт внутри dpl_reserve() */
txn->tw.dirtylist->sorted = txn->tw.dirtylist->length = 0; txn->wr.dirtylist->sorted = txn->wr.dirtylist->length = 0;
#endif /* asertions enabled */ #endif /* asertions enabled */
if (unlikely(!txn->tw.dirtylist || txn->tw.dirtylist->detent < wanna || txn->tw.dirtylist->detent > wanna + wanna) && if (unlikely(!txn->wr.dirtylist || txn->wr.dirtylist->detent < wanna || txn->wr.dirtylist->detent > wanna + wanna) &&
unlikely(!dpl_reserve(txn, wanna))) unlikely(!dpl_reserve(txn, wanna)))
return MDBX_ENOMEM; return MDBX_ENOMEM;
dpl_clear(txn->tw.dirtylist); /* LY: wr.dirtylist не может быть nullptr, так как либо уже выделен, либо будет выделен в dpl_reserve(). */
/* coverity[var_deref_model] */
dpl_clear(txn->wr.dirtylist);
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
@ -79,7 +81,7 @@ __hot __noinline dpl_t *dpl_sort_slowpath(const MDBX_txn *txn) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
const size_t unsorted = dl->length - dl->sorted; const size_t unsorted = dl->length - dl->sorted;
if (likely(unsorted < MDBX_RADIXSORT_THRESHOLD) || unlikely(!dp_radixsort(dl->items + 1, dl->length))) { if (likely(unsorted < MDBX_RADIXSORT_THRESHOLD) || unlikely(!dp_radixsort(dl->items + 1, dl->length))) {
@ -133,7 +135,7 @@ __hot __noinline size_t dpl_search(const MDBX_txn *txn, pgno_t pgno) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
if (AUDIT_ENABLED()) { if (AUDIT_ENABLED()) {
for (const dp_t *ptr = dl->items + dl->sorted; --ptr > dl->items;) { for (const dp_t *ptr = dl->items + dl->sorted; --ptr > dl->items;) {
@ -175,7 +177,7 @@ __hot __noinline size_t dpl_search(const MDBX_txn *txn, pgno_t pgno) {
const page_t *debug_dpl_find(const MDBX_txn *txn, const pgno_t pgno) { const page_t *debug_dpl_find(const MDBX_txn *txn, const pgno_t pgno) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
const dpl_t *dl = txn->tw.dirtylist; const dpl_t *dl = txn->wr.dirtylist;
if (dl) { if (dl) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
@ -198,7 +200,7 @@ void dpl_remove_ex(const MDBX_txn *txn, size_t i, size_t npages) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
assert((intptr_t)i > 0 && i <= dl->length); assert((intptr_t)i > 0 && i <= dl->length);
assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
dl->pages_including_loose -= npages; dl->pages_including_loose -= npages;
@ -214,10 +216,10 @@ int __must_check_result dpl_append(MDBX_txn *txn, pgno_t pgno, page_t *page, siz
const dp_t dp = {page, pgno, (pgno_t)npages}; const dp_t dp = {page, pgno, (pgno_t)npages};
if ((txn->flags & MDBX_WRITEMAP) == 0) { if ((txn->flags & MDBX_WRITEMAP) == 0) {
size_t *const ptr = ptr_disp(page, -(ptrdiff_t)sizeof(size_t)); size_t *const ptr = ptr_disp(page, -(ptrdiff_t)sizeof(size_t));
*ptr = txn->tw.dirtylru; *ptr = txn->wr.dirtylru;
} }
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
tASSERT(txn, dl->length <= PAGELIST_LIMIT + MDBX_PNL_GRANULATE); tASSERT(txn, dl->length <= PAGELIST_LIMIT + MDBX_PNL_GRANULATE);
tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
if (AUDIT_ENABLED()) { if (AUDIT_ENABLED()) {
@ -313,7 +315,7 @@ int __must_check_result dpl_append(MDBX_txn *txn, pgno_t pgno, page_t *page, siz
__cold bool dpl_check(MDBX_txn *txn) { __cold bool dpl_check(MDBX_txn *txn) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
const dpl_t *const dl = txn->tw.dirtylist; const dpl_t *const dl = txn->wr.dirtylist;
if (!dl) { if (!dl) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
return true; return true;
@ -322,7 +324,7 @@ __cold bool dpl_check(MDBX_txn *txn) {
assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); assert(dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
tASSERT(txn, tASSERT(txn,
txn->tw.dirtyroom + dl->length == (txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit)); txn->wr.dirtyroom + dl->length == (txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
if (!AUDIT_ENABLED()) if (!AUDIT_ENABLED())
return true; return true;
@ -362,28 +364,28 @@ __cold bool dpl_check(MDBX_txn *txn) {
return false; return false;
} }
const size_t rpa = pnl_search(txn->tw.repnl, dp->pgno, txn->geo.first_unallocated); const size_t rpa = pnl_search(txn->wr.repnl, dp->pgno, txn->geo.first_unallocated);
tASSERT(txn, rpa > MDBX_PNL_GETSIZE(txn->tw.repnl) || txn->tw.repnl[rpa] != dp->pgno); tASSERT(txn, rpa > pnl_size(txn->wr.repnl) || txn->wr.repnl[rpa] != dp->pgno);
if (rpa <= MDBX_PNL_GETSIZE(txn->tw.repnl) && unlikely(txn->tw.repnl[rpa] == dp->pgno)) if (rpa <= pnl_size(txn->wr.repnl) && unlikely(txn->wr.repnl[rpa] == dp->pgno))
return false; return false;
if (num > 1) { if (num > 1) {
const size_t rpb = pnl_search(txn->tw.repnl, dp->pgno + num - 1, txn->geo.first_unallocated); const size_t rpb = pnl_search(txn->wr.repnl, dp->pgno + num - 1, txn->geo.first_unallocated);
tASSERT(txn, rpa == rpb); tASSERT(txn, rpa == rpb);
if (unlikely(rpa != rpb)) if (unlikely(rpa != rpb))
return false; return false;
} }
} }
tASSERT(txn, loose == txn->tw.loose_count); tASSERT(txn, loose == txn->wr.loose_count);
if (unlikely(loose != txn->tw.loose_count)) if (unlikely(loose != txn->wr.loose_count))
return false; return false;
tASSERT(txn, pages == dl->pages_including_loose); tASSERT(txn, pages == dl->pages_including_loose);
if (unlikely(pages != dl->pages_including_loose)) if (unlikely(pages != dl->pages_including_loose))
return false; return false;
for (size_t i = 1; i <= MDBX_PNL_GETSIZE(txn->tw.retired_pages); ++i) { for (size_t i = 1; i <= pnl_size(txn->wr.retired_pages); ++i) {
const page_t *const dp = debug_dpl_find(txn, txn->tw.retired_pages[i]); const page_t *const dp = debug_dpl_find(txn, txn->wr.retired_pages[i]);
tASSERT(txn, !dp); tASSERT(txn, !dp);
if (unlikely(dp)) if (unlikely(dp))
return false; return false;
@ -395,11 +397,11 @@ __cold bool dpl_check(MDBX_txn *txn) {
/*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/
__noinline void dpl_lru_reduce(MDBX_txn *txn) { __noinline void dpl_lru_reduce(MDBX_txn *txn) {
NOTICE("lru-reduce %u -> %u", txn->tw.dirtylru, txn->tw.dirtylru >> 1); VERBOSE("lru-reduce %u -> %u", txn->wr.dirtylru, txn->wr.dirtylru >> 1);
tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0); tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0);
do { do {
txn->tw.dirtylru >>= 1; txn->wr.dirtylru >>= 1;
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
for (size_t i = 1; i <= dl->length; ++i) { for (size_t i = 1; i <= dl->length; ++i) {
size_t *const ptr = ptr_disp(dl->items[i].ptr, -(ptrdiff_t)sizeof(size_t)); size_t *const ptr = ptr_disp(dl->items[i].ptr, -(ptrdiff_t)sizeof(size_t));
*ptr >>= 1; *ptr >>= 1;
@ -411,14 +413,14 @@ __noinline void dpl_lru_reduce(MDBX_txn *txn) {
void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled) { void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
if (MDBX_PNL_GETSIZE(pl) && txn->tw.dirtylist->length) { if (pnl_size(pl) && txn->wr.dirtylist->length) {
tASSERT(txn, pnl_check_allocated(pl, (size_t)txn->geo.first_unallocated << spilled)); tASSERT(txn, pnl_check_allocated(pl, (size_t)txn->geo.first_unallocated << spilled));
dpl_t *dl = dpl_sort(txn); dpl_t *dl = dpl_sort(txn);
/* Scanning in ascend order */ /* Scanning in ascend order */
const intptr_t step = MDBX_PNL_ASCENDING ? 1 : -1; const intptr_t step = MDBX_PNL_ASCENDING ? 1 : -1;
const intptr_t begin = MDBX_PNL_ASCENDING ? 1 : MDBX_PNL_GETSIZE(pl); const intptr_t begin = MDBX_PNL_ASCENDING ? 1 : pnl_size(pl);
const intptr_t end = MDBX_PNL_ASCENDING ? MDBX_PNL_GETSIZE(pl) + 1 : 0; const intptr_t end = MDBX_PNL_ASCENDING ? pnl_size(pl) + 1 : 0;
tASSERT(txn, pl[begin] <= pl[end - step]); tASSERT(txn, pl[begin] <= pl[end - step]);
size_t w, r = dpl_search(txn, pl[begin] >> spilled); size_t w, r = dpl_search(txn, pl[begin] >> spilled);
@ -466,9 +468,9 @@ void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled) {
} }
} }
dl->sorted = dpl_setlen(dl, w - 1); dl->sorted = dpl_setlen(dl, w - 1);
txn->tw.dirtyroom += r - w; txn->wr.dirtyroom += r - w;
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length == tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit)); (txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
return; return;
} }
} }
@ -477,7 +479,7 @@ void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled) {
void dpl_release_shadows(MDBX_txn *txn) { void dpl_release_shadows(MDBX_txn *txn) {
tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0); tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0);
MDBX_env *env = txn->env; MDBX_env *env = txn->env;
dpl_t *const dl = txn->tw.dirtylist; dpl_t *const dl = txn->wr.dirtylist;
for (size_t i = 1; i <= dl->length; i++) for (size_t i = 1; i <= dl->length; i++)
page_shadow_release(env, dl->items[i].ptr, dpl_npages(dl, i)); page_shadow_release(env, dl->items[i].ptr, dpl_npages(dl, i));

View File

@ -46,14 +46,14 @@ static inline dpl_t *dpl_sort(const MDBX_txn *txn) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
tASSERT(txn, dl->length <= PAGELIST_LIMIT); tASSERT(txn, dl->length <= PAGELIST_LIMIT);
tASSERT(txn, dl->sorted <= dl->length); tASSERT(txn, dl->sorted <= dl->length);
tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
return likely(dl->sorted == dl->length) ? dl : dpl_sort_slowpath(txn); return likely(dl->sorted == dl->length) ? dl : dpl_sort_slowpath(txn);
} }
MDBX_INTERNAL __noinline size_t dpl_search(const MDBX_txn *txn, pgno_t pgno); MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL __noinline size_t dpl_search(const MDBX_txn *txn, pgno_t pgno);
MDBX_MAYBE_UNUSED MDBX_INTERNAL const page_t *debug_dpl_find(const MDBX_txn *txn, const pgno_t pgno); MDBX_MAYBE_UNUSED MDBX_INTERNAL const page_t *debug_dpl_find(const MDBX_txn *txn, const pgno_t pgno);
@ -68,11 +68,11 @@ MDBX_NOTHROW_PURE_FUNCTION static inline pgno_t dpl_endpgno(const dpl_t *dl, siz
return dpl_npages(dl, i) + dl->items[i].pgno; return dpl_npages(dl, i) + dl->items[i].pgno;
} }
static inline bool dpl_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npages) { MDBX_NOTHROW_PURE_FUNCTION static inline bool dpl_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npages) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
tASSERT(txn, dl->sorted == dl->length); tASSERT(txn, dl->sorted == dl->length);
tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID); tASSERT(txn, dl->items[0].pgno == 0 && dl->items[dl->length + 1].pgno == P_INVALID);
size_t const n = dpl_search(txn, pgno); size_t const n = dpl_search(txn, pgno);
@ -96,7 +96,7 @@ static inline bool dpl_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npages
MDBX_NOTHROW_PURE_FUNCTION static inline size_t dpl_exist(const MDBX_txn *txn, pgno_t pgno) { MDBX_NOTHROW_PURE_FUNCTION static inline size_t dpl_exist(const MDBX_txn *txn, pgno_t pgno) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *dl = txn->tw.dirtylist; dpl_t *dl = txn->wr.dirtylist;
size_t i = dpl_search(txn, pgno); size_t i = dpl_search(txn, pgno);
tASSERT(txn, (int)i > 0); tASSERT(txn, (int)i > 0);
return (dl->items[i].pgno == pgno) ? i : 0; return (dl->items[i].pgno == pgno) ? i : 0;
@ -105,7 +105,7 @@ MDBX_NOTHROW_PURE_FUNCTION static inline size_t dpl_exist(const MDBX_txn *txn, p
MDBX_INTERNAL void dpl_remove_ex(const MDBX_txn *txn, size_t i, size_t npages); MDBX_INTERNAL void dpl_remove_ex(const MDBX_txn *txn, size_t i, size_t npages);
static inline void dpl_remove(const MDBX_txn *txn, size_t i) { static inline void dpl_remove(const MDBX_txn *txn, size_t i) {
dpl_remove_ex(txn, i, dpl_npages(txn->tw.dirtylist, i)); dpl_remove_ex(txn, i, dpl_npages(txn->wr.dirtylist, i));
} }
MDBX_INTERNAL int __must_check_result dpl_append(MDBX_txn *txn, pgno_t pgno, page_t *page, size_t npages); MDBX_INTERNAL int __must_check_result dpl_append(MDBX_txn *txn, pgno_t pgno, page_t *page, size_t npages);
@ -114,19 +114,19 @@ MDBX_MAYBE_UNUSED MDBX_INTERNAL bool dpl_check(MDBX_txn *txn);
MDBX_NOTHROW_PURE_FUNCTION static inline uint32_t dpl_age(const MDBX_txn *txn, size_t i) { MDBX_NOTHROW_PURE_FUNCTION static inline uint32_t dpl_age(const MDBX_txn *txn, size_t i) {
tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0); tASSERT(txn, (txn->flags & (MDBX_TXN_RDONLY | MDBX_WRITEMAP)) == 0);
const dpl_t *dl = txn->tw.dirtylist; const dpl_t *dl = txn->wr.dirtylist;
assert((intptr_t)i > 0 && i <= dl->length); assert((intptr_t)i > 0 && i <= dl->length);
size_t *const ptr = ptr_disp(dl->items[i].ptr, -(ptrdiff_t)sizeof(size_t)); size_t *const ptr = ptr_disp(dl->items[i].ptr, -(ptrdiff_t)sizeof(size_t));
return txn->tw.dirtylru - (uint32_t)*ptr; return txn->wr.dirtylru - (uint32_t)*ptr;
} }
MDBX_INTERNAL void dpl_lru_reduce(MDBX_txn *txn); MDBX_INTERNAL void dpl_lru_reduce(MDBX_txn *txn);
static inline uint32_t dpl_lru_turn(MDBX_txn *txn) { static inline uint32_t dpl_lru_turn(MDBX_txn *txn) {
txn->tw.dirtylru += 1; txn->wr.dirtylru += 1;
if (unlikely(txn->tw.dirtylru > UINT32_MAX / 3) && (txn->flags & MDBX_WRITEMAP) == 0) if (unlikely(txn->wr.dirtylru > UINT32_MAX / 3) && (txn->flags & MDBX_WRITEMAP) == 0)
dpl_lru_reduce(txn); dpl_lru_reduce(txn);
return txn->tw.dirtylru; return txn->wr.dirtylru;
} }
MDBX_INTERNAL void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled); MDBX_INTERNAL void dpl_sift(MDBX_txn *const txn, pnl_t pl, const bool spilled);

View File

@ -234,13 +234,14 @@ __cold int dxb_resize(MDBX_env *const env, const pgno_t used_pgno, const pgno_t
rc = MDBX_RESULT_TRUE; rc = MDBX_RESULT_TRUE;
#if defined(MADV_REMOVE) #if defined(MADV_REMOVE)
if (env->flags & MDBX_WRITEMAP) if (env->flags & MDBX_WRITEMAP)
rc = madvise(ptr_disp(env->dxb_mmap.base, size_bytes), prev_size - size_bytes, MADV_REMOVE) ? ignore_enosys(errno) rc = madvise(ptr_disp(env->dxb_mmap.base, size_bytes), prev_size - size_bytes, MADV_REMOVE)
? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS; : MDBX_SUCCESS;
#endif /* MADV_REMOVE */ #endif /* MADV_REMOVE */
#if defined(MADV_DONTNEED) #if defined(MADV_DONTNEED)
if (rc == MDBX_RESULT_TRUE) if (rc == MDBX_RESULT_TRUE)
rc = madvise(ptr_disp(env->dxb_mmap.base, size_bytes), prev_size - size_bytes, MADV_DONTNEED) rc = madvise(ptr_disp(env->dxb_mmap.base, size_bytes), prev_size - size_bytes, MADV_DONTNEED)
? ignore_enosys(errno) ? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS; : MDBX_SUCCESS;
#elif defined(POSIX_MADV_DONTNEED) #elif defined(POSIX_MADV_DONTNEED)
if (rc == MDBX_RESULT_TRUE) if (rc == MDBX_RESULT_TRUE)
@ -370,7 +371,7 @@ void dxb_sanitize_tail(MDBX_env *env, MDBX_txn *txn) {
return; return;
} else if (env_owned_wrtxn(env)) { } else if (env_owned_wrtxn(env)) {
/* inside write-txn */ /* inside write-txn */
last = meta_recent(env, &env->basal_txn->tw.troika).ptr_v->geometry.first_unallocated; last = meta_recent(env, &env->basal_txn->wr.troika).ptr_v->geometry.first_unallocated;
} else if (env->flags & MDBX_RDONLY) { } else if (env->flags & MDBX_RDONLY) {
/* read-only mode, no write-txn, no wlock mutex */ /* read-only mode, no write-txn, no wlock mutex */
last = NUM_METAS; last = NUM_METAS;
@ -426,7 +427,7 @@ __cold int dxb_set_readahead(const MDBX_env *env, const pgno_t edge, const bool
void *const ptr = ptr_disp(env->dxb_mmap.base, offset); void *const ptr = ptr_disp(env->dxb_mmap.base, offset);
if (enable) { if (enable) {
#if defined(MADV_NORMAL) #if defined(MADV_NORMAL)
err = madvise(ptr, length, MADV_NORMAL) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(ptr, length, MADV_NORMAL) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#elif defined(POSIX_MADV_NORMAL) #elif defined(POSIX_MADV_NORMAL)
@ -454,7 +455,7 @@ __cold int dxb_set_readahead(const MDBX_env *env, const pgno_t edge, const bool
hint.ra_count = unlikely(length > INT_MAX && sizeof(length) > sizeof(hint.ra_count)) ? INT_MAX : (int)length; hint.ra_count = unlikely(length > INT_MAX && sizeof(length) > sizeof(hint.ra_count)) ? INT_MAX : (int)length;
(void)/* Ignore ENOTTY for DB on the ram-disk and so on */ fcntl(env->lazy_fd, F_RDADVISE, &hint); (void)/* Ignore ENOTTY for DB on the ram-disk and so on */ fcntl(env->lazy_fd, F_RDADVISE, &hint);
#elif defined(MADV_WILLNEED) #elif defined(MADV_WILLNEED)
err = madvise(ptr, length, MADV_WILLNEED) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(ptr, length, MADV_WILLNEED) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#elif defined(POSIX_MADV_WILLNEED) #elif defined(POSIX_MADV_WILLNEED)
@ -479,7 +480,7 @@ __cold int dxb_set_readahead(const MDBX_env *env, const pgno_t edge, const bool
} else { } else {
mincore_clean_cache(env); mincore_clean_cache(env);
#if defined(MADV_RANDOM) #if defined(MADV_RANDOM)
err = madvise(ptr, length, MADV_RANDOM) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(ptr, length, MADV_RANDOM) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#elif defined(POSIX_MADV_RANDOM) #elif defined(POSIX_MADV_RANDOM)
@ -686,14 +687,16 @@ __cold int dxb_setup(MDBX_env *env, const int lck_rc, const mdbx_mode_t mode_bit
return err; return err;
#if defined(MADV_DONTDUMP) #if defined(MADV_DONTDUMP)
err = madvise(env->dxb_mmap.base, env->dxb_mmap.limit, MADV_DONTDUMP) ? ignore_enosys(errno) : MDBX_SUCCESS; err =
madvise(env->dxb_mmap.base, env->dxb_mmap.limit, MADV_DONTDUMP) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#endif /* MADV_DONTDUMP */ #endif /* MADV_DONTDUMP */
#if defined(MADV_DODUMP) #if defined(MADV_DODUMP)
if (globals.runtime_flags & MDBX_DBG_DUMP) { if (globals.runtime_flags & MDBX_DBG_DUMP) {
const size_t meta_length_aligned2os = pgno_align2os_bytes(env, NUM_METAS); const size_t meta_length_aligned2os = pgno_align2os_bytes(env, NUM_METAS);
err = madvise(env->dxb_mmap.base, meta_length_aligned2os, MADV_DODUMP) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(env->dxb_mmap.base, meta_length_aligned2os, MADV_DODUMP) ? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
} }
@ -932,7 +935,7 @@ __cold int dxb_setup(MDBX_env *env, const int lck_rc, const mdbx_mode_t mode_bit
bytes2pgno(env, env->dxb_mmap.current)); bytes2pgno(env, env->dxb_mmap.current));
err = madvise(ptr_disp(env->dxb_mmap.base, used_aligned2os_bytes), env->dxb_mmap.current - used_aligned2os_bytes, err = madvise(ptr_disp(env->dxb_mmap.base, used_aligned2os_bytes), env->dxb_mmap.current - used_aligned2os_bytes,
MADV_REMOVE) MADV_REMOVE)
? ignore_enosys(errno) ? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS; : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
@ -942,7 +945,7 @@ __cold int dxb_setup(MDBX_env *env, const int lck_rc, const mdbx_mode_t mode_bit
NOTICE("open-MADV_%s %u..%u", "DONTNEED", env->lck->discarded_tail.weak, bytes2pgno(env, env->dxb_mmap.current)); NOTICE("open-MADV_%s %u..%u", "DONTNEED", env->lck->discarded_tail.weak, bytes2pgno(env, env->dxb_mmap.current));
err = madvise(ptr_disp(env->dxb_mmap.base, used_aligned2os_bytes), env->dxb_mmap.current - used_aligned2os_bytes, err = madvise(ptr_disp(env->dxb_mmap.base, used_aligned2os_bytes), env->dxb_mmap.current - used_aligned2os_bytes,
MADV_DONTNEED) MADV_DONTNEED)
? ignore_enosys(errno) ? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS; : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
@ -1034,7 +1037,7 @@ int dxb_sync_locked(MDBX_env *env, unsigned flags, meta_t *const pending, troika
#endif /* MADV_FREE */ #endif /* MADV_FREE */
int err = madvise(ptr_disp(env->dxb_mmap.base, discard_edge_bytes), prev_discarded_bytes - discard_edge_bytes, int err = madvise(ptr_disp(env->dxb_mmap.base, discard_edge_bytes), prev_discarded_bytes - discard_edge_bytes,
advise) advise)
? ignore_enosys(errno) ? ignore_enosys_and_eagain(errno)
: MDBX_SUCCESS; : MDBX_SUCCESS;
#else #else
int err = ignore_enosys(posix_madvise(ptr_disp(env->dxb_mmap.base, discard_edge_bytes), int err = ignore_enosys(posix_madvise(ptr_disp(env->dxb_mmap.base, discard_edge_bytes),
@ -1061,16 +1064,17 @@ int dxb_sync_locked(MDBX_env *env, unsigned flags, meta_t *const pending, troika
#endif /* MADV_DONTNEED || POSIX_MADV_DONTNEED */ #endif /* MADV_DONTNEED || POSIX_MADV_DONTNEED */
/* LY: check conditions to shrink datafile */ /* LY: check conditions to shrink datafile */
const pgno_t backlog_gap = 3 + pending->trees.gc.height * 3; const pgno_t stockpile_gap = 3 + pending->trees.gc.height * 3;
pgno_t shrink_step = 0; pgno_t shrink_step = 0;
if (pending->geometry.shrink_pv && pending->geometry.now - pending->geometry.first_unallocated > if (pending->geometry.shrink_pv && pending->geometry.now - pending->geometry.first_unallocated >
(shrink_step = pv2pages(pending->geometry.shrink_pv)) + backlog_gap) { (shrink_step = pv2pages(pending->geometry.shrink_pv)) + stockpile_gap) {
if (pending->geometry.now > largest_pgno && pending->geometry.now - largest_pgno > shrink_step + backlog_gap) { if (pending->geometry.now > largest_pgno &&
pending->geometry.now - largest_pgno > shrink_step + stockpile_gap) {
const pgno_t aligner = const pgno_t aligner =
pending->geometry.grow_pv ? /* grow_step */ pv2pages(pending->geometry.grow_pv) : shrink_step; pending->geometry.grow_pv ? /* grow_step */ pv2pages(pending->geometry.grow_pv) : shrink_step;
const pgno_t with_backlog_gap = largest_pgno + backlog_gap; const pgno_t with_stockpile_gap = largest_pgno + stockpile_gap;
const pgno_t aligned = const pgno_t aligned =
pgno_align2os_pgno(env, (size_t)with_backlog_gap + aligner - with_backlog_gap % aligner); pgno_align2os_pgno(env, (size_t)with_stockpile_gap + aligner - with_stockpile_gap % aligner);
const pgno_t bottom = (aligned > pending->geometry.lower) ? aligned : pending->geometry.lower; const pgno_t bottom = (aligned > pending->geometry.lower) ? aligned : pending->geometry.lower;
if (pending->geometry.now > bottom) { if (pending->geometry.now > bottom) {
if (TROIKA_HAVE_STEADY(troika)) if (TROIKA_HAVE_STEADY(troika))
@ -1290,6 +1294,7 @@ int dxb_sync_locked(MDBX_env *env, unsigned flags, meta_t *const pending, troika
} }
uint64_t timestamp = 0; uint64_t timestamp = 0;
/* coverity[array_null] */
while ("workaround for https://libmdbx.dqdkfa.ru/dead-github/issues/269") { while ("workaround for https://libmdbx.dqdkfa.ru/dead-github/issues/269") {
rc = coherency_check_written(env, pending->unsafe_txnid, target, rc = coherency_check_written(env, pending->unsafe_txnid, target,
bytes2pgno(env, ptr_dist(target, env->dxb_mmap.base)), &timestamp); bytes2pgno(env, ptr_dist(target, env->dxb_mmap.base)), &timestamp);
@ -1306,8 +1311,8 @@ int dxb_sync_locked(MDBX_env *env, unsigned flags, meta_t *const pending, troika
*troika = meta_tap(env); *troika = meta_tap(env);
for (MDBX_txn *txn = env->basal_txn; txn; txn = txn->nested) for (MDBX_txn *txn = env->basal_txn; txn; txn = txn->nested)
if (troika != &txn->tw.troika) if (troika != &txn->wr.troika)
txn->tw.troika = *troika; txn->wr.troika = *troika;
/* LY: shrink datafile if needed */ /* LY: shrink datafile if needed */
if (unlikely(shrink)) { if (unlikely(shrink)) {

View File

@ -76,7 +76,7 @@ retry:;
goto bailout; goto bailout;
} }
const troika_t troika = (txn_owned || should_unlock) ? env->basal_txn->tw.troika : meta_tap(env); const troika_t troika = (txn_owned || should_unlock) ? env->basal_txn->wr.troika : meta_tap(env);
const meta_ptr_t head = meta_recent(env, &troika); const meta_ptr_t head = meta_recent(env, &troika);
const uint64_t unsynced_pages = atomic_load64(&env->lck->unsynced_pages, mo_Relaxed); const uint64_t unsynced_pages = atomic_load64(&env->lck->unsynced_pages, mo_Relaxed);
if (unsynced_pages == 0) { if (unsynced_pages == 0) {
@ -158,13 +158,13 @@ retry:;
#if MDBX_ENABLE_PGOP_STAT #if MDBX_ENABLE_PGOP_STAT
env->lck->pgops.wops.weak += wops; env->lck->pgops.wops.weak += wops;
#endif /* MDBX_ENABLE_PGOP_STAT */ #endif /* MDBX_ENABLE_PGOP_STAT */
env->basal_txn->tw.troika = meta_tap(env); env->basal_txn->wr.troika = meta_tap(env);
eASSERT(env, !env->txn && !env->basal_txn->nested); eASSERT(env, !env->txn && !env->basal_txn->nested);
goto retry; goto retry;
} }
eASSERT(env, head.txnid == recent_committed_txnid(env)); eASSERT(env, head.txnid == recent_committed_txnid(env));
env->basal_txn->txnid = head.txnid; env->basal_txn->txnid = head.txnid;
txn_snapshot_oldest(env->basal_txn); txn_gc_detent(env->basal_txn);
flags |= txn_shrink_allowed; flags |= txn_shrink_allowed;
} }
@ -182,7 +182,7 @@ retry:;
DEBUG("meta-head %" PRIaPGNO ", %s, sync_pending %" PRIu64, data_page(head.ptr_c)->pgno, DEBUG("meta-head %" PRIaPGNO ", %s, sync_pending %" PRIu64, data_page(head.ptr_c)->pgno,
durable_caption(head.ptr_c), unsynced_pages); durable_caption(head.ptr_c), unsynced_pages);
meta_t meta = *head.ptr_c; meta_t meta = *head.ptr_c;
rc = dxb_sync_locked(env, flags, &meta, &env->basal_txn->tw.troika); rc = dxb_sync_locked(env, flags, &meta, &env->basal_txn->wr.troika);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto bailout; goto bailout;
} }
@ -524,7 +524,7 @@ __cold int env_close(MDBX_env *env, bool resurrect_after_fork) {
env->defer_free = nullptr; env->defer_free = nullptr;
#endif /* MDBX_ENABLE_DBI_LOCKFREE */ #endif /* MDBX_ENABLE_DBI_LOCKFREE */
if (!(env->flags & MDBX_RDONLY)) if ((env->flags & MDBX_RDONLY) == 0)
osal_ioring_destroy(&env->ioring); osal_ioring_destroy(&env->ioring);
env->lck = nullptr; env->lck = nullptr;
@ -593,12 +593,7 @@ __cold int env_close(MDBX_env *env, bool resurrect_after_fork) {
env->pathname.buffer = nullptr; env->pathname.buffer = nullptr;
} }
if (env->basal_txn) { if (env->basal_txn) {
dpl_free(env->basal_txn); txn_basal_destroy(env->basal_txn);
txl_free(env->basal_txn->tw.gc.retxl);
pnl_free(env->basal_txn->tw.retired_pages);
pnl_free(env->basal_txn->tw.spilled.list);
pnl_free(env->basal_txn->tw.repnl);
osal_free(env->basal_txn);
env->basal_txn = nullptr; env->basal_txn = nullptr;
} }
} }

View File

@ -28,12 +28,6 @@
typedef struct iov_ctx iov_ctx_t; typedef struct iov_ctx iov_ctx_t;
#include "osal.h" #include "osal.h"
#if UINTPTR_MAX > 0xffffFFFFul || ULONG_MAX > 0xffffFFFFul || defined(_WIN64)
#define MDBX_WORDBITS 64
#else
#define MDBX_WORDBITS 32
#endif /* MDBX_WORDBITS */
#include "options.h" #include "options.h"
#include "atomics-types.h" #include "atomics-types.h"

View File

@ -162,9 +162,9 @@ MDBX_MAYBE_UNUSED __hot static pgno_t *scan4seq_fallback(pgno_t *range, const si
} }
MDBX_MAYBE_UNUSED static const pgno_t *scan4range_checker(const pnl_t pnl, const size_t seq) { MDBX_MAYBE_UNUSED static const pgno_t *scan4range_checker(const pnl_t pnl, const size_t seq) {
size_t begin = MDBX_PNL_ASCENDING ? 1 : MDBX_PNL_GETSIZE(pnl); size_t begin = MDBX_PNL_ASCENDING ? 1 : pnl_size(pnl);
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
while (seq <= MDBX_PNL_GETSIZE(pnl) - begin) { while (seq <= pnl_size(pnl) - begin) {
if (pnl[begin + seq] - pnl[begin] == seq) if (pnl[begin + seq] - pnl[begin] == seq)
return pnl + begin; return pnl + begin;
++begin; ++begin;
@ -570,14 +570,11 @@ static pgno_t *scan4seq_resolver(pgno_t *range, const size_t len, const size_t s
/*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/
#define ALLOC_COALESCE 4 /* внутреннее состояние */ static inline bool is_reclaimable(MDBX_txn *txn, const MDBX_cursor *mc, const uint8_t flags) {
#define ALLOC_SHOULD_SCAN 8 /* внутреннее состояние */
#define ALLOC_LIFO 16 /* внутреннее состояние */
static inline bool is_gc_usable(MDBX_txn *txn, const MDBX_cursor *mc, const uint8_t flags) {
/* If txn is updating the GC, then the retired-list cannot play catch-up with /* If txn is updating the GC, then the retired-list cannot play catch-up with
* itself by growing while trying to save it. */ * itself by growing while trying to save it. */
if (mc->tree == &txn->dbs[FREE_DBI] && !(flags & ALLOC_RESERVE) && !(mc->flags & z_gcu_preparation)) STATIC_ASSERT(ALLOC_RESERVE == z_gcu_preparation);
if (mc->tree == &txn->dbs[FREE_DBI] && !((flags | mc->flags) & z_gcu_preparation))
return false; return false;
/* avoid search inside empty tree and while tree is updating, /* avoid search inside empty tree and while tree is updating,
@ -590,12 +587,10 @@ static inline bool is_gc_usable(MDBX_txn *txn, const MDBX_cursor *mc, const uint
return true; return true;
} }
static inline bool is_already_reclaimed(const MDBX_txn *txn, txnid_t id) { return txl_contain(txn->tw.gc.retxl, id); }
__hot static pgno_t repnl_get_single(MDBX_txn *txn) { __hot static pgno_t repnl_get_single(MDBX_txn *txn) {
const size_t len = MDBX_PNL_GETSIZE(txn->tw.repnl); const size_t len = pnl_size(txn->wr.repnl);
assert(len > 0); assert(len > 0);
pgno_t *target = MDBX_PNL_EDGE(txn->tw.repnl); pgno_t *target = MDBX_PNL_EDGE(txn->wr.repnl);
const ptrdiff_t dir = MDBX_PNL_ASCENDING ? 1 : -1; const ptrdiff_t dir = MDBX_PNL_ASCENDING ? 1 : -1;
/* Есть ТРИ потенциально выигрышные, но противо-направленные тактики: /* Есть ТРИ потенциально выигрышные, но противо-направленные тактики:
@ -663,7 +658,7 @@ __hot static pgno_t repnl_get_single(MDBX_txn *txn) {
#else #else
/* вырезаем элемент с перемещением хвоста */ /* вырезаем элемент с перемещением хвоста */
const pgno_t pgno = *scan; const pgno_t pgno = *scan;
MDBX_PNL_SETSIZE(txn->tw.repnl, len - 1); pnl_setsize(txn->wr.repnl, len - 1);
while (++scan <= target) while (++scan <= target)
scan[-1] = *scan; scan[-1] = *scan;
return pgno; return pgno;
@ -676,44 +671,44 @@ __hot static pgno_t repnl_get_single(MDBX_txn *txn) {
const pgno_t pgno = *target; const pgno_t pgno = *target;
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
/* вырезаем элемент с перемещением хвоста */ /* вырезаем элемент с перемещением хвоста */
MDBX_PNL_SETSIZE(txn->tw.repnl, len - 1); pnl_setsize(txn->wr.repnl, len - 1);
for (const pgno_t *const end = txn->tw.repnl + len - 1; target <= end; ++target) for (const pgno_t *const end = txn->wr.repnl + len - 1; target <= end; ++target)
*target = target[1]; *target = target[1];
#else #else
/* перемещать хвост не нужно, просто усекам список */ /* перемещать хвост не нужно, просто усекам список */
MDBX_PNL_SETSIZE(txn->tw.repnl, len - 1); pnl_setsize(txn->wr.repnl, len - 1);
#endif #endif
return pgno; return pgno;
} }
__hot static pgno_t repnl_get_sequence(MDBX_txn *txn, const size_t num, uint8_t flags) { __hot static pgno_t repnl_get_sequence(MDBX_txn *txn, const size_t num, uint8_t flags) {
const size_t len = MDBX_PNL_GETSIZE(txn->tw.repnl); const size_t len = pnl_size(txn->wr.repnl);
pgno_t *edge = MDBX_PNL_EDGE(txn->tw.repnl); pgno_t *edge = MDBX_PNL_EDGE(txn->wr.repnl);
assert(len >= num && num > 1); assert(len >= num && num > 1);
const size_t seq = num - 1; const size_t seq = num - 1;
#if !MDBX_PNL_ASCENDING #if !MDBX_PNL_ASCENDING
if (edge[-(ptrdiff_t)seq] - *edge == seq) { if (edge[-(ptrdiff_t)seq] - *edge == seq) {
if (unlikely(flags & ALLOC_RESERVE)) if (unlikely(flags & ALLOC_RESERVE))
return P_INVALID; return P_INVALID;
assert(edge == scan4range_checker(txn->tw.repnl, seq)); assert(edge == scan4range_checker(txn->wr.repnl, seq));
/* перемещать хвост не нужно, просто усекам список */ /* перемещать хвост не нужно, просто усекам список */
MDBX_PNL_SETSIZE(txn->tw.repnl, len - num); pnl_setsize(txn->wr.repnl, len - num);
return *edge; return *edge;
} }
#endif #endif
pgno_t *target = scan4seq_impl(edge, len, seq); pgno_t *target = scan4seq_impl(edge, len, seq);
assert(target == scan4range_checker(txn->tw.repnl, seq)); assert(target == scan4range_checker(txn->wr.repnl, seq));
if (target) { if (target) {
if (unlikely(flags & ALLOC_RESERVE)) if (unlikely(flags & ALLOC_RESERVE))
return P_INVALID; return P_INVALID;
const pgno_t pgno = *target; const pgno_t pgno = *target;
/* вырезаем найденную последовательность с перемещением хвоста */ /* вырезаем найденную последовательность с перемещением хвоста */
MDBX_PNL_SETSIZE(txn->tw.repnl, len - num); pnl_setsize(txn->wr.repnl, len - num);
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
for (const pgno_t *const end = txn->tw.repnl + len - num; target <= end; ++target) for (const pgno_t *const end = txn->wr.repnl + len - num; target <= end; ++target)
*target = target[num]; *target = target[num];
#else #else
for (const pgno_t *const end = txn->tw.repnl + len; ++target <= end;) for (const pgno_t *const end = txn->wr.repnl + len; ++target <= end;)
target[-(ptrdiff_t)num] = *target; target[-(ptrdiff_t)num] = *target;
#endif #endif
return pgno; return pgno;
@ -721,6 +716,10 @@ __hot static pgno_t repnl_get_sequence(MDBX_txn *txn, const size_t num, uint8_t
return 0; return 0;
} }
bool gc_repnl_has_span(const MDBX_txn *txn, const size_t num) {
return (num > 1) ? repnl_get_sequence((MDBX_txn *)txn, num, ALLOC_RESERVE) != 0 : !MDBX_PNL_IS_EMPTY(txn->wr.repnl);
}
static inline pgr_t page_alloc_finalize(MDBX_env *const env, MDBX_txn *const txn, const MDBX_cursor *const mc, static inline pgr_t page_alloc_finalize(MDBX_env *const env, MDBX_txn *const txn, const MDBX_cursor *const mc,
const pgno_t pgno, const size_t num) { const pgno_t pgno, const size_t num) {
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
@ -762,7 +761,7 @@ static inline pgr_t page_alloc_finalize(MDBX_env *const env, MDBX_txn *const txn
* обновляться PTE с последующей генерацией page-fault и чтением данных из * обновляться PTE с последующей генерацией page-fault и чтением данных из
* грязной I/O очереди. Из-за этого штраф за лишнюю запись может быть * грязной I/O очереди. Из-за этого штраф за лишнюю запись может быть
* сравним с избегаемым ненужным чтением. */ * сравним с избегаемым ненужным чтением. */
if (txn->tw.prefault_write_activated) { if (txn->wr.prefault_write_activated) {
void *const pattern = ptr_disp(env->page_auxbuf, need_clean ? env->ps : env->ps * 2); void *const pattern = ptr_disp(env->page_auxbuf, need_clean ? env->ps : env->ps * 2);
size_t file_offset = pgno2bytes(env, pgno); size_t file_offset = pgno2bytes(env, pgno);
if (likely(num == 1)) { if (likely(num == 1)) {
@ -823,7 +822,7 @@ static inline pgr_t page_alloc_finalize(MDBX_env *const env, MDBX_txn *const txn
ret.err = page_dirty(txn, ret.page, (pgno_t)num); ret.err = page_dirty(txn, ret.page, (pgno_t)num);
bailout: bailout:
tASSERT(txn, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); tASSERT(txn, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
size_t majflt_after; size_t majflt_after;
prof->xtime_cpu += osal_cputime(&majflt_after) - cputime_before; prof->xtime_cpu += osal_cputime(&majflt_after) - cputime_before;
@ -842,8 +841,15 @@ pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags)
prof->spe_counter += 1; prof->spe_counter += 1;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
/* Если взведен флажок ALLOC_RESERVE, то требуется только обеспечение соответствующего резерва в txn->wr.repnl
* и/или txn->wr.gc.reclaimed, но без выделения и возврата страницы. При этом возможны три варианта вызова:
* 1. num == 0 требуется слот для возврата в GC остатков ранее переработанных/извлеченных страниц,
* при этом нет смысла перерабатывать длинные записи, так как тогда дефицит свободных id/слотов не уменьшится;
* 2. num == 1 требуется увеличение резерва перед обновлением GC;
* 3. num > 1 требуется последовательность страниц для сохранения retired-страниц
* при выключенном MDBX_ENABLE_BIGFOOT. */
eASSERT(env, num > 0 || (flags & ALLOC_RESERVE)); eASSERT(env, num > 0 || (flags & ALLOC_RESERVE));
eASSERT(env, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); eASSERT(env, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
size_t newnext; size_t newnext;
const uint64_t monotime_begin = (MDBX_ENABLE_PROFGC || (num > 1 && env->options.gc_time_limit)) ? osal_monotime() : 0; const uint64_t monotime_begin = (MDBX_ENABLE_PROFGC || (num > 1 && env->options.gc_time_limit)) ? osal_monotime() : 0;
@ -858,21 +864,20 @@ pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags)
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
prof->xpages += 1; prof->xpages += 1;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
if (MDBX_PNL_GETSIZE(txn->tw.repnl) >= num) { if (pnl_size(txn->wr.repnl) >= num) {
eASSERT(env, MDBX_PNL_LAST(txn->tw.repnl) < txn->geo.first_unallocated && eASSERT(env, MDBX_PNL_LAST(txn->wr.repnl) < txn->geo.first_unallocated &&
MDBX_PNL_FIRST(txn->tw.repnl) < txn->geo.first_unallocated); MDBX_PNL_FIRST(txn->wr.repnl) < txn->geo.first_unallocated);
pgno = repnl_get_sequence(txn, num, flags); pgno = repnl_get_sequence(txn, num, flags);
if (likely(pgno)) if (likely(pgno))
goto done; goto done;
} }
} else { } else {
eASSERT(env, num == 0 || MDBX_PNL_GETSIZE(txn->tw.repnl) == 0); eASSERT(env, num == 0 || pnl_size(txn->wr.repnl) == 0 || (flags & ALLOC_RESERVE));
eASSERT(env, !(flags & ALLOC_RESERVE) || num == 0);
} }
//--------------------------------------------------------------------------- //---------------------------------------------------------------------------
if (unlikely(!is_gc_usable(txn, mc, flags))) { if (unlikely(!is_reclaimable(txn, mc, flags))) {
eASSERT(env, (txn->flags & txn_gc_drained) || num > 1); eASSERT(env, (txn->flags & txn_gc_drained) || num > 1);
goto no_gc; goto no_gc;
} }
@ -880,22 +885,13 @@ pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags)
eASSERT(env, (flags & (ALLOC_COALESCE | ALLOC_LIFO | ALLOC_SHOULD_SCAN)) == 0); eASSERT(env, (flags & (ALLOC_COALESCE | ALLOC_LIFO | ALLOC_SHOULD_SCAN)) == 0);
flags += (env->flags & MDBX_LIFORECLAIM) ? ALLOC_LIFO : 0; flags += (env->flags & MDBX_LIFORECLAIM) ? ALLOC_LIFO : 0;
if (/* Не коагулируем записи при подготовке резерва для обновления GC. /* Не коагулируем записи в случае запроса слота для возврата страниц в GC. Иначе попытка увеличить резерв
* Иначе попытка увеличить резерв может приводить к необходимости ещё * может приводить к необходимости ещё большего резерва из-за увеличения списка переработанных страниц. */
* большего резерва из-за увеличения списка переработанных страниц. */ if (num > 0 && txn->dbs[FREE_DBI].branch_pages && pnl_size(txn->wr.repnl) < env->maxgc_large1page / 2)
(flags & ALLOC_RESERVE) == 0) {
if (txn->dbs[FREE_DBI].branch_pages && MDBX_PNL_GETSIZE(txn->tw.repnl) < env->maxgc_large1page / 2)
flags += ALLOC_COALESCE; flags += ALLOC_COALESCE;
}
MDBX_cursor *const gc = ptr_disp(env->basal_txn, sizeof(MDBX_txn)); txn->wr.prefault_write_activated = !env->incore && env->options.prefault_write;
eASSERT(env, mc != gc && gc->next == gc); if (txn->wr.prefault_write_activated) {
gc->txn = txn;
gc->dbi_state = txn->dbi_state;
gc->top_and_flags = z_fresh_mark;
txn->tw.prefault_write_activated = env->options.prefault_write;
if (txn->tw.prefault_write_activated) {
/* Проверка посредством minicore() существенно снижает затраты, но в /* Проверка посредством minicore() существенно снижает затраты, но в
* простейших случаях (тривиальный бенчмарк) интегральная производительность * простейших случаях (тривиальный бенчмарк) интегральная производительность
* становится вдвое меньше. А на платформах без mincore() и с проблемной * становится вдвое меньше. А на платформах без mincore() и с проблемной
@ -908,48 +904,47 @@ pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags)
(txn->dbs[FREE_DBI].branch_pages == 0 && txn->geo.now < 1234) || (txn->dbs[FREE_DBI].branch_pages == 0 && txn->geo.now < 1234) ||
/* Не суетимся если страница в зоне включенного упреждающего чтения */ /* Не суетимся если страница в зоне включенного упреждающего чтения */
(readahead_enabled && pgno + num < readahead_edge)) (readahead_enabled && pgno + num < readahead_edge))
txn->tw.prefault_write_activated = false; txn->wr.prefault_write_activated = false;
} }
retry_gc_refresh_oldest:; MDBX_cursor *const gc = gc_cursor(env);
txnid_t oldest = txn_snapshot_oldest(txn); gc->txn = txn;
retry_gc_have_oldest: gc->tree = txn->dbs;
if (unlikely(oldest >= txn->txnid)) { gc->dbi_state = txn->dbi_state;
ERROR("unexpected/invalid oldest-readed txnid %" PRIaTXN " for current-txnid %" PRIaTXN, oldest, txn->txnid); gc->top_and_flags = z_fresh_mark;
retry_gc_refresh_detent:
txn_gc_detent(txn);
retry_gc_have_detent:
if (unlikely(txn->env->gc.detent >= txn->txnid)) {
FATAL("unexpected/invalid gc-detent %" PRIaTXN " for current-txnid %" PRIaTXN, txn->env->gc.detent, txn->txnid);
ret.err = MDBX_PROBLEM; ret.err = MDBX_PROBLEM;
goto fail; goto fail;
} }
const txnid_t detent = oldest + 1;
txnid_t id = 0; txnid_t id = 0;
MDBX_cursor_op op = MDBX_FIRST; MDBX_cursor_op op = MDBX_FIRST;
if (flags & ALLOC_LIFO) { if (flags & ALLOC_LIFO) {
if (!txn->tw.gc.retxl) {
txn->tw.gc.retxl = txl_alloc();
if (unlikely(!txn->tw.gc.retxl)) {
ret.err = MDBX_ENOMEM;
goto fail;
}
}
/* Begin lookup backward from oldest reader */ /* Begin lookup backward from oldest reader */
id = detent - 1; id = txn->env->gc.detent;
op = MDBX_SET_RANGE; op = MDBX_SET_RANGE;
} else if (txn->tw.gc.last_reclaimed) { } else {
/* Continue lookup forward from last-reclaimed */ /* Continue lookup forward from last-reclaimed */
id = txn->tw.gc.last_reclaimed + 1; id = rkl_highest(&txn->wr.gc.reclaimed);
if (id >= detent) if (id) {
goto depleted_gc; id += 1;
op = MDBX_SET_RANGE; op = MDBX_SET_RANGE;
if (id >= txn->env->gc.detent)
goto depleted_gc;
}
} }
next_gc:; next_gc:
MDBX_val key;
key.iov_base = &id;
key.iov_len = sizeof(id);
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
prof->rsteps += 1; prof->rsteps += 1
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
;
MDBX_val key = {.iov_base = &id, .iov_len = sizeof(id)};
/* Seek first/next GC record */ /* Seek first/next GC record */
ret.err = cursor_ops(gc, &key, nullptr, op); ret.err = cursor_ops(gc, &key, nullptr, op);
@ -967,15 +962,18 @@ next_gc:;
ret.err = MDBX_CORRUPTED; ret.err = MDBX_CORRUPTED;
goto fail; goto fail;
} }
id = unaligned_peek_u64(4, key.iov_base); id = unaligned_peek_u64(4, key.iov_base);
if (flags & ALLOC_LIFO) { if (flags & ALLOC_LIFO) {
op = MDBX_PREV; op = MDBX_PREV;
if (id >= detent || is_already_reclaimed(txn, id)) if (id >= txn->env->gc.detent || gc_is_reclaimed(txn, id))
goto next_gc; goto next_gc;
} else { } else {
op = MDBX_NEXT; if (unlikely(id >= txn->env->gc.detent))
if (unlikely(id >= detent))
goto depleted_gc; goto depleted_gc;
op = MDBX_NEXT;
if (gc_is_reclaimed(txn, id))
goto next_gc;
} }
txn->flags &= ~txn_gc_drained; txn->flags &= ~txn_gc_drained;
@ -993,60 +991,61 @@ next_gc:;
goto fail; goto fail;
} }
const size_t gc_len = MDBX_PNL_GETSIZE(gc_pnl); const size_t gc_len = pnl_size(gc_pnl);
TRACE("gc-read: id #%" PRIaTXN " len %zu, re-list will %zu ", id, gc_len, gc_len + MDBX_PNL_GETSIZE(txn->tw.repnl)); TRACE("gc-read: id #%" PRIaTXN " len %zu, re-list will %zu ", id, gc_len, gc_len + pnl_size(txn->wr.repnl));
if (unlikely(gc_len + MDBX_PNL_GETSIZE(txn->tw.repnl) >= env->maxgc_large1page)) { if (unlikely(!num)) {
/* Don't try to coalesce too much. */ /* TODO: Проверка критериев пункта 2 сформулированного в gc_provide_slots().
* Сейчас тут сильно упрощенная и не совсем верная проверка, так как пока недоступна информация о кол-ве имеющихся
* слотов и их дефиците для возврата wr.repl. */
if (gc_len > env->maxgc_large1page / 4 * 3
/* если запись достаточно длинная, то переработка слота не особо увеличит место для возврата wr.repl, и т.п. */
&& pnl_size(txn->wr.repnl) + gc_len > env->maxgc_large1page /* не помещается в хвост */) {
DEBUG("avoid reclaiming %" PRIaTXN " slot, since it is too long (%zu)", id, gc_len);
ret.err = MDBX_NOTFOUND;
goto reserve_done;
}
}
if (unlikely(gc_len + pnl_size(txn->wr.repnl) /* Don't try to coalesce too much. */ >= env->maxgc_large1page)) {
if (flags & ALLOC_SHOULD_SCAN) { if (flags & ALLOC_SHOULD_SCAN) {
eASSERT(env, flags & ALLOC_COALESCE); eASSERT(env, (flags & ALLOC_COALESCE) /* && !(flags & ALLOC_RESERVE) */ && num > 0);
eASSERT(env, !(flags & ALLOC_RESERVE));
eASSERT(env, num > 0);
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
env->lck->pgops.gc_prof.coalescences += 1; env->lck->pgops.gc_prof.coalescences += 1;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
TRACE("clear %s %s", "ALLOC_COALESCE", "since got threshold"); TRACE("clear %s %s", "ALLOC_COALESCE", "since got threshold");
if (MDBX_PNL_GETSIZE(txn->tw.repnl) >= num) { if (pnl_size(txn->wr.repnl) >= num) {
eASSERT(env, MDBX_PNL_LAST(txn->tw.repnl) < txn->geo.first_unallocated && eASSERT(env, MDBX_PNL_LAST(txn->wr.repnl) < txn->geo.first_unallocated &&
MDBX_PNL_FIRST(txn->tw.repnl) < txn->geo.first_unallocated); MDBX_PNL_FIRST(txn->wr.repnl) < txn->geo.first_unallocated);
if (likely(num == 1)) { if (likely(num == 1)) {
pgno = repnl_get_single(txn); pgno = (flags & ALLOC_RESERVE) ? P_INVALID : repnl_get_single(txn);
goto done; goto done;
} }
pgno = repnl_get_sequence(txn, num, flags); pgno = repnl_get_sequence(txn, num, flags);
if (likely(pgno)) if (likely(pgno))
goto done; goto done;
} }
flags -= ALLOC_COALESCE | ALLOC_SHOULD_SCAN;
} }
if (unlikely(/* list is too long already */ MDBX_PNL_GETSIZE(txn->tw.repnl) >= env->options.rp_augment_limit) && flags &= ~(ALLOC_COALESCE | ALLOC_SHOULD_SCAN);
if (unlikely(/* list is too long already */ pnl_size(txn->wr.repnl) >= env->options.rp_augment_limit) &&
((/* not a slot-request from gc-update */ num && ((/* not a slot-request from gc-update */ num &&
/* have enough unallocated space */ txn->geo.upper >= txn->geo.first_unallocated + num && /* have enough unallocated space */ txn->geo.upper >= txn->geo.first_unallocated + num &&
monotime_since_cached(monotime_begin, &now_cache) + txn->tw.gc.time_acc >= env->options.gc_time_limit) || monotime_since_cached(monotime_begin, &now_cache) + txn->wr.gc.spent >= env->options.gc_time_limit) ||
gc_len + MDBX_PNL_GETSIZE(txn->tw.repnl) >= PAGELIST_LIMIT)) { gc_len + pnl_size(txn->wr.repnl) >= PAGELIST_LIMIT)) {
/* Stop reclaiming to avoid large/overflow the page list. This is a rare /* Stop reclaiming to avoid large/overflow the page list. This is a rare
* case while search for a continuously multi-page region in a * case while search for a continuously multi-page region in a large database,
* large database, see https://libmdbx.dqdkfa.ru/dead-github/issues/123 */ * see https://libmdbx.dqdkfa.ru/dead-github/issues/123 */
NOTICE("stop reclaiming %s: %zu (current) + %zu " NOTICE("stop reclaiming %s: %zu (current) + %zu "
"(chunk) -> %zu, rp_augment_limit %u", "(chunk) >= %zu, rp_augment_limit %u",
likely(gc_len + MDBX_PNL_GETSIZE(txn->tw.repnl) < PAGELIST_LIMIT) ? "since rp_augment_limit was reached" likely(gc_len + pnl_size(txn->wr.repnl) < PAGELIST_LIMIT) ? "since rp_augment_limit was reached"
: "to avoid PNL overflow", : "to avoid PNL overflow",
MDBX_PNL_GETSIZE(txn->tw.repnl), gc_len, gc_len + MDBX_PNL_GETSIZE(txn->tw.repnl), pnl_size(txn->wr.repnl), gc_len, gc_len + pnl_size(txn->wr.repnl), env->options.rp_augment_limit);
env->options.rp_augment_limit);
goto depleted_gc; goto depleted_gc;
} }
} }
/* Remember ID of readed GC record */ /* Append PNL from GC record to wr.repnl */
txn->tw.gc.last_reclaimed = id; ret.err = pnl_need(&txn->wr.repnl, gc_len);
if (flags & ALLOC_LIFO) {
ret.err = txl_append(&txn->tw.gc.retxl, id);
if (unlikely(ret.err != MDBX_SUCCESS))
goto fail;
}
/* Append PNL from GC record to tw.repnl */
ret.err = pnl_need(&txn->tw.repnl, gc_len);
if (unlikely(ret.err != MDBX_SUCCESS)) if (unlikely(ret.err != MDBX_SUCCESS))
goto fail; goto fail;
@ -1061,53 +1060,83 @@ next_gc:;
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
const uint64_t merge_begin = osal_monotime(); const uint64_t merge_begin = osal_monotime();
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
pnl_merge(txn->tw.repnl, gc_pnl); pnl_merge(txn->wr.repnl, gc_pnl);
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
prof->pnl_merge.calls += 1; prof->pnl_merge.calls += 1;
prof->pnl_merge.volume += MDBX_PNL_GETSIZE(txn->tw.repnl); prof->pnl_merge.volume += pnl_size(txn->wr.repnl);
prof->pnl_merge.time += osal_monotime() - merge_begin; prof->pnl_merge.time += osal_monotime() - merge_begin;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
flags |= ALLOC_SHOULD_SCAN; flags |= ALLOC_SHOULD_SCAN;
if (AUDIT_ENABLED()) { if (AUDIT_ENABLED()) {
if (unlikely(!pnl_check(txn->tw.repnl, txn->geo.first_unallocated))) { if (unlikely(!pnl_check(txn->wr.repnl, txn->geo.first_unallocated))) {
ERROR("%s/%d: %s", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid txn retired-list"); ERROR("%s/%d: %s", "MDBX_CORRUPTED", MDBX_CORRUPTED, "invalid txn retired-list");
ret.err = MDBX_CORRUPTED; ret.err = MDBX_CORRUPTED;
goto fail; goto fail;
} }
} else { } else {
eASSERT(env, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated)); eASSERT(env, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated));
} }
eASSERT(env, dpl_check(txn)); eASSERT(env, dpl_check(txn));
eASSERT(env, MDBX_PNL_GETSIZE(txn->tw.repnl) == 0 || MDBX_PNL_MOST(txn->tw.repnl) < txn->geo.first_unallocated); eASSERT(env, pnl_size(txn->wr.repnl) == 0 || MDBX_PNL_MOST(txn->wr.repnl) < txn->geo.first_unallocated);
if (MDBX_ENABLE_REFUND && MDBX_PNL_GETSIZE(txn->tw.repnl) && if (MDBX_ENABLE_REFUND && pnl_size(txn->wr.repnl) &&
unlikely(MDBX_PNL_MOST(txn->tw.repnl) == txn->geo.first_unallocated - 1)) { unlikely(MDBX_PNL_MOST(txn->wr.repnl) == txn->geo.first_unallocated - 1)) {
/* Refund suitable pages into "unallocated" space */ /* Refund suitable pages into "unallocated" space */
txn_refund(txn); txn_refund(txn);
} }
eASSERT(env, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); eASSERT(env, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
/* Done for a kick-reclaim mode, actually no page needed */
if (unlikely(num == 0)) {
eASSERT(env, ret.err == MDBX_SUCCESS);
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "early-exit for slot", id, MDBX_PNL_GETSIZE(txn->tw.repnl));
goto early_exit;
}
/* TODO: delete reclaimed records */
eASSERT(env, op == MDBX_PREV || op == MDBX_NEXT); eASSERT(env, op == MDBX_PREV || op == MDBX_NEXT);
rkl_t *rkl = &txn->wr.gc.reclaimed;
const char *rkl_name = "reclaimed";
if (mc->dbi_state != txn->dbi_state &&
(MDBX_DEBUG || pnl_size(txn->wr.repnl) > (size_t)gc->tree->height + gc->tree->height + 3)) {
gc->next = txn->cursors[FREE_DBI];
txn->cursors[FREE_DBI] = gc;
ret.err = cursor_del(gc, 0);
txn->cursors[FREE_DBI] = gc->next;
if (likely(ret.err == MDBX_SUCCESS)) {
if (unlikely(txn->dbs[FREE_DBI].items == 0)) {
flags &= ~ALLOC_COALESCE;
txn->flags |= txn_gc_drained;
op = MDBX_FIRST; /* для предотвращения ошибок из-за относительного перемещения курсора */
}
rkl = &txn->wr.gc.ready4reuse;
rkl_name = "ready4reuse";
} else {
VERBOSE("gc-early-clean: err %d, repnl %zu, gc-height %u (%u branch, %u leafs)", ret.err, pnl_size(txn->wr.repnl),
gc->tree->height, gc->tree->branch_pages, gc->tree->leaf_pages);
if (unlikely(txn->flags & MDBX_TXN_ERROR))
goto fail;
}
}
ret.err = rkl_push(rkl, id);
TRACE("%" PRIaTXN " len %zu pushed to rkl-%s, err %d", id, gc_len, rkl_name, ret.err);
if (unlikely(ret.err != MDBX_SUCCESS))
goto fail;
if (flags & ALLOC_COALESCE) { if (flags & ALLOC_COALESCE) {
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "coalesce-continue", id, MDBX_PNL_GETSIZE(txn->tw.repnl)); eASSERT(env, op == MDBX_PREV || op == MDBX_NEXT);
if (pnl_size(txn->wr.repnl) < env->maxgc_large1page / 2) {
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "coalesce-continue", id, pnl_size(txn->wr.repnl));
goto next_gc; goto next_gc;
} }
flags -= ALLOC_COALESCE;
}
scan: scan:
if ((flags & ALLOC_RESERVE) && num < 2) {
/* Если был нужен только slot/id для gc_reclaim_slot() или gc_reserve4stockpile() */
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "reserve-done", id, pnl_size(txn->wr.repnl));
ret.err = MDBX_SUCCESS;
goto reserve_done;
}
eASSERT(env, flags & ALLOC_SHOULD_SCAN); eASSERT(env, flags & ALLOC_SHOULD_SCAN);
eASSERT(env, num > 0); eASSERT(env, num > 0);
if (MDBX_PNL_GETSIZE(txn->tw.repnl) >= num) { if (pnl_size(txn->wr.repnl) >= num) {
eASSERT(env, MDBX_PNL_LAST(txn->tw.repnl) < txn->geo.first_unallocated && eASSERT(env, MDBX_PNL_LAST(txn->wr.repnl) < txn->geo.first_unallocated &&
MDBX_PNL_FIRST(txn->tw.repnl) < txn->geo.first_unallocated); MDBX_PNL_FIRST(txn->wr.repnl) < txn->geo.first_unallocated);
if (likely(num == 1)) { if (likely(num == 1)) {
eASSERT(env, !(flags & ALLOC_RESERVE)); eASSERT(env, !(flags & ALLOC_RESERVE));
pgno = repnl_get_single(txn); pgno = repnl_get_single(txn);
@ -1118,17 +1147,16 @@ scan:
goto done; goto done;
} }
flags -= ALLOC_SHOULD_SCAN; flags -= ALLOC_SHOULD_SCAN;
if (ret.err == MDBX_SUCCESS) { if ((txn->flags & txn_gc_drained) == 0) {
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "continue-search", id, MDBX_PNL_GETSIZE(txn->tw.repnl)); TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "continue-search", id, pnl_size(txn->wr.repnl));
goto next_gc; goto next_gc;
} }
depleted_gc: depleted_gc:
TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "gc-depleted", id, MDBX_PNL_GETSIZE(txn->tw.repnl)); TRACE("%s: last id #%" PRIaTXN ", re-len %zu", "gc-depleted", id, pnl_size(txn->wr.repnl));
ret.err = MDBX_NOTFOUND; txn->flags |= txn_gc_drained;
if (flags & ALLOC_SHOULD_SCAN) if (flags & ALLOC_SHOULD_SCAN)
goto scan; goto scan;
txn->flags |= txn_gc_drained;
//------------------------------------------------------------------------- //-------------------------------------------------------------------------
@ -1143,11 +1171,11 @@ depleted_gc:
newnext = txn->geo.first_unallocated + num; newnext = txn->geo.first_unallocated + num;
/* Does reclaiming stopped at the last steady point? */ /* Does reclaiming stopped at the last steady point? */
const meta_ptr_t recent = meta_recent(env, &txn->tw.troika); const meta_ptr_t recent = meta_recent(env, &txn->wr.troika);
const meta_ptr_t prefer_steady = meta_prefer_steady(env, &txn->tw.troika); const meta_ptr_t prefer_steady = meta_prefer_steady(env, &txn->wr.troika);
if (recent.ptr_c != prefer_steady.ptr_c && prefer_steady.is_steady && detent == prefer_steady.txnid + 1) { if (recent.ptr_c != prefer_steady.ptr_c && prefer_steady.is_steady && txn->env->gc.detent == prefer_steady.txnid) {
DEBUG("gc-kick-steady: recent %" PRIaTXN "-%s, steady %" PRIaTXN "-%s, detent %" PRIaTXN, recent.txnid, DEBUG("gc-kick-steady: recent %" PRIaTXN "-%s, steady %" PRIaTXN "-%s", recent.txnid, durable_caption(recent.ptr_c),
durable_caption(recent.ptr_c), prefer_steady.txnid, durable_caption(prefer_steady.ptr_c), detent); prefer_steady.txnid, durable_caption(prefer_steady.ptr_c));
const pgno_t autosync_threshold = atomic_load32(&env->lck->autosync_threshold, mo_Relaxed); const pgno_t autosync_threshold = atomic_load32(&env->lck->autosync_threshold, mo_Relaxed);
const uint64_t autosync_period = atomic_load64(&env->lck->autosync_period, mo_Relaxed); const uint64_t autosync_period = atomic_load64(&env->lck->autosync_period, mo_Relaxed);
uint64_t eoos_timestamp; uint64_t eoos_timestamp;
@ -1166,12 +1194,12 @@ depleted_gc:
#if MDBX_ENABLE_PROFGC #if MDBX_ENABLE_PROFGC
env->lck->pgops.gc_prof.wipes += 1; env->lck->pgops.gc_prof.wipes += 1;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
ret.err = meta_wipe_steady(env, detent); ret.err = meta_wipe_steady(env, txn->env->gc.detent);
DEBUG("gc-wipe-steady, rc %d", ret.err); DEBUG("gc-wipe-steady, rc %d", ret.err);
if (unlikely(ret.err != MDBX_SUCCESS)) if (unlikely(ret.err != MDBX_SUCCESS))
goto fail; goto fail;
eASSERT(env, prefer_steady.ptr_c != meta_prefer_steady(env, &txn->tw.troika).ptr_c); eASSERT(env, prefer_steady.ptr_c != meta_prefer_steady(env, &txn->wr.troika).ptr_c);
goto retry_gc_refresh_oldest; goto retry_gc_refresh_detent;
} }
if ((autosync_threshold && atomic_load64(&env->lck->unsynced_pages, mo_Relaxed) >= autosync_threshold) || if ((autosync_threshold && atomic_load64(&env->lck->unsynced_pages, mo_Relaxed) >= autosync_threshold) ||
(autosync_period && (eoos_timestamp = atomic_load64(&env->lck->eoos_timestamp, mo_Relaxed)) && (autosync_period && (eoos_timestamp = atomic_load64(&env->lck->eoos_timestamp, mo_Relaxed)) &&
@ -1183,21 +1211,18 @@ depleted_gc:
env->lck->pgops.gc_prof.flushes += 1; env->lck->pgops.gc_prof.flushes += 1;
#endif /* MDBX_ENABLE_PROFGC */ #endif /* MDBX_ENABLE_PROFGC */
meta_t meta = *recent.ptr_c; meta_t meta = *recent.ptr_c;
ret.err = dxb_sync_locked(env, env->flags & MDBX_WRITEMAP, &meta, &txn->tw.troika); ret.err = dxb_sync_locked(env, env->flags & MDBX_WRITEMAP, &meta, &txn->wr.troika);
DEBUG("gc-make-steady, rc %d", ret.err); DEBUG("gc-make-steady, rc %d", ret.err);
eASSERT(env, ret.err != MDBX_RESULT_TRUE); eASSERT(env, ret.err != MDBX_RESULT_TRUE);
if (unlikely(ret.err != MDBX_SUCCESS)) if (unlikely(ret.err != MDBX_SUCCESS))
goto fail; goto fail;
eASSERT(env, prefer_steady.ptr_c != meta_prefer_steady(env, &txn->tw.troika).ptr_c); eASSERT(env, prefer_steady.ptr_c != meta_prefer_steady(env, &txn->wr.troika).ptr_c);
goto retry_gc_refresh_oldest; goto retry_gc_refresh_detent;
} }
} }
if (unlikely(true == atomic_load32(&env->lck->rdt_refresh_flag, mo_AcquireRelease))) { if (unlikely(true == atomic_load32(&env->lck->rdt_refresh_flag, mo_AcquireRelease)) && txn_gc_detent(txn))
oldest = txn_snapshot_oldest(txn); goto retry_gc_have_detent;
if (oldest >= detent)
goto retry_gc_have_oldest;
}
/* Avoid kick lagging reader(s) if is enough unallocated space /* Avoid kick lagging reader(s) if is enough unallocated space
* at the end of database file. */ * at the end of database file. */
@ -1206,11 +1231,8 @@ depleted_gc:
goto done; goto done;
} }
if (oldest < txn->txnid - xMDBX_TXNID_STEP) { if (txn->txnid - txn->env->gc.detent > xMDBX_TXNID_STEP && mvcc_kick_laggards(env, txn->env->gc.detent))
oldest = mvcc_kick_laggards(env, oldest); goto retry_gc_refresh_detent;
if (oldest >= detent)
goto retry_gc_have_oldest;
}
//--------------------------------------------------------------------------- //---------------------------------------------------------------------------
@ -1263,7 +1285,7 @@ done:
if (likely((flags & ALLOC_RESERVE) == 0)) { if (likely((flags & ALLOC_RESERVE) == 0)) {
if (pgno) { if (pgno) {
eASSERT(env, pgno + num <= txn->geo.first_unallocated && pgno >= NUM_METAS); eASSERT(env, pgno + num <= txn->geo.first_unallocated && pgno >= NUM_METAS);
eASSERT(env, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); eASSERT(env, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
} else { } else {
pgno = txn->geo.first_unallocated; pgno = txn->geo.first_unallocated;
txn->geo.first_unallocated += (pgno_t)num; txn->geo.first_unallocated += (pgno_t)num;
@ -1275,32 +1297,41 @@ done:
if (unlikely(ret.err != MDBX_SUCCESS)) { if (unlikely(ret.err != MDBX_SUCCESS)) {
fail: fail:
eASSERT(env, ret.err != MDBX_SUCCESS); eASSERT(env, ret.err != MDBX_SUCCESS);
eASSERT(env, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); eASSERT(env, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
int level; int level;
const char *what; if (flags & ALLOC_UNIMPORTANT)
if (flags & ALLOC_RESERVE) { level = MDBX_LOG_DEBUG;
level = (flags & ALLOC_UNIMPORTANT) ? MDBX_LOG_DEBUG : MDBX_LOG_NOTICE; else if (flags & ALLOC_RESERVE)
what = num ? "reserve-pages" : "fetch-slot"; level = MDBX_LOG_NOTICE;
} else { else {
txn->flags |= MDBX_TXN_ERROR; txn->flags |= MDBX_TXN_ERROR;
level = MDBX_LOG_ERROR; level = MDBX_LOG_ERROR;
what = "pages";
} }
if (LOG_ENABLED(level)) if (LOG_ENABLED(level)) {
if (num)
debug_log(level, __func__, __LINE__, debug_log(level, __func__, __LINE__,
"unable alloc %zu %s, alloc-flags 0x%x, err %d, txn-flags " "unable %s %zu, alloc-flags 0x%x, err %d, txn-flags "
"0x%x, re-list-len %zu, loose-count %zu, gc: height %u, " "0x%x, re-list-len %zu, loose-count %zu, gc: height %u, "
"branch %zu, leaf %zu, large %zu, entries %zu\n", "branch %zu, leaf %zu, large %zu, entries %zu\n",
num, what, flags, ret.err, txn->flags, MDBX_PNL_GETSIZE(txn->tw.repnl), txn->tw.loose_count, (flags & ALLOC_RESERVE) ? "reserve" : "alloc", num, flags, ret.err, txn->flags,
txn->dbs[FREE_DBI].height, (size_t)txn->dbs[FREE_DBI].branch_pages, pnl_size(txn->wr.repnl), txn->wr.loose_count, txn->dbs[FREE_DBI].height,
(size_t)txn->dbs[FREE_DBI].leaf_pages, (size_t)txn->dbs[FREE_DBI].large_pages, (size_t)txn->dbs[FREE_DBI].branch_pages, (size_t)txn->dbs[FREE_DBI].leaf_pages,
(size_t)txn->dbs[FREE_DBI].items); (size_t)txn->dbs[FREE_DBI].large_pages, (size_t)txn->dbs[FREE_DBI].items);
else
debug_log(level, __func__, __LINE__,
"unable fetch-slot, alloc-flags 0x%x, err %d, txn-flags "
"0x%x, re-list-len %zu, loose-count %zu, gc: height %u, "
"branch %zu, leaf %zu, large %zu, entries %zu\n",
flags, ret.err, txn->flags, pnl_size(txn->wr.repnl), txn->wr.loose_count, txn->dbs[FREE_DBI].height,
(size_t)txn->dbs[FREE_DBI].branch_pages, (size_t)txn->dbs[FREE_DBI].leaf_pages,
(size_t)txn->dbs[FREE_DBI].large_pages, (size_t)txn->dbs[FREE_DBI].items);
}
ret.page = nullptr; ret.page = nullptr;
} }
if (num > 1) if (num > 1)
txn->tw.gc.time_acc += monotime_since_cached(monotime_begin, &now_cache); txn->wr.gc.spent += monotime_since_cached(monotime_begin, &now_cache);
} else { } else {
early_exit: reserve_done:
DEBUG("return nullptr for %zu pages for ALLOC_%s, rc %d", num, num ? "RESERVE" : "SLOT", ret.err); DEBUG("return nullptr for %zu pages for ALLOC_%s, rc %d", num, num ? "RESERVE" : "SLOT", ret.err);
ret.page = nullptr; ret.page = nullptr;
} }
@ -1317,20 +1348,20 @@ __hot pgr_t gc_alloc_single(const MDBX_cursor *const mc) {
tASSERT(txn, F_ISSET(*cursor_dbi_state(mc), DBI_LINDO | DBI_VALID | DBI_DIRTY)); tASSERT(txn, F_ISSET(*cursor_dbi_state(mc), DBI_LINDO | DBI_VALID | DBI_DIRTY));
/* If there are any loose pages, just use them */ /* If there are any loose pages, just use them */
while (likely(txn->tw.loose_pages)) { while (likely(txn->wr.loose_pages)) {
#if MDBX_ENABLE_REFUND #if MDBX_ENABLE_REFUND
if (unlikely(txn->tw.loose_refund_wl > txn->geo.first_unallocated)) { if (unlikely(txn->wr.loose_refund_wl > txn->geo.first_unallocated)) {
txn_refund(txn); txn_refund(txn);
if (!txn->tw.loose_pages) if (!txn->wr.loose_pages)
break; break;
} }
#endif /* MDBX_ENABLE_REFUND */ #endif /* MDBX_ENABLE_REFUND */
page_t *lp = txn->tw.loose_pages; page_t *lp = txn->wr.loose_pages;
MDBX_ASAN_UNPOISON_MEMORY_REGION(lp, txn->env->ps); MDBX_ASAN_UNPOISON_MEMORY_REGION(lp, txn->env->ps);
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *)); VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
txn->tw.loose_pages = page_next(lp); txn->wr.loose_pages = page_next(lp);
txn->tw.loose_count--; txn->wr.loose_count--;
DEBUG_EXTRA("db %d use loose page %" PRIaPGNO, cursor_dbi_dbg(mc), lp->pgno); DEBUG_EXTRA("db %d use loose page %" PRIaPGNO, cursor_dbi_dbg(mc), lp->pgno);
tASSERT(txn, lp->pgno < txn->geo.first_unallocated); tASSERT(txn, lp->pgno < txn->geo.first_unallocated);
tASSERT(txn, lp->pgno >= NUM_METAS); tASSERT(txn, lp->pgno >= NUM_METAS);
@ -1340,7 +1371,7 @@ __hot pgr_t gc_alloc_single(const MDBX_cursor *const mc) {
return ret; return ret;
} }
if (likely(MDBX_PNL_GETSIZE(txn->tw.repnl) > 0)) if (likely(pnl_size(txn->wr.repnl) > 0))
return page_alloc_finalize(txn->env, txn, mc, repnl_get_single(txn), 1); return page_alloc_finalize(txn->env, txn, mc, repnl_get_single(txn), 1);
return gc_alloc_ex(mc, 1, ALLOC_DEFAULT); return gc_alloc_ex(mc, 1, ALLOC_DEFAULT);

File diff suppressed because it is too large Load Diff

View File

@ -5,14 +5,37 @@
#include "essentials.h" #include "essentials.h"
/* Гистограмма решения нарезки фрагментов для ситуации нехватки идентификаторов/слотов. */
typedef struct gc_dense_histogram {
/* Размер массива одновременно задаёт максимальный размер последовательностей,
* с которыми решается задача распределения.
*
* Использование длинных последовательностей контрпродуктивно, так как такие последовательности будут
* создавать/воспроизводить/повторять аналогичные затруднения при последующей переработке. Однако,
* в редких ситуациях это может быть единственным выходом. */
unsigned end;
pgno_t array[31];
} gc_dense_histogram_t;
typedef struct gc_update_context { typedef struct gc_update_context {
unsigned loop; unsigned loop;
pgno_t prev_first_unallocated; unsigned goodchunk;
bool dense; bool dense;
size_t reserve_adj; pgno_t prev_first_unallocated;
size_t retired_stored; size_t retired_stored;
size_t amount, reserved, cleaned_slot, reused_slot, fill_idx; size_t return_reserved_lo, return_reserved_hi;
txnid_t cleaned_id, rid; txnid_t gc_first;
intptr_t return_left;
#ifndef MDBX_DEBUG_GCU
#define MDBX_DEBUG_GCU 0
#endif
#if MDBX_DEBUG_GCU
struct {
txnid_t prev;
unsigned n;
} dbg;
#endif /* MDBX_DEBUG_GCU */
rkl_t sequel;
#if MDBX_ENABLE_BIGFOOT #if MDBX_ENABLE_BIGFOOT
txnid_t bigfoot; txnid_t bigfoot;
#endif /* MDBX_ENABLE_BIGFOOT */ #endif /* MDBX_ENABLE_BIGFOOT */
@ -20,21 +43,40 @@ typedef struct gc_update_context {
MDBX_cursor cursor; MDBX_cursor cursor;
cursor_couple_t couple; cursor_couple_t couple;
}; };
gc_dense_histogram_t dense_histogram;
} gcu_t; } gcu_t;
static inline int gc_update_init(MDBX_txn *txn, gcu_t *ctx) { MDBX_INTERNAL int gc_put_init(MDBX_txn *txn, gcu_t *ctx);
memset(ctx, 0, offsetof(gcu_t, cursor)); MDBX_INTERNAL void gc_put_destroy(gcu_t *ctx);
ctx->dense = txn->txnid <= MIN_TXNID;
#if MDBX_ENABLE_BIGFOOT #define ALLOC_DEFAULT 0 /* штатное/обычное выделение страниц */
ctx->bigfoot = txn->txnid; #define ALLOC_UNIMPORTANT 1 /* запрос неважен, невозможность выделения не приведет к ошибке транзакции */
#endif /* MDBX_ENABLE_BIGFOOT */ #define ALLOC_RESERVE 2 /* подготовка резерва для обновления GC, без аллокации */
return cursor_init(&ctx->cursor, txn, FREE_DBI); #define ALLOC_COALESCE 4 /* внутреннее состояние/флажок */
} #define ALLOC_SHOULD_SCAN 8 /* внутреннее состояние/флажок */
#define ALLOC_LIFO 16 /* внутреннее состояние/флажок */
#define ALLOC_DEFAULT 0
#define ALLOC_RESERVE 1
#define ALLOC_UNIMPORTANT 2
MDBX_INTERNAL pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags); MDBX_INTERNAL pgr_t gc_alloc_ex(const MDBX_cursor *const mc, const size_t num, uint8_t flags);
MDBX_INTERNAL pgr_t gc_alloc_single(const MDBX_cursor *const mc); MDBX_INTERNAL pgr_t gc_alloc_single(const MDBX_cursor *const mc);
MDBX_INTERNAL int gc_update(MDBX_txn *txn, gcu_t *ctx); MDBX_INTERNAL int gc_update(MDBX_txn *txn, gcu_t *ctx);
MDBX_NOTHROW_PURE_FUNCTION static inline size_t gc_stockpile(const MDBX_txn *txn) {
return pnl_size(txn->wr.repnl) + txn->wr.loose_count;
}
MDBX_NOTHROW_PURE_FUNCTION static inline size_t gc_chunk_bytes(const size_t chunk) {
return (chunk + 1) * sizeof(pgno_t);
}
MDBX_INTERNAL bool gc_repnl_has_span(const MDBX_txn *txn, const size_t num);
static inline bool gc_is_reclaimed(const MDBX_txn *txn, const txnid_t id) {
return rkl_contain(&txn->wr.gc.reclaimed, id) || rkl_contain(&txn->wr.gc.comeback, id);
}
static inline txnid_t txnid_min(txnid_t a, txnid_t b) { return (a < b) ? a : b; }
static inline txnid_t txnid_max(txnid_t a, txnid_t b) { return (a > b) ? a : b; }
static inline MDBX_cursor *gc_cursor(MDBX_env *env) { return ptr_disp(env->basal_txn, sizeof(MDBX_txn)); }

View File

@ -41,11 +41,12 @@ typedef struct node_search_result {
typedef struct bind_reader_slot_result { typedef struct bind_reader_slot_result {
int err; int err;
reader_slot_t *rslot; reader_slot_t *slot;
} bsr_t; } bsr_t;
#include "atomics-ops.h" #include "atomics-ops.h"
#include "proto.h" #include "proto.h"
#include "rkl.h"
#include "txl.h" #include "txl.h"
#include "unaligned.h" #include "unaligned.h"
#if defined(_WIN32) || defined(_WIN64) #if defined(_WIN32) || defined(_WIN64)
@ -155,7 +156,8 @@ enum txn_flags {
txn_rw_begin_flags = MDBX_TXN_NOMETASYNC | MDBX_TXN_NOSYNC | MDBX_TXN_TRY, txn_rw_begin_flags = MDBX_TXN_NOMETASYNC | MDBX_TXN_NOSYNC | MDBX_TXN_TRY,
txn_shrink_allowed = UINT32_C(0x40000000), txn_shrink_allowed = UINT32_C(0x40000000),
txn_parked = MDBX_TXN_PARKED, txn_parked = MDBX_TXN_PARKED,
txn_gc_drained = 0x40 /* GC was depleted up to oldest reader */, txn_gc_drained = 0x80 /* GC was depleted up to oldest reader */,
txn_may_have_cursors = 0x100,
txn_state_flags = MDBX_TXN_FINISHED | MDBX_TXN_ERROR | MDBX_TXN_DIRTY | MDBX_TXN_SPILLS | MDBX_TXN_HAS_CHILD | txn_state_flags = MDBX_TXN_FINISHED | MDBX_TXN_ERROR | MDBX_TXN_DIRTY | MDBX_TXN_SPILLS | MDBX_TXN_HAS_CHILD |
MDBX_TXN_INVALID | txn_gc_drained MDBX_TXN_INVALID | txn_gc_drained
}; };
@ -205,17 +207,17 @@ struct MDBX_txn {
union { union {
struct { struct {
/* For read txns: This thread/txn's reader table slot, or nullptr. */ /* For read txns: This thread/txn's slot table slot, or nullptr. */
reader_slot_t *reader; reader_slot_t *slot;
} to; } ro;
struct { struct {
troika_t troika; troika_t troika;
pnl_t __restrict repnl; /* Reclaimed GC pages */ pnl_t __restrict repnl; /* Reclaimed GC pages */
struct { struct {
/* The list of reclaimed txn-ids from GC */ rkl_t reclaimed; /* The list of reclaimed txn-ids from GC, but not cleared/deleted */
txl_t __restrict retxl; rkl_t ready4reuse; /* The list of reclaimed txn-ids from GC, and cleared/deleted */
txnid_t last_reclaimed; /* ID of last used record */ uint64_t spent; /* Time spent reading and searching GC */
uint64_t time_acc; rkl_t comeback; /* The list of ids of records returned into GC during commit, etc */
} gc; } gc;
bool prefault_write_activated; bool prefault_write_activated;
#if MDBX_ENABLE_REFUND #if MDBX_ENABLE_REFUND
@ -235,7 +237,7 @@ struct MDBX_txn {
/* The list of loose pages that became unused and may be reused /* The list of loose pages that became unused and may be reused
* in this transaction, linked through `page_next()`. */ * in this transaction, linked through `page_next()`. */
page_t *__restrict loose_pages; page_t *__restrict loose_pages;
/* Number of loose pages (tw.loose_pages) */ /* Number of loose pages (wr.loose_pages) */
size_t loose_count; size_t loose_count;
union { union {
struct { struct {
@ -249,7 +251,7 @@ struct MDBX_txn {
size_t writemap_spilled_npages; size_t writemap_spilled_npages;
}; };
/* In write txns, next is located the array of cursors for each DB */ /* In write txns, next is located the array of cursors for each DB */
} tw; } wr;
}; };
}; };
@ -285,13 +287,14 @@ struct MDBX_cursor {
}; };
/* флаги проверки, в том числе биты для проверки типа листовых страниц. */ /* флаги проверки, в том числе биты для проверки типа листовых страниц. */
uint8_t checking; uint8_t checking;
uint8_t pad;
/* Указывает на txn->dbi_state[] для DBI этого курсора. /* Указывает на txn->dbi_state[] для DBI этого курсора.
* Модификатор __restrict тут полезен и безопасен в текущем понимании, * Модификатор __restrict тут полезен и безопасен в текущем понимании,
* так как пересечение возможно только с dbi_state транзакции, * так как пересечение возможно только с dbi_state транзакции,
* и происходит по-чтению до последующего изменения/записи. */ * и происходит по-чтению до последующего изменения/записи. */
uint8_t *__restrict dbi_state; uint8_t *__restrict dbi_state;
/* Связь списка отслеживания курсоров в транзакции */ /* Связь списка отслеживания курсоров в транзакции. */
MDBX_txn *txn; MDBX_txn *txn;
/* Указывает на tree->dbs[] для DBI этого курсора. */ /* Указывает на tree->dbs[] для DBI этого курсора. */
tree_t *tree; tree_t *tree;
@ -362,8 +365,7 @@ struct MDBX_env {
atomic_pgno_t mlocked_pgno; atomic_pgno_t mlocked_pgno;
uint8_t ps2ln; /* log2 of DB page size */ uint8_t ps2ln; /* log2 of DB page size */
int8_t stuck_meta; /* recovery-only: target meta page or less that zero */ int8_t stuck_meta; /* recovery-only: target meta page or less that zero */
uint16_t merge_threshold, merge_threshold_gc; /* pages emptier than this are uint16_t merge_threshold; /* pages emptier than this are candidates for merging */
candidates for merging */
unsigned max_readers; /* size of the reader table */ unsigned max_readers; /* size of the reader table */
MDBX_dbi max_dbi; /* size of the DB table */ MDBX_dbi max_dbi; /* size of the DB table */
uint32_t pid; /* process ID of this env */ uint32_t pid; /* process ID of this env */
@ -465,6 +467,9 @@ struct MDBX_env {
/* --------------------------------------------------- mostly volatile part */ /* --------------------------------------------------- mostly volatile part */
MDBX_txn *txn; /* current write transaction */ MDBX_txn *txn; /* current write transaction */
struct {
txnid_t detent;
} gc;
osal_fastmutex_t dbi_lock; osal_fastmutex_t dbi_lock;
unsigned n_dbi; /* number of DBs opened */ unsigned n_dbi; /* number of DBs opened */
@ -536,7 +541,9 @@ MDBX_MAYBE_UNUSED static void static_checks(void) {
STATIC_ASSERT(offsetof(lck_t, cached_oldest) % MDBX_CACHELINE_SIZE == 0); STATIC_ASSERT(offsetof(lck_t, cached_oldest) % MDBX_CACHELINE_SIZE == 0);
STATIC_ASSERT(offsetof(lck_t, rdt_length) % MDBX_CACHELINE_SIZE == 0); STATIC_ASSERT(offsetof(lck_t, rdt_length) % MDBX_CACHELINE_SIZE == 0);
#endif /* MDBX_LOCKING */ #endif /* MDBX_LOCKING */
#if FLEXIBLE_ARRAY_MEMBERS
STATIC_ASSERT(offsetof(lck_t, rdt) % MDBX_CACHELINE_SIZE == 0); STATIC_ASSERT(offsetof(lck_t, rdt) % MDBX_CACHELINE_SIZE == 0);
#endif /* FLEXIBLE_ARRAY_MEMBERS */
#if FLEXIBLE_ARRAY_MEMBERS #if FLEXIBLE_ARRAY_MEMBERS
STATIC_ASSERT(NODESIZE == offsetof(node_t, payload)); STATIC_ASSERT(NODESIZE == offsetof(node_t, payload));
@ -545,11 +552,7 @@ MDBX_MAYBE_UNUSED static void static_checks(void) {
STATIC_ASSERT(sizeof(clc_t) == 3 * sizeof(void *)); STATIC_ASSERT(sizeof(clc_t) == 3 * sizeof(void *));
STATIC_ASSERT(sizeof(kvx_t) == 8 * sizeof(void *)); STATIC_ASSERT(sizeof(kvx_t) == 8 * sizeof(void *));
#if MDBX_WORDBITS == 64 #define KVX_SIZE_LN2 MDBX_WORDBITS_LN2
#define KVX_SIZE_LN2 6
#else
#define KVX_SIZE_LN2 5
#endif
STATIC_ASSERT(sizeof(kvx_t) == (1u << KVX_SIZE_LN2)); STATIC_ASSERT(sizeof(kvx_t) == (1u << KVX_SIZE_LN2));
} }
#endif /* Disabled for MSVC 19.0 (VisualStudio 2015) */ #endif /* Disabled for MSVC 19.0 (VisualStudio 2015) */

View File

@ -186,7 +186,7 @@ typedef struct reader_slot {
/* The header for the reader table (a memory-mapped lock file). */ /* The header for the reader table (a memory-mapped lock file). */
typedef struct shared_lck { typedef struct shared_lck {
/* Stamp identifying this as an MDBX file. /* Stamp identifying this as an MDBX file.
* It must be set to MDBX_MAGIC with with MDBX_LOCK_VERSION. */ * It must be set to MDBX_MAGIC with MDBX_LOCK_VERSION. */
uint64_t magic_and_version; uint64_t magic_and_version;
/* Format of this lock file. Must be set to MDBX_LOCK_FORMAT. */ /* Format of this lock file. Must be set to MDBX_LOCK_FORMAT. */

View File

@ -93,10 +93,11 @@ __cold static void choice_fcntl(void) {
static int lck_op(const mdbx_filehandle_t fd, int cmd, const int lck, const off_t offset, off_t len) { static int lck_op(const mdbx_filehandle_t fd, int cmd, const int lck, const off_t offset, off_t len) {
STATIC_ASSERT(sizeof(off_t) >= sizeof(void *) && sizeof(off_t) >= sizeof(size_t)); STATIC_ASSERT(sizeof(off_t) >= sizeof(void *) && sizeof(off_t) >= sizeof(size_t));
#ifdef __ANDROID_API__ #if defined(__ANDROID_API__) && __ANDROID_API__ < 24
STATIC_ASSERT_MSG((sizeof(off_t) * 8 == MDBX_WORDBITS), "The bitness of system `off_t` type is mismatch. Please " STATIC_ASSERT_MSG((sizeof(off_t) * CHAR_BIT == MDBX_WORDBITS),
"The bitness of system `off_t` type is mismatch. Please "
"fix build and/or NDK configuration."); "fix build and/or NDK configuration.");
#endif /* Android */ #endif /* Android && API < 24 */
assert(offset >= 0 && len > 0); assert(offset >= 0 && len > 0);
assert((uint64_t)offset < (uint64_t)INT64_MAX && (uint64_t)len < (uint64_t)INT64_MAX && assert((uint64_t)offset < (uint64_t)INT64_MAX && (uint64_t)len < (uint64_t)INT64_MAX &&
(uint64_t)(offset + len) > (uint64_t)offset); (uint64_t)(offset + len) > (uint64_t)offset);
@ -108,17 +109,12 @@ static int lck_op(const mdbx_filehandle_t fd, int cmd, const int lck, const off_
jitter4testing(true); jitter4testing(true);
for (;;) { for (;;) {
MDBX_STRUCT_FLOCK lock_op; MDBX_STRUCT_FLOCK lock_op = {.l_type = lck, .l_whence = SEEK_SET, .l_start = offset, .l_len = len};
STATIC_ASSERT_MSG(sizeof(off_t) <= sizeof(lock_op.l_start) && sizeof(off_t) <= sizeof(lock_op.l_len) && STATIC_ASSERT_MSG(sizeof(off_t) <= sizeof(lock_op.l_start) && sizeof(off_t) <= sizeof(lock_op.l_len) &&
OFF_T_MAX == (off_t)OFF_T_MAX, OFF_T_MAX == (off_t)OFF_T_MAX,
"Support for large/64-bit-sized files is misconfigured " "Support for large/64-bit-sized files is misconfigured "
"for the target system and/or toolchain. " "for the target system and/or toolchain. "
"Please fix it or at least disable it completely."); "Please fix it or at least disable it completely.");
memset(&lock_op, 0, sizeof(lock_op));
lock_op.l_type = lck;
lock_op.l_whence = SEEK_SET;
lock_op.l_start = offset;
lock_op.l_len = len;
int rc = MDBX_FCNTL(fd, cmd, &lock_op); int rc = MDBX_FCNTL(fd, cmd, &lock_op);
jitter4testing(true); jitter4testing(true);
if (rc != -1) { if (rc != -1) {
@ -132,7 +128,8 @@ static int lck_op(const mdbx_filehandle_t fd, int cmd, const int lck, const off_
} }
rc = errno; rc = errno;
#if MDBX_USE_OFDLOCKS #if MDBX_USE_OFDLOCKS
if (rc == EINVAL && (cmd == MDBX_F_OFD_SETLK || cmd == MDBX_F_OFD_SETLKW || cmd == MDBX_F_OFD_GETLK)) { if (ignore_enosys_and_einval(rc) == MDBX_RESULT_TRUE &&
(cmd == MDBX_F_OFD_SETLK || cmd == MDBX_F_OFD_SETLKW || cmd == MDBX_F_OFD_GETLK)) {
/* fallback to non-OFD locks */ /* fallback to non-OFD locks */
if (cmd == MDBX_F_OFD_SETLK) if (cmd == MDBX_F_OFD_SETLK)
cmd = MDBX_F_SETLK; cmd = MDBX_F_SETLK;
@ -460,6 +457,10 @@ __cold MDBX_INTERNAL int lck_destroy(MDBX_env *env, MDBX_env *inprocess_neighbor
jitter4testing(false); jitter4testing(false);
} }
#if MDBX_LOCKING == MDBX_LOCKING_SYSV
env->me_sysv_ipc.semid = -1;
#endif /* MDBX_LOCKING */
if (current_pid != env->pid) { if (current_pid != env->pid) {
eASSERT(env, !inprocess_neighbor); eASSERT(env, !inprocess_neighbor);
NOTICE("drown env %p after-fork pid %d -> %d", __Wpedantic_format_voidptr(env), env->pid, current_pid); NOTICE("drown env %p after-fork pid %d -> %d", __Wpedantic_format_voidptr(env), env->pid, current_pid);
@ -776,14 +777,14 @@ static int osal_ipclock_lock(MDBX_env *env, osal_ipclock_t *ipc, const bool dont
return rc; return rc;
} }
int osal_ipclock_unlock(MDBX_env *env, osal_ipclock_t *ipc) { static int osal_ipclock_unlock(MDBX_env *env, osal_ipclock_t *ipc) {
int err = MDBX_ENOSYS; int err = MDBX_ENOSYS;
#if MDBX_LOCKING == MDBX_LOCKING_POSIX2001 || MDBX_LOCKING == MDBX_LOCKING_POSIX2008 #if MDBX_LOCKING == MDBX_LOCKING_POSIX2001 || MDBX_LOCKING == MDBX_LOCKING_POSIX2008
err = pthread_mutex_unlock(ipc); err = pthread_mutex_unlock(ipc);
#elif MDBX_LOCKING == MDBX_LOCKING_POSIX1988 #elif MDBX_LOCKING == MDBX_LOCKING_POSIX1988
err = sem_post(ipc) ? errno : MDBX_SUCCESS; err = sem_post(ipc) ? errno : MDBX_SUCCESS;
#elif MDBX_LOCKING == MDBX_LOCKING_SYSV #elif MDBX_LOCKING == MDBX_LOCKING_SYSV
if (unlikely(*ipc != (pid_t)env->pid)) if (unlikely(*ipc != (pid_t)env->pid || env->me_sysv_ipc.key == -1))
err = EPERM; err = EPERM;
else { else {
*ipc = 0; *ipc = 0;
@ -823,7 +824,6 @@ MDBX_INTERNAL void lck_rdt_unlock(MDBX_env *env) {
int lck_txn_lock(MDBX_env *env, bool dont_wait) { int lck_txn_lock(MDBX_env *env, bool dont_wait) {
TRACE("%swait %s", dont_wait ? "dont-" : "", ">>"); TRACE("%swait %s", dont_wait ? "dont-" : "", ">>");
eASSERT(env, env->basal_txn || (env->lck == lckless_stub(env) && (env->flags & MDBX_RDONLY)));
jitter4testing(true); jitter4testing(true);
const int err = osal_ipclock_lock(env, &env->lck->wrt_lock, dont_wait); const int err = osal_ipclock_lock(env, &env->lck->wrt_lock, dont_wait);
int rc = err; int rc = err;
@ -841,10 +841,8 @@ int lck_txn_lock(MDBX_env *env, bool dont_wait) {
void lck_txn_unlock(MDBX_env *env) { void lck_txn_unlock(MDBX_env *env) {
TRACE("%s", ">>"); TRACE("%s", ">>");
if (env->basal_txn) { if (env->basal_txn) {
eASSERT(env, !env->basal_txn || env->basal_txn->owner == osal_thread_self()); eASSERT(env, env->basal_txn->owner == osal_thread_self());
env->basal_txn->owner = 0; env->basal_txn->owner = 0;
} else {
eASSERT(env, env->lck == lckless_stub(env) && (env->flags & MDBX_RDONLY));
} }
int err = osal_ipclock_unlock(env, &env->lck->wrt_lock); int err = osal_ipclock_unlock(env, &env->lck->wrt_lock);
TRACE("<< err %d", err); TRACE("<< err %d", err);

View File

@ -87,7 +87,7 @@ int lck_txn_lock(MDBX_env *env, bool dontwait) {
} }
} }
eASSERT(env, !env->basal_txn->owner); eASSERT(env, !env->basal_txn || !env->basal_txn->owner);
if (env->flags & MDBX_EXCLUSIVE) if (env->flags & MDBX_EXCLUSIVE)
goto done; goto done;
@ -104,10 +104,11 @@ int lck_txn_lock(MDBX_env *env, bool dontwait) {
} }
if (rc == MDBX_SUCCESS) { if (rc == MDBX_SUCCESS) {
done: done:
if (env->basal_txn)
env->basal_txn->owner = osal_thread_self();
/* Zap: Failing to release lock 'env->windowsbug_lock' /* Zap: Failing to release lock 'env->windowsbug_lock'
* in function 'mdbx_txn_lock' */ * in function 'mdbx_txn_lock' */
MDBX_SUPPRESS_GOOFY_MSVC_ANALYZER(26115); MDBX_SUPPRESS_GOOFY_MSVC_ANALYZER(26115);
env->basal_txn->owner = osal_thread_self();
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
@ -116,13 +117,14 @@ int lck_txn_lock(MDBX_env *env, bool dontwait) {
} }
void lck_txn_unlock(MDBX_env *env) { void lck_txn_unlock(MDBX_env *env) {
eASSERT(env, env->basal_txn->owner == osal_thread_self()); eASSERT(env, !env->basal_txn || env->basal_txn->owner == osal_thread_self());
if ((env->flags & MDBX_EXCLUSIVE) == 0) { if ((env->flags & MDBX_EXCLUSIVE) == 0) {
const HANDLE fd4data = env->ioring.overlapped_fd ? env->ioring.overlapped_fd : env->lazy_fd; const HANDLE fd4data = env->ioring.overlapped_fd ? env->ioring.overlapped_fd : env->lazy_fd;
int err = funlock(fd4data, DXB_BODY); int err = funlock(fd4data, DXB_BODY);
if (err != MDBX_SUCCESS) if (err != MDBX_SUCCESS)
mdbx_panic("%s failed: err %u", __func__, err); mdbx_panic("%s failed: err %u", __func__, err);
} }
if (env->basal_txn)
env->basal_txn->owner = 0; env->basal_txn->owner = 0;
LeaveCriticalSection(&env->windowsbug_lock); LeaveCriticalSection(&env->windowsbug_lock);
} }

View File

@ -69,13 +69,13 @@ __cold static int lck_setup_locked(MDBX_env *env) {
return err; return err;
#ifdef MADV_DODUMP #ifdef MADV_DODUMP
err = madvise(env->lck_mmap.lck, size, MADV_DODUMP) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(env->lck_mmap.lck, size, MADV_DODUMP) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#endif /* MADV_DODUMP */ #endif /* MADV_DODUMP */
#ifdef MADV_WILLNEED #ifdef MADV_WILLNEED
err = madvise(env->lck_mmap.lck, size, MADV_WILLNEED) ? ignore_enosys(errno) : MDBX_SUCCESS; err = madvise(env->lck_mmap.lck, size, MADV_WILLNEED) ? ignore_enosys_and_eagain(errno) : MDBX_SUCCESS;
if (unlikely(MDBX_IS_ERROR(err))) if (unlikely(MDBX_IS_ERROR(err)))
return err; return err;
#elif defined(POSIX_MADV_WILLNEED) #elif defined(POSIX_MADV_WILLNEED)

View File

@ -3,6 +3,9 @@
#include "internals.h" #include "internals.h"
/*------------------------------------------------------------------------------
logging */
__cold void debug_log_va(int level, const char *function, int line, const char *fmt, va_list args) { __cold void debug_log_va(int level, const char *function, int line, const char *fmt, va_list args) {
ENSURE(nullptr, osal_fastmutex_acquire(&globals.debug_lock) == 0); ENSURE(nullptr, osal_fastmutex_acquire(&globals.debug_lock) == 0);
if (globals.logger.ptr) { if (globals.logger.ptr) {
@ -110,8 +113,49 @@ __cold const char *mdbx_dump_val(const MDBX_val *val, char *const buf, const siz
return buf; return buf;
} }
__cold static int setup_debug(MDBX_log_level_t level, MDBX_debug_flags_t flags, union logger_union logger, char *buffer,
size_t buffer_size) {
ENSURE(nullptr, osal_fastmutex_acquire(&globals.debug_lock) == 0);
const int rc = globals.runtime_flags | (globals.loglevel << 16);
if (level != MDBX_LOG_DONTCHANGE)
globals.loglevel = (uint8_t)level;
if (flags != MDBX_DBG_DONTCHANGE) {
flags &=
#if MDBX_DEBUG
MDBX_DBG_ASSERT | MDBX_DBG_AUDIT | MDBX_DBG_JITTER |
#endif
MDBX_DBG_DUMP | MDBX_DBG_LEGACY_MULTIOPEN | MDBX_DBG_LEGACY_OVERLAP | MDBX_DBG_DONT_UPGRADE;
globals.runtime_flags = (uint8_t)flags;
}
assert(MDBX_LOGGER_DONTCHANGE == ((MDBX_debug_func *)(intptr_t)-1));
if (logger.ptr != (void *)((intptr_t)-1)) {
globals.logger.ptr = logger.ptr;
globals.logger_buffer = buffer;
globals.logger_buffer_size = buffer_size;
}
ENSURE(nullptr, osal_fastmutex_release(&globals.debug_lock) == 0);
return rc;
}
__cold int mdbx_setup_debug_nofmt(MDBX_log_level_t level, MDBX_debug_flags_t flags, MDBX_debug_func_nofmt *logger,
char *buffer, size_t buffer_size) {
union logger_union thunk;
thunk.nofmt = (logger && buffer && buffer_size) ? logger : MDBX_LOGGER_NOFMT_DONTCHANGE;
return setup_debug(level, flags, thunk, buffer, buffer_size);
}
__cold int mdbx_setup_debug(MDBX_log_level_t level, MDBX_debug_flags_t flags, MDBX_debug_func *logger) {
union logger_union thunk;
thunk.fmt = logger;
return setup_debug(level, flags, thunk, nullptr, 0);
}
/*------------------------------------------------------------------------------ /*------------------------------------------------------------------------------
LY: debug stuff */ debug stuff */
__cold const char *pagetype_caption(const uint8_t type, char buf4unknown[16]) { __cold const char *pagetype_caption(const uint8_t type, char buf4unknown[16]) {
switch (type) { switch (type) {
@ -207,44 +251,3 @@ __cold void page_list(page_t *mp) {
VERBOSE("Total: header %u + contents %zu + unused %zu\n", is_dupfix_leaf(mp) ? PAGEHDRSZ : PAGEHDRSZ + mp->lower, VERBOSE("Total: header %u + contents %zu + unused %zu\n", is_dupfix_leaf(mp) ? PAGEHDRSZ : PAGEHDRSZ + mp->lower,
total, page_room(mp)); total, page_room(mp));
} }
__cold static int setup_debug(MDBX_log_level_t level, MDBX_debug_flags_t flags, union logger_union logger, char *buffer,
size_t buffer_size) {
ENSURE(nullptr, osal_fastmutex_acquire(&globals.debug_lock) == 0);
const int rc = globals.runtime_flags | (globals.loglevel << 16);
if (level != MDBX_LOG_DONTCHANGE)
globals.loglevel = (uint8_t)level;
if (flags != MDBX_DBG_DONTCHANGE) {
flags &=
#if MDBX_DEBUG
MDBX_DBG_ASSERT | MDBX_DBG_AUDIT | MDBX_DBG_JITTER |
#endif
MDBX_DBG_DUMP | MDBX_DBG_LEGACY_MULTIOPEN | MDBX_DBG_LEGACY_OVERLAP | MDBX_DBG_DONT_UPGRADE;
globals.runtime_flags = (uint8_t)flags;
}
assert(MDBX_LOGGER_DONTCHANGE == ((MDBX_debug_func *)(intptr_t)-1));
if (logger.ptr != (void *)((intptr_t)-1)) {
globals.logger.ptr = logger.ptr;
globals.logger_buffer = buffer;
globals.logger_buffer_size = buffer_size;
}
ENSURE(nullptr, osal_fastmutex_release(&globals.debug_lock) == 0);
return rc;
}
__cold int mdbx_setup_debug_nofmt(MDBX_log_level_t level, MDBX_debug_flags_t flags, MDBX_debug_func_nofmt *logger,
char *buffer, size_t buffer_size) {
union logger_union thunk;
thunk.nofmt = (logger && buffer && buffer_size) ? logger : MDBX_LOGGER_NOFMT_DONTCHANGE;
return setup_debug(level, flags, thunk, buffer, buffer_size);
}
__cold int mdbx_setup_debug(MDBX_log_level_t level, MDBX_debug_flags_t flags, MDBX_debug_func *logger) {
union logger_union thunk;
thunk.fmt = logger;
return setup_debug(level, flags, thunk, nullptr, 0);
}

View File

@ -1,6 +1,6 @@
.\" Copyright 2015-2025 Leonid Yuriev <leo@yuriev.ru>. .\" Copyright 2015-2025 Leonid Yuriev <leo@yuriev.ru>.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_CHK 1 "2024-08-29" "MDBX 0.13" .TH MDBX_CHK 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_chk \- MDBX checking tool mdbx_chk \- MDBX checking tool
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -2,7 +2,7 @@
.\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>. .\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>.
.\" Copyright 2012-2015 Howard Chu, Symas Corp. All Rights Reserved. .\" Copyright 2012-2015 Howard Chu, Symas Corp. All Rights Reserved.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_COPY 1 "2024-08-29" "MDBX 0.13" .TH MDBX_COPY 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_copy \- MDBX environment copy tool mdbx_copy \- MDBX environment copy tool
.SH SYNOPSIS .SH SYNOPSIS
@ -14,6 +14,8 @@ mdbx_copy \- MDBX environment copy tool
[\c [\c
.BR \-c ] .BR \-c ]
[\c [\c
.BR \-f ]
[\c
.BR \-d ] .BR \-d ]
[\c [\c
.BR \-p ] .BR \-p ]
@ -49,6 +51,9 @@ or unused pages will be omitted from the copy. This option will
slow down the backup process as it is more CPU-intensive. slow down the backup process as it is more CPU-intensive.
Currently it fails if the environment has suffered a page leak. Currently it fails if the environment has suffered a page leak.
.TP .TP
.BR \-f
Silently overwrite the target file, if it exists, instead of reaching an error.
.TP
.BR \-d .BR \-d
Alters geometry to enforce the copy to be a dynamic size DB, Alters geometry to enforce the copy to be a dynamic size DB,
which could be growth and shrink by reasonable steps on the fly. which could be growth and shrink by reasonable steps on the fly.

View File

@ -1,7 +1,7 @@
.\" Copyright 2021-2025 Leonid Yuriev <leo@yuriev.ru>. .\" Copyright 2021-2025 Leonid Yuriev <leo@yuriev.ru>.
.\" Copyright 2014-2021 Howard Chu, Symas Corp. All Rights Reserved. .\" Copyright 2014-2021 Howard Chu, Symas Corp. All Rights Reserved.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_DROP 1 "2024-08-29" "MDBX 0.13" .TH MDBX_DROP 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_drop \- MDBX database delete tool mdbx_drop \- MDBX database delete tool
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -2,7 +2,7 @@
.\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>. .\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>.
.\" Copyright 2014-2015 Howard Chu, Symas Corp. All Rights Reserved. .\" Copyright 2014-2015 Howard Chu, Symas Corp. All Rights Reserved.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_DUMP 1 "2024-08-29" "MDBX 0.13" .TH MDBX_DUMP 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_dump \- MDBX environment export tool mdbx_dump \- MDBX environment export tool
.SH SYNOPSIS .SH SYNOPSIS
@ -12,6 +12,8 @@ mdbx_dump \- MDBX environment export tool
[\c [\c
.BR \-q ] .BR \-q ]
[\c [\c
.BR \-c ]
[\c
.BI \-f \ file\fR] .BI \-f \ file\fR]
[\c [\c
.BR \-l ] .BR \-l ]
@ -41,6 +43,9 @@ Write the library version number to the standard output, and exit.
.BR \-q .BR \-q
Be quiet. Be quiet.
.TP .TP
.BR \-c
Concise mode without repeating keys in a dump, but incompatible with Berkeley DB and LMDB.
.TP
.BR \-f \ file .BR \-f \ file
Write to the specified file instead of to the standard output. Write to the specified file instead of to the standard output.
.TP .TP

View File

@ -2,7 +2,7 @@
.\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>. .\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>.
.\" Copyright 2014-2015 Howard Chu, Symas Corp. All Rights Reserved. .\" Copyright 2014-2015 Howard Chu, Symas Corp. All Rights Reserved.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_LOAD 1 "2024-08-29" "MDBX 0.13" .TH MDBX_LOAD 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_load \- MDBX environment import tool mdbx_load \- MDBX environment import tool
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -2,7 +2,7 @@
.\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>. .\" Copyright 2015,2016 Peter-Service R&D LLC <http://billing.ru/>.
.\" Copyright 2012-2015 Howard Chu, Symas Corp. All Rights Reserved. .\" Copyright 2012-2015 Howard Chu, Symas Corp. All Rights Reserved.
.\" Copying restrictions apply. See COPYRIGHT/LICENSE. .\" Copying restrictions apply. See COPYRIGHT/LICENSE.
.TH MDBX_STAT 1 "2024-08-29" "MDBX 0.13" .TH MDBX_STAT 1 "2025-01-14" "MDBX 0.14"
.SH NAME .SH NAME
mdbx_stat \- MDBX environment status tool mdbx_stat \- MDBX environment status tool
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -252,9 +252,9 @@ __cold int meta_wipe_steady(MDBX_env *env, txnid_t inclusive_upto) {
/* force oldest refresh */ /* force oldest refresh */
atomic_store32(&env->lck->rdt_refresh_flag, true, mo_Relaxed); atomic_store32(&env->lck->rdt_refresh_flag, true, mo_Relaxed);
env->basal_txn->tw.troika = meta_tap(env); env->basal_txn->wr.troika = meta_tap(env);
for (MDBX_txn *scan = env->basal_txn->nested; scan; scan = scan->nested) for (MDBX_txn *scan = env->basal_txn->nested; scan; scan = scan->nested)
scan->tw.troika = env->basal_txn->tw.troika; scan->wr.troika = env->basal_txn->wr.troika;
return err; return err;
} }

View File

@ -50,23 +50,23 @@ bsr_t mvcc_bind_slot(MDBX_env *env) {
} }
} }
result.rslot = &env->lck->rdt[slot]; result.slot = &env->lck->rdt[slot];
/* Claim the reader slot, carefully since other code /* Claim the reader slot, carefully since other code
* uses the reader table un-mutexed: First reset the * uses the reader table un-mutexed: First reset the
* slot, next publish it in lck->rdt_length. After * slot, next publish it in lck->rdt_length. After
* that, it is safe for mdbx_env_close() to touch it. * that, it is safe for mdbx_env_close() to touch it.
* When it will be closed, we can finally claim it. */ * When it will be closed, we can finally claim it. */
atomic_store32(&result.rslot->pid, 0, mo_AcquireRelease); atomic_store32(&result.slot->pid, 0, mo_AcquireRelease);
safe64_reset(&result.rslot->txnid, true); safe64_reset(&result.slot->txnid, true);
if (slot == nreaders) if (slot == nreaders)
env->lck->rdt_length.weak = (uint32_t)++nreaders; env->lck->rdt_length.weak = (uint32_t)++nreaders;
result.rslot->tid.weak = (env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self(); result.slot->tid.weak = (env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self();
atomic_store32(&result.rslot->pid, env->pid, mo_AcquireRelease); atomic_store32(&result.slot->pid, env->pid, mo_AcquireRelease);
lck_rdt_unlock(env); lck_rdt_unlock(env);
if (likely(env->flags & ENV_TXKEY)) { if (likely(env->flags & ENV_TXKEY)) {
eASSERT(env, env->registered_reader_pid == env->pid); eASSERT(env, env->registered_reader_pid == env->pid);
thread_rthc_set(env->me_txkey, result.rslot); thread_rthc_set(env->me_txkey, result.slot);
} }
return result; return result;
} }
@ -300,7 +300,7 @@ __cold MDBX_INTERNAL int mvcc_cleanup_dead(MDBX_env *env, int rdt_locked, int *d
return rc; return rc;
} }
__cold txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler) { __cold bool mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler) {
DEBUG("DB size maxed out by reading #%" PRIaTXN, straggler); DEBUG("DB size maxed out by reading #%" PRIaTXN, straggler);
osal_memory_fence(mo_AcquireRelease, false); osal_memory_fence(mo_AcquireRelease, false);
MDBX_hsr_func *const callback = env->hsr_callback; MDBX_hsr_func *const callback = env->hsr_callback;
@ -308,7 +308,7 @@ __cold txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler) {
bool notify_eof_of_loop = false; bool notify_eof_of_loop = false;
int retry = 0; int retry = 0;
do { do {
const txnid_t steady = env->txn->tw.troika.txnid[env->txn->tw.troika.prefer_steady]; const txnid_t steady = env->txn->wr.troika.txnid[env->txn->wr.troika.prefer_steady];
env->lck->rdt_refresh_flag.weak = /* force refresh */ true; env->lck->rdt_refresh_flag.weak = /* force refresh */ true;
oldest = mvcc_shapshot_oldest(env, steady); oldest = mvcc_shapshot_oldest(env, steady);
eASSERT(env, oldest < env->basal_txn->txnid); eASSERT(env, oldest < env->basal_txn->txnid);
@ -374,7 +374,7 @@ __cold txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler) {
if (safe64_read(&stucked->txnid) != straggler || !pid) if (safe64_read(&stucked->txnid) != straggler || !pid)
continue; continue;
const meta_ptr_t head = meta_recent(env, &env->txn->tw.troika); const meta_ptr_t head = meta_recent(env, &env->txn->wr.troika);
const txnid_t gap = (head.txnid - straggler) / xMDBX_TXNID_STEP; const txnid_t gap = (head.txnid - straggler) / xMDBX_TXNID_STEP;
const uint64_t head_retired = unaligned_peek_u64(4, head.ptr_c->pages_retired); const uint64_t head_retired = unaligned_peek_u64(4, head.ptr_c->pages_retired);
const size_t space = (head_retired > hold_retired) ? pgno2bytes(env, (pgno_t)(head_retired - hold_retired)) : 0; const size_t space = (head_retired > hold_retired) ? pgno2bytes(env, (pgno_t)(head_retired - hold_retired)) : 0;
@ -410,5 +410,5 @@ __cold txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler) {
NOTICE("hsr-kick: done turn %" PRIaTXN " -> %" PRIaTXN " +%" PRIaTXN, straggler, oldest, turn); NOTICE("hsr-kick: done turn %" PRIaTXN " -> %" PRIaTXN " +%" PRIaTXN, straggler, oldest, turn);
callback(env, env->txn, 0, 0, straggler, (turn < UINT_MAX) ? (unsigned)turn : UINT_MAX, 0, -retry); callback(env, env->txn, 0, 0, straggler, (turn < UINT_MAX) ? (unsigned)turn : UINT_MAX, 0, -retry);
} }
return oldest; return oldest > straggler;
} }

View File

@ -50,14 +50,9 @@ int __must_check_result node_add_branch(MDBX_cursor *mc, size_t indx, const MDBX
is_subpage(mp) ? "sub-" : "", mp->pgno, indx, pgno, key ? key->iov_len : 0, DKEY_DEBUG(key)); is_subpage(mp) ? "sub-" : "", mp->pgno, indx, pgno, key ? key->iov_len : 0, DKEY_DEBUG(key));
cASSERT(mc, page_type(mp) == P_BRANCH); cASSERT(mc, page_type(mp) == P_BRANCH);
cASSERT(mc, mp->txnid >= mc->txn->front_txnid);
STATIC_ASSERT(NODESIZE % 2 == 0); STATIC_ASSERT(NODESIZE % 2 == 0);
/* Move higher pointers up one slot. */
const size_t nkeys = page_numkeys(mp);
cASSERT(mc, nkeys >= indx);
for (size_t i = nkeys; i > indx; --i)
mp->entries[i] = mp->entries[i - 1];
/* Adjust free space offsets. */ /* Adjust free space offsets. */
const size_t branch_bytes = branch_size(mc->txn->env, key); const size_t branch_bytes = branch_size(mc->txn->env, key);
const intptr_t lower = mp->lower + sizeof(indx_t); const intptr_t lower = mp->lower + sizeof(indx_t);
@ -66,6 +61,13 @@ int __must_check_result node_add_branch(MDBX_cursor *mc, size_t indx, const MDBX
mc->txn->flags |= MDBX_TXN_ERROR; mc->txn->flags |= MDBX_TXN_ERROR;
return MDBX_PAGE_FULL; return MDBX_PAGE_FULL;
} }
/* Move higher pointers up one slot. */
const size_t nkeys = page_numkeys(mp);
cASSERT(mc, nkeys >= indx);
for (size_t i = nkeys; i > indx; --i)
mp->entries[i] = mp->entries[i - 1];
mp->lower = (indx_t)lower; mp->lower = (indx_t)lower;
mp->entries[indx] = mp->upper = (indx_t)upper; mp->entries[indx] = mp->upper = (indx_t)upper;

View File

@ -17,7 +17,6 @@ MDBX_NOTHROW_PURE_FUNCTION static inline pgno_t node_pgno(const node_t *const __
/* Set the page number in a branch node */ /* Set the page number in a branch node */
static inline void node_set_pgno(node_t *const __restrict node, pgno_t pgno) { static inline void node_set_pgno(node_t *const __restrict node, pgno_t pgno) {
assert(pgno >= MIN_PAGENO && pgno <= MAX_PAGENO); assert(pgno >= MIN_PAGENO && pgno <= MAX_PAGENO);
UNALIGNED_POKE_32(node, node_t, child_pgno, (uint32_t)pgno); UNALIGNED_POKE_32(node, node_t, child_pgno, (uint32_t)pgno);
} }

View File

@ -257,6 +257,14 @@
#error MDBX_HAVE_BUILTIN_CPU_SUPPORTS must be defined as 0 or 1 #error MDBX_HAVE_BUILTIN_CPU_SUPPORTS must be defined as 0 or 1
#endif /* MDBX_HAVE_BUILTIN_CPU_SUPPORTS */ #endif /* MDBX_HAVE_BUILTIN_CPU_SUPPORTS */
/** if enabled then treats the commit of pure (nothing changes) transactions as special
* cases and return \ref MDBX_RESULT_TRUE instead of \ref MDBX_SUCCESS. */
#ifndef MDBX_NOSUCCESS_PURE_COMMIT
#define MDBX_NOSUCCESS_PURE_COMMIT 0
#elif !(MDBX_NOSUCCESS_PURE_COMMIT == 0 || MDBX_NOSUCCESS_PURE_COMMIT == 1)
#error MDBX_NOSUCCESS_PURE_COMMIT must be defined as 0 or 1
#endif /* MDBX_NOSUCCESS_PURE_COMMIT */
/** if enabled then instead of the returned error `MDBX_REMOTE`, only a warning is issued, when /** if enabled then instead of the returned error `MDBX_REMOTE`, only a warning is issued, when
* the database being opened in non-read-only mode is located in a file system exported via NFS. */ * the database being opened in non-read-only mode is located in a file system exported via NFS. */
#ifndef MDBX_ENABLE_NON_READONLY_EXPORT #ifndef MDBX_ENABLE_NON_READONLY_EXPORT

View File

@ -248,7 +248,7 @@ __cold void mdbx_panic(const char *fmt, ...) {
unlikely(num < 1 || !message) ? "<troubles with panic-message preparation>" : message; unlikely(num < 1 || !message) ? "<troubles with panic-message preparation>" : message;
if (globals.logger.ptr) if (globals.logger.ptr)
debug_log(MDBX_LOG_FATAL, "panic", 0, "%s", const_message); debug_log(MDBX_LOG_FATAL, "mdbx-panic", 0, "%s", const_message);
while (1) { while (1) {
#if defined(_WIN32) || defined(_WIN64) #if defined(_WIN32) || defined(_WIN64)
@ -262,7 +262,7 @@ __cold void mdbx_panic(const char *fmt, ...) {
#endif #endif
FatalExit(ERROR_UNHANDLED_ERROR); FatalExit(ERROR_UNHANDLED_ERROR);
#else #else
__assert_fail(const_message, "mdbx", 0, "panic"); __assert_fail(const_message, "mdbx-panic", 0, const_message);
abort(); abort();
#endif #endif
} }
@ -1198,29 +1198,29 @@ MDBX_INTERNAL int osal_openfile(const enum osal_openfile_purpose purpose, const
break; break;
case MDBX_OPEN_DXB_OVERLAPPED_DIRECT: case MDBX_OPEN_DXB_OVERLAPPED_DIRECT:
FlagsAndAttributes |= FILE_FLAG_NO_BUFFERING; FlagsAndAttributes |= FILE_FLAG_NO_BUFFERING;
/* fall through */ __fallthrough /* fall through */;
__fallthrough;
case MDBX_OPEN_DXB_OVERLAPPED: case MDBX_OPEN_DXB_OVERLAPPED:
FlagsAndAttributes |= FILE_FLAG_OVERLAPPED; FlagsAndAttributes |= FILE_FLAG_OVERLAPPED;
/* fall through */ __fallthrough /* fall through */;
__fallthrough;
case MDBX_OPEN_DXB_DSYNC: case MDBX_OPEN_DXB_DSYNC:
CreationDisposition = OPEN_EXISTING; CreationDisposition = OPEN_EXISTING;
DesiredAccess |= GENERIC_WRITE | GENERIC_READ; DesiredAccess |= GENERIC_WRITE | GENERIC_READ;
FlagsAndAttributes |= FILE_FLAG_WRITE_THROUGH; FlagsAndAttributes |= FILE_FLAG_WRITE_THROUGH;
break; break;
case MDBX_OPEN_COPY:
CreationDisposition = CREATE_NEW;
ShareMode = 0;
DesiredAccess |= GENERIC_WRITE;
if (env->ps >= globals.sys_pagesize)
FlagsAndAttributes |= FILE_FLAG_NO_BUFFERING;
break;
case MDBX_OPEN_DELETE: case MDBX_OPEN_DELETE:
CreationDisposition = OPEN_EXISTING; CreationDisposition = OPEN_EXISTING;
ShareMode |= FILE_SHARE_DELETE; ShareMode |= FILE_SHARE_DELETE;
DesiredAccess = FILE_READ_ATTRIBUTES | FILE_WRITE_ATTRIBUTES | DELETE | SYNCHRONIZE; DesiredAccess = FILE_READ_ATTRIBUTES | FILE_WRITE_ATTRIBUTES | DELETE | SYNCHRONIZE;
break; break;
case MDBX_OPEN_COPY_EXCL:
CreationDisposition = CREATE_NEW;
__fallthrough /* fall through */;
case MDBX_OPEN_COPY_OVERWRITE:
ShareMode = 0;
DesiredAccess |= GENERIC_WRITE;
if (env->ps >= globals.sys_pagesize)
FlagsAndAttributes |= FILE_FLAG_NO_BUFFERING;
break;
} }
*fd = CreateFileW(pathname, DesiredAccess, ShareMode, nullptr, CreationDisposition, FlagsAndAttributes, nullptr); *fd = CreateFileW(pathname, DesiredAccess, ShareMode, nullptr, CreationDisposition, FlagsAndAttributes, nullptr);
@ -1260,9 +1260,6 @@ MDBX_INTERNAL int osal_openfile(const enum osal_openfile_purpose purpose, const
case MDBX_OPEN_DXB_LAZY: case MDBX_OPEN_DXB_LAZY:
flags |= O_RDWR; flags |= O_RDWR;
break; break;
case MDBX_OPEN_COPY:
flags = O_CREAT | O_WRONLY | O_EXCL;
break;
case MDBX_OPEN_DXB_DSYNC: case MDBX_OPEN_DXB_DSYNC:
flags |= O_WRONLY; flags |= O_WRONLY;
#if defined(O_DSYNC) #if defined(O_DSYNC)
@ -1276,9 +1273,14 @@ MDBX_INTERNAL int osal_openfile(const enum osal_openfile_purpose purpose, const
case MDBX_OPEN_DELETE: case MDBX_OPEN_DELETE:
flags = O_RDWR; flags = O_RDWR;
break; break;
case MDBX_OPEN_COPY_EXCL:
flags |= O_EXCL;
__fallthrough /* fall through */;
case MDBX_OPEN_COPY_OVERWRITE:
flags |= O_WRONLY;
} }
const bool direct_nocache_for_copy = env->ps >= globals.sys_pagesize && purpose == MDBX_OPEN_COPY; const bool direct_nocache_for_copy = env->ps >= globals.sys_pagesize && purpose >= MDBX_OPEN_COPY_EXCL;
if (direct_nocache_for_copy) { if (direct_nocache_for_copy) {
#if defined(O_DIRECT) #if defined(O_DIRECT)
flags |= O_DIRECT; flags |= O_DIRECT;

View File

@ -193,7 +193,14 @@ typedef struct osal_mmap {
#elif defined(__ANDROID_API__) #elif defined(__ANDROID_API__)
#if __ANDROID_API__ < 24 #if __ANDROID_API__ < 24
/* https://android-developers.googleblog.com/2017/09/introducing-android-native-development.html
* https://android.googlesource.com/platform/bionic/+/master/docs/32-bit-abi.md */
#define MDBX_HAVE_PWRITEV 0 #define MDBX_HAVE_PWRITEV 0
#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS != MDBX_WORDBITS
#error "_FILE_OFFSET_BITS != MDBX_WORDBITS and __ANDROID_API__ < 24" (_FILE_OFFSET_BITS != MDBX_WORDBITS)
#elif defined(__FILE_OFFSET_BITS) && __FILE_OFFSET_BITS != MDBX_WORDBITS
#error "__FILE_OFFSET_BITS != MDBX_WORDBITS and __ANDROID_API__ < 24" (__FILE_OFFSET_BITS != MDBX_WORDBITS)
#endif
#else #else
#define MDBX_HAVE_PWRITEV 1 #define MDBX_HAVE_PWRITEV 1
#endif #endif
@ -439,8 +446,9 @@ enum osal_openfile_purpose {
MDBX_OPEN_DXB_OVERLAPPED_DIRECT, MDBX_OPEN_DXB_OVERLAPPED_DIRECT,
#endif /* Windows */ #endif /* Windows */
MDBX_OPEN_LCK, MDBX_OPEN_LCK,
MDBX_OPEN_COPY, MDBX_OPEN_DELETE,
MDBX_OPEN_DELETE MDBX_OPEN_COPY_EXCL,
MDBX_OPEN_COPY_OVERWRITE,
}; };
MDBX_MAYBE_UNUSED static inline bool osal_isdirsep(pathchar_t c) { MDBX_MAYBE_UNUSED static inline bool osal_isdirsep(pathchar_t c) {

View File

@ -443,8 +443,8 @@ static __always_inline pgr_t page_get_inline(const uint16_t ILL, const MDBX_curs
const size_t i = dpl_search(spiller, pgno); const size_t i = dpl_search(spiller, pgno);
tASSERT(txn, (intptr_t)i > 0); tASSERT(txn, (intptr_t)i > 0);
if (spiller->tw.dirtylist->items[i].pgno == pgno) { if (spiller->wr.dirtylist->items[i].pgno == pgno) {
r.page = spiller->tw.dirtylist->items[i].ptr; r.page = spiller->wr.dirtylist->items[i].ptr;
break; break;
} }
@ -457,6 +457,8 @@ static __always_inline pgr_t page_get_inline(const uint16_t ILL, const MDBX_curs
goto bailout; goto bailout;
} }
TRACE("dbi %zu, mc %p, page %u, %p", cursor_dbi(mc), __Wpedantic_format_voidptr(mc), pgno,
__Wpedantic_format_voidptr(r.page));
if (unlikely(mc->checking & z_pagecheck)) if (unlikely(mc->checking & z_pagecheck))
return check_page_complete(ILL, r.page, mc, front); return check_page_complete(ILL, r.page, mc, front);

View File

@ -144,14 +144,14 @@ __cold pgr_t __must_check_result page_unspill(MDBX_txn *const txn, const page_t
} }
__hot int page_touch_modifable(MDBX_txn *txn, const page_t *const mp) { __hot int page_touch_modifable(MDBX_txn *txn, const page_t *const mp) {
tASSERT(txn, is_modifable(txn, mp) && txn->tw.dirtylist); tASSERT(txn, is_modifable(txn, mp) && txn->wr.dirtylist);
tASSERT(txn, !is_largepage(mp) && !is_subpage(mp)); tASSERT(txn, !is_largepage(mp) && !is_subpage(mp));
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
const size_t n = dpl_search(txn, mp->pgno); const size_t n = dpl_search(txn, mp->pgno);
if (MDBX_AVOID_MSYNC && unlikely(txn->tw.dirtylist->items[n].pgno != mp->pgno)) { if (MDBX_AVOID_MSYNC && unlikely(txn->wr.dirtylist->items[n].pgno != mp->pgno)) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP)); tASSERT(txn, (txn->flags & MDBX_WRITEMAP));
tASSERT(txn, n > 0 && n <= txn->tw.dirtylist->length + 1); tASSERT(txn, n > 0 && n <= txn->wr.dirtylist->length + 1);
VERBOSE("unspill page %" PRIaPGNO, mp->pgno); VERBOSE("unspill page %" PRIaPGNO, mp->pgno);
#if MDBX_ENABLE_PGOP_STAT #if MDBX_ENABLE_PGOP_STAT
txn->env->lck->pgops.unspill.weak += 1; txn->env->lck->pgops.unspill.weak += 1;
@ -159,11 +159,11 @@ __hot int page_touch_modifable(MDBX_txn *txn, const page_t *const mp) {
return page_dirty(txn, (page_t *)mp, 1); return page_dirty(txn, (page_t *)mp, 1);
} }
tASSERT(txn, n > 0 && n <= txn->tw.dirtylist->length); tASSERT(txn, n > 0 && n <= txn->wr.dirtylist->length);
tASSERT(txn, txn->tw.dirtylist->items[n].pgno == mp->pgno && txn->tw.dirtylist->items[n].ptr == mp); tASSERT(txn, txn->wr.dirtylist->items[n].pgno == mp->pgno && txn->wr.dirtylist->items[n].ptr == mp);
if (!MDBX_AVOID_MSYNC || (txn->flags & MDBX_WRITEMAP) == 0) { if (!MDBX_AVOID_MSYNC || (txn->flags & MDBX_WRITEMAP) == 0) {
size_t *const ptr = ptr_disp(txn->tw.dirtylist->items[n].ptr, -(ptrdiff_t)sizeof(size_t)); size_t *const ptr = ptr_disp(txn->wr.dirtylist->items[n].ptr, -(ptrdiff_t)sizeof(size_t));
*ptr = txn->tw.dirtylru; *ptr = txn->wr.dirtylru;
} }
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
@ -179,19 +179,22 @@ __hot int page_touch_unmodifable(MDBX_txn *txn, MDBX_cursor *mc, const page_t *c
page_t *np; page_t *np;
if (is_frozen(txn, mp)) { if (is_frozen(txn, mp)) {
/* CoW the page */ /* CoW the page */
rc = pnl_need(&txn->tw.retired_pages, 1);
if (unlikely(rc != MDBX_SUCCESS))
goto fail;
const pgr_t par = gc_alloc_single(mc); const pgr_t par = gc_alloc_single(mc);
rc = par.err; rc = par.err;
np = par.page; np = par.page;
if (unlikely(rc != MDBX_SUCCESS)) {
if (likely(mc->dbi_state != txn->dbi_state) || (rc != MDBX_MAP_FULL && rc != MDBX_BACKLOG_DEPLETED))
goto fail;
return rc;
}
rc = pnl_append(&txn->wr.retired_pages, mp->pgno);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto fail; goto fail;
const pgno_t pgno = np->pgno; const pgno_t pgno = np->pgno;
DEBUG("touched db %d page %" PRIaPGNO " -> %" PRIaPGNO, cursor_dbi_dbg(mc), mp->pgno, pgno); DEBUG("touched db %d page %" PRIaPGNO " -> %" PRIaPGNO, cursor_dbi_dbg(mc), mp->pgno, pgno);
tASSERT(txn, mp->pgno != pgno); tASSERT(txn, mp->pgno != pgno);
pnl_append_prereserved(txn->tw.retired_pages, mp->pgno);
/* Update the parent page, if any, to point to the new page */ /* Update the parent page, if any, to point to the new page */
if (likely(mc->top)) { if (likely(mc->top)) {
page_t *parent = mc->pg[mc->top - 1]; page_t *parent = mc->pg[mc->top - 1];
@ -227,7 +230,7 @@ __hot int page_touch_unmodifable(MDBX_txn *txn, MDBX_cursor *mc, const page_t *c
} }
DEBUG("clone db %d page %" PRIaPGNO, cursor_dbi_dbg(mc), mp->pgno); DEBUG("clone db %d page %" PRIaPGNO, cursor_dbi_dbg(mc), mp->pgno);
tASSERT(txn, txn->tw.dirtylist->length <= PAGELIST_LIMIT + MDBX_PNL_GRANULATE); tASSERT(txn, txn->wr.dirtylist->length <= PAGELIST_LIMIT + MDBX_PNL_GRANULATE);
/* No - copy it */ /* No - copy it */
np = page_shadow_alloc(txn, 1); np = page_shadow_alloc(txn, 1);
if (unlikely(!np)) { if (unlikely(!np)) {
@ -369,7 +372,7 @@ static inline bool suitable4loose(const MDBX_txn *txn, pgno_t pgno) {
* страница не примыкает к какой-либо из уже находящийся в reclaimed. * страница не примыкает к какой-либо из уже находящийся в reclaimed.
* 2) стоит подумать над тем, чтобы при большом loose-списке отбрасывать * 2) стоит подумать над тем, чтобы при большом loose-списке отбрасывать
половину в reclaimed. */ половину в reclaimed. */
return txn->tw.loose_count < txn->env->options.dp_loose_limit && return txn->wr.loose_count < txn->env->options.dp_loose_limit &&
(!MDBX_ENABLE_REFUND || (!MDBX_ENABLE_REFUND ||
/* skip pages near to the end in favor of compactification */ /* skip pages near to the end in favor of compactification */
txn->geo.first_unallocated > pgno + txn->env->options.dp_loose_limit || txn->geo.first_unallocated > pgno + txn->env->options.dp_loose_limit ||
@ -417,14 +420,14 @@ int page_retire_ex(MDBX_cursor *mc, const pgno_t pgno, page_t *mp /* maybe null
status = frozen; status = frozen;
if (ASSERT_ENABLED()) { if (ASSERT_ENABLED()) {
for (MDBX_txn *scan = txn; scan; scan = scan->parent) { for (MDBX_txn *scan = txn; scan; scan = scan->parent) {
tASSERT(txn, !txn->tw.spilled.list || !spill_search(scan, pgno)); tASSERT(txn, !txn->wr.spilled.list || !spill_search(scan, pgno));
tASSERT(txn, !scan->tw.dirtylist || !debug_dpl_find(scan, pgno)); tASSERT(txn, !scan->wr.dirtylist || !debug_dpl_find(scan, pgno));
} }
} }
goto status_done; goto status_done;
} else if (pageflags && txn->tw.dirtylist) { } else if (pageflags && txn->wr.dirtylist) {
if ((di = dpl_exist(txn, pgno)) != 0) { if ((di = dpl_exist(txn, pgno)) != 0) {
mp = txn->tw.dirtylist->items[di].ptr; mp = txn->wr.dirtylist->items[di].ptr;
tASSERT(txn, is_modifable(txn, mp)); tASSERT(txn, is_modifable(txn, mp));
status = modifable; status = modifable;
goto status_done; goto status_done;
@ -461,16 +464,16 @@ int page_retire_ex(MDBX_cursor *mc, const pgno_t pgno, page_t *mp /* maybe null
tASSERT(txn, !is_spilled(txn, mp)); tASSERT(txn, !is_spilled(txn, mp));
tASSERT(txn, !is_shadowed(txn, mp)); tASSERT(txn, !is_shadowed(txn, mp));
tASSERT(txn, !debug_dpl_find(txn, pgno)); tASSERT(txn, !debug_dpl_find(txn, pgno));
tASSERT(txn, !txn->tw.spilled.list || !spill_search(txn, pgno)); tASSERT(txn, !txn->wr.spilled.list || !spill_search(txn, pgno));
} else if (is_modifable(txn, mp)) { } else if (is_modifable(txn, mp)) {
status = modifable; status = modifable;
if (txn->tw.dirtylist) if (txn->wr.dirtylist)
di = dpl_exist(txn, pgno); di = dpl_exist(txn, pgno);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) || !is_spilled(txn, mp)); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) || !is_spilled(txn, mp));
tASSERT(txn, !txn->tw.spilled.list || !spill_search(txn, pgno)); tASSERT(txn, !txn->wr.spilled.list || !spill_search(txn, pgno));
} else if (is_shadowed(txn, mp)) { } else if (is_shadowed(txn, mp)) {
status = shadowed; status = shadowed;
tASSERT(txn, !txn->tw.spilled.list || !spill_search(txn, pgno)); tASSERT(txn, !txn->wr.spilled.list || !spill_search(txn, pgno));
tASSERT(txn, !debug_dpl_find(txn, pgno)); tASSERT(txn, !debug_dpl_find(txn, pgno));
} else { } else {
tASSERT(txn, is_spilled(txn, mp)); tASSERT(txn, is_spilled(txn, mp));
@ -504,7 +507,7 @@ status_done:
if (status == frozen) { if (status == frozen) {
retire: retire:
DEBUG("retire %zu page %" PRIaPGNO, npages, pgno); DEBUG("retire %zu page %" PRIaPGNO, npages, pgno);
rc = pnl_append_span(&txn->tw.retired_pages, pgno, npages); rc = pnl_append_span(&txn->wr.retired_pages, pgno, npages);
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
return rc; return rc;
} }
@ -560,17 +563,17 @@ status_done:
if (status == modifable) { if (status == modifable) {
/* Dirty page from this transaction */ /* Dirty page from this transaction */
/* If suitable we can reuse it through loose list */ /* If suitable we can reuse it through loose list */
if (likely(npages == 1 && suitable4loose(txn, pgno)) && (di || !txn->tw.dirtylist)) { if (likely(npages == 1 && suitable4loose(txn, pgno)) && (di || !txn->wr.dirtylist)) {
DEBUG("loosen dirty page %" PRIaPGNO, pgno); DEBUG("loosen dirty page %" PRIaPGNO, pgno);
if (MDBX_DEBUG != 0 || unlikely(txn->env->flags & MDBX_PAGEPERTURB)) if (MDBX_DEBUG != 0 || unlikely(txn->env->flags & MDBX_PAGEPERTURB))
memset(page_data(mp), -1, txn->env->ps - PAGEHDRSZ); memset(page_data(mp), -1, txn->env->ps - PAGEHDRSZ);
mp->txnid = INVALID_TXNID; mp->txnid = INVALID_TXNID;
mp->flags = P_LOOSE; mp->flags = P_LOOSE;
page_next(mp) = txn->tw.loose_pages; page_next(mp) = txn->wr.loose_pages;
txn->tw.loose_pages = mp; txn->wr.loose_pages = mp;
txn->tw.loose_count++; txn->wr.loose_count++;
#if MDBX_ENABLE_REFUND #if MDBX_ENABLE_REFUND
txn->tw.loose_refund_wl = (pgno + 2 > txn->tw.loose_refund_wl) ? pgno + 2 : txn->tw.loose_refund_wl; txn->wr.loose_refund_wl = (pgno + 2 > txn->wr.loose_refund_wl) ? pgno + 2 : txn->wr.loose_refund_wl;
#endif /* MDBX_ENABLE_REFUND */ #endif /* MDBX_ENABLE_REFUND */
VALGRIND_MAKE_MEM_NOACCESS(page_data(mp), txn->env->ps - PAGEHDRSZ); VALGRIND_MAKE_MEM_NOACCESS(page_data(mp), txn->env->ps - PAGEHDRSZ);
MDBX_ASAN_POISON_MEMORY_REGION(page_data(mp), txn->env->ps - PAGEHDRSZ); MDBX_ASAN_POISON_MEMORY_REGION(page_data(mp), txn->env->ps - PAGEHDRSZ);
@ -608,8 +611,8 @@ status_done:
reclaim: reclaim:
DEBUG("reclaim %zu %s page %" PRIaPGNO, npages, "dirty", pgno); DEBUG("reclaim %zu %s page %" PRIaPGNO, npages, "dirty", pgno);
rc = pnl_insert_span(&txn->tw.repnl, pgno, npages); rc = pnl_insert_span(&txn->wr.repnl, pgno, npages);
tASSERT(txn, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND)); tASSERT(txn, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - MDBX_ENABLE_REFUND));
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
return rc; return rc;
} }
@ -660,10 +663,10 @@ status_done:
__hot int __must_check_result page_dirty(MDBX_txn *txn, page_t *mp, size_t npages) { __hot int __must_check_result page_dirty(MDBX_txn *txn, page_t *mp, size_t npages) {
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
mp->txnid = txn->front_txnid; mp->txnid = txn->front_txnid;
if (!txn->tw.dirtylist) { if (!txn->wr.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
txn->tw.writemap_dirty_npages += npages; txn->wr.writemap_dirty_npages += npages;
tASSERT(txn, txn->tw.spilled.list == nullptr); tASSERT(txn, txn->wr.spilled.list == nullptr);
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
@ -671,29 +674,29 @@ __hot int __must_check_result page_dirty(MDBX_txn *txn, page_t *mp, size_t npage
#if xMDBX_DEBUG_SPILLING == 2 #if xMDBX_DEBUG_SPILLING == 2
txn->env->debug_dirtied_act += 1; txn->env->debug_dirtied_act += 1;
ENSURE(txn->env, txn->env->debug_dirtied_act < txn->env->debug_dirtied_est); ENSURE(txn->env, txn->env->debug_dirtied_act < txn->env->debug_dirtied_est);
ENSURE(txn->env, txn->tw.dirtyroom + txn->tw.loose_count > 0); ENSURE(txn->env, txn->wr.dirtyroom + txn->wr.loose_count > 0);
#endif /* xMDBX_DEBUG_SPILLING == 2 */ #endif /* xMDBX_DEBUG_SPILLING == 2 */
int rc; int rc;
if (unlikely(txn->tw.dirtyroom == 0)) { if (unlikely(txn->wr.dirtyroom == 0)) {
if (txn->tw.loose_count) { if (txn->wr.loose_count) {
page_t *lp = txn->tw.loose_pages; page_t *lp = txn->wr.loose_pages;
DEBUG("purge-and-reclaim loose page %" PRIaPGNO, lp->pgno); DEBUG("purge-and-reclaim loose page %" PRIaPGNO, lp->pgno);
rc = pnl_insert_span(&txn->tw.repnl, lp->pgno, 1); rc = pnl_insert_span(&txn->wr.repnl, lp->pgno, 1);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto bailout; goto bailout;
size_t di = dpl_search(txn, lp->pgno); size_t di = dpl_search(txn, lp->pgno);
tASSERT(txn, txn->tw.dirtylist->items[di].ptr == lp); tASSERT(txn, txn->wr.dirtylist->items[di].ptr == lp);
dpl_remove(txn, di); dpl_remove(txn, di);
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *)); MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *));
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *)); VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
txn->tw.loose_pages = page_next(lp); txn->wr.loose_pages = page_next(lp);
txn->tw.loose_count--; txn->wr.loose_count--;
txn->tw.dirtyroom++; txn->wr.dirtyroom++;
if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP)) if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP))
page_shadow_release(txn->env, lp, 1); page_shadow_release(txn->env, lp, 1);
} else { } else {
ERROR("Dirtyroom is depleted, DPL length %zu", txn->tw.dirtylist->length); ERROR("Dirtyroom is depleted, DPL length %zu", txn->wr.dirtylist->length);
if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP)) if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP))
page_shadow_release(txn->env, mp, npages); page_shadow_release(txn->env, mp, npages);
return MDBX_TXN_FULL; return MDBX_TXN_FULL;
@ -706,7 +709,7 @@ __hot int __must_check_result page_dirty(MDBX_txn *txn, page_t *mp, size_t npage
txn->flags |= MDBX_TXN_ERROR; txn->flags |= MDBX_TXN_ERROR;
return rc; return rc;
} }
txn->tw.dirtyroom--; txn->wr.dirtyroom--;
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }

View File

@ -88,7 +88,7 @@ static inline int page_touch(MDBX_cursor *mc) {
} }
if (is_modifable(txn, mp)) { if (is_modifable(txn, mp)) {
if (!txn->tw.dirtylist) { if (!txn->wr.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) && !MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) && !MDBX_AVOID_MSYNC);
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
@ -114,14 +114,14 @@ static inline void page_wash(MDBX_txn *txn, size_t di, page_t *const mp, const s
mp->txnid = INVALID_TXNID; mp->txnid = INVALID_TXNID;
mp->flags = P_BAD; mp->flags = P_BAD;
if (txn->tw.dirtylist) { if (txn->wr.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
tASSERT(txn, MDBX_AVOID_MSYNC || (di && txn->tw.dirtylist->items[di].ptr == mp)); tASSERT(txn, MDBX_AVOID_MSYNC || (di && txn->wr.dirtylist->items[di].ptr == mp));
if (!MDBX_AVOID_MSYNC || di) { if (!MDBX_AVOID_MSYNC || di) {
dpl_remove_ex(txn, di, npages); dpl_remove_ex(txn, di, npages);
txn->tw.dirtyroom++; txn->wr.dirtyroom++;
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length == tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit)); (txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP)) { if (!MDBX_AVOID_MSYNC || !(txn->flags & MDBX_WRITEMAP)) {
page_shadow_release(txn->env, mp, npages); page_shadow_release(txn->env, mp, npages);
return; return;
@ -129,7 +129,7 @@ static inline void page_wash(MDBX_txn *txn, size_t di, page_t *const mp, const s
} }
} else { } else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) && !MDBX_AVOID_MSYNC && !di); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) && !MDBX_AVOID_MSYNC && !di);
txn->tw.writemap_dirty_npages -= (txn->tw.writemap_dirty_npages > npages) ? npages : txn->tw.writemap_dirty_npages; txn->wr.writemap_dirty_npages -= (txn->wr.writemap_dirty_npages > npages) ? npages : txn->wr.writemap_dirty_npages;
} }
VALGRIND_MAKE_MEM_UNDEFINED(mp, PAGEHDRSZ); VALGRIND_MAKE_MEM_UNDEFINED(mp, PAGEHDRSZ);
VALGRIND_MAKE_MEM_NOACCESS(page_data(mp), pgno2bytes(txn->env, npages) - PAGEHDRSZ); VALGRIND_MAKE_MEM_NOACCESS(page_data(mp), pgno2bytes(txn->env, npages) - PAGEHDRSZ);

View File

@ -23,12 +23,19 @@ void pnl_free(pnl_t pnl) {
osal_free(pnl - 1); osal_free(pnl - 1);
} }
pnl_t pnl_clone(const pnl_t src) {
pnl_t pl = pnl_alloc(pnl_alloclen(src));
if (likely(pl))
memcpy(pl, src, MDBX_PNL_SIZEOF(src));
return pl;
}
void pnl_shrink(pnl_t __restrict *__restrict ppnl) { void pnl_shrink(pnl_t __restrict *__restrict ppnl) {
assert(pnl_bytes2size(pnl_size2bytes(MDBX_PNL_INITIAL)) >= MDBX_PNL_INITIAL && assert(pnl_bytes2size(pnl_size2bytes(MDBX_PNL_INITIAL)) >= MDBX_PNL_INITIAL &&
pnl_bytes2size(pnl_size2bytes(MDBX_PNL_INITIAL)) < MDBX_PNL_INITIAL * 3 / 2); pnl_bytes2size(pnl_size2bytes(MDBX_PNL_INITIAL)) < MDBX_PNL_INITIAL * 3 / 2);
assert(MDBX_PNL_GETSIZE(*ppnl) <= PAGELIST_LIMIT && MDBX_PNL_ALLOCLEN(*ppnl) >= MDBX_PNL_GETSIZE(*ppnl)); assert(pnl_size(*ppnl) <= PAGELIST_LIMIT && pnl_alloclen(*ppnl) >= pnl_size(*ppnl));
MDBX_PNL_SETSIZE(*ppnl, 0); pnl_setsize(*ppnl, 0);
if (unlikely(MDBX_PNL_ALLOCLEN(*ppnl) > if (unlikely(pnl_alloclen(*ppnl) >
MDBX_PNL_INITIAL * (MDBX_PNL_PREALLOC_FOR_RADIXSORT ? 8 : 4) - MDBX_CACHELINE_SIZE / sizeof(pgno_t))) { MDBX_PNL_INITIAL * (MDBX_PNL_PREALLOC_FOR_RADIXSORT ? 8 : 4) - MDBX_CACHELINE_SIZE / sizeof(pgno_t))) {
size_t bytes = pnl_size2bytes(MDBX_PNL_INITIAL * 2); size_t bytes = pnl_size2bytes(MDBX_PNL_INITIAL * 2);
pnl_t pnl = osal_realloc(*ppnl - 1, bytes); pnl_t pnl = osal_realloc(*ppnl - 1, bytes);
@ -43,9 +50,9 @@ void pnl_shrink(pnl_t __restrict *__restrict ppnl) {
} }
int pnl_reserve(pnl_t __restrict *__restrict ppnl, const size_t wanna) { int pnl_reserve(pnl_t __restrict *__restrict ppnl, const size_t wanna) {
const size_t allocated = MDBX_PNL_ALLOCLEN(*ppnl); const size_t allocated = pnl_alloclen(*ppnl);
assert(MDBX_PNL_GETSIZE(*ppnl) <= PAGELIST_LIMIT && MDBX_PNL_ALLOCLEN(*ppnl) >= MDBX_PNL_GETSIZE(*ppnl)); assert(pnl_size(*ppnl) <= PAGELIST_LIMIT && pnl_alloclen(*ppnl) >= pnl_size(*ppnl));
if (likely(allocated >= wanna)) if (unlikely(allocated >= wanna))
return MDBX_SUCCESS; return MDBX_SUCCESS;
if (unlikely(wanna > /* paranoia */ PAGELIST_LIMIT)) { if (unlikely(wanna > /* paranoia */ PAGELIST_LIMIT)) {
@ -82,15 +89,15 @@ static __always_inline int __must_check_result pnl_append_stepped(unsigned step,
} }
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
size_t w = MDBX_PNL_GETSIZE(pnl); size_t w = pnl_size(pnl);
do { do {
pnl[++w] = pgno; pnl[++w] = pgno;
pgno += step; pgno += step;
} while (--n); } while (--n);
MDBX_PNL_SETSIZE(pnl, w); pnl_setsize(pnl, w);
#else #else
size_t w = MDBX_PNL_GETSIZE(pnl) + n; size_t w = pnl_size(pnl) + n;
MDBX_PNL_SETSIZE(pnl, w); pnl_setsize(pnl, w);
do { do {
pnl[w--] = pgno; pnl[w--] = pgno;
pgno += step; pgno += step;
@ -114,8 +121,8 @@ __hot int __must_check_result pnl_insert_span(__restrict pnl_t *ppnl, pgno_t pgn
return rc; return rc;
const pnl_t pnl = *ppnl; const pnl_t pnl = *ppnl;
size_t r = MDBX_PNL_GETSIZE(pnl), w = r + n; size_t r = pnl_size(pnl), w = r + n;
MDBX_PNL_SETSIZE(pnl, w); pnl_setsize(pnl, w);
while (r && MDBX_PNL_DISORDERED(pnl[r], pgno)) while (r && MDBX_PNL_DISORDERED(pnl[r], pgno))
pnl[w--] = pnl[r--]; pnl[w--] = pnl[r--];
@ -127,15 +134,15 @@ __hot int __must_check_result pnl_insert_span(__restrict pnl_t *ppnl, pgno_t pgn
__hot __noinline bool pnl_check(const const_pnl_t pnl, const size_t limit) { __hot __noinline bool pnl_check(const const_pnl_t pnl, const size_t limit) {
assert(limit >= MIN_PAGENO - MDBX_ENABLE_REFUND); assert(limit >= MIN_PAGENO - MDBX_ENABLE_REFUND);
if (likely(MDBX_PNL_GETSIZE(pnl))) { if (likely(pnl_size(pnl))) {
if (unlikely(MDBX_PNL_GETSIZE(pnl) > PAGELIST_LIMIT)) if (unlikely(pnl_size(pnl) > PAGELIST_LIMIT))
return false; return false;
if (unlikely(MDBX_PNL_LEAST(pnl) < MIN_PAGENO)) if (unlikely(MDBX_PNL_LEAST(pnl) < MIN_PAGENO))
return false; return false;
if (unlikely(MDBX_PNL_MOST(pnl) >= limit)) if (unlikely(MDBX_PNL_MOST(pnl) >= limit))
return false; return false;
if ((!MDBX_DISABLE_VALIDATION || AUDIT_ENABLED()) && likely(MDBX_PNL_GETSIZE(pnl) > 1)) { if ((!MDBX_DISABLE_VALIDATION || AUDIT_ENABLED()) && likely(pnl_size(pnl) > 1)) {
const pgno_t *scan = MDBX_PNL_BEGIN(pnl); const pgno_t *scan = MDBX_PNL_BEGIN(pnl);
const pgno_t *const end = MDBX_PNL_END(pnl); const pgno_t *const end = MDBX_PNL_END(pnl);
pgno_t prev = *scan++; pgno_t prev = *scan++;
@ -182,10 +189,10 @@ static __always_inline void pnl_merge_inner(pgno_t *__restrict dst, const pgno_t
__hot size_t pnl_merge(pnl_t dst, const pnl_t src) { __hot size_t pnl_merge(pnl_t dst, const pnl_t src) {
assert(pnl_check_allocated(dst, MAX_PAGENO + 1)); assert(pnl_check_allocated(dst, MAX_PAGENO + 1));
assert(pnl_check(src, MAX_PAGENO + 1)); assert(pnl_check(src, MAX_PAGENO + 1));
const size_t src_len = MDBX_PNL_GETSIZE(src); const size_t src_len = pnl_size(src);
const size_t dst_len = MDBX_PNL_GETSIZE(dst); const size_t dst_len = pnl_size(dst);
size_t total = dst_len; size_t total = dst_len;
assert(MDBX_PNL_ALLOCLEN(dst) >= total); assert(pnl_alloclen(dst) >= total);
if (likely(src_len > 0)) { if (likely(src_len > 0)) {
total += src_len; total += src_len;
if (!MDBX_DEBUG && total < (MDBX_HAVE_CMOV ? 21 : 12)) if (!MDBX_DEBUG && total < (MDBX_HAVE_CMOV ? 21 : 12))
@ -200,7 +207,7 @@ __hot size_t pnl_merge(pnl_t dst, const pnl_t src) {
dst[0] = /* the detent */ (MDBX_PNL_ASCENDING ? 0 : P_INVALID); dst[0] = /* the detent */ (MDBX_PNL_ASCENDING ? 0 : P_INVALID);
pnl_merge_inner(dst + total, dst + dst_len, src + src_len, src); pnl_merge_inner(dst + total, dst + dst_len, src + src_len, src);
} }
MDBX_PNL_SETSIZE(dst, total); pnl_setsize(dst, total);
} }
assert(pnl_check_allocated(dst, MAX_PAGENO + 1)); assert(pnl_check_allocated(dst, MAX_PAGENO + 1));
return total; return total;
@ -216,8 +223,8 @@ RADIXSORT_IMPL(pgno, pgno_t, MDBX_PNL_EXTRACT_KEY, MDBX_PNL_PREALLOC_FOR_RADIXSO
SORT_IMPL(pgno_sort, false, pgno_t, MDBX_PNL_ORDERED) SORT_IMPL(pgno_sort, false, pgno_t, MDBX_PNL_ORDERED)
__hot __noinline void pnl_sort_nochk(pnl_t pnl) { __hot __noinline void pnl_sort_nochk(pnl_t pnl) {
if (likely(MDBX_PNL_GETSIZE(pnl) < MDBX_RADIXSORT_THRESHOLD) || if (likely(pnl_size(pnl) < MDBX_RADIXSORT_THRESHOLD) ||
unlikely(!pgno_radixsort(&MDBX_PNL_FIRST(pnl), MDBX_PNL_GETSIZE(pnl)))) unlikely(!pgno_radixsort(&MDBX_PNL_FIRST(pnl), pnl_size(pnl))))
pgno_sort(MDBX_PNL_BEGIN(pnl), MDBX_PNL_END(pnl)); pgno_sort(MDBX_PNL_BEGIN(pnl), MDBX_PNL_END(pnl));
} }
@ -225,8 +232,8 @@ SEARCH_IMPL(pgno_bsearch, pgno_t, pgno_t, MDBX_PNL_ORDERED)
__hot __noinline size_t pnl_search_nochk(const pnl_t pnl, pgno_t pgno) { __hot __noinline size_t pnl_search_nochk(const pnl_t pnl, pgno_t pgno) {
const pgno_t *begin = MDBX_PNL_BEGIN(pnl); const pgno_t *begin = MDBX_PNL_BEGIN(pnl);
const pgno_t *it = pgno_bsearch(begin, MDBX_PNL_GETSIZE(pnl), pgno); const pgno_t *it = pgno_bsearch(begin, pnl_size(pnl), pgno);
const pgno_t *end = begin + MDBX_PNL_GETSIZE(pnl); const pgno_t *end = begin + pnl_size(pnl);
assert(it >= begin && it <= end); assert(it >= begin && it <= end);
if (it != begin) if (it != begin)
assert(MDBX_PNL_ORDERED(it[-1], pgno)); assert(MDBX_PNL_ORDERED(it[-1], pgno));
@ -234,3 +241,18 @@ __hot __noinline size_t pnl_search_nochk(const pnl_t pnl, pgno_t pgno) {
assert(!MDBX_PNL_ORDERED(it[0], pgno)); assert(!MDBX_PNL_ORDERED(it[0], pgno));
return it - begin + 1; return it - begin + 1;
} }
size_t pnl_maxspan(const pnl_t pnl) {
size_t len = pnl_size(pnl);
if (len > 1) {
size_t span = 1, left = len - span;
const pgno_t *scan = MDBX_PNL_BEGIN(pnl);
do {
const bool contiguous = MDBX_PNL_CONTIGUOUS(*scan, scan[span], span);
span += contiguous;
scan += 1 - contiguous;
} while (--left);
len = span;
}
return len;
}

View File

@ -28,48 +28,36 @@ typedef const pgno_t *const_pnl_t;
#define MDBX_PNL_GRANULATE (1 << MDBX_PNL_GRANULATE_LOG2) #define MDBX_PNL_GRANULATE (1 << MDBX_PNL_GRANULATE_LOG2)
#define MDBX_PNL_INITIAL (MDBX_PNL_GRANULATE - 2 - MDBX_ASSUME_MALLOC_OVERHEAD / sizeof(pgno_t)) #define MDBX_PNL_INITIAL (MDBX_PNL_GRANULATE - 2 - MDBX_ASSUME_MALLOC_OVERHEAD / sizeof(pgno_t))
#define MDBX_PNL_ALLOCLEN(pl) ((pl)[-1]) MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION static inline size_t pnl_alloclen(const_pnl_t pnl) { return pnl[-1]; }
#define MDBX_PNL_GETSIZE(pl) ((size_t)((pl)[0]))
#define MDBX_PNL_SETSIZE(pl, size) \ MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION static inline size_t pnl_size(const_pnl_t pnl) { return pnl[0]; }
do { \
const size_t __size = size; \ MDBX_MAYBE_UNUSED static inline void pnl_setsize(pnl_t pnl, size_t len) {
assert(__size < INT_MAX); \ assert(len < INT_MAX);
(pl)[0] = (pgno_t)__size; \ pnl[0] = (pgno_t)len;
} while (0) }
#define MDBX_PNL_FIRST(pl) ((pl)[1]) #define MDBX_PNL_FIRST(pl) ((pl)[1])
#define MDBX_PNL_LAST(pl) ((pl)[MDBX_PNL_GETSIZE(pl)]) #define MDBX_PNL_LAST(pl) ((pl)[pnl_size(pl)])
#define MDBX_PNL_BEGIN(pl) (&(pl)[1]) #define MDBX_PNL_BEGIN(pl) (&(pl)[1])
#define MDBX_PNL_END(pl) (&(pl)[MDBX_PNL_GETSIZE(pl) + 1]) #define MDBX_PNL_END(pl) (&(pl)[pnl_size(pl) + 1])
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
#define MDBX_PNL_EDGE(pl) ((pl) + 1) #define MDBX_PNL_EDGE(pl) ((pl) + 1)
#define MDBX_PNL_LEAST(pl) MDBX_PNL_FIRST(pl) #define MDBX_PNL_LEAST(pl) MDBX_PNL_FIRST(pl)
#define MDBX_PNL_MOST(pl) MDBX_PNL_LAST(pl) #define MDBX_PNL_MOST(pl) MDBX_PNL_LAST(pl)
#define MDBX_PNL_CONTIGUOUS(prev, next, span) ((next) - (prev)) == (span))
#else #else
#define MDBX_PNL_EDGE(pl) ((pl) + MDBX_PNL_GETSIZE(pl)) #define MDBX_PNL_EDGE(pl) ((pl) + pnl_size(pl))
#define MDBX_PNL_LEAST(pl) MDBX_PNL_LAST(pl) #define MDBX_PNL_LEAST(pl) MDBX_PNL_LAST(pl)
#define MDBX_PNL_MOST(pl) MDBX_PNL_FIRST(pl) #define MDBX_PNL_MOST(pl) MDBX_PNL_FIRST(pl)
#define MDBX_PNL_CONTIGUOUS(prev, next, span) (((prev) - (next)) == (span))
#endif #endif
#define MDBX_PNL_SIZEOF(pl) ((MDBX_PNL_GETSIZE(pl) + 1) * sizeof(pgno_t)) #define MDBX_PNL_SIZEOF(pl) ((pnl_size(pl) + 1) * sizeof(pgno_t))
#define MDBX_PNL_IS_EMPTY(pl) (MDBX_PNL_GETSIZE(pl) == 0) #define MDBX_PNL_IS_EMPTY(pl) (pnl_size(pl) == 0)
MDBX_MAYBE_UNUSED static inline size_t pnl_size2bytes(size_t size) { MDBX_NOTHROW_PURE_FUNCTION MDBX_MAYBE_UNUSED static inline pgno_t pnl_bytes2size(const size_t bytes) {
assert(size > 0 && size <= PAGELIST_LIMIT);
#if MDBX_PNL_PREALLOC_FOR_RADIXSORT
size += size;
#endif /* MDBX_PNL_PREALLOC_FOR_RADIXSORT */
STATIC_ASSERT(MDBX_ASSUME_MALLOC_OVERHEAD +
(PAGELIST_LIMIT * (MDBX_PNL_PREALLOC_FOR_RADIXSORT + 1) + MDBX_PNL_GRANULATE + 3) * sizeof(pgno_t) <
SIZE_MAX / 4 * 3);
size_t bytes =
ceil_powerof2(MDBX_ASSUME_MALLOC_OVERHEAD + sizeof(pgno_t) * (size + 3), MDBX_PNL_GRANULATE * sizeof(pgno_t)) -
MDBX_ASSUME_MALLOC_OVERHEAD;
return bytes;
}
MDBX_MAYBE_UNUSED static inline pgno_t pnl_bytes2size(const size_t bytes) {
size_t size = bytes / sizeof(pgno_t); size_t size = bytes / sizeof(pgno_t);
assert(size > 3 && size <= PAGELIST_LIMIT + /* alignment gap */ 65536); assert(size > 3 && size <= PAGELIST_LIMIT + /* alignment gap */ 65536);
size -= 3; size -= 3;
@ -79,29 +67,54 @@ MDBX_MAYBE_UNUSED static inline pgno_t pnl_bytes2size(const size_t bytes) {
return (pgno_t)size; return (pgno_t)size;
} }
MDBX_NOTHROW_PURE_FUNCTION MDBX_MAYBE_UNUSED static inline size_t pnl_size2bytes(size_t wanna_size) {
size_t size = wanna_size;
assert(size > 0 && size <= PAGELIST_LIMIT);
#if MDBX_PNL_PREALLOC_FOR_RADIXSORT
size += size;
#endif /* MDBX_PNL_PREALLOC_FOR_RADIXSORT */
STATIC_ASSERT(MDBX_ASSUME_MALLOC_OVERHEAD +
(PAGELIST_LIMIT * (MDBX_PNL_PREALLOC_FOR_RADIXSORT + 1) + MDBX_PNL_GRANULATE + 3) * sizeof(pgno_t) <
SIZE_MAX / 4 * 3);
size_t bytes =
ceil_powerof2(MDBX_ASSUME_MALLOC_OVERHEAD + sizeof(pgno_t) * (size + 3), MDBX_PNL_GRANULATE * sizeof(pgno_t)) -
MDBX_ASSUME_MALLOC_OVERHEAD;
assert(pnl_bytes2size(bytes) >= wanna_size);
return bytes;
}
MDBX_INTERNAL pnl_t pnl_alloc(size_t size); MDBX_INTERNAL pnl_t pnl_alloc(size_t size);
MDBX_INTERNAL void pnl_free(pnl_t pnl); MDBX_INTERNAL void pnl_free(pnl_t pnl);
MDBX_MAYBE_UNUSED MDBX_INTERNAL pnl_t pnl_clone(const pnl_t src);
MDBX_INTERNAL int pnl_reserve(pnl_t __restrict *__restrict ppnl, const size_t wanna); MDBX_INTERNAL int pnl_reserve(pnl_t __restrict *__restrict ppnl, const size_t wanna);
MDBX_MAYBE_UNUSED static inline int __must_check_result pnl_need(pnl_t __restrict *__restrict ppnl, size_t num) { MDBX_MAYBE_UNUSED static inline int __must_check_result pnl_need(pnl_t __restrict *__restrict ppnl, size_t num) {
assert(MDBX_PNL_GETSIZE(*ppnl) <= PAGELIST_LIMIT && MDBX_PNL_ALLOCLEN(*ppnl) >= MDBX_PNL_GETSIZE(*ppnl)); assert(pnl_size(*ppnl) <= PAGELIST_LIMIT && pnl_alloclen(*ppnl) >= pnl_size(*ppnl));
assert(num <= PAGELIST_LIMIT); assert(num <= PAGELIST_LIMIT);
const size_t wanna = MDBX_PNL_GETSIZE(*ppnl) + num; const size_t wanna = pnl_size(*ppnl) + num;
return likely(MDBX_PNL_ALLOCLEN(*ppnl) >= wanna) ? MDBX_SUCCESS : pnl_reserve(ppnl, wanna); return likely(pnl_alloclen(*ppnl) >= wanna) ? MDBX_SUCCESS : pnl_reserve(ppnl, wanna);
} }
MDBX_MAYBE_UNUSED static inline void pnl_append_prereserved(__restrict pnl_t pnl, pgno_t pgno) { MDBX_MAYBE_UNUSED static inline void pnl_append_prereserved(__restrict pnl_t pnl, pgno_t pgno) {
assert(MDBX_PNL_GETSIZE(pnl) < MDBX_PNL_ALLOCLEN(pnl)); assert(pnl_size(pnl) < pnl_alloclen(pnl));
if (AUDIT_ENABLED()) { if (AUDIT_ENABLED()) {
for (size_t i = MDBX_PNL_GETSIZE(pnl); i > 0; --i) for (size_t i = pnl_size(pnl); i > 0; --i)
assert(pgno != pnl[i]); assert(pgno != pnl[i]);
} }
*pnl += 1; *pnl += 1;
MDBX_PNL_LAST(pnl) = pgno; MDBX_PNL_LAST(pnl) = pgno;
} }
MDBX_MAYBE_UNUSED static inline int __must_check_result pnl_append(__restrict pnl_t *ppnl, pgno_t pgno) {
int rc = pnl_need(ppnl, 1);
if (likely(rc == MDBX_SUCCESS))
pnl_append_prereserved(*ppnl, pgno);
return rc;
}
MDBX_INTERNAL void pnl_shrink(pnl_t __restrict *__restrict ppnl); MDBX_INTERNAL void pnl_shrink(pnl_t __restrict *__restrict ppnl);
MDBX_INTERNAL int __must_check_result spill_append_span(__restrict pnl_t *ppnl, pgno_t pgno, size_t n); MDBX_INTERNAL int __must_check_result spill_append_span(__restrict pnl_t *ppnl, pgno_t pgno, size_t n);
@ -110,14 +123,14 @@ MDBX_INTERNAL int __must_check_result pnl_append_span(__restrict pnl_t *ppnl, pg
MDBX_INTERNAL int __must_check_result pnl_insert_span(__restrict pnl_t *ppnl, pgno_t pgno, size_t n); MDBX_INTERNAL int __must_check_result pnl_insert_span(__restrict pnl_t *ppnl, pgno_t pgno, size_t n);
MDBX_INTERNAL size_t pnl_search_nochk(const pnl_t pnl, pgno_t pgno); MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL size_t pnl_search_nochk(const pnl_t pnl, pgno_t pgno);
MDBX_INTERNAL void pnl_sort_nochk(pnl_t pnl); MDBX_INTERNAL void pnl_sort_nochk(pnl_t pnl);
MDBX_INTERNAL bool pnl_check(const const_pnl_t pnl, const size_t limit); MDBX_INTERNAL bool pnl_check(const const_pnl_t pnl, const size_t limit);
MDBX_MAYBE_UNUSED static inline bool pnl_check_allocated(const const_pnl_t pnl, const size_t limit) { MDBX_MAYBE_UNUSED static inline bool pnl_check_allocated(const const_pnl_t pnl, const size_t limit) {
return pnl == nullptr || (MDBX_PNL_ALLOCLEN(pnl) >= MDBX_PNL_GETSIZE(pnl) && pnl_check(pnl, limit)); return pnl == nullptr || (pnl_alloclen(pnl) >= pnl_size(pnl) && pnl_check(pnl, limit));
} }
MDBX_MAYBE_UNUSED static inline void pnl_sort(pnl_t pnl, size_t limit4check) { MDBX_MAYBE_UNUSED static inline void pnl_sort(pnl_t pnl, size_t limit4check) {
@ -126,7 +139,8 @@ MDBX_MAYBE_UNUSED static inline void pnl_sort(pnl_t pnl, size_t limit4check) {
(void)limit4check; (void)limit4check;
} }
MDBX_MAYBE_UNUSED static inline size_t pnl_search(const pnl_t pnl, pgno_t pgno, size_t limit) { MDBX_NOTHROW_PURE_FUNCTION MDBX_MAYBE_UNUSED static inline size_t pnl_search(const pnl_t pnl, pgno_t pgno,
size_t limit) {
assert(pnl_check_allocated(pnl, limit)); assert(pnl_check_allocated(pnl, limit));
if (MDBX_HAVE_CMOV) { if (MDBX_HAVE_CMOV) {
/* cmov-ускоренный бинарный поиск может читать (но не использовать) один /* cmov-ускоренный бинарный поиск может читать (но не использовать) один
@ -144,3 +158,5 @@ MDBX_MAYBE_UNUSED static inline size_t pnl_search(const pnl_t pnl, pgno_t pgno,
} }
MDBX_INTERNAL size_t pnl_merge(pnl_t dst, const pnl_t src); MDBX_INTERNAL size_t pnl_merge(pnl_t dst, const pnl_t src);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL size_t pnl_maxspan(const pnl_t pnl);

View File

@ -122,6 +122,8 @@
#pragma warning(disable : 6235) /* <expression> is always a constant */ #pragma warning(disable : 6235) /* <expression> is always a constant */
#pragma warning(disable : 6237) /* <expression> is never evaluated and might \ #pragma warning(disable : 6237) /* <expression> is never evaluated and might \
have side effects */ have side effects */
#pragma warning(disable : 5286) /* implicit conversion from enum type 'type 1' to enum type 'type 2' */
#pragma warning(disable : 5287) /* operands are different enum types 'type 1' and 'type 2' */
#endif #endif
#pragma warning(disable : 4710) /* 'xyz': function not inlined */ #pragma warning(disable : 4710) /* 'xyz': function not inlined */
#pragma warning(disable : 4711) /* function 'xyz' selected for automatic \ #pragma warning(disable : 4711) /* function 'xyz' selected for automatic \
@ -433,11 +435,6 @@ __extern_C key_t ftok(const char *, int);
#if __ANDROID_API__ >= 21 #if __ANDROID_API__ >= 21
#include <sys/sendfile.h> #include <sys/sendfile.h>
#endif #endif
#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS != MDBX_WORDBITS
#error "_FILE_OFFSET_BITS != MDBX_WORDBITS" (_FILE_OFFSET_BITS != MDBX_WORDBITS)
#elif defined(__FILE_OFFSET_BITS) && __FILE_OFFSET_BITS != MDBX_WORDBITS
#error "__FILE_OFFSET_BITS != MDBX_WORDBITS" (__FILE_OFFSET_BITS != MDBX_WORDBITS)
#endif
#endif /* Android */ #endif /* Android */
#if defined(HAVE_SYS_STAT_H) || __has_include(<sys/stat.h>) #if defined(HAVE_SYS_STAT_H) || __has_include(<sys/stat.h>)
@ -522,6 +519,14 @@ __extern_C key_t ftok(const char *, int);
#endif #endif
#endif /* __BYTE_ORDER__ || __ORDER_LITTLE_ENDIAN__ || __ORDER_BIG_ENDIAN__ */ #endif /* __BYTE_ORDER__ || __ORDER_LITTLE_ENDIAN__ || __ORDER_BIG_ENDIAN__ */
#if UINTPTR_MAX > 0xffffFFFFul || ULONG_MAX > 0xffffFFFFul || defined(_WIN64)
#define MDBX_WORDBITS 64
#define MDBX_WORDBITS_LN2 6
#else
#define MDBX_WORDBITS 32
#define MDBX_WORDBITS_LN2 5
#endif /* MDBX_WORDBITS */
/*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/
/* Availability of CMOV or equivalent */ /* Availability of CMOV or equivalent */

View File

@ -15,9 +15,8 @@ MDBX_INTERNAL bsr_t mvcc_bind_slot(MDBX_env *env);
MDBX_MAYBE_UNUSED MDBX_INTERNAL pgno_t mvcc_largest_this(MDBX_env *env, pgno_t largest); MDBX_MAYBE_UNUSED MDBX_INTERNAL pgno_t mvcc_largest_this(MDBX_env *env, pgno_t largest);
MDBX_INTERNAL txnid_t mvcc_shapshot_oldest(MDBX_env *const env, const txnid_t steady); MDBX_INTERNAL txnid_t mvcc_shapshot_oldest(MDBX_env *const env, const txnid_t steady);
MDBX_INTERNAL pgno_t mvcc_snapshot_largest(const MDBX_env *env, pgno_t last_used_page); MDBX_INTERNAL pgno_t mvcc_snapshot_largest(const MDBX_env *env, pgno_t last_used_page);
MDBX_INTERNAL txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t straggler);
MDBX_INTERNAL int mvcc_cleanup_dead(MDBX_env *env, int rlocked, int *dead); MDBX_INTERNAL int mvcc_cleanup_dead(MDBX_env *env, int rlocked, int *dead);
MDBX_INTERNAL txnid_t mvcc_kick_laggards(MDBX_env *env, const txnid_t laggard); MDBX_INTERNAL bool mvcc_kick_laggards(MDBX_env *env, const txnid_t laggard);
/* dxb.c */ /* dxb.c */
MDBX_INTERNAL int dxb_setup(MDBX_env *env, const int lck_rc, const mdbx_mode_t mode_bits); MDBX_INTERNAL int dxb_setup(MDBX_env *env, const int lck_rc, const mdbx_mode_t mode_bits);
@ -39,37 +38,53 @@ static inline void dxb_sanitize_tail(MDBX_env *env, MDBX_txn *txn) {
#endif /* ENABLE_MEMCHECK || __SANITIZE_ADDRESS__ */ #endif /* ENABLE_MEMCHECK || __SANITIZE_ADDRESS__ */
/* txn.c */ /* txn.c */
MDBX_INTERNAL bool txn_refund(MDBX_txn *txn);
MDBX_INTERNAL txnid_t txn_snapshot_oldest(const MDBX_txn *const txn);
MDBX_INTERNAL int txn_abort(MDBX_txn *txn);
MDBX_INTERNAL int txn_renew(MDBX_txn *txn, unsigned flags);
MDBX_INTERNAL int txn_park(MDBX_txn *txn, bool autounpark);
MDBX_INTERNAL int txn_unpark(MDBX_txn *txn);
MDBX_INTERNAL int txn_check_badbits_parked(const MDBX_txn *txn, int bad_bits);
MDBX_INTERNAL void txn_done_cursors(MDBX_txn *txn, const bool merge);
#define TXN_END_NAMES \ #define TXN_END_NAMES \
{"committed", "empty-commit", "abort", "reset", "fail-begin", "fail-beginchild", "ousted", nullptr} {"committed", "pure-commit", "abort", "reset", "fail-begin", "fail-begin-nested", "ousted", nullptr}
enum { enum {
/* txn_end operation number, for logging */ /* txn_end operation number, for logging */
TXN_END_COMMITTED, TXN_END_COMMITTED /* 0 */,
TXN_END_PURE_COMMIT, TXN_END_PURE_COMMIT /* 1 */,
TXN_END_ABORT, TXN_END_ABORT /* 2 */,
TXN_END_RESET, TXN_END_RESET /* 3 */,
TXN_END_FAIL_BEGIN, TXN_END_FAIL_BEGIN /* 4 */,
TXN_END_FAIL_BEGINCHILD, TXN_END_FAIL_BEGIN_NESTED /* 5 */,
TXN_END_OUSTED, TXN_END_OUSTED /* 6 */,
TXN_END_OPMASK = 0x07 /* mask for txn_end() operation number */, TXN_END_OPMASK = 0x07 /* mask for txn_end() operation number */,
TXN_END_UPDATE = 0x10 /* update env state (DBIs) */, TXN_END_UPDATE = 0x10 /* update env state (DBIs) */,
TXN_END_FREE = 0x20 /* free txn unless it is env.basal_txn */, TXN_END_FREE = 0x20 /* free txn unless it is env.basal_txn */,
TXN_END_EOTDONE = 0x40 /* txn's cursors already closed */, TXN_END_SLOT = 0x40 /* release any reader slot if NOSTICKYTHREADS */
TXN_END_SLOT = 0x80 /* release any reader slot if NOSTICKYTHREADS */
}; };
struct commit_timestamp {
uint64_t start, prep, gc, audit, write, sync, gc_cpu;
};
MDBX_INTERNAL bool txn_refund(MDBX_txn *txn);
MDBX_INTERNAL bool txn_gc_detent(const MDBX_txn *const txn);
MDBX_INTERNAL int txn_check_badbits_parked(const MDBX_txn *txn, int bad_bits);
MDBX_INTERNAL void txn_done_cursors(MDBX_txn *txn);
MDBX_INTERNAL int txn_shadow_cursors(const MDBX_txn *parent, const size_t dbi);
MDBX_INTERNAL MDBX_txn *txn_alloc(const MDBX_txn_flags_t flags, MDBX_env *env);
MDBX_INTERNAL int txn_abort(MDBX_txn *txn);
MDBX_INTERNAL int txn_renew(MDBX_txn *txn, unsigned flags);
MDBX_INTERNAL int txn_end(MDBX_txn *txn, unsigned mode); MDBX_INTERNAL int txn_end(MDBX_txn *txn, unsigned mode);
MDBX_INTERNAL int txn_write(MDBX_txn *txn, iov_ctx_t *ctx);
MDBX_INTERNAL void txn_take_gcprof(const MDBX_txn *txn, MDBX_commit_latency *latency); MDBX_INTERNAL int txn_nested_create(MDBX_txn *parent, const MDBX_txn_flags_t flags);
MDBX_INTERNAL void txn_merge(MDBX_txn *const parent, MDBX_txn *const txn, const size_t parent_retired_len); MDBX_INTERNAL void txn_nested_abort(MDBX_txn *nested);
MDBX_INTERNAL int txn_nested_join(MDBX_txn *txn, struct commit_timestamp *ts);
MDBX_INTERNAL MDBX_txn *txn_basal_create(const size_t max_dbi);
MDBX_INTERNAL void txn_basal_destroy(MDBX_txn *txn);
MDBX_INTERNAL int txn_basal_start(MDBX_txn *txn, unsigned flags);
MDBX_INTERNAL int txn_basal_commit(MDBX_txn *txn, struct commit_timestamp *ts);
MDBX_INTERNAL int txn_basal_end(MDBX_txn *txn, unsigned mode);
MDBX_INTERNAL int txn_ro_park(MDBX_txn *txn, bool autounpark);
MDBX_INTERNAL int txn_ro_unpark(MDBX_txn *txn);
MDBX_INTERNAL int txn_ro_start(MDBX_txn *txn, unsigned flags);
MDBX_INTERNAL int txn_ro_end(MDBX_txn *txn, unsigned mode);
/* env.c */ /* env.c */
MDBX_INTERNAL int env_open(MDBX_env *env, mdbx_mode_t mode); MDBX_INTERNAL int env_open(MDBX_env *env, mdbx_mode_t mode);

View File

@ -7,37 +7,37 @@
static void refund_reclaimed(MDBX_txn *txn) { static void refund_reclaimed(MDBX_txn *txn) {
/* Scanning in descend order */ /* Scanning in descend order */
pgno_t first_unallocated = txn->geo.first_unallocated; pgno_t first_unallocated = txn->geo.first_unallocated;
const pnl_t pnl = txn->tw.repnl; const pnl_t pnl = txn->wr.repnl;
tASSERT(txn, MDBX_PNL_GETSIZE(pnl) && MDBX_PNL_MOST(pnl) == first_unallocated - 1); tASSERT(txn, pnl_size(pnl) && MDBX_PNL_MOST(pnl) == first_unallocated - 1);
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
size_t i = MDBX_PNL_GETSIZE(pnl); size_t i = pnl_size(pnl);
tASSERT(txn, pnl[i] == first_unallocated - 1); tASSERT(txn, pnl[i] == first_unallocated - 1);
while (--first_unallocated, --i > 0 && pnl[i] == first_unallocated - 1) while (--first_unallocated, --i > 0 && pnl[i] == first_unallocated - 1)
; ;
MDBX_PNL_SETSIZE(pnl, i); pnl_setsize(pnl, i);
#else #else
size_t i = 1; size_t i = 1;
tASSERT(txn, pnl[i] == first_unallocated - 1); tASSERT(txn, pnl[i] == first_unallocated - 1);
size_t len = MDBX_PNL_GETSIZE(pnl); size_t len = pnl_size(pnl);
while (--first_unallocated, ++i <= len && pnl[i] == first_unallocated - 1) while (--first_unallocated, ++i <= len && pnl[i] == first_unallocated - 1)
; ;
MDBX_PNL_SETSIZE(pnl, len -= i - 1); pnl_setsize(pnl, len -= i - 1);
for (size_t move = 0; move < len; ++move) for (size_t move = 0; move < len; ++move)
pnl[1 + move] = pnl[i + move]; pnl[1 + move] = pnl[i + move];
#endif #endif
VERBOSE("refunded %" PRIaPGNO " pages: %" PRIaPGNO " -> %" PRIaPGNO, txn->geo.first_unallocated - first_unallocated, VERBOSE("refunded %" PRIaPGNO " pages: %" PRIaPGNO " -> %" PRIaPGNO, txn->geo.first_unallocated - first_unallocated,
txn->geo.first_unallocated, first_unallocated); txn->geo.first_unallocated, first_unallocated);
txn->geo.first_unallocated = first_unallocated; txn->geo.first_unallocated = first_unallocated;
tASSERT(txn, pnl_check_allocated(txn->tw.repnl, txn->geo.first_unallocated - 1)); tASSERT(txn, pnl_check_allocated(txn->wr.repnl, txn->geo.first_unallocated - 1));
} }
static void refund_loose(MDBX_txn *txn) { static void refund_loose(MDBX_txn *txn) {
tASSERT(txn, txn->tw.loose_pages != nullptr); tASSERT(txn, txn->wr.loose_pages != nullptr);
tASSERT(txn, txn->tw.loose_count > 0); tASSERT(txn, txn->wr.loose_count > 0);
dpl_t *const dl = txn->tw.dirtylist; dpl_t *const dl = txn->wr.dirtylist;
if (dl) { if (dl) {
tASSERT(txn, dl->length >= txn->tw.loose_count); tASSERT(txn, dl->length >= txn->wr.loose_count);
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
} else { } else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
@ -46,23 +46,23 @@ static void refund_loose(MDBX_txn *txn) {
pgno_t onstack[MDBX_CACHELINE_SIZE * 8 / sizeof(pgno_t)]; pgno_t onstack[MDBX_CACHELINE_SIZE * 8 / sizeof(pgno_t)];
pnl_t suitable = onstack; pnl_t suitable = onstack;
if (!dl || dl->length - dl->sorted > txn->tw.loose_count) { if (!dl || dl->length - dl->sorted > txn->wr.loose_count) {
/* Dirty list is useless since unsorted. */ /* Dirty list is useless since unsorted. */
if (pnl_bytes2size(sizeof(onstack)) < txn->tw.loose_count) { if (pnl_bytes2size(sizeof(onstack)) < txn->wr.loose_count) {
suitable = pnl_alloc(txn->tw.loose_count); suitable = pnl_alloc(txn->wr.loose_count);
if (unlikely(!suitable)) if (unlikely(!suitable))
return /* this is not a reason for transaction fail */; return /* this is not a reason for transaction fail */;
} }
/* Collect loose-pages which may be refunded. */ /* Collect loose-pages which may be refunded. */
tASSERT(txn, txn->geo.first_unallocated >= MIN_PAGENO + txn->tw.loose_count); tASSERT(txn, txn->geo.first_unallocated >= MIN_PAGENO + txn->wr.loose_count);
pgno_t most = MIN_PAGENO; pgno_t most = MIN_PAGENO;
size_t w = 0; size_t w = 0;
for (const page_t *lp = txn->tw.loose_pages; lp; lp = page_next(lp)) { for (const page_t *lp = txn->wr.loose_pages; lp; lp = page_next(lp)) {
tASSERT(txn, lp->flags == P_LOOSE); tASSERT(txn, lp->flags == P_LOOSE);
tASSERT(txn, txn->geo.first_unallocated > lp->pgno); tASSERT(txn, txn->geo.first_unallocated > lp->pgno);
if (likely(txn->geo.first_unallocated - txn->tw.loose_count <= lp->pgno)) { if (likely(txn->geo.first_unallocated - txn->wr.loose_count <= lp->pgno)) {
tASSERT(txn, w < ((suitable == onstack) ? pnl_bytes2size(sizeof(onstack)) : MDBX_PNL_ALLOCLEN(suitable))); tASSERT(txn, w < ((suitable == onstack) ? pnl_bytes2size(sizeof(onstack)) : pnl_alloclen(suitable)));
suitable[++w] = lp->pgno; suitable[++w] = lp->pgno;
most = (lp->pgno > most) ? lp->pgno : most; most = (lp->pgno > most) ? lp->pgno : most;
} }
@ -72,13 +72,13 @@ static void refund_loose(MDBX_txn *txn) {
if (most + 1 == txn->geo.first_unallocated) { if (most + 1 == txn->geo.first_unallocated) {
/* Sort suitable list and refund pages at the tail. */ /* Sort suitable list and refund pages at the tail. */
MDBX_PNL_SETSIZE(suitable, w); pnl_setsize(suitable, w);
pnl_sort(suitable, MAX_PAGENO + 1); pnl_sort(suitable, MAX_PAGENO + 1);
/* Scanning in descend order */ /* Scanning in descend order */
const intptr_t step = MDBX_PNL_ASCENDING ? -1 : 1; const intptr_t step = MDBX_PNL_ASCENDING ? -1 : 1;
const intptr_t begin = MDBX_PNL_ASCENDING ? MDBX_PNL_GETSIZE(suitable) : 1; const intptr_t begin = MDBX_PNL_ASCENDING ? pnl_size(suitable) : 1;
const intptr_t end = MDBX_PNL_ASCENDING ? 0 : MDBX_PNL_GETSIZE(suitable) + 1; const intptr_t end = MDBX_PNL_ASCENDING ? 0 : pnl_size(suitable) + 1;
tASSERT(txn, suitable[begin] >= suitable[end - step]); tASSERT(txn, suitable[begin] >= suitable[end - step]);
tASSERT(txn, most == suitable[begin]); tASSERT(txn, most == suitable[begin]);
@ -90,11 +90,11 @@ static void refund_loose(MDBX_txn *txn) {
const size_t refunded = txn->geo.first_unallocated - most; const size_t refunded = txn->geo.first_unallocated - most;
DEBUG("refund-suitable %zu pages %" PRIaPGNO " -> %" PRIaPGNO, refunded, most, txn->geo.first_unallocated); DEBUG("refund-suitable %zu pages %" PRIaPGNO " -> %" PRIaPGNO, refunded, most, txn->geo.first_unallocated);
txn->geo.first_unallocated = most; txn->geo.first_unallocated = most;
txn->tw.loose_count -= refunded; txn->wr.loose_count -= refunded;
if (dl) { if (dl) {
txn->tw.dirtyroom += refunded; txn->wr.dirtyroom += refunded;
dl->pages_including_loose -= refunded; dl->pages_including_loose -= refunded;
assert(txn->tw.dirtyroom <= txn->env->options.dp_limit); assert(txn->wr.dirtyroom <= txn->env->options.dp_limit);
/* Filter-out dirty list */ /* Filter-out dirty list */
size_t r = 0; size_t r = 0;
@ -115,8 +115,8 @@ static void refund_loose(MDBX_txn *txn) {
} }
} }
dpl_setlen(dl, w); dpl_setlen(dl, w);
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length == tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit)); (txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
} }
goto unlink_loose; goto unlink_loose;
} }
@ -141,15 +141,15 @@ static void refund_loose(MDBX_txn *txn) {
if (dl->sorted != dl->length) { if (dl->sorted != dl->length) {
const size_t refunded = dl->sorted - dl->length; const size_t refunded = dl->sorted - dl->length;
dl->sorted = dl->length; dl->sorted = dl->length;
txn->tw.loose_count -= refunded; txn->wr.loose_count -= refunded;
txn->tw.dirtyroom += refunded; txn->wr.dirtyroom += refunded;
dl->pages_including_loose -= refunded; dl->pages_including_loose -= refunded;
tASSERT(txn, txn->tw.dirtyroom + txn->tw.dirtylist->length == tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->tw.dirtyroom : txn->env->options.dp_limit)); (txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
/* Filter-out loose chain & dispose refunded pages. */ /* Filter-out loose chain & dispose refunded pages. */
unlink_loose: unlink_loose:
for (page_t *__restrict *__restrict link = &txn->tw.loose_pages; *link;) { for (page_t *__restrict *__restrict link = &txn->wr.loose_pages; *link;) {
page_t *dp = *link; page_t *dp = *link;
tASSERT(txn, dp->flags == P_LOOSE); tASSERT(txn, dp->flags == P_LOOSE);
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(dp), sizeof(page_t *)); MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(dp), sizeof(page_t *));
@ -168,21 +168,21 @@ static void refund_loose(MDBX_txn *txn) {
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
if (suitable != onstack) if (suitable != onstack)
pnl_free(suitable); pnl_free(suitable);
txn->tw.loose_refund_wl = txn->geo.first_unallocated; txn->wr.loose_refund_wl = txn->geo.first_unallocated;
} }
bool txn_refund(MDBX_txn *txn) { bool txn_refund(MDBX_txn *txn) {
const pgno_t before = txn->geo.first_unallocated; const pgno_t before = txn->geo.first_unallocated;
if (txn->tw.loose_pages && txn->tw.loose_refund_wl > txn->geo.first_unallocated) if (txn->wr.loose_pages && txn->wr.loose_refund_wl > txn->geo.first_unallocated)
refund_loose(txn); refund_loose(txn);
while (true) { while (true) {
if (MDBX_PNL_GETSIZE(txn->tw.repnl) == 0 || MDBX_PNL_MOST(txn->tw.repnl) != txn->geo.first_unallocated - 1) if (pnl_size(txn->wr.repnl) == 0 || MDBX_PNL_MOST(txn->wr.repnl) != txn->geo.first_unallocated - 1)
break; break;
refund_reclaimed(txn); refund_reclaimed(txn);
if (!txn->tw.loose_pages || txn->tw.loose_refund_wl <= txn->geo.first_unallocated) if (!txn->wr.loose_pages || txn->wr.loose_refund_wl <= txn->geo.first_unallocated)
break; break;
const pgno_t memo = txn->geo.first_unallocated; const pgno_t memo = txn->geo.first_unallocated;
@ -194,7 +194,7 @@ bool txn_refund(MDBX_txn *txn) {
if (before == txn->geo.first_unallocated) if (before == txn->geo.first_unallocated)
return false; return false;
if (txn->tw.spilled.list) if (txn->wr.spilled.list)
/* Squash deleted pagenums if we refunded any */ /* Squash deleted pagenums if we refunded any */
spill_purge(txn); spill_purge(txn);

646
src/rkl.c Normal file
View File

@ -0,0 +1,646 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2025
#include "internals.h"
static inline size_t rkl_size2bytes(const size_t size) {
assert(size > 0 && size <= txl_max * 2);
size_t bytes = ceil_powerof2(MDBX_ASSUME_MALLOC_OVERHEAD + sizeof(txnid_t) * size, txl_granulate * sizeof(txnid_t)) -
MDBX_ASSUME_MALLOC_OVERHEAD;
return bytes;
}
static inline size_t rkl_bytes2size(const size_t bytes) {
size_t size = bytes / sizeof(txnid_t);
assert(size > 0 && size <= txl_max * 2);
return size;
}
void rkl_init(rkl_t *rkl) {
rkl->list_limit = ARRAY_LENGTH(rkl->inplace);
rkl->list = rkl->inplace;
rkl_clear(rkl);
}
void rkl_clear(rkl_t *rkl) {
rkl->solid_begin = UINT64_MAX;
rkl->solid_end = 0;
rkl->list_length = 0;
}
void rkl_destroy(rkl_t *rkl) {
void *ptr = rkl->list;
rkl->list = nullptr;
if (ptr != rkl->inplace)
osal_free(ptr);
}
static inline bool solid_empty(const rkl_t *rkl) { return !(rkl->solid_begin < rkl->solid_end); }
#define RKL_ORDERED(first, last) ((first) < (last))
SEARCH_IMPL(rkl_bsearch, txnid_t, txnid_t, RKL_ORDERED)
void rkl_destructive_move(rkl_t *src, rkl_t *dst) {
assert(rkl_check(src));
dst->solid_begin = src->solid_begin;
dst->solid_end = src->solid_end;
dst->list_length = src->list_length;
if (dst->list != dst->inplace)
osal_free(dst->list);
if (src->list != src->inplace) {
dst->list = src->list;
dst->list_limit = src->list_limit;
} else {
dst->list = dst->inplace;
dst->list_limit = ARRAY_LENGTH(src->inplace);
memcpy(dst->inplace, src->list, sizeof(dst->inplace));
}
rkl_init(src);
}
static int rkl_resize(rkl_t *rkl, size_t wanna_size) {
assert(wanna_size > rkl->list_length);
assert(rkl_check(rkl));
STATIC_ASSERT(txl_max < INT_MAX / sizeof(txnid_t));
if (unlikely(wanna_size > txl_max)) {
ERROR("rkl too long (%zu >= %zu)", wanna_size, (size_t)txl_max);
return MDBX_TXN_FULL;
}
if (unlikely(wanna_size < rkl->list_length)) {
ERROR("unable shrink rkl to %zu since length is %u", wanna_size, rkl->list_length);
return MDBX_PROBLEM;
}
if (unlikely(wanna_size <= ARRAY_LENGTH(rkl->inplace))) {
if (rkl->list != rkl->inplace) {
assert(rkl->list_limit > ARRAY_LENGTH(rkl->inplace) && rkl->list_length <= ARRAY_LENGTH(rkl->inplace));
memcpy(rkl->inplace, rkl->list, sizeof(rkl->inplace));
rkl->list_limit = ARRAY_LENGTH(rkl->inplace);
osal_free(rkl->list);
rkl->list = rkl->inplace;
} else {
assert(rkl->list_limit == ARRAY_LENGTH(rkl->inplace));
}
return MDBX_SUCCESS;
}
if (wanna_size != rkl->list_limit) {
size_t bytes = rkl_size2bytes(wanna_size);
void *ptr = (rkl->list == rkl->inplace) ? osal_malloc(bytes) : osal_realloc(rkl->list, bytes);
if (unlikely(!ptr))
return MDBX_ENOMEM;
#ifdef osal_malloc_usable_size
bytes = osal_malloc_usable_size(ptr);
#endif /* osal_malloc_usable_size */
rkl->list_limit = rkl_bytes2size(bytes);
if (rkl->list == rkl->inplace)
memcpy(ptr, rkl->inplace, sizeof(rkl->inplace));
rkl->list = ptr;
}
return MDBX_SUCCESS;
}
int rkl_copy(const rkl_t *src, rkl_t *dst) {
assert(rkl_check(src));
rkl_init(dst);
if (!rkl_empty(src)) {
if (dst->list_limit < src->list_length) {
int err = rkl_resize(dst, src->list_limit);
if (unlikely(err != MDBX_SUCCESS))
return err;
}
memcpy(dst->list, src->list, sizeof(txnid_t) * src->list_length);
dst->list_length = src->list_length;
dst->solid_begin = src->solid_begin;
dst->solid_end = src->solid_end;
}
return MDBX_SUCCESS;
}
size_t rkl_len(const rkl_t *rkl) { return rkl_empty(rkl) ? 0 : rkl->solid_end - rkl->solid_begin + rkl->list_length; }
__hot bool rkl_contain(const rkl_t *rkl, txnid_t id) {
assert(rkl_check(rkl));
if (id >= rkl->solid_begin && id < rkl->solid_end)
return true;
if (rkl->list_length) {
const txnid_t *it = rkl_bsearch(rkl->list, rkl->list_length, id);
const txnid_t *const end = rkl->list + rkl->list_length;
assert(it >= rkl->list && it <= end);
if (it != rkl->list)
assert(RKL_ORDERED(it[-1], id));
if (it != end) {
assert(!RKL_ORDERED(it[0], id));
return *it == id;
}
}
return false;
}
__hot bool rkl_find(const rkl_t *rkl, const txnid_t id, rkl_iter_t *iter) {
assert(rkl_check(rkl));
*iter = rkl_iterator(rkl, false);
if (id >= rkl->solid_begin) {
if (id < rkl->solid_end) {
iter->pos = iter->solid_offset + (unsigned)(id - rkl->solid_begin);
return true;
}
iter->pos = (unsigned)(rkl->solid_end - rkl->solid_begin);
}
if (rkl->list_length) {
const txnid_t *it = rkl_bsearch(rkl->list, rkl->list_length, id);
const txnid_t *const end = rkl->list + rkl->list_length;
assert(it >= rkl->list && it <= end);
if (it != rkl->list)
assert(RKL_ORDERED(it[-1], id));
iter->pos += (unsigned)(it - rkl->list);
if (it != end) {
assert(!RKL_ORDERED(it[0], id));
return *it == id;
}
}
return false;
}
static inline txnid_t list_remove_first(rkl_t *rkl) {
assert(rkl->list_length > 0);
const txnid_t first = rkl->list[0];
if (--rkl->list_length) {
/* TODO: Можно подумать о том, чтобы для избавления от memove() добавить headroom или вместо длины и
* указателя на список использовать три поля: list_begin, list_end и list_buffer. */
size_t i = 0;
do
rkl->list[i] = rkl->list[i + 1];
while (++i <= rkl->list_length);
}
return first;
}
static inline txnid_t after_cut(rkl_t *rkl, const txnid_t out) {
if (rkl->list_length == 0 && rkl->solid_begin == rkl->solid_end) {
rkl->solid_end = 0;
rkl->solid_begin = UINT64_MAX;
}
return out;
}
static int extend_solid(rkl_t *rkl, txnid_t solid_begin, txnid_t solid_end, const txnid_t id) {
if (rkl->list_length) {
const txnid_t *i = rkl_bsearch(rkl->list, rkl->list_length, id);
const txnid_t *const end = rkl->list + rkl->list_length;
/* если начало или конец списка примыкает к непрерывному интервалу,
* то переносим эти элементы из списка в непрерывный интервал */
txnid_t *f = (txnid_t *)i;
while (f > rkl->list && f[-1] >= solid_begin - 1) {
f -= 1;
solid_begin -= 1;
if (unlikely(*f != solid_begin))
return MDBX_RESULT_TRUE;
}
txnid_t *t = (txnid_t *)i;
while (t < end && *t <= solid_end) {
if (unlikely(*t != solid_end))
return MDBX_RESULT_TRUE;
solid_end += 1;
t += 1;
}
if (f < t) {
rkl->list_length -= t - f;
while (t < end)
*f++ = *t++;
}
}
rkl->solid_begin = solid_begin;
rkl->solid_end = solid_end;
assert(rkl_check(rkl));
return MDBX_SUCCESS;
}
int rkl_push(rkl_t *rkl, const txnid_t id) {
assert(id >= MIN_TXNID && id < INVALID_TXNID);
assert(rkl_check(rkl));
const bool known_continuous = false;
if (rkl->solid_begin >= rkl->solid_end) {
/* непрерывный интервал пуст */
return extend_solid(rkl, id, id + 1, id);
} else if (id < rkl->solid_begin) {
if (known_continuous || id + 1 == rkl->solid_begin)
/* id примыкает к solid_begin */
return extend_solid(rkl, id, rkl->solid_end, id);
} else if (id >= rkl->solid_end) {
if (known_continuous || id == rkl->solid_end)
/* id примыкает к solid_end */
return extend_solid(rkl, rkl->solid_begin, id + 1, id);
} else {
/* id входит в интервал между solid_begin и solid_end, т.е. подан дубликат */
return MDBX_RESULT_TRUE;
}
if (rkl->list_length == 1 && rkl->solid_end == rkl->solid_begin + 1 &&
(rkl->list[0] == id + 1 || rkl->list[0] == id - 1)) {
/* В списке один элемент и добавляемый id примыкает к нему, при этом в непрерывном интервале тоже один элемент.
* Лучше поменять элементы списка и непрерывного интервала. */
const txnid_t couple = (rkl->list[0] == id - 1) ? id - 1 : id;
rkl->list[0] = rkl->solid_begin;
rkl->solid_begin = couple;
rkl->solid_end = couple + 2;
assert(rkl_check(rkl));
return MDBX_SUCCESS;
}
if (unlikely(rkl->list_length == rkl->list_limit)) {
/* удваиваем размер буфера если закончилось место */
size_t x2 = (rkl->list_limit + 1) << 1;
x2 = (x2 > 62) ? x2 : 62;
x2 = (x2 < txl_max) ? x2 : txl_max;
x2 = (x2 > rkl->list_length) ? x2 : rkl->list_length + 42;
int err = rkl_resize(rkl, x2);
if (unlikely(err != MDBX_SUCCESS))
return err;
assert(rkl->list_limit > rkl->list_length);
}
size_t i = rkl->list_length;
/* ищем место для вставки двигаясь от конца к началу списка, сразу переставляя/раздвигая элементы */
while (i > 0) {
if (RKL_ORDERED(id, rkl->list[i - 1])) {
rkl->list[i] = rkl->list[i - 1];
i -= 1;
continue;
}
if (unlikely(id == rkl->list[i - 1])) {
while (++i < rkl->list_length)
rkl->list[i - 1] = rkl->list[i];
return MDBX_RESULT_TRUE;
}
break;
}
rkl->list[i] = id;
rkl->list_length++;
assert(rkl_check(rkl));
/* После добавления id в списке могла образоваться длинная последовательность,
* которую (возможно) стоит обменять с непрерывным интервалом. */
if (rkl->list_length > (MDBX_DEBUG ? 2 : 16) &&
((i > 0 && rkl->list[i - 1] == id - 1) || (i + 1 < rkl->list_length && rkl->list[i + 1] == id + 1))) {
txnid_t new_solid_begin = id;
size_t from = i;
while (from > 0 && rkl->list[from - 1] == new_solid_begin - 1) {
from -= 1;
new_solid_begin -= 1;
}
txnid_t new_solid_end = id + 1;
size_t to = i + 1;
while (to < rkl->list_length && rkl->list[to] == new_solid_end) {
to += 1;
new_solid_end += 1;
}
const size_t new_solid_len = to - from;
if (new_solid_len > 3) {
const size_t old_solid_len = rkl->solid_end - rkl->solid_begin;
if (new_solid_len > old_solid_len) {
/* Новая непрерывная последовательность длиннее текущей.
* Считаем обмен выгодным, если он дешевле пути развития событий с добавлением следующего элемента в список. */
const size_t old_solid_pos = rkl_bsearch(rkl->list, rkl->list_length, rkl->solid_begin) - rkl->list;
const size_t swap_cost =
/* количество элементов списка после изымаемой из списка последовательности,
* которые нужно переместить */
rkl->list_length - to +
/* количество элементов списка после позиции добавляемой в список последовательности,
* которые нужно переместить */
((from > old_solid_pos) ? from - old_solid_pos : 0)
/* количество элементов списка добавляемой последовательности, которые нужно добавить */
+ old_solid_len;
/* количество элементов списка, которые нужно переместить для вставки еще-одного/следующего элемента */
const size_t new_insert_cost = rkl->list_length - i;
/* coverity[logical_vs_bitwise] */
if (unlikely(swap_cost < new_insert_cost) || MDBX_DEBUG) {
/* Изымаемая последовательность длиннее добавляемой, поэтому:
* - список станет короче;
* - перемещать хвост нужно всегда к началу;
* - если начальные элементы потребуется раздвигать,
* то места хватит и остающиеся элементы в конце не будут перезаписаны. */
size_t moved = 0;
if (from > old_solid_pos) {
/* добавляемая последовательность ближе к началу, нужно раздвинуть элементы в голове для вставки. */
moved = from - old_solid_pos;
do {
from -= 1;
rkl->list[from + old_solid_len] = rkl->list[from];
} while (from > old_solid_pos);
} else if (from + new_solid_len < old_solid_pos) {
/* добавляемая последовательность дальше от начала,
* перемещаем часть элементов из хвоста после изымаемой последовательности */
do
rkl->list[from++] = rkl->list[to++];
while (from < old_solid_pos - new_solid_len);
}
/* вставляем последовательноть */
i = 0;
do
rkl->list[from++] = rkl->solid_begin + i++;
while (i != old_solid_len);
/* сдвигаем оставшийся хвост */
while (to < rkl->list_length)
rkl->list[moved + from++] = rkl->list[to++];
rkl->list_length = rkl->list_length - new_solid_len + old_solid_len;
rkl->solid_begin = new_solid_begin;
rkl->solid_end = new_solid_end;
assert(rkl_check(rkl));
}
}
}
}
return MDBX_SUCCESS;
}
txnid_t rkl_pop(rkl_t *rkl, const bool highest_not_lowest) {
assert(rkl_check(rkl));
if (rkl->list_length) {
assert(rkl->solid_begin <= rkl->solid_end);
if (highest_not_lowest && (solid_empty(rkl) || rkl->solid_end < rkl->list[rkl->list_length - 1]))
return after_cut(rkl, rkl->list[rkl->list_length -= 1]);
if (!highest_not_lowest && (solid_empty(rkl) || rkl->solid_begin > rkl->list[0]))
return after_cut(rkl, list_remove_first(rkl));
}
if (!solid_empty(rkl))
return after_cut(rkl, highest_not_lowest ? --rkl->solid_end : rkl->solid_begin++);
assert(rkl_empty(rkl));
return 0;
}
txnid_t rkl_lowest(const rkl_t *rkl) {
if (rkl->list_length)
return (solid_empty(rkl) || rkl->list[0] < rkl->solid_begin) ? rkl->list[0] : rkl->solid_begin;
return !solid_empty(rkl) ? rkl->solid_begin : INVALID_TXNID;
}
txnid_t rkl_highest(const rkl_t *rkl) {
if (rkl->list_length)
return (solid_empty(rkl) || rkl->list[rkl->list_length - 1] >= rkl->solid_end) ? rkl->list[rkl->list_length - 1]
: rkl->solid_end - 1;
return !solid_empty(rkl) ? rkl->solid_end - 1 : 0;
}
int rkl_merge(const rkl_t *src, rkl_t *dst, bool ignore_duplicates) {
if (src->list_length) {
size_t i = src->list_length;
do {
int err = rkl_push(dst, src->list[i - 1]);
if (unlikely(err != MDBX_SUCCESS) && (!ignore_duplicates || err != MDBX_RESULT_TRUE))
return err;
} while (--i);
}
txnid_t id = src->solid_begin;
while (id < src->solid_end) {
int err = rkl_push(dst, id);
if (unlikely(err != MDBX_SUCCESS) && (!ignore_duplicates || err != MDBX_RESULT_TRUE))
return err;
++id;
}
return MDBX_SUCCESS;
}
int rkl_destructive_merge(rkl_t *src, rkl_t *dst, bool ignore_duplicates) {
int err = rkl_merge(src, dst, ignore_duplicates);
rkl_destroy(src);
return err;
}
rkl_iter_t rkl_iterator(const rkl_t *rkl, const bool reverse) {
rkl_iter_t iter = {.rkl = rkl, .pos = reverse ? rkl_len(rkl) : 0, .solid_offset = 0};
if (!solid_empty(rkl) && rkl->list_length) {
const txnid_t *it = rkl_bsearch(rkl->list, rkl->list_length, rkl->solid_begin);
const txnid_t *const end = rkl->list + rkl->list_length;
assert(it >= rkl->list && it <= end && (it == end || *it > rkl->solid_begin));
iter.solid_offset = it - rkl->list;
}
return iter;
}
txnid_t rkl_turn(rkl_iter_t *iter, const bool reverse) {
assert((unsigned)reverse == (unsigned)!!reverse);
size_t pos = iter->pos - reverse;
if (unlikely(pos >= rkl_len(iter->rkl)))
return 0;
iter->pos = pos + !reverse;
assert(iter->pos <= rkl_len(iter->rkl));
const size_t solid_len = iter->rkl->solid_end - iter->rkl->solid_begin;
if (iter->rkl->list_length) {
if (pos < iter->solid_offset)
return iter->rkl->list[pos];
else if (pos < iter->solid_offset + solid_len)
return iter->rkl->solid_begin + pos - iter->solid_offset;
else
return iter->rkl->list[pos - solid_len];
}
assert(pos < solid_len);
return iter->rkl->solid_begin + pos;
}
size_t rkl_left(rkl_iter_t *iter, const bool reverse) {
assert(iter->pos <= rkl_len(iter->rkl));
return reverse ? iter->pos : rkl_len(iter->rkl) - iter->pos;
}
#if 1
#define DEBUG_HOLE(hole) \
do { \
} while (0)
#else
#define DEBUG_HOLE(hole) \
do { \
printf(" return-%sward: %d, ", reverse ? "back" : "for", __LINE__); \
if (hole.begin == hole.end) \
printf("empty-hole\n"); \
else if (hole.end - hole.begin == 1) \
printf("hole %" PRIaTXN "\n", hole.begin); \
else \
printf("hole %" PRIaTXN "-%" PRIaTXN "\n", hole.begin, hole.end - 1); \
fflush(nullptr); \
} while (0)
#endif
rkl_hole_t rkl_hole(rkl_iter_t *iter, const bool reverse) {
assert((unsigned)reverse == (unsigned)!!reverse);
rkl_hole_t hole;
const size_t len = rkl_len(iter->rkl);
size_t pos = iter->pos;
if (unlikely(pos >= len)) {
if (len == 0) {
hole.begin = 1;
hole.end = MAX_TXNID;
iter->pos = 0;
DEBUG_HOLE(hole);
return hole;
} else if (pos == len && reverse) {
/* шаг назад из позиции на конце rkl */
} else if (reverse) {
hole.begin = 1;
hole.end = 1 /* rkl_lowest(iter->rkl); */;
iter->pos = 0;
DEBUG_HOLE(hole);
return hole;
} else {
hole.begin = MAX_TXNID /* rkl_highest(iter->rkl) + 1 */;
hole.end = MAX_TXNID;
iter->pos = len;
DEBUG_HOLE(hole);
return hole;
}
}
const size_t solid_len = iter->rkl->solid_end - iter->rkl->solid_begin;
if (iter->rkl->list_length) {
/* список элементов не пуст */
txnid_t here, there;
for (size_t next;; pos = next) {
next = reverse ? pos - 1 : pos + 1;
if (pos < iter->solid_offset) {
/* текущая позиция перед непрерывным интервалом */
here = iter->rkl->list[pos];
if (next == iter->solid_offset) {
/* в следующей позиции начинается непрерывный интерал (при поиске вперед) */
assert(!reverse);
hole.begin = here + 1;
hole.end = iter->rkl->solid_begin;
next += solid_len;
assert(hole.begin < hole.end /* зазор обязан быть, иначе это ошибка не-слияния */);
/* зазор между элементом списка перед сплошным интервалом и началом интервала */
iter->pos = next - 1;
DEBUG_HOLE(hole);
return hole;
}
if (next >= len)
/* уперлись в конец или начало rkl */
break;
/* следующая позиция также перед непрерывным интервалом */
there = iter->rkl->list[next];
} else if (pos >= iter->solid_offset + solid_len) {
/* текущая позиция после непрерывного интервала */
here = (pos < len) ? iter->rkl->list[pos - solid_len] : MAX_TXNID;
if (next >= len)
/* уперлись в конец или начало rkl */
break;
if (next == iter->solid_offset + solid_len - 1) {
/* в следующей позиции конец непрерывного интервала (при поиске назад) */
assert(reverse);
hole.begin = iter->rkl->solid_end;
hole.end = here;
pos = iter->solid_offset;
assert(hole.begin < hole.end /* зазор обязан быть, иначе это ошибка не-слияния */);
/* зазор между элементом списка после сплошного интервала и концом интервала */
iter->pos = pos;
DEBUG_HOLE(hole);
return hole;
}
/* следующая позиция также после непрерывного интервала */
there = iter->rkl->list[next - solid_len];
} else if (reverse) {
/* текущая позиция внутри непрерывного интервала и поиск назад */
next = iter->solid_offset - 1;
here = iter->rkl->solid_begin;
if (next >= len)
/* нет элементов списка перед непрерывным интервалом */
break;
/* предыдущая позиция перед непрерывным интервалом */
there = iter->rkl->list[next];
} else {
/* текущая позиция внутри непрерывного интервала и поиск вперед */
next = iter->solid_offset + solid_len;
here = iter->rkl->solid_end - 1;
if (next >= len)
/* нет элементов списка после непрерывного интервала */
break;
/* следующая позиция после непрерывного интервала */
there = iter->rkl->list[next - solid_len];
}
hole.begin = (reverse ? there : here) + 1;
hole.end = reverse ? here : there;
if (hole.begin < hole.end) {
/* есть зазор между текущей и следующей позицией */
iter->pos = next;
DEBUG_HOLE(hole);
return hole;
}
}
if (reverse) {
/* уперлись в начало rkl, возвращаем зазор перед началом rkl */
hole.begin = 1;
hole.end = here;
iter->pos = 0;
DEBUG_HOLE(hole);
} else {
/* уперлись в конец rkl, возвращаем зазор после конца rkl */
hole.begin = here + 1;
hole.end = MAX_TXNID;
iter->pos = len;
DEBUG_HOLE(hole);
}
return hole;
}
/* список элементов пуст, но есть непрерывный интервал */
iter->pos = reverse ? 0 : len;
if (reverse && pos < len) {
/* возвращаем зазор перед непрерывным интервалом */
hole.begin = 1;
hole.end = iter->rkl->solid_begin;
DEBUG_HOLE(hole);
} else {
/* возвращаем зазор после непрерывного интервала */
hole.begin = iter->rkl->solid_end;
hole.end = MAX_TXNID;
DEBUG_HOLE(hole);
}
return hole;
}
bool rkl_check(const rkl_t *rkl) {
if (!rkl)
return false;
if (rkl->list == rkl->inplace && unlikely(rkl->list_limit != ARRAY_LENGTH(rkl->inplace)))
return false;
if (unlikely(rkl->list_limit < ARRAY_LENGTH(rkl->inplace)))
return false;
if (rkl_empty(rkl))
return rkl->list_length == 0 && solid_empty(rkl);
if (rkl->list_length) {
for (size_t i = 1; i < rkl->list_length; ++i)
if (unlikely(!RKL_ORDERED(rkl->list[i - 1], rkl->list[i])))
return false;
if (!solid_empty(rkl) && rkl->solid_begin - 1 <= rkl->list[rkl->list_length - 1] &&
rkl->solid_end >= rkl->list[0]) {
/* непрерывный интервал "плавает" внутри списка, т.е. находится между какими-то соседними значениями */
const txnid_t *it = rkl_bsearch(rkl->list, rkl->list_length, rkl->solid_begin);
const txnid_t *const end = rkl->list + rkl->list_length;
if (it < rkl->list || it > end)
return false;
if (it > rkl->list && it[-1] >= rkl->solid_begin)
return false;
if (it < end && it[0] <= rkl->solid_end)
return false;
}
}
return true;
}

76
src/rkl.h Normal file
View File

@ -0,0 +1,76 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2025
#pragma once
#include "essentials.h"
/* Сортированный набор txnid, использующий внутри комбинацию непрерывного интервала и списка.
* Обеспечивает хранение id записей при переработке, очистку и обновлении GC, включая возврат остатков переработанных
* страниц.
*
* При переработке GC записи преимущественно выбираются последовательно, но это не гарантируется. В LIFO-режиме
* переработка и добавление записей в rkl происходит преимущественно в обратном порядке, но из-за завершения читающих
* транзакций могут быть «скачки» в прямом направлении. В FIFO-режиме записи GC перерабатываются в прямом порядке и при
* этом линейно, но не обязательно строго последовательно, при этом гарантируется что между добавляемыми в rkl
* идентификаторами в GC нет записей, т.е. между первой (минимальный id) и последней (максимальный id) в GC нет записей
* и весь интервал может быть использован для возврата остатков страниц в GC.
*
* Таким образом, комбинация линейного интервала и списка (отсортированного в порядке возрастания элементов) является
* рациональным решением, близким к теоретически оптимальному пределу.
*
* Реализация rkl достаточно проста/прозрачная, если не считать неочевидную «магию» обмена непрерывного интервала и
* образующихся в списке последовательностей. Однако, именно этот автоматически выполняемый без лишних операций обмен
* оправдывает все накладные расходы. */
typedef struct MDBX_rkl {
txnid_t solid_begin, solid_end; /* начало и конец непрерывной последовательности solid_begin ... solid_end-1. */
unsigned list_length; /* текущая длина списка. */
unsigned list_limit; /* размер буфера выделенного под список, равен ARRAY_LENGTH(inplace) когда list == inplace. */
txnid_t *list; /* список отдельных элементов в порядке возрастания (наименьший в начале). */
txnid_t inplace[4 + 8]; /* статический массив для коротких списков, чтобы избавиться от выделения/освобождения памяти
* в большинстве случаев. */
} rkl_t;
MDBX_MAYBE_UNUSED MDBX_INTERNAL void rkl_init(rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_INTERNAL void rkl_clear(rkl_t *rkl);
static inline void rkl_clear_and_shrink(rkl_t *rkl) { rkl_clear(rkl); /* TODO */ }
MDBX_MAYBE_UNUSED MDBX_INTERNAL void rkl_destroy(rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_INTERNAL void rkl_destructive_move(rkl_t *src, rkl_t *dst);
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result int rkl_copy(const rkl_t *src, rkl_t *dst);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION static inline bool rkl_empty(const rkl_t *rkl) {
return rkl->solid_begin > rkl->solid_end;
}
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL bool rkl_check(const rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL size_t rkl_len(const rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL txnid_t rkl_lowest(const rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL txnid_t rkl_highest(const rkl_t *rkl);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION static inline txnid_t rkl_edge(const rkl_t *rkl,
const bool highest_not_lowest) {
return highest_not_lowest ? rkl_highest(rkl) : rkl_lowest(rkl);
}
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result int rkl_push(rkl_t *rkl, const txnid_t id);
MDBX_MAYBE_UNUSED MDBX_INTERNAL txnid_t rkl_pop(rkl_t *rkl, const bool highest_not_lowest);
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result int rkl_merge(const rkl_t *src, rkl_t *dst, bool ignore_duplicates);
MDBX_MAYBE_UNUSED MDBX_INTERNAL int rkl_destructive_merge(rkl_t *src, rkl_t *dst, bool ignore_duplicates);
/* Итератор для rkl.
* Обеспечивает изоляцию внутреннего устройства rkl от остального кода, чем существенно его упрощает.
* Фактически именно использованием rkl с итераторами ликвидируется "ребус" исторически образовавшийся в gc-update. */
typedef struct MDBX_rkl_iter {
const rkl_t *rkl;
unsigned pos;
unsigned solid_offset;
} rkl_iter_t;
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result rkl_iter_t rkl_iterator(const rkl_t *rkl, const bool reverse);
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result txnid_t rkl_turn(rkl_iter_t *iter, const bool reverse);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION MDBX_INTERNAL size_t rkl_left(rkl_iter_t *iter, const bool reverse);
MDBX_MAYBE_UNUSED MDBX_INTERNAL bool rkl_find(const rkl_t *rkl, const txnid_t id, rkl_iter_t *iter);
MDBX_MAYBE_UNUSED MDBX_NOTHROW_PURE_FUNCTION __must_check_result MDBX_INTERNAL bool rkl_contain(const rkl_t *rkl,
txnid_t id);
typedef struct MDBX_rkl_hole {
txnid_t begin;
txnid_t end;
} rkl_hole_t;
MDBX_MAYBE_UNUSED MDBX_INTERNAL __must_check_result rkl_hole_t rkl_hole(rkl_iter_t *iter, const bool reverse);

View File

@ -4,44 +4,42 @@
#include "internals.h" #include "internals.h"
void spill_remove(MDBX_txn *txn, size_t idx, size_t npages) { void spill_remove(MDBX_txn *txn, size_t idx, size_t npages) {
tASSERT(txn, idx > 0 && idx <= MDBX_PNL_GETSIZE(txn->tw.spilled.list) && txn->tw.spilled.least_removed > 0); tASSERT(txn, idx > 0 && idx <= pnl_size(txn->wr.spilled.list) && txn->wr.spilled.least_removed > 0);
txn->tw.spilled.least_removed = (idx < txn->tw.spilled.least_removed) ? idx : txn->tw.spilled.least_removed; txn->wr.spilled.least_removed = (idx < txn->wr.spilled.least_removed) ? idx : txn->wr.spilled.least_removed;
txn->tw.spilled.list[idx] |= 1; txn->wr.spilled.list[idx] |= 1;
MDBX_PNL_SETSIZE(txn->tw.spilled.list, pnl_setsize(txn->wr.spilled.list, pnl_size(txn->wr.spilled.list) - (idx == pnl_size(txn->wr.spilled.list)));
MDBX_PNL_GETSIZE(txn->tw.spilled.list) - (idx == MDBX_PNL_GETSIZE(txn->tw.spilled.list)));
while (unlikely(npages > 1)) { while (unlikely(npages > 1)) {
const pgno_t pgno = (txn->tw.spilled.list[idx] >> 1) + 1; const pgno_t pgno = (txn->wr.spilled.list[idx] >> 1) + 1;
if (MDBX_PNL_ASCENDING) { if (MDBX_PNL_ASCENDING) {
if (++idx > MDBX_PNL_GETSIZE(txn->tw.spilled.list) || (txn->tw.spilled.list[idx] >> 1) != pgno) if (++idx > pnl_size(txn->wr.spilled.list) || (txn->wr.spilled.list[idx] >> 1) != pgno)
return; return;
} else { } else {
if (--idx < 1 || (txn->tw.spilled.list[idx] >> 1) != pgno) if (--idx < 1 || (txn->wr.spilled.list[idx] >> 1) != pgno)
return; return;
txn->tw.spilled.least_removed = (idx < txn->tw.spilled.least_removed) ? idx : txn->tw.spilled.least_removed; txn->wr.spilled.least_removed = (idx < txn->wr.spilled.least_removed) ? idx : txn->wr.spilled.least_removed;
} }
txn->tw.spilled.list[idx] |= 1; txn->wr.spilled.list[idx] |= 1;
MDBX_PNL_SETSIZE(txn->tw.spilled.list, pnl_setsize(txn->wr.spilled.list, pnl_size(txn->wr.spilled.list) - (idx == pnl_size(txn->wr.spilled.list)));
MDBX_PNL_GETSIZE(txn->tw.spilled.list) - (idx == MDBX_PNL_GETSIZE(txn->tw.spilled.list)));
--npages; --npages;
} }
} }
pnl_t spill_purge(MDBX_txn *txn) { pnl_t spill_purge(MDBX_txn *txn) {
tASSERT(txn, txn->tw.spilled.least_removed > 0); tASSERT(txn, txn->wr.spilled.least_removed > 0);
const pnl_t sl = txn->tw.spilled.list; const pnl_t sl = txn->wr.spilled.list;
if (txn->tw.spilled.least_removed != INT_MAX) { if (txn->wr.spilled.least_removed != INT_MAX) {
size_t len = MDBX_PNL_GETSIZE(sl), r, w; size_t len = pnl_size(sl), r, w;
for (w = r = txn->tw.spilled.least_removed; r <= len; ++r) { for (w = r = txn->wr.spilled.least_removed; r <= len; ++r) {
sl[w] = sl[r]; sl[w] = sl[r];
w += 1 - (sl[r] & 1); w += 1 - (sl[r] & 1);
} }
for (size_t i = 1; i < w; ++i) for (size_t i = 1; i < w; ++i)
tASSERT(txn, (sl[i] & 1) == 0); tASSERT(txn, (sl[i] & 1) == 0);
MDBX_PNL_SETSIZE(sl, w - 1); pnl_setsize(sl, w - 1);
txn->tw.spilled.least_removed = INT_MAX; txn->wr.spilled.least_removed = INT_MAX;
} else { } else {
for (size_t i = 1; i <= MDBX_PNL_GETSIZE(sl); ++i) for (size_t i = 1; i <= pnl_size(sl); ++i)
tASSERT(txn, (sl[i] & 1) == 0); tASSERT(txn, (sl[i] & 1) == 0);
} }
return sl; return sl;
@ -57,7 +55,7 @@ static int spill_page(MDBX_txn *txn, iov_ctx_t *ctx, page_t *dp, const size_t np
const pgno_t pgno = dp->pgno; const pgno_t pgno = dp->pgno;
int err = iov_page(txn, ctx, dp, npages); int err = iov_page(txn, ctx, dp, npages);
if (likely(err == MDBX_SUCCESS)) if (likely(err == MDBX_SUCCESS))
err = spill_append_span(&txn->tw.spilled.list, pgno, npages); err = spill_append_span(&txn->wr.spilled.list, pgno, npages);
return err; return err;
} }
@ -72,25 +70,29 @@ static size_t spill_cursor_keep(const MDBX_txn *const txn, const MDBX_cursor *mc
intptr_t i = 0; intptr_t i = 0;
do { do {
mp = mc->pg[i]; mp = mc->pg[i];
TRACE("dbi %zu, mc-%p[%zu], page %u %p", cursor_dbi(mc), __Wpedantic_format_voidptr(mc), i, mp->pgno,
__Wpedantic_format_voidptr(mp));
tASSERT(txn, !is_subpage(mp)); tASSERT(txn, !is_subpage(mp));
if (is_modifable(txn, mp)) { if (is_modifable(txn, mp)) {
size_t const n = dpl_search(txn, mp->pgno); size_t const n = dpl_search(txn, mp->pgno);
if (txn->tw.dirtylist->items[n].pgno == mp->pgno && if (txn->wr.dirtylist->items[n].pgno == mp->pgno &&
/* не считаем дважды */ dpl_age(txn, n)) { /* не считаем дважды */ dpl_age(txn, n)) {
size_t *const ptr = ptr_disp(txn->tw.dirtylist->items[n].ptr, -(ptrdiff_t)sizeof(size_t)); size_t *const ptr = ptr_disp(txn->wr.dirtylist->items[n].ptr, -(ptrdiff_t)sizeof(size_t));
*ptr = txn->tw.dirtylru; *ptr = txn->wr.dirtylru;
tASSERT(txn, dpl_age(txn, n) == 0); tASSERT(txn, dpl_age(txn, n) == 0);
++keep; ++keep;
DEBUG("keep page %" PRIaPGNO " (%p), dbi %zu, %scursor %p[%zu]", mp->pgno, __Wpedantic_format_voidptr(mp),
cursor_dbi(mc), is_inner(mc) ? "sub-" : "", __Wpedantic_format_voidptr(mc), i);
} }
} }
} while (++i <= mc->top); } while (++i <= mc->top);
tASSERT(txn, is_leaf(mp)); tASSERT(txn, is_leaf(mp));
if (!mc->subcur || mc->ki[mc->top] >= page_numkeys(mp)) if (!inner_pointed(mc))
break;
if (!(node_flags(page_node(mp, mc->ki[mc->top])) & N_TREE))
break; break;
mc = &mc->subcur->cursor; mc = &mc->subcur->cursor;
if (is_subpage(mc->pg[0]))
break;
} }
return keep; return keep;
} }
@ -115,7 +117,7 @@ static size_t spill_txn_keep(MDBX_txn *txn, MDBX_cursor *m0) {
* ... * ...
* > 255 = must not be spilled. */ * > 255 = must not be spilled. */
MDBX_NOTHROW_PURE_FUNCTION static unsigned spill_prio(const MDBX_txn *txn, const size_t i, const uint32_t reciprocal) { MDBX_NOTHROW_PURE_FUNCTION static unsigned spill_prio(const MDBX_txn *txn, const size_t i, const uint32_t reciprocal) {
dpl_t *const dl = txn->tw.dirtylist; dpl_t *const dl = txn->wr.dirtylist;
const uint32_t age = dpl_age(txn, i); const uint32_t age = dpl_age(txn, i);
const size_t npages = dpl_npages(dl, i); const size_t npages = dpl_npages(dl, i);
const pgno_t pgno = dl->items[i].pgno; const pgno_t pgno = dl->items[i].pgno;
@ -178,14 +180,14 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
int rc = MDBX_SUCCESS; int rc = MDBX_SUCCESS;
if (unlikely(txn->tw.loose_count >= if (unlikely(txn->wr.loose_count >=
(txn->tw.dirtylist ? txn->tw.dirtylist->pages_including_loose : txn->tw.writemap_dirty_npages))) (txn->wr.dirtylist ? txn->wr.dirtylist->pages_including_loose : txn->wr.writemap_dirty_npages)))
goto done; goto done;
const size_t dirty_entries = txn->tw.dirtylist ? (txn->tw.dirtylist->length - txn->tw.loose_count) : 1; const size_t dirty_entries = txn->wr.dirtylist ? (txn->wr.dirtylist->length - txn->wr.loose_count) : 1;
const size_t dirty_npages = const size_t dirty_npages =
(txn->tw.dirtylist ? txn->tw.dirtylist->pages_including_loose : txn->tw.writemap_dirty_npages) - (txn->wr.dirtylist ? txn->wr.dirtylist->pages_including_loose : txn->wr.writemap_dirty_npages) -
txn->tw.loose_count; txn->wr.loose_count;
const size_t need_spill_entries = spill_gate(txn->env, wanna_spill_entries, dirty_entries); const size_t need_spill_entries = spill_gate(txn->env, wanna_spill_entries, dirty_entries);
const size_t need_spill_npages = spill_gate(txn->env, wanna_spill_npages, dirty_npages); const size_t need_spill_npages = spill_gate(txn->env, wanna_spill_npages, dirty_npages);
@ -196,17 +198,17 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
if (txn->flags & MDBX_WRITEMAP) { if (txn->flags & MDBX_WRITEMAP) {
NOTICE("%s-spilling %zu dirty-entries, %zu dirty-npages", "msync", dirty_entries, dirty_npages); NOTICE("%s-spilling %zu dirty-entries, %zu dirty-npages", "msync", dirty_entries, dirty_npages);
const MDBX_env *env = txn->env; const MDBX_env *env = txn->env;
tASSERT(txn, txn->tw.spilled.list == nullptr); tASSERT(txn, txn->wr.spilled.list == nullptr);
rc = osal_msync(&txn->env->dxb_mmap, 0, pgno_align2os_bytes(env, txn->geo.first_unallocated), MDBX_SYNC_KICK); rc = osal_msync(&txn->env->dxb_mmap, 0, pgno_align2os_bytes(env, txn->geo.first_unallocated), MDBX_SYNC_KICK);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto bailout; goto bailout;
#if MDBX_AVOID_MSYNC #if MDBX_AVOID_MSYNC
MDBX_ANALYSIS_ASSUME(txn->tw.dirtylist != nullptr); MDBX_ANALYSIS_ASSUME(txn->wr.dirtylist != nullptr);
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
env->lck->unsynced_pages.weak += txn->tw.dirtylist->pages_including_loose - txn->tw.loose_count; env->lck->unsynced_pages.weak += txn->wr.dirtylist->pages_including_loose - txn->wr.loose_count;
dpl_clear(txn->tw.dirtylist); dpl_clear(txn->wr.dirtylist);
txn->tw.dirtyroom = env->options.dp_limit - txn->tw.loose_count; txn->wr.dirtyroom = env->options.dp_limit - txn->wr.loose_count;
for (page_t *lp = txn->tw.loose_pages; lp != nullptr; lp = page_next(lp)) { for (page_t *lp = txn->wr.loose_pages; lp != nullptr; lp = page_next(lp)) {
tASSERT(txn, lp->flags == P_LOOSE); tASSERT(txn, lp->flags == P_LOOSE);
rc = dpl_append(txn, lp->pgno, lp, 1); rc = dpl_append(txn, lp->pgno, lp, 1);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
@ -216,22 +218,22 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
} }
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
#else #else
tASSERT(txn, txn->tw.dirtylist == nullptr); tASSERT(txn, txn->wr.dirtylist == nullptr);
env->lck->unsynced_pages.weak += txn->tw.writemap_dirty_npages; env->lck->unsynced_pages.weak += txn->wr.writemap_dirty_npages;
txn->tw.writemap_spilled_npages += txn->tw.writemap_dirty_npages; txn->wr.writemap_spilled_npages += txn->wr.writemap_dirty_npages;
txn->tw.writemap_dirty_npages = 0; txn->wr.writemap_dirty_npages = 0;
#endif /* MDBX_AVOID_MSYNC */ #endif /* MDBX_AVOID_MSYNC */
goto done; goto done;
} }
NOTICE("%s-spilling %zu dirty-entries, %zu dirty-npages", "write", need_spill_entries, need_spill_npages); NOTICE("%s-spilling %zu dirty-entries, %zu dirty-npages", "write", need_spill_entries, need_spill_npages);
MDBX_ANALYSIS_ASSUME(txn->tw.dirtylist != nullptr); MDBX_ANALYSIS_ASSUME(txn->wr.dirtylist != nullptr);
tASSERT(txn, txn->tw.dirtylist->length - txn->tw.loose_count >= 1); tASSERT(txn, txn->wr.dirtylist->length - txn->wr.loose_count >= 1);
tASSERT(txn, txn->tw.dirtylist->pages_including_loose - txn->tw.loose_count >= need_spill_npages); tASSERT(txn, txn->wr.dirtylist->pages_including_loose - txn->wr.loose_count >= need_spill_npages);
if (!txn->tw.spilled.list) { if (!txn->wr.spilled.list) {
txn->tw.spilled.least_removed = INT_MAX; txn->wr.spilled.least_removed = INT_MAX;
txn->tw.spilled.list = pnl_alloc(need_spill); txn->wr.spilled.list = pnl_alloc(need_spill);
if (unlikely(!txn->tw.spilled.list)) { if (unlikely(!txn->wr.spilled.list)) {
rc = MDBX_ENOMEM; rc = MDBX_ENOMEM;
bailout: bailout:
txn->flags |= MDBX_TXN_ERROR; txn->flags |= MDBX_TXN_ERROR;
@ -240,7 +242,7 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
} else { } else {
/* purge deleted slots */ /* purge deleted slots */
spill_purge(txn); spill_purge(txn);
rc = pnl_reserve(&txn->tw.spilled.list, need_spill); rc = pnl_reserve(&txn->wr.spilled.list, need_spill);
(void)rc /* ignore since the resulting list may be shorter (void)rc /* ignore since the resulting list may be shorter
and pnl_append() will increase pnl on demand */ and pnl_append() will increase pnl on demand */
; ;
@ -251,9 +253,9 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
/* Preserve pages which may soon be dirtied again */ /* Preserve pages which may soon be dirtied again */
const size_t unspillable = spill_txn_keep(txn, m0); const size_t unspillable = spill_txn_keep(txn, m0);
if (unspillable + txn->tw.loose_count >= dl->length) { if (unspillable + txn->wr.loose_count >= dl->length) {
#if xMDBX_DEBUG_SPILLING == 1 /* avoid false failure in debug mode */ #if xMDBX_DEBUG_SPILLING == 1 /* avoid false failure in debug mode */
if (likely(txn->tw.dirtyroom + txn->tw.loose_count >= need)) if (likely(txn->wr.dirtyroom + txn->wr.loose_count >= need))
return MDBX_SUCCESS; return MDBX_SUCCESS;
#endif /* xMDBX_DEBUG_SPILLING */ #endif /* xMDBX_DEBUG_SPILLING */
ERROR("all %zu dirty pages are unspillable since referenced " ERROR("all %zu dirty pages are unspillable since referenced "
@ -293,7 +295,7 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
age_max = (age_max >= age) ? age_max : age; age_max = (age_max >= age) ? age_max : age;
} }
VERBOSE("lru-head %u, age-max %u", txn->tw.dirtylru, age_max); VERBOSE("lru-head %u, age-max %u", txn->wr.dirtylru, age_max);
/* half of 8-bit radix-sort */ /* half of 8-bit radix-sort */
pgno_t radix_entries[256], radix_npages[256]; pgno_t radix_entries[256], radix_npages[256];
@ -388,8 +390,8 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
tASSERT(txn, r - w == spilled_entries || rc != MDBX_SUCCESS); tASSERT(txn, r - w == spilled_entries || rc != MDBX_SUCCESS);
dl->sorted = dpl_setlen(dl, w); dl->sorted = dpl_setlen(dl, w);
txn->tw.dirtyroom += spilled_entries; txn->wr.dirtyroom += spilled_entries;
txn->tw.dirtylist->pages_including_loose -= spilled_npages; txn->wr.dirtylist->pages_including_loose -= spilled_npages;
tASSERT(txn, dpl_check(txn)); tASSERT(txn, dpl_check(txn));
if (!iov_empty(&ctx)) { if (!iov_empty(&ctx)) {
@ -400,10 +402,10 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
goto bailout; goto bailout;
txn->env->lck->unsynced_pages.weak += spilled_npages; txn->env->lck->unsynced_pages.weak += spilled_npages;
pnl_sort(txn->tw.spilled.list, (size_t)txn->geo.first_unallocated << 1); pnl_sort(txn->wr.spilled.list, (size_t)txn->geo.first_unallocated << 1);
txn->flags |= MDBX_TXN_SPILLS; txn->flags |= MDBX_TXN_SPILLS;
NOTICE("spilled %u dirty-entries, %u dirty-npages, now have %zu dirty-room", spilled_entries, spilled_npages, NOTICE("spilled %u dirty-entries, %u dirty-npages, now have %zu dirty-room", spilled_entries, spilled_npages,
txn->tw.dirtyroom); txn->wr.dirtyroom);
} else { } else {
tASSERT(txn, rc == MDBX_SUCCESS); tASSERT(txn, rc == MDBX_SUCCESS);
for (size_t i = 1; i <= dl->length; ++i) { for (size_t i = 1; i <= dl->length; ++i) {
@ -414,18 +416,18 @@ __cold int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, const intp
} }
#if xMDBX_DEBUG_SPILLING == 2 #if xMDBX_DEBUG_SPILLING == 2
if (txn->tw.loose_count + txn->tw.dirtyroom <= need / 2 + 1) if (txn->wr.loose_count + txn->wr.dirtyroom <= need / 2 + 1)
ERROR("dirty-list length: before %zu, after %zu, parent %zi, loose %zu; " ERROR("dirty-list length: before %zu, after %zu, parent %zi, loose %zu; "
"needed %zu, spillable %zu; " "needed %zu, spillable %zu; "
"spilled %u dirty-entries, now have %zu dirty-room", "spilled %u dirty-entries, now have %zu dirty-room",
dl->length + spilled_entries, dl->length, dl->length + spilled_entries, dl->length,
(txn->parent && txn->parent->tw.dirtylist) ? (intptr_t)txn->parent->tw.dirtylist->length : -1, (txn->parent && txn->parent->wr.dirtylist) ? (intptr_t)txn->parent->wr.dirtylist->length : -1,
txn->tw.loose_count, need, spillable_entries, spilled_entries, txn->tw.dirtyroom); txn->wr.loose_count, need, spillable_entries, spilled_entries, txn->wr.dirtyroom);
ENSURE(txn->env, txn->tw.loose_count + txn->tw.dirtyroom > need / 2); ENSURE(txn->env, txn->wr.loose_count + txn->wr.dirtyroom > need / 2);
#endif /* xMDBX_DEBUG_SPILLING */ #endif /* xMDBX_DEBUG_SPILLING */
done: done:
return likely(txn->tw.dirtyroom + txn->tw.loose_count > ((need > CURSOR_STACK_SIZE) ? CURSOR_STACK_SIZE : need)) return likely(txn->wr.dirtyroom + txn->wr.loose_count > ((need > CURSOR_STACK_SIZE) ? CURSOR_STACK_SIZE : need))
? MDBX_SUCCESS ? MDBX_SUCCESS
: MDBX_TXN_FULL; : MDBX_TXN_FULL;
} }

View File

@ -13,19 +13,19 @@ MDBX_INTERNAL int spill_slowpath(MDBX_txn *const txn, MDBX_cursor *const m0, con
static inline size_t spill_search(const MDBX_txn *txn, pgno_t pgno) { static inline size_t spill_search(const MDBX_txn *txn, pgno_t pgno) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC); tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
const pnl_t pnl = txn->tw.spilled.list; const pnl_t pnl = txn->wr.spilled.list;
if (likely(!pnl)) if (likely(!pnl))
return 0; return 0;
pgno <<= 1; pgno <<= 1;
size_t n = pnl_search(pnl, pgno, (size_t)MAX_PAGENO + MAX_PAGENO + 1); size_t n = pnl_search(pnl, pgno, (size_t)MAX_PAGENO + MAX_PAGENO + 1);
return (n <= MDBX_PNL_GETSIZE(pnl) && pnl[n] == pgno) ? n : 0; return (n <= pnl_size(pnl) && pnl[n] == pgno) ? n : 0;
} }
static inline bool spill_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npages) { static inline bool spill_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npages) {
const pnl_t pnl = txn->tw.spilled.list; const pnl_t pnl = txn->wr.spilled.list;
if (likely(!pnl)) if (likely(!pnl))
return false; return false;
const size_t len = MDBX_PNL_GETSIZE(pnl); const size_t len = pnl_size(pnl);
if (LOG_ENABLED(MDBX_LOG_EXTRA)) { if (LOG_ENABLED(MDBX_LOG_EXTRA)) {
DEBUG_EXTRA("PNL len %zu [", len); DEBUG_EXTRA("PNL len %zu [", len);
for (size_t i = 1; i <= len; ++i) for (size_t i = 1; i <= len; ++i)
@ -36,12 +36,12 @@ static inline bool spill_intersect(const MDBX_txn *txn, pgno_t pgno, size_t npag
const pgno_t spilled_range_last = ((pgno + (pgno_t)npages) << 1) - 1; const pgno_t spilled_range_last = ((pgno + (pgno_t)npages) << 1) - 1;
#if MDBX_PNL_ASCENDING #if MDBX_PNL_ASCENDING
const size_t n = pnl_search(pnl, spilled_range_begin, (size_t)(MAX_PAGENO + 1) << 1); const size_t n = pnl_search(pnl, spilled_range_begin, (size_t)(MAX_PAGENO + 1) << 1);
tASSERT(txn, n && (n == MDBX_PNL_GETSIZE(pnl) + 1 || spilled_range_begin <= pnl[n])); tASSERT(txn, n && (n == pnl_size(pnl) + 1 || spilled_range_begin <= pnl[n]));
const bool rc = n <= MDBX_PNL_GETSIZE(pnl) && pnl[n] <= spilled_range_last; const bool rc = n <= pnl_size(pnl) && pnl[n] <= spilled_range_last;
#else #else
const size_t n = pnl_search(pnl, spilled_range_last, (size_t)MAX_PAGENO + MAX_PAGENO + 1); const size_t n = pnl_search(pnl, spilled_range_last, (size_t)MAX_PAGENO + MAX_PAGENO + 1);
tASSERT(txn, n && (n == MDBX_PNL_GETSIZE(pnl) + 1 || spilled_range_last >= pnl[n])); tASSERT(txn, n && (n == pnl_size(pnl) + 1 || spilled_range_last >= pnl[n]));
const bool rc = n <= MDBX_PNL_GETSIZE(pnl) && pnl[n] >= spilled_range_begin; const bool rc = n <= pnl_size(pnl) && pnl[n] >= spilled_range_begin;
#endif #endif
if (ASSERT_ENABLED()) { if (ASSERT_ENABLED()) {
bool check = false; bool check = false;
@ -56,10 +56,10 @@ static inline int txn_spill(MDBX_txn *const txn, MDBX_cursor *const m0, const si
tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0); tASSERT(txn, (txn->flags & MDBX_TXN_RDONLY) == 0);
tASSERT(txn, !m0 || cursor_is_tracked(m0)); tASSERT(txn, !m0 || cursor_is_tracked(m0));
const intptr_t wanna_spill_entries = txn->tw.dirtylist ? (need - txn->tw.dirtyroom - txn->tw.loose_count) : 0; const intptr_t wanna_spill_entries = txn->wr.dirtylist ? (need - txn->wr.dirtyroom - txn->wr.loose_count) : 0;
const intptr_t wanna_spill_npages = const intptr_t wanna_spill_npages =
need + (txn->tw.dirtylist ? txn->tw.dirtylist->pages_including_loose : txn->tw.writemap_dirty_npages) - need + (txn->wr.dirtylist ? txn->wr.dirtylist->pages_including_loose : txn->wr.writemap_dirty_npages) -
txn->tw.loose_count - txn->env->options.dp_limit; txn->wr.loose_count - txn->env->options.dp_limit;
/* production mode */ /* production mode */
if (likely(wanna_spill_npages < 1 && wanna_spill_entries < 1) if (likely(wanna_spill_npages < 1 && wanna_spill_entries < 1)

View File

@ -65,7 +65,7 @@ MDBX_chk_flags_t chk_flags = MDBX_CHK_DEFAULTS;
MDBX_chk_stage_t chk_stage = MDBX_chk_none; MDBX_chk_stage_t chk_stage = MDBX_chk_none;
static MDBX_chk_line_t line_struct; static MDBX_chk_line_t line_struct;
static size_t anchor_lineno; static size_t anchor_cookie;
static size_t line_count; static size_t line_count;
static FILE *line_output; static FILE *line_output;
@ -275,7 +275,7 @@ static MDBX_chk_user_table_cookie_t *table_filter(MDBX_chk_context_t *ctx, const
static int stage_begin(MDBX_chk_context_t *ctx, enum MDBX_chk_stage stage) { static int stage_begin(MDBX_chk_context_t *ctx, enum MDBX_chk_stage stage) {
(void)ctx; (void)ctx;
chk_stage = stage; chk_stage = stage;
anchor_lineno = line_count; anchor_cookie = line_count;
flush(); flush();
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
@ -284,7 +284,7 @@ static int conclude(MDBX_chk_context_t *ctx);
static int stage_end(MDBX_chk_context_t *ctx, enum MDBX_chk_stage stage, int err) { static int stage_end(MDBX_chk_context_t *ctx, enum MDBX_chk_stage stage, int err) {
if (stage == MDBX_chk_conclude && !err) if (stage == MDBX_chk_conclude && !err)
err = conclude(ctx); err = conclude(ctx);
suffix(anchor_lineno, err ? "error(s)" : "done"); suffix(anchor_cookie, err ? "error(s)" : "done");
flush(); flush();
chk_stage = MDBX_chk_none; chk_stage = MDBX_chk_none;
return err; return err;
@ -367,7 +367,7 @@ static int conclude(MDBX_chk_context_t *ctx) {
(chk_flags & (MDBX_CHK_SKIP_BTREE_TRAVERSAL | MDBX_CHK_SKIP_KV_TRAVERSAL)) == 0 && (chk_flags & (MDBX_CHK_SKIP_BTREE_TRAVERSAL | MDBX_CHK_SKIP_KV_TRAVERSAL)) == 0 &&
(env_flags & MDBX_RDONLY) == 0 && !only_table.iov_base && stuck_meta < 0 && (env_flags & MDBX_RDONLY) == 0 && !only_table.iov_base && stuck_meta < 0 &&
ctx->result.steady_txnid < ctx->result.recent_txnid) { ctx->result.steady_txnid < ctx->result.recent_txnid) {
const size_t step_lineno = print(MDBX_chk_resolution, const size_t cookie = print(MDBX_chk_resolution,
"Perform sync-to-disk for make steady checkpoint" "Perform sync-to-disk for make steady checkpoint"
" at txn-id #%" PRIi64 "...", " at txn-id #%" PRIi64 "...",
ctx->result.recent_txnid); ctx->result.recent_txnid);
@ -376,7 +376,7 @@ static int conclude(MDBX_chk_context_t *ctx) {
if (err == MDBX_SUCCESS) { if (err == MDBX_SUCCESS) {
ctx->result.problems_meta -= 1; ctx->result.problems_meta -= 1;
ctx->result.total_problems -= 1; ctx->result.total_problems -= 1;
suffix(step_lineno, "done"); suffix(cookie, "done");
} }
} }
@ -384,15 +384,14 @@ static int conclude(MDBX_chk_context_t *ctx) {
!only_table.iov_base && (env_flags & (MDBX_RDONLY | MDBX_EXCLUSIVE)) == MDBX_EXCLUSIVE) { !only_table.iov_base && (env_flags & (MDBX_RDONLY | MDBX_EXCLUSIVE)) == MDBX_EXCLUSIVE) {
const bool successful_check = (err | ctx->result.total_problems | ctx->result.problems_meta) == 0; const bool successful_check = (err | ctx->result.total_problems | ctx->result.problems_meta) == 0;
if (successful_check || force_turn_meta) { if (successful_check || force_turn_meta) {
const size_t step_lineno = const size_t cookie = print(MDBX_chk_resolution, "Performing turn to the specified meta-page (%d) due to %s!",
print(MDBX_chk_resolution, "Performing turn to the specified meta-page (%d) due to %s!", stuck_meta, stuck_meta, successful_check ? "successful check" : "the -T option was given");
successful_check ? "successful check" : "the -T option was given");
flush(); flush();
err = mdbx_env_turn_for_recovery(ctx->env, stuck_meta); err = mdbx_env_turn_for_recovery(ctx->env, stuck_meta);
if (err != MDBX_SUCCESS) if (err != MDBX_SUCCESS)
error_fn("mdbx_env_turn_for_recovery", err); error_fn("mdbx_env_turn_for_recovery", err);
else else
suffix(step_lineno, "done"); suffix(cookie, "done");
} else { } else {
print(MDBX_chk_resolution, print(MDBX_chk_resolution,
"Skipping turn to the specified meta-page (%d) due to " "Skipping turn to the specified meta-page (%d) due to "
@ -605,8 +604,10 @@ int main(int argc, char *argv[]) {
rc == EBUSY || rc == EAGAIN rc == EBUSY || rc == EAGAIN
#endif #endif
)) { )) {
const size_t cookie = print(MDBX_chk_resolution, "Try open in non-exclusive mode...");
env_flags &= ~MDBX_EXCLUSIVE; env_flags &= ~MDBX_EXCLUSIVE;
rc = mdbx_env_open(env, envname, env_flags | MDBX_ACCEDE, 0); rc = mdbx_env_open(env, envname, env_flags | MDBX_ACCEDE, 0);
suffix(cookie, rc ? "failed" : "done");
} }
} }
@ -619,14 +620,14 @@ int main(int argc, char *argv[]) {
print_ln(MDBX_chk_verbose, "%s mode", (env_flags & MDBX_EXCLUSIVE) ? "monopolistic" : "cooperative"); print_ln(MDBX_chk_verbose, "%s mode", (env_flags & MDBX_EXCLUSIVE) ? "monopolistic" : "cooperative");
if (warmup) { if (warmup) {
anchor_lineno = print(MDBX_chk_verbose, "warming up..."); anchor_cookie = print(MDBX_chk_verbose, "warming up...");
flush(); flush();
rc = mdbx_env_warmup(env, nullptr, warmup_flags, 3600 * 65536); rc = mdbx_env_warmup(env, nullptr, warmup_flags, 3600 * 65536);
if (MDBX_IS_ERROR(rc)) { if (MDBX_IS_ERROR(rc)) {
error_fn("mdbx_env_warmup", rc); error_fn("mdbx_env_warmup", rc);
goto bailout; goto bailout;
} }
suffix(anchor_lineno, rc ? "timeout" : "done"); suffix(anchor_cookie, rc ? "timeout" : "done");
} }
rc = mdbx_env_chk(env, &cb, &chk, chk_flags, MDBX_chk_result + (verbose << MDBX_chk_severity_prio_shift), 0); rc = mdbx_env_chk(env, &cb, &chk, chk_flags, MDBX_chk_result + (verbose << MDBX_chk_severity_prio_shift), 0);

View File

@ -42,6 +42,7 @@ static void usage(const char *prog) {
" -V\t\tprint version and exit\n" " -V\t\tprint version and exit\n"
" -q\t\tbe quiet\n" " -q\t\tbe quiet\n"
" -c\t\tenable compactification (skip unused pages)\n" " -c\t\tenable compactification (skip unused pages)\n"
" -f\t\tforce copying even the target file exists\n"
" -d\t\tenforce copy to be a dynamic size DB\n" " -d\t\tenforce copy to be a dynamic size DB\n"
" -p\t\tusing transaction parking/ousting during copying MVCC-snapshot\n" " -p\t\tusing transaction parking/ousting during copying MVCC-snapshot\n"
" \t\tto avoid stopping recycling and overflowing the DB\n" " \t\tto avoid stopping recycling and overflowing the DB\n"
@ -87,6 +88,8 @@ int main(int argc, char *argv[]) {
cpflags |= MDBX_CP_FORCE_DYNAMIC_SIZE; cpflags |= MDBX_CP_FORCE_DYNAMIC_SIZE;
else if (argv[1][1] == 'p' && argv[1][2] == '\0') else if (argv[1][1] == 'p' && argv[1][2] == '\0')
cpflags |= MDBX_CP_THROTTLE_MVCC; cpflags |= MDBX_CP_THROTTLE_MVCC;
else if (argv[1][1] == 'f' && argv[1][2] == '\0')
cpflags |= MDBX_CP_OVERWRITE;
else if (argv[1][1] == 'q' && argv[1][2] == '\0') else if (argv[1][1] == 'q' && argv[1][2] == '\0')
quiet = true; quiet = true;
else if (argv[1][1] == 'u' && argv[1][2] == '\0') else if (argv[1][1] == 'u' && argv[1][2] == '\0')

View File

@ -20,6 +20,7 @@
#define PRINT 1 #define PRINT 1
#define GLOBAL 2 #define GLOBAL 2
#define CONCISE 4
static int mode = GLOBAL; static int mode = GLOBAL;
typedef struct flagbit { typedef struct flagbit {
@ -55,42 +56,23 @@ static void signal_handler(int sig) {
#endif /* !WINDOWS */ #endif /* !WINDOWS */
static const char hexc[] = "0123456789abcdef"; static void dumpval(const MDBX_val *v) {
static const char digits[] = "0123456789abcdef";
static void dumpbyte(unsigned char c) {
putchar(hexc[c >> 4]);
putchar(hexc[c & 15]);
}
static void text(MDBX_val *v) {
unsigned char *c, *end;
putchar(' '); putchar(' ');
c = v->iov_base; for (const unsigned char *c = v->iov_base, *end = c + v->iov_len; c < end; ++c) {
end = c + v->iov_len; if (mode & PRINT) {
while (c < end) {
if (isprint(*c) && *c != '\\') { if (isprint(*c) && *c != '\\') {
putchar(*c); putchar(*c);
} else { continue;
} else
putchar('\\'); putchar('\\');
dumpbyte(*c);
} }
c++; putchar(digits[*c >> 4]);
putchar(digits[*c & 15]);
} }
putchar('\n'); putchar('\n');
} }
static void dumpval(MDBX_val *v) {
unsigned char *c, *end;
putchar(' ');
c = v->iov_base;
end = c + v->iov_len;
while (c < end)
dumpbyte(*c++);
putchar('\n');
}
bool quiet = false, rescue = false; bool quiet = false, rescue = false;
const char *prog; const char *prog;
static void error(const char *func, int rc) { static void error(const char *func, int rc) {
@ -185,12 +167,19 @@ static int dump_tbl(MDBX_txn *txn, MDBX_dbi dbi, char *name) {
rc = MDBX_EINTR; rc = MDBX_EINTR;
break; break;
} }
if (mode & PRINT) {
text(&key);
text(&data);
} else {
dumpval(&key); dumpval(&key);
dumpval(&data); dumpval(&data);
if ((flags & MDBX_DUPSORT) && (mode & CONCISE)) {
while ((rc = mdbx_cursor_get(cursor, &key, &data, MDBX_NEXT_DUP)) == MDBX_SUCCESS) {
if (user_break) {
rc = MDBX_EINTR;
break;
}
putchar(' ');
dumpval(&data);
}
if (rc != MDBX_NOTFOUND)
break;
} }
} }
printf("DATA=END\n"); printf("DATA=END\n");
@ -206,10 +195,12 @@ static int dump_tbl(MDBX_txn *txn, MDBX_dbi dbi, char *name) {
static void usage(void) { static void usage(void) {
fprintf(stderr, fprintf(stderr,
"usage: %s " "usage: %s "
"[-V] [-q] [-f file] [-l] [-p] [-r] [-a|-s table] [-u|U] " "[-V] [-q] [-c] [-f file] [-l] [-p] [-r] [-a|-s table] [-u|U] "
"dbpath\n" "dbpath\n"
" -V\t\tprint version and exit\n" " -V\t\tprint version and exit\n"
" -q\t\tbe quiet\n" " -q\t\tbe quiet\n"
" -c\t\tconcise mode without repeating keys,\n"
" \t\tbut incompatible with Berkeley DB and LMDB\n"
" -f\t\twrite to file instead of stdout\n" " -f\t\twrite to file instead of stdout\n"
" -l\t\tlist tables and exit\n" " -l\t\tlist tables and exit\n"
" -p\t\tuse printable characters\n" " -p\t\tuse printable characters\n"
@ -268,6 +259,7 @@ int main(int argc, char *argv[]) {
"s:" "s:"
"V" "V"
"r" "r"
"c"
"q")) != EOF) { "q")) != EOF) {
switch (i) { switch (i) {
case 'V': case 'V':
@ -298,6 +290,9 @@ int main(int argc, char *argv[]) {
break; break;
case 'n': case 'n':
break; break;
case 'c':
mode |= CONCISE;
break;
case 'p': case 'p':
mode |= PRINT; mode |= PRINT;
break; break;

View File

@ -380,7 +380,16 @@ __hot static int readline(MDBX_val *out, MDBX_val *buf) {
return badend(); return badend();
} }
} }
if (fgets(buf->iov_base, (int)buf->iov_len, stdin) == nullptr)
/* modern concise mode, where space in second position mean the same (previously) value */
c = fgetc(stdin);
if (c == EOF)
return errno ? errno : EOF;
if (c == ' ')
return (ungetc(c, stdin) == c) ? MDBX_SUCCESS : (errno ? errno : EOF);
*(char *)buf->iov_base = c;
if (fgets((char *)buf->iov_base + 1, (int)buf->iov_len - 1, stdin) == nullptr)
return errno ? errno : EOF; return errno ? errno : EOF;
lineno++; lineno++;
@ -721,8 +730,8 @@ int main(int argc, char *argv[]) {
} }
int batch = 0; int batch = 0;
MDBX_val key = {.iov_base = nullptr, .iov_len = 0}, data = {.iov_base = nullptr, .iov_len = 0};
while (err == MDBX_SUCCESS) { while (err == MDBX_SUCCESS) {
MDBX_val key, data;
err = readline(&key, &kbuf); err = readline(&key, &kbuf);
if (err == EOF) if (err == EOF)
break; break;

View File

@ -38,11 +38,10 @@ static MDBX_cursor *cursor_clone(const MDBX_cursor *csrc, cursor_couple_t *coupl
/*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/
void recalculate_merge_thresholds(MDBX_env *env) { void recalculate_merge_thresholds(MDBX_env *env) {
const size_t bytes = page_space(env); const size_t whole_page_space = page_space(env);
env->merge_threshold = (uint16_t)(bytes - (bytes * env->options.merge_threshold_16dot16_percent >> 16)); env->merge_threshold =
env->merge_threshold_gc = (uint16_t)(whole_page_space - (whole_page_space * env->options.merge_threshold_16dot16_percent >> 16));
(uint16_t)(bytes - ((env->options.merge_threshold_16dot16_percent > 19005) ? bytes / 3 /* 33 % */ eASSERT(env, env->merge_threshold >= whole_page_space / 2 && env->merge_threshold <= whole_page_space / 64 * 63);
: bytes / 4 /* 25 % */));
} }
int tree_drop(MDBX_cursor *mc, const bool may_have_tables) { int tree_drop(MDBX_cursor *mc, const bool may_have_tables) {
@ -56,7 +55,7 @@ int tree_drop(MDBX_cursor *mc, const bool may_have_tables) {
if (!(may_have_tables | mc->tree->large_pages)) if (!(may_have_tables | mc->tree->large_pages))
cursor_pop(mc); cursor_pop(mc);
rc = pnl_need(&txn->tw.retired_pages, rc = pnl_need(&txn->wr.retired_pages,
(size_t)mc->tree->branch_pages + (size_t)mc->tree->leaf_pages + (size_t)mc->tree->large_pages); (size_t)mc->tree->branch_pages + (size_t)mc->tree->leaf_pages + (size_t)mc->tree->large_pages);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
goto bailout; goto bailout;
@ -446,8 +445,8 @@ static int page_merge(MDBX_cursor *csrc, MDBX_cursor *cdst) {
cASSERT(cdst, cdst->top > 0); cASSERT(cdst, cdst->top > 0);
cASSERT(cdst, cdst->top + 1 < cdst->tree->height || is_leaf(cdst->pg[cdst->tree->height - 1])); cASSERT(cdst, cdst->top + 1 < cdst->tree->height || is_leaf(cdst->pg[cdst->tree->height - 1]));
cASSERT(csrc, csrc->top + 1 < csrc->tree->height || is_leaf(csrc->pg[csrc->tree->height - 1])); cASSERT(csrc, csrc->top + 1 < csrc->tree->height || is_leaf(csrc->pg[csrc->tree->height - 1]));
cASSERT(cdst, cASSERT(cdst, cursor_dbi(csrc) == FREE_DBI || csrc->txn->env->options.prefer_waf_insteadof_balance ||
csrc->txn->env->options.prefer_waf_insteadof_balance || page_room(pdst) >= page_used(cdst->txn->env, psrc)); page_room(pdst) >= page_used(cdst->txn->env, psrc));
const int pagetype = page_type(psrc); const int pagetype = page_type(psrc);
/* Move all nodes from src to dst */ /* Move all nodes from src to dst */
@ -680,8 +679,18 @@ int tree_rebalance(MDBX_cursor *mc) {
const size_t minkeys = (pagetype & P_BRANCH) + (size_t)1; const size_t minkeys = (pagetype & P_BRANCH) + (size_t)1;
/* Pages emptier than this are candidates for merging. */ /* Pages emptier than this are candidates for merging. */
size_t room_threshold = size_t room_threshold = mc->txn->env->merge_threshold;
likely(mc->tree != &mc->txn->dbs[FREE_DBI]) ? mc->txn->env->merge_threshold : mc->txn->env->merge_threshold_gc; bool minimize_waf = mc->txn->env->options.prefer_waf_insteadof_balance;
if (unlikely(mc->tree == &mc->txn->dbs[FREE_DBI])) {
/* В случае GC всегда минимизируем WAF, а рыхлые страницы объединяем только при наличии запаса в gc_stockpile().
* Это позволяет уменьшить WAF и избавиться от лишних действий/циклов как при переработке GC,
* так и при возврате неиспользованных страниц. Сбалансированность b-tree при этом почти не деградирует,
* ибо добавление/удаление/обновление запиcей происходит почти всегда только по краям. */
minimize_waf = true;
room_threshold = page_space(mc->txn->env);
if (gc_stockpile(mc->txn) > (size_t)mc->tree->height + mc->tree->height)
room_threshold >>= 1;
}
const size_t numkeys = page_numkeys(tp); const size_t numkeys = page_numkeys(tp);
const size_t room = page_room(tp); const size_t room = page_room(tp);
@ -802,10 +811,26 @@ int tree_rebalance(MDBX_cursor *mc) {
const size_t right_room = right ? page_room(right) : 0; const size_t right_room = right ? page_room(right) : 0;
const size_t left_nkeys = left ? page_numkeys(left) : 0; const size_t left_nkeys = left ? page_numkeys(left) : 0;
const size_t right_nkeys = right ? page_numkeys(right) : 0; const size_t right_nkeys = right ? page_numkeys(right) : 0;
/* Нужно выбрать между правой и левой страницами для слияния текущей или перемещения узла в текущую.
* Таким образом, нужно выбрать один из четырёх вариантов согласно критериям.
*
* Если включен minimize_waf, то стараемся не вовлекать чистые страницы,
* пренебрегая идеальностью баланса ради уменьшения WAF.
*
* При этом отдельные варианты могут быть не доступны, либо "не сработать" из-за того что:
* - в какой-то branch-странице не хватит места из-за распространения/обновления первых ключей,
* которые хранятся в родительских страницах;
* - при включенном minimize_waf распространение/обновление первых ключей
* потребуется разделение какой-либо странице, что увеличит WAF и поэтому обесценивает дальнейшее
* следование minimize_waf. */
bool involve = !(left && right); bool involve = !(left && right);
retry: retry:
cASSERT(mc, mc->top > 0); cASSERT(mc, mc->top > 0);
if (left_room > room_threshold && left_room >= right_room && (is_modifable(mc->txn, left) || involve)) { const bool consider_left = left && (involve || is_modifable(mc->txn, left));
const bool consider_right = right && (involve || is_modifable(mc->txn, right));
if (consider_left && left_room > room_threshold && left_room >= right_room) {
/* try merge with left */ /* try merge with left */
cASSERT(mc, left_nkeys >= minkeys); cASSERT(mc, left_nkeys >= minkeys);
mn->pg[mn->top] = left; mn->pg[mn->top] = left;
@ -825,7 +850,7 @@ retry:
return rc; return rc;
} }
} }
if (right_room > room_threshold && (is_modifable(mc->txn, right) || involve)) { if (consider_right && right_room > room_threshold) {
/* try merge with right */ /* try merge with right */
cASSERT(mc, right_nkeys >= minkeys); cASSERT(mc, right_nkeys >= minkeys);
mn->pg[mn->top] = right; mn->pg[mn->top] = right;
@ -843,8 +868,7 @@ retry:
} }
} }
if (left_nkeys > minkeys && (right_nkeys <= left_nkeys || right_room >= left_room) && if (consider_left && left_nkeys > minkeys && (right_nkeys <= left_nkeys || right_room >= left_room)) {
(is_modifable(mc->txn, left) || involve)) {
/* try move from left */ /* try move from left */
mn->pg[mn->top] = left; mn->pg[mn->top] = left;
mn->ki[mn->top - 1] = (indx_t)(ki_pre_top - 1); mn->ki[mn->top - 1] = (indx_t)(ki_pre_top - 1);
@ -860,7 +884,7 @@ retry:
return rc; return rc;
} }
} }
if (right_nkeys > minkeys && (is_modifable(mc->txn, right) || involve)) { if (consider_right && right_nkeys > minkeys) {
/* try move from right */ /* try move from right */
mn->pg[mn->top] = right; mn->pg[mn->top] = right;
mn->ki[mn->top - 1] = (indx_t)(ki_pre_top + 1); mn->ki[mn->top - 1] = (indx_t)(ki_pre_top + 1);
@ -884,17 +908,20 @@ retry:
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
if (mc->txn->env->options.prefer_waf_insteadof_balance && likely(room_threshold > 0)) { if (minimize_waf && room_threshold > 0) {
/* Если включен minimize_waf, то переходим к попыткам слияния с сильно
* заполненными страницами до вовлечения чистых страниц (не измененных в этой транзакции) */
room_threshold = 0; room_threshold = 0;
goto retry; goto retry;
} }
if (likely(!involve) && if (!involve) {
(likely(mc->tree != &mc->txn->dbs[FREE_DBI]) || mc->txn->tw.loose_pages || MDBX_PNL_GETSIZE(mc->txn->tw.repnl) || /* Теперь допускаем вовлечение чистых страниц (не измененных в этой транзакции),
(mc->flags & z_gcu_preparation) || (mc->txn->flags & txn_gc_drained) || room_threshold)) { * что улучшает баланс в дереве, но увеличивает WAF. */
involve = true; involve = true;
goto retry; goto retry;
} }
if (likely(room_threshold > 0)) { if (room_threshold > 0) {
/* Если не нашли подходящей соседней, то допускаем слияние с сильно заполненными страницами */
room_threshold = 0; room_threshold = 0;
goto retry; goto retry;
} }
@ -905,8 +932,17 @@ retry:
return MDBX_PROBLEM; return MDBX_PROBLEM;
} }
static int do_page_split(MDBX_cursor *mc, const MDBX_val *const newkey, MDBX_val *const newdata, pgno_t newpgno,
const unsigned naf);
int page_split(MDBX_cursor *mc, const MDBX_val *const newkey, MDBX_val *const newdata, pgno_t newpgno, int page_split(MDBX_cursor *mc, const MDBX_val *const newkey, MDBX_val *const newdata, pgno_t newpgno,
const unsigned naf) { const unsigned naf) {
int rc = do_page_split(mc, newkey, newdata, newpgno, naf);
return rc;
}
int do_page_split(MDBX_cursor *mc, const MDBX_val *const newkey, MDBX_val *const newdata, pgno_t newpgno,
const unsigned naf) {
unsigned flags; unsigned flags;
int rc = MDBX_SUCCESS, foliage = 0; int rc = MDBX_SUCCESS, foliage = 0;
MDBX_env *const env = mc->txn->env; MDBX_env *const env = mc->txn->env;
@ -1228,6 +1264,7 @@ int page_split(MDBX_cursor *mc, const MDBX_val *const newkey, MDBX_val *const ne
/* root split? */ /* root split? */
prev_top += mc->top - top; prev_top += mc->top - top;
cASSERT(mn, prev_top <= mn->top && prev_top <= mc->top);
/* Right page might now have changed parent. /* Right page might now have changed parent.
* Check if left page also changed parent. */ * Check if left page also changed parent. */

View File

@ -38,8 +38,8 @@ void txl_free(txl_t txl) {
} }
static int txl_reserve(txl_t __restrict *__restrict ptxl, const size_t wanna) { static int txl_reserve(txl_t __restrict *__restrict ptxl, const size_t wanna) {
const size_t allocated = (size_t)MDBX_PNL_ALLOCLEN(*ptxl); const size_t allocated = txl_alloclen(*ptxl);
assert(MDBX_PNL_GETSIZE(*ptxl) <= txl_max && MDBX_PNL_ALLOCLEN(*ptxl) >= MDBX_PNL_GETSIZE(*ptxl)); assert(txl_size(*ptxl) <= txl_max && txl_alloclen(*ptxl) >= txl_size(*ptxl));
if (likely(allocated >= wanna)) if (likely(allocated >= wanna))
return MDBX_SUCCESS; return MDBX_SUCCESS;
@ -63,35 +63,35 @@ static int txl_reserve(txl_t __restrict *__restrict ptxl, const size_t wanna) {
return MDBX_ENOMEM; return MDBX_ENOMEM;
} }
static __always_inline int __must_check_result txl_need(txl_t __restrict *__restrict ptxl, size_t num) { static inline int __must_check_result txl_need(txl_t __restrict *__restrict ptxl, size_t num) {
assert(MDBX_PNL_GETSIZE(*ptxl) <= txl_max && MDBX_PNL_ALLOCLEN(*ptxl) >= MDBX_PNL_GETSIZE(*ptxl)); assert(txl_size(*ptxl) <= txl_max && txl_alloclen(*ptxl) >= txl_size(*ptxl));
assert(num <= PAGELIST_LIMIT); assert(num <= PAGELIST_LIMIT);
const size_t wanna = (size_t)MDBX_PNL_GETSIZE(*ptxl) + num; const size_t wanna = txl_size(*ptxl) + num;
return likely(MDBX_PNL_ALLOCLEN(*ptxl) >= wanna) ? MDBX_SUCCESS : txl_reserve(ptxl, wanna); return likely(txl_alloclen(*ptxl) >= wanna) ? MDBX_SUCCESS : txl_reserve(ptxl, wanna);
} }
static __always_inline void txl_xappend(txl_t __restrict txl, txnid_t id) { static inline void txl_append_prereserved(txl_t __restrict txl, txnid_t id) {
assert(MDBX_PNL_GETSIZE(txl) < MDBX_PNL_ALLOCLEN(txl)); assert(txl_size(txl) < txl_alloclen(txl));
txl[0] += 1; size_t end = txl[0] += 1;
MDBX_PNL_LAST(txl) = id; txl[end] = id;
} }
#define TXNID_SORT_CMP(first, last) ((first) > (last)) #define TXNID_SORT_CMP(first, last) ((first) > (last))
SORT_IMPL(txnid_sort, false, txnid_t, TXNID_SORT_CMP) SORT_IMPL(txnid_sort, false, txnid_t, TXNID_SORT_CMP)
void txl_sort(txl_t txl) { txnid_sort(MDBX_PNL_BEGIN(txl), MDBX_PNL_END(txl)); } void txl_sort(txl_t txl) { txnid_sort(txl + 1, txl + txl_size(txl) + 1); }
int __must_check_result txl_append(txl_t __restrict *ptxl, txnid_t id) { int __must_check_result txl_append(txl_t __restrict *ptxl, txnid_t id) {
if (unlikely(MDBX_PNL_GETSIZE(*ptxl) == MDBX_PNL_ALLOCLEN(*ptxl))) { if (unlikely(txl_size(*ptxl) == txl_alloclen(*ptxl))) {
int rc = txl_need(ptxl, txl_granulate); int rc = txl_need(ptxl, txl_granulate);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return rc; return rc;
} }
txl_xappend(*ptxl, id); txl_append_prereserved(*ptxl, id);
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
__hot bool txl_contain(const txl_t txl, txnid_t id) { __hot bool txl_contain(const txl_t txl, txnid_t id) {
const size_t len = MDBX_PNL_GETSIZE(txl); const size_t len = txl_size(txl);
for (size_t i = 1; i <= len; ++i) for (size_t i = 1; i <= len; ++i)
if (txl[i] == id) if (txl[i] == id)
return true; return true;

View File

@ -15,12 +15,16 @@ enum txl_rules {
txl_max = (1u << 26) - 2 - MDBX_ASSUME_MALLOC_OVERHEAD / sizeof(txnid_t) txl_max = (1u << 26) - 2 - MDBX_ASSUME_MALLOC_OVERHEAD / sizeof(txnid_t)
}; };
MDBX_INTERNAL txl_t txl_alloc(void); MDBX_MAYBE_UNUSED MDBX_INTERNAL txl_t txl_alloc(void);
MDBX_INTERNAL void txl_free(txl_t txl); MDBX_MAYBE_UNUSED MDBX_INTERNAL void txl_free(txl_t txl);
MDBX_INTERNAL int __must_check_result txl_append(txl_t __restrict *ptxl, txnid_t id); MDBX_MAYBE_UNUSED MDBX_INTERNAL int __must_check_result txl_append(txl_t __restrict *ptxl, txnid_t id);
MDBX_INTERNAL void txl_sort(txl_t txl); MDBX_MAYBE_UNUSED MDBX_INTERNAL void txl_sort(txl_t txl);
MDBX_INTERNAL bool txl_contain(const txl_t txl, txnid_t id); MDBX_MAYBE_UNUSED MDBX_INTERNAL bool txl_contain(const txl_t txl, txnid_t id);
static inline size_t txl_alloclen(const_txl_t txl) { return txl[-1]; }
static inline size_t txl_size(const_txl_t txl) { return txl[0]; }

371
src/txn-basal.c Normal file
View File

@ -0,0 +1,371 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2015-2025
#include "internals.h"
static int txn_write(MDBX_txn *txn, iov_ctx_t *ctx) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
dpl_t *const dl = dpl_sort(txn);
int rc = MDBX_SUCCESS;
size_t r, w, total_npages = 0;
for (w = 0, r = 1; r <= dl->length; ++r) {
page_t *dp = dl->items[r].ptr;
if (dp->flags & P_LOOSE) {
dl->items[++w] = dl->items[r];
continue;
}
unsigned npages = dpl_npages(dl, r);
total_npages += npages;
rc = iov_page(txn, ctx, dp, npages);
if (unlikely(rc != MDBX_SUCCESS))
return rc;
}
if (!iov_empty(ctx)) {
tASSERT(txn, rc == MDBX_SUCCESS);
rc = iov_write(ctx);
}
if (likely(rc == MDBX_SUCCESS) && ctx->fd == txn->env->lazy_fd) {
txn->env->lck->unsynced_pages.weak += total_npages;
if (!txn->env->lck->eoos_timestamp.weak)
txn->env->lck->eoos_timestamp.weak = osal_monotime();
}
txn->wr.dirtylist->pages_including_loose -= total_npages;
while (r <= dl->length)
dl->items[++w] = dl->items[r++];
dl->sorted = dpl_setlen(dl, w);
txn->wr.dirtyroom += r - 1 - w;
tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
tASSERT(txn, txn->wr.dirtylist->length == txn->wr.loose_count);
tASSERT(txn, txn->wr.dirtylist->pages_including_loose == txn->wr.loose_count);
return rc;
}
__cold MDBX_txn *txn_basal_create(const size_t max_dbi) {
MDBX_txn *txn = nullptr;
const intptr_t bitmap_bytes =
#if MDBX_ENABLE_DBI_SPARSE
ceil_powerof2(max_dbi, CHAR_BIT * sizeof(txn->dbi_sparse[0])) / CHAR_BIT;
#else
0;
#endif /* MDBX_ENABLE_DBI_SPARSE */
const size_t base = sizeof(MDBX_txn) + /* GC cursor */ sizeof(cursor_couple_t);
const size_t size =
base + bitmap_bytes +
max_dbi * (sizeof(txn->dbs[0]) + sizeof(txn->cursors[0]) + sizeof(txn->dbi_seqs[0]) + sizeof(txn->dbi_state[0]));
txn = osal_calloc(1, size);
if (unlikely(!txn))
return txn;
rkl_init(&txn->wr.gc.reclaimed);
rkl_init(&txn->wr.gc.ready4reuse);
rkl_init(&txn->wr.gc.comeback);
txn->dbs = ptr_disp(txn, base);
txn->cursors = ptr_disp(txn->dbs, max_dbi * sizeof(txn->dbs[0]));
txn->dbi_seqs = ptr_disp(txn->cursors, max_dbi * sizeof(txn->cursors[0]));
txn->dbi_state = ptr_disp(txn, size - max_dbi * sizeof(txn->dbi_state[0]));
#if MDBX_ENABLE_DBI_SPARSE
txn->dbi_sparse = ptr_disp(txn->dbi_state, -bitmap_bytes);
#endif /* MDBX_ENABLE_DBI_SPARSE */
txn->flags = MDBX_TXN_FINISHED;
txn->wr.retired_pages = pnl_alloc(MDBX_PNL_INITIAL);
txn->wr.repnl = pnl_alloc(MDBX_PNL_INITIAL);
if (unlikely(!txn->wr.retired_pages || !txn->wr.repnl)) {
txn_basal_destroy(txn);
txn = nullptr;
}
return txn;
}
__cold void txn_basal_destroy(MDBX_txn *txn) {
dpl_free(txn);
rkl_destroy(&txn->wr.gc.reclaimed);
rkl_destroy(&txn->wr.gc.ready4reuse);
rkl_destroy(&txn->wr.gc.comeback);
pnl_free(txn->wr.retired_pages);
pnl_free(txn->wr.spilled.list);
pnl_free(txn->wr.repnl);
osal_free(txn);
}
int txn_basal_start(MDBX_txn *txn, unsigned flags) {
MDBX_env *const env = txn->env;
txn->wr.troika = meta_tap(env);
const meta_ptr_t head = meta_recent(env, &txn->wr.troika);
uint64_t timestamp = 0;
/* coverity[array_null] */
while ("workaround for https://libmdbx.dqdkfa.ru/dead-github/issues/269") {
int err = coherency_fetch_head(txn, head, &timestamp);
if (likely(err == MDBX_SUCCESS))
break;
if (unlikely(err != MDBX_RESULT_TRUE))
return err;
}
eASSERT(env, meta_txnid(head.ptr_v) == txn->txnid);
txn->txnid = safe64_txnid_next(txn->txnid);
if (unlikely(txn->txnid > MAX_TXNID)) {
ERROR("txnid overflow, raise %d", MDBX_TXN_FULL);
return MDBX_TXN_FULL;
}
tASSERT(txn, txn->dbs[FREE_DBI].flags == MDBX_INTEGERKEY);
tASSERT(txn, check_table_flags(txn->dbs[MAIN_DBI].flags));
txn->flags = flags;
txn->nested = nullptr;
txn->wr.loose_pages = nullptr;
txn->wr.loose_count = 0;
#if MDBX_ENABLE_REFUND
txn->wr.loose_refund_wl = 0;
#endif /* MDBX_ENABLE_REFUND */
pnl_setsize(txn->wr.retired_pages, 0);
txn->wr.spilled.list = nullptr;
txn->wr.spilled.least_removed = 0;
txn->wr.gc.spent = 0;
tASSERT(txn, rkl_empty(&txn->wr.gc.reclaimed));
tASSERT(txn, rkl_empty(&txn->wr.gc.ready4reuse));
tASSERT(txn, rkl_empty(&txn->wr.gc.comeback));
txn->env->gc.detent = 0;
env->txn = txn;
return MDBX_SUCCESS;
}
int txn_basal_end(MDBX_txn *txn, unsigned mode) {
MDBX_env *const env = txn->env;
tASSERT(txn, (txn->flags & (MDBX_TXN_FINISHED | txn_may_have_cursors)) == 0 && txn->owner);
ENSURE(env, txn->txnid >= /* paranoia is appropriate here */ env->lck->cached_oldest.weak);
dxb_sanitize_tail(env, nullptr);
txn->flags = MDBX_TXN_FINISHED;
env->txn = nullptr;
pnl_free(txn->wr.spilled.list);
txn->wr.spilled.list = nullptr;
rkl_clear_and_shrink(&txn->wr.gc.reclaimed);
rkl_clear_and_shrink(&txn->wr.gc.ready4reuse);
rkl_clear_and_shrink(&txn->wr.gc.comeback);
eASSERT(env, txn->parent == nullptr);
pnl_shrink(&txn->wr.retired_pages);
pnl_shrink(&txn->wr.repnl);
if (!(env->flags & MDBX_WRITEMAP))
dpl_release_shadows(txn);
/* Export or close DBI handles created in this txn */
int err = dbi_update(txn, (mode & TXN_END_UPDATE) != 0);
if (unlikely(err != MDBX_SUCCESS)) {
ERROR("unexpected error %d during export the state of dbi-handles to env", err);
err = MDBX_PROBLEM;
}
/* The writer mutex was locked in mdbx_txn_begin. */
lck_txn_unlock(env);
return err;
}
int txn_basal_commit(MDBX_txn *txn, struct commit_timestamp *ts) {
MDBX_env *const env = txn->env;
tASSERT(txn, txn == env->basal_txn && !txn->parent && !txn->nested);
if (!txn->wr.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
} else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length == env->options.dp_limit);
}
if (txn->flags & txn_may_have_cursors)
txn_done_cursors(txn);
bool need_flush_for_nometasync = false;
const meta_ptr_t head = meta_recent(env, &txn->wr.troika);
const uint32_t meta_sync_txnid = atomic_load32(&env->lck->meta_sync_txnid, mo_Relaxed);
/* sync prev meta */
if (head.is_steady && meta_sync_txnid != (uint32_t)head.txnid) {
/* Исправление унаследованного от LMDB недочета:
*
* Всё хорошо, если все процессы работающие с БД не используют WRITEMAP.
* Тогда мета-страница (обновленная, но не сброшенная на диск) будет
* сохранена в результате fdatasync() при записи данных этой транзакции.
*
* Всё хорошо, если все процессы работающие с БД используют WRITEMAP
* без MDBX_AVOID_MSYNC.
* Тогда мета-страница (обновленная, но не сброшенная на диск) будет
* сохранена в результате msync() при записи данных этой транзакции.
*
* Если же в процессах работающих с БД используется оба метода, как sync()
* в режиме MDBX_WRITEMAP, так и записи через файловый дескриптор, то
* становится невозможным обеспечить фиксацию на диске мета-страницы
* предыдущей транзакции и данных текущей транзакции, за счет одной
* sync-операцией выполняемой после записи данных текущей транзакции.
* Соответственно, требуется явно обновлять мета-страницу, что полностью
* уничтожает выгоду от NOMETASYNC. */
const uint32_t txnid_dist = ((txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC) ? MDBX_NOMETASYNC_LAZY_FD
: MDBX_NOMETASYNC_LAZY_WRITEMAP;
/* Смысл "магии" в том, чтобы избежать отдельного вызова fdatasync()
* или msync() для гарантированной фиксации на диске мета-страницы,
* которая была "лениво" отправлена на запись в предыдущей транзакции,
* но не сброшена на диск из-за активного режима MDBX_NOMETASYNC. */
if (
#if defined(_WIN32) || defined(_WIN64)
!env->ioring.overlapped_fd &&
#endif
meta_sync_txnid == (uint32_t)head.txnid - txnid_dist)
need_flush_for_nometasync = true;
else {
int err = meta_sync(env, head);
if (unlikely(err != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "presync-meta", err);
return err;
}
}
}
if ((!txn->wr.dirtylist || txn->wr.dirtylist->length == 0) &&
(txn->flags & (MDBX_TXN_DIRTY | MDBX_TXN_SPILLS | MDBX_TXN_NOSYNC | MDBX_TXN_NOMETASYNC)) == 0 &&
!need_flush_for_nometasync && !head.is_steady && !AUDIT_ENABLED()) {
TXN_FOREACH_DBI_ALL(txn, i) { tASSERT(txn, !(txn->dbi_state[i] & DBI_DIRTY)); }
/* fast completion of pure transaction */
return MDBX_NOSUCCESS_PURE_COMMIT ? MDBX_RESULT_TRUE : MDBX_SUCCESS;
}
DEBUG("committing txn %" PRIaTXN " %p on env %p, root page %" PRIaPGNO "/%" PRIaPGNO, txn->txnid, (void *)txn,
(void *)env, txn->dbs[MAIN_DBI].root, txn->dbs[FREE_DBI].root);
if (txn->n_dbi > CORE_DBS) {
/* Update table root pointers */
cursor_couple_t cx;
int err = cursor_init(&cx.outer, txn, MAIN_DBI);
if (unlikely(err != MDBX_SUCCESS))
return err;
cx.outer.next = txn->cursors[MAIN_DBI];
txn->cursors[MAIN_DBI] = &cx.outer;
TXN_FOREACH_DBI_USER(txn, i) {
if ((txn->dbi_state[i] & DBI_DIRTY) == 0)
continue;
tree_t *const db = &txn->dbs[i];
DEBUG("update main's entry for sub-db %zu, mod_txnid %" PRIaTXN " -> %" PRIaTXN, i, db->mod_txnid, txn->txnid);
/* Может быть mod_txnid > front после коммита вложенных тразакций */
db->mod_txnid = txn->txnid;
MDBX_val data = {db, sizeof(tree_t)};
err = cursor_put(&cx.outer, &env->kvs[i].name, &data, N_TREE);
if (unlikely(err != MDBX_SUCCESS)) {
txn->cursors[MAIN_DBI] = cx.outer.next;
return err;
}
}
txn->cursors[MAIN_DBI] = cx.outer.next;
}
if (ts) {
ts->prep = osal_monotime();
ts->gc_cpu = osal_cputime(nullptr);
}
gcu_t gcu_ctx;
int rc = gc_put_init(txn, &gcu_ctx);
if (likely(rc == MDBX_SUCCESS))
rc = gc_update(txn, &gcu_ctx);
#if MDBX_ENABLE_BIGFOOT
const txnid_t commit_txnid = gcu_ctx.bigfoot;
if (commit_txnid > txn->txnid)
TRACE("use @%" PRIaTXN " (+%zu) for commit bigfoot-txn", commit_txnid, (size_t)(commit_txnid - txn->txnid));
#else
const txnid_t commit_txnid = txn->txnid;
#endif
gc_put_destroy(&gcu_ctx);
if (ts)
ts->gc_cpu = osal_cputime(nullptr) - ts->gc_cpu;
if (unlikely(rc != MDBX_SUCCESS))
return rc;
tASSERT(txn, txn->wr.loose_count == 0);
txn->dbs[FREE_DBI].mod_txnid = (txn->dbi_state[FREE_DBI] & DBI_DIRTY) ? txn->txnid : txn->dbs[FREE_DBI].mod_txnid;
txn->dbs[MAIN_DBI].mod_txnid = (txn->dbi_state[MAIN_DBI] & DBI_DIRTY) ? txn->txnid : txn->dbs[MAIN_DBI].mod_txnid;
if (ts) {
ts->gc = osal_monotime();
ts->audit = ts->gc;
}
if (AUDIT_ENABLED()) {
rc = audit_ex(txn, pnl_size(txn->wr.retired_pages), true);
if (ts)
ts->audit = osal_monotime();
if (unlikely(rc != MDBX_SUCCESS))
return rc;
}
if (txn->wr.dirtylist) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0 || MDBX_AVOID_MSYNC);
tASSERT(txn, txn->wr.loose_count == 0);
mdbx_filehandle_t fd =
#if defined(_WIN32) || defined(_WIN64)
env->ioring.overlapped_fd ? env->ioring.overlapped_fd : env->lazy_fd;
(void)need_flush_for_nometasync;
#else
(need_flush_for_nometasync || env->dsync_fd == INVALID_HANDLE_VALUE ||
txn->wr.dirtylist->length > env->options.writethrough_threshold ||
atomic_load64(&env->lck->unsynced_pages, mo_Relaxed))
? env->lazy_fd
: env->dsync_fd;
#endif /* Windows */
iov_ctx_t write_ctx;
rc = iov_init(txn, &write_ctx, txn->wr.dirtylist->length, txn->wr.dirtylist->pages_including_loose, fd, false);
if (unlikely(rc != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "iov-init", rc);
return rc;
}
rc = txn_write(txn, &write_ctx);
if (unlikely(rc != MDBX_SUCCESS)) {
ERROR("txn-%s: error %d", "write", rc);
return rc;
}
} else {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) != 0 && !MDBX_AVOID_MSYNC);
env->lck->unsynced_pages.weak += txn->wr.writemap_dirty_npages;
if (!env->lck->eoos_timestamp.weak)
env->lck->eoos_timestamp.weak = osal_monotime();
}
/* TODO: use ctx.flush_begin & ctx.flush_end for range-sync */
if (ts)
ts->write = osal_monotime();
meta_t meta;
memcpy(meta.magic_and_version, head.ptr_c->magic_and_version, 8);
meta.reserve16 = head.ptr_c->reserve16;
meta.validator_id = head.ptr_c->validator_id;
meta.extra_pagehdr = head.ptr_c->extra_pagehdr;
unaligned_poke_u64(4, meta.pages_retired,
unaligned_peek_u64(4, head.ptr_c->pages_retired) + pnl_size(txn->wr.retired_pages));
meta.geometry = txn->geo;
meta.trees.gc = txn->dbs[FREE_DBI];
meta.trees.main = txn->dbs[MAIN_DBI];
meta.canary = txn->canary;
memcpy(&meta.dxbid, &head.ptr_c->dxbid, sizeof(meta.dxbid));
meta.unsafe_sign = DATASIGN_NONE;
meta_set_txnid(env, &meta, commit_txnid);
rc = dxb_sync_locked(env, env->flags | txn->flags | txn_shrink_allowed, &meta, &txn->wr.troika);
if (ts)
ts->sync = osal_monotime();
if (unlikely(rc != MDBX_SUCCESS)) {
env->flags |= ENV_FATAL_ERROR;
ERROR("txn-%s: error %d", "sync", rc);
return rc;
}
return MDBX_SUCCESS;
}

599
src/txn-nested.c Normal file
View File

@ -0,0 +1,599 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2015-2025
#include "internals.h"
/* Merge pageset of the nested txn into parent */
static void txn_merge(MDBX_txn *const parent, MDBX_txn *const txn, const size_t parent_retired_len) {
tASSERT(txn, (txn->flags & MDBX_WRITEMAP) == 0);
dpl_t *const src = dpl_sort(txn);
/* Remove refunded pages from parent's dirty list */
dpl_t *const dst = dpl_sort(parent);
if (MDBX_ENABLE_REFUND) {
size_t n = dst->length;
while (n && dst->items[n].pgno >= parent->geo.first_unallocated) {
const unsigned npages = dpl_npages(dst, n);
page_shadow_release(txn->env, dst->items[n].ptr, npages);
--n;
}
parent->wr.dirtyroom += dst->sorted - n;
dst->sorted = dpl_setlen(dst, n);
tASSERT(parent, parent->wr.dirtyroom + parent->wr.dirtylist->length ==
(parent->parent ? parent->parent->wr.dirtyroom : parent->env->options.dp_limit));
}
/* Remove reclaimed pages from parent's dirty list */
const pnl_t reclaimed_list = parent->wr.repnl;
dpl_sift(parent, reclaimed_list, false);
/* Move retired pages from parent's dirty & spilled list to reclaimed */
size_t r, w, d, s, l;
for (r = w = parent_retired_len; ++r <= pnl_size(parent->wr.retired_pages);) {
const pgno_t pgno = parent->wr.retired_pages[r];
const size_t di = dpl_exist(parent, pgno);
const size_t si = !di ? spill_search(parent, pgno) : 0;
unsigned npages;
const char *kind;
if (di) {
page_t *dp = dst->items[di].ptr;
tASSERT(parent, (dp->flags & ~(P_LEAF | P_DUPFIX | P_BRANCH | P_LARGE | P_SPILLED)) == 0);
npages = dpl_npages(dst, di);
page_wash(parent, di, dp, npages);
kind = "dirty";
l = 1;
if (unlikely(npages > l)) {
/* OVERFLOW-страница могла быть переиспользована по частям. Тогда
* в retired-списке может быть только начало последовательности,
* а остаток растащен по dirty, spilled и reclaimed спискам. Поэтому
* переносим в reclaimed с проверкой на обрыв последовательности.
* В любом случае, все осколки будут учтены и отфильтрованы, т.е. если
* страница была разбита на части, то важно удалить dirty-элемент,
* а все осколки будут учтены отдельно. */
/* Список retired страниц не сортирован, но для ускорения сортировки
* дополняется в соответствии с MDBX_PNL_ASCENDING */
#if MDBX_PNL_ASCENDING
const size_t len = pnl_size(parent->wr.retired_pages);
while (r < len && parent->wr.retired_pages[r + 1] == pgno + l) {
++r;
if (++l == npages)
break;
}
#else
while (w > parent_retired_len && parent->wr.retired_pages[w - 1] == pgno + l) {
--w;
if (++l == npages)
break;
}
#endif
}
} else if (unlikely(si)) {
l = npages = 1;
spill_remove(parent, si, 1);
kind = "spilled";
} else {
parent->wr.retired_pages[++w] = pgno;
continue;
}
DEBUG("reclaim retired parent's %u -> %zu %s page %" PRIaPGNO, npages, l, kind, pgno);
int err = pnl_insert_span(&parent->wr.repnl, pgno, l);
ENSURE(txn->env, err == MDBX_SUCCESS);
}
pnl_setsize(parent->wr.retired_pages, w);
/* Filter-out parent spill list */
if (parent->wr.spilled.list && pnl_size(parent->wr.spilled.list) > 0) {
const pnl_t sl = spill_purge(parent);
size_t len = pnl_size(sl);
if (len) {
/* Remove refunded pages from parent's spill list */
if (MDBX_ENABLE_REFUND && MDBX_PNL_MOST(sl) >= (parent->geo.first_unallocated << 1)) {
#if MDBX_PNL_ASCENDING
size_t i = pnl_size(sl);
assert(MDBX_PNL_MOST(sl) == MDBX_PNL_LAST(sl));
do {
if ((sl[i] & 1) == 0)
DEBUG("refund parent's spilled page %" PRIaPGNO, sl[i] >> 1);
i -= 1;
} while (i && sl[i] >= (parent->geo.first_unallocated << 1));
pnl_setsize(sl, i);
#else
assert(MDBX_PNL_MOST(sl) == MDBX_PNL_FIRST(sl));
size_t i = 0;
do {
++i;
if ((sl[i] & 1) == 0)
DEBUG("refund parent's spilled page %" PRIaPGNO, sl[i] >> 1);
} while (i < len && sl[i + 1] >= (parent->geo.first_unallocated << 1));
pnl_setsize(sl, len -= i);
memmove(sl + 1, sl + 1 + i, len * sizeof(sl[0]));
#endif
}
tASSERT(txn, pnl_check_allocated(sl, (size_t)parent->geo.first_unallocated << 1));
/* Remove reclaimed pages from parent's spill list */
s = pnl_size(sl), r = pnl_size(reclaimed_list);
/* Scanning from end to begin */
while (s && r) {
if (sl[s] & 1) {
--s;
continue;
}
const pgno_t spilled_pgno = sl[s] >> 1;
const pgno_t reclaimed_pgno = reclaimed_list[r];
if (reclaimed_pgno != spilled_pgno) {
const bool cmp = MDBX_PNL_ORDERED(spilled_pgno, reclaimed_pgno);
s -= !cmp;
r -= cmp;
} else {
DEBUG("remove reclaimed parent's spilled page %" PRIaPGNO, reclaimed_pgno);
spill_remove(parent, s, 1);
--s;
--r;
}
}
/* Remove anything in our dirty list from parent's spill list */
/* Scanning spill list in descend order */
const intptr_t step = MDBX_PNL_ASCENDING ? -1 : 1;
s = MDBX_PNL_ASCENDING ? pnl_size(sl) : 1;
d = src->length;
while (d && (MDBX_PNL_ASCENDING ? s > 0 : s <= pnl_size(sl))) {
if (sl[s] & 1) {
s += step;
continue;
}
const pgno_t spilled_pgno = sl[s] >> 1;
const pgno_t dirty_pgno_form = src->items[d].pgno;
const unsigned npages = dpl_npages(src, d);
const pgno_t dirty_pgno_to = dirty_pgno_form + npages;
if (dirty_pgno_form > spilled_pgno) {
--d;
continue;
}
if (dirty_pgno_to <= spilled_pgno) {
s += step;
continue;
}
DEBUG("remove dirtied parent's spilled %u page %" PRIaPGNO, npages, dirty_pgno_form);
spill_remove(parent, s, 1);
s += step;
}
/* Squash deleted pagenums if we deleted any */
spill_purge(parent);
}
}
/* Remove anything in our spill list from parent's dirty list */
if (txn->wr.spilled.list) {
tASSERT(txn, pnl_check_allocated(txn->wr.spilled.list, (size_t)parent->geo.first_unallocated << 1));
dpl_sift(parent, txn->wr.spilled.list, true);
tASSERT(parent, parent->wr.dirtyroom + parent->wr.dirtylist->length ==
(parent->parent ? parent->parent->wr.dirtyroom : parent->env->options.dp_limit));
}
/* Find length of merging our dirty list with parent's and release
* filter-out pages */
for (l = 0, d = dst->length, s = src->length; d > 0 && s > 0;) {
page_t *sp = src->items[s].ptr;
tASSERT(parent, (sp->flags & ~(P_LEAF | P_DUPFIX | P_BRANCH | P_LARGE | P_LOOSE | P_SPILLED)) == 0);
const unsigned s_npages = dpl_npages(src, s);
const pgno_t s_pgno = src->items[s].pgno;
page_t *dp = dst->items[d].ptr;
tASSERT(parent, (dp->flags & ~(P_LEAF | P_DUPFIX | P_BRANCH | P_LARGE | P_SPILLED)) == 0);
const unsigned d_npages = dpl_npages(dst, d);
const pgno_t d_pgno = dst->items[d].pgno;
if (d_pgno >= s_pgno + s_npages) {
--d;
++l;
} else if (d_pgno + d_npages <= s_pgno) {
if (sp->flags != P_LOOSE) {
sp->txnid = parent->front_txnid;
sp->flags &= ~P_SPILLED;
}
--s;
++l;
} else {
dst->items[d--].ptr = nullptr;
page_shadow_release(txn->env, dp, d_npages);
}
}
assert(dst->sorted == dst->length);
tASSERT(parent, dst->detent >= l + d + s);
dst->sorted = l + d + s; /* the merged length */
while (s > 0) {
page_t *sp = src->items[s].ptr;
tASSERT(parent, (sp->flags & ~(P_LEAF | P_DUPFIX | P_BRANCH | P_LARGE | P_LOOSE | P_SPILLED)) == 0);
if (sp->flags != P_LOOSE) {
sp->txnid = parent->front_txnid;
sp->flags &= ~P_SPILLED;
}
--s;
}
/* Merge our dirty list into parent's, i.e. merge(dst, src) -> dst */
if (dst->sorted >= dst->length) {
/* from end to begin with dst extending */
for (l = dst->sorted, s = src->length, d = dst->length; s > 0 && d > 0;) {
if (unlikely(l <= d)) {
/* squash to get a gap of free space for merge */
for (r = w = 1; r <= d; ++r)
if (dst->items[r].ptr) {
if (w != r) {
dst->items[w] = dst->items[r];
dst->items[r].ptr = nullptr;
}
++w;
}
VERBOSE("squash to begin for extending-merge %zu -> %zu", d, w - 1);
d = w - 1;
continue;
}
assert(l > d);
if (dst->items[d].ptr) {
dst->items[l--] = (dst->items[d].pgno > src->items[s].pgno) ? dst->items[d--] : src->items[s--];
} else
--d;
}
if (s > 0) {
assert(l == s);
while (d > 0) {
assert(dst->items[d].ptr == nullptr);
--d;
}
do {
assert(l > 0);
dst->items[l--] = src->items[s--];
} while (s > 0);
} else {
assert(l == d);
while (l > 0) {
assert(dst->items[l].ptr != nullptr);
--l;
}
}
} else {
/* from begin to end with shrinking (a lot of new large/overflow pages) */
for (l = s = d = 1; s <= src->length && d <= dst->length;) {
if (unlikely(l >= d)) {
/* squash to get a gap of free space for merge */
for (r = w = dst->length; r >= d; --r)
if (dst->items[r].ptr) {
if (w != r) {
dst->items[w] = dst->items[r];
dst->items[r].ptr = nullptr;
}
--w;
}
VERBOSE("squash to end for shrinking-merge %zu -> %zu", d, w + 1);
d = w + 1;
continue;
}
assert(l < d);
if (dst->items[d].ptr) {
dst->items[l++] = (dst->items[d].pgno < src->items[s].pgno) ? dst->items[d++] : src->items[s++];
} else
++d;
}
if (s <= src->length) {
assert(dst->sorted - l == src->length - s);
while (d <= dst->length) {
assert(dst->items[d].ptr == nullptr);
--d;
}
do {
assert(l <= dst->sorted);
dst->items[l++] = src->items[s++];
} while (s <= src->length);
} else {
assert(dst->sorted - l == dst->length - d);
while (l <= dst->sorted) {
assert(l <= d && d <= dst->length && dst->items[d].ptr);
dst->items[l++] = dst->items[d++];
}
}
}
parent->wr.dirtyroom -= dst->sorted - dst->length;
assert(parent->wr.dirtyroom <= parent->env->options.dp_limit);
dpl_setlen(dst, dst->sorted);
parent->wr.dirtylru = txn->wr.dirtylru;
/* В текущем понимании выгоднее пересчитать кол-во страниц,
* чем подмешивать лишние ветвления и вычисления в циклы выше. */
dst->pages_including_loose = 0;
for (r = 1; r <= dst->length; ++r)
dst->pages_including_loose += dpl_npages(dst, r);
tASSERT(parent, dpl_check(parent));
dpl_free(txn);
if (txn->wr.spilled.list) {
if (parent->wr.spilled.list) {
/* Must not fail since space was preserved above. */
pnl_merge(parent->wr.spilled.list, txn->wr.spilled.list);
pnl_free(txn->wr.spilled.list);
} else {
parent->wr.spilled.list = txn->wr.spilled.list;
parent->wr.spilled.least_removed = txn->wr.spilled.least_removed;
}
tASSERT(parent, dpl_check(parent));
}
if (parent->wr.spilled.list) {
assert(pnl_check_allocated(parent->wr.spilled.list, (size_t)parent->geo.first_unallocated << 1));
if (pnl_size(parent->wr.spilled.list))
parent->flags |= MDBX_TXN_SPILLS;
}
}
int txn_nested_create(MDBX_txn *parent, const MDBX_txn_flags_t flags) {
if (parent->env->options.spill_parent4child_denominator) {
/* Spill dirty-pages of parent to provide dirtyroom for child txn */
int err =
txn_spill(parent, nullptr, parent->wr.dirtylist->length / parent->env->options.spill_parent4child_denominator);
if (unlikely(err != MDBX_SUCCESS))
return LOG_IFERR(err);
}
tASSERT(parent, audit_ex(parent, 0, false) == 0);
MDBX_txn *const txn = txn_alloc(flags, parent->env);
if (unlikely(!txn))
return LOG_IFERR(MDBX_ENOMEM);
tASSERT(parent, dpl_check(parent));
txn->txnid = parent->txnid;
txn->front_txnid = parent->front_txnid + 1;
txn->canary = parent->canary;
parent->flags |= MDBX_TXN_HAS_CHILD;
parent->nested = txn;
txn->parent = parent;
txn->env->txn = txn;
txn->owner = parent->owner;
txn->wr.troika = parent->wr.troika;
#if MDBX_ENABLE_DBI_SPARSE
txn->dbi_sparse = parent->dbi_sparse;
#endif /* MDBX_ENABLE_DBI_SPARSE */
txn->dbi_seqs = parent->dbi_seqs;
txn->geo = parent->geo;
int err = dpl_alloc(txn);
if (unlikely(err != MDBX_SUCCESS))
return LOG_IFERR(err);
const size_t len = pnl_size(parent->wr.repnl) + parent->wr.loose_count;
txn->wr.repnl = pnl_alloc((len > MDBX_PNL_INITIAL) ? len : MDBX_PNL_INITIAL);
if (unlikely(!txn->wr.repnl))
return LOG_IFERR(MDBX_ENOMEM);
/* Move loose pages to reclaimed list */
if (parent->wr.loose_count) {
do {
page_t *lp = parent->wr.loose_pages;
tASSERT(parent, lp->flags == P_LOOSE);
err = pnl_insert_span(&parent->wr.repnl, lp->pgno, 1);
if (unlikely(err != MDBX_SUCCESS))
return LOG_IFERR(err);
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *));
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
parent->wr.loose_pages = page_next(lp);
/* Remove from dirty list */
page_wash(parent, dpl_exist(parent, lp->pgno), lp, 1);
} while (parent->wr.loose_pages);
parent->wr.loose_count = 0;
#if MDBX_ENABLE_REFUND
parent->wr.loose_refund_wl = 0;
#endif /* MDBX_ENABLE_REFUND */
tASSERT(parent, dpl_check(parent));
}
#if MDBX_ENABLE_REFUND
txn->wr.loose_refund_wl = 0;
#endif /* MDBX_ENABLE_REFUND */
txn->wr.dirtyroom = parent->wr.dirtyroom;
txn->wr.dirtylru = parent->wr.dirtylru;
dpl_sort(parent);
if (parent->wr.spilled.list)
spill_purge(parent);
tASSERT(txn, pnl_alloclen(txn->wr.repnl) >= pnl_size(parent->wr.repnl));
memcpy(txn->wr.repnl, parent->wr.repnl, MDBX_PNL_SIZEOF(parent->wr.repnl));
/* coverity[assignment_where_comparison_intended] */
tASSERT(txn, pnl_check_allocated(txn->wr.repnl, (txn->geo.first_unallocated /* LY: intentional assignment
here, only for assertion */
= parent->geo.first_unallocated) -
MDBX_ENABLE_REFUND));
txn->wr.gc.spent = parent->wr.gc.spent;
rkl_init(&txn->wr.gc.comeback);
err = rkl_copy(&parent->wr.gc.reclaimed, &txn->wr.gc.reclaimed);
if (unlikely(err != MDBX_SUCCESS))
return err;
err = rkl_copy(&parent->wr.gc.ready4reuse, &txn->wr.gc.ready4reuse);
if (unlikely(err != MDBX_SUCCESS))
return err;
txn->wr.retired_pages = parent->wr.retired_pages;
parent->wr.retired_pages = (void *)(intptr_t)pnl_size(parent->wr.retired_pages);
txn->cursors[FREE_DBI] = nullptr;
txn->cursors[MAIN_DBI] = nullptr;
txn->dbi_state[FREE_DBI] = parent->dbi_state[FREE_DBI] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY);
txn->dbi_state[MAIN_DBI] = parent->dbi_state[MAIN_DBI] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY);
memset(txn->dbi_state + CORE_DBS, 0, (txn->n_dbi = parent->n_dbi) - CORE_DBS);
memcpy(txn->dbs, parent->dbs, sizeof(txn->dbs[0]) * CORE_DBS);
tASSERT(parent, parent->wr.dirtyroom + parent->wr.dirtylist->length ==
(parent->parent ? parent->parent->wr.dirtyroom : parent->env->options.dp_limit));
tASSERT(txn, txn->wr.dirtyroom + txn->wr.dirtylist->length ==
(txn->parent ? txn->parent->wr.dirtyroom : txn->env->options.dp_limit));
return txn_shadow_cursors(parent, MAIN_DBI);
}
void txn_nested_abort(MDBX_txn *nested) {
MDBX_txn *const parent = nested->parent;
tASSERT(nested, !(nested->flags & txn_may_have_cursors));
nested->signature = 0;
nested->owner = 0;
tASSERT(nested, rkl_empty(&nested->wr.gc.comeback));
rkl_destroy(&nested->wr.gc.reclaimed);
rkl_destroy(&nested->wr.gc.ready4reuse);
if (nested->wr.retired_pages) {
tASSERT(parent, pnl_size(nested->wr.retired_pages) >= (uintptr_t)parent->wr.retired_pages);
pnl_setsize(nested->wr.retired_pages, (uintptr_t)parent->wr.retired_pages);
parent->wr.retired_pages = nested->wr.retired_pages;
}
tASSERT(parent, dpl_check(parent));
tASSERT(parent, audit_ex(parent, 0, false) == 0);
dpl_release_shadows(nested);
dpl_free(nested);
pnl_free(nested->wr.repnl);
osal_free(nested);
}
int txn_nested_join(MDBX_txn *txn, struct commit_timestamp *ts) {
MDBX_env *const env = txn->env;
MDBX_txn *const parent = txn->parent;
tASSERT(txn, audit_ex(txn, 0, false) == 0);
eASSERT(env, txn != env->basal_txn);
eASSERT(env, parent->signature == txn_signature);
eASSERT(env, parent->nested == txn && (parent->flags & MDBX_TXN_HAS_CHILD) != 0);
eASSERT(env, dpl_check(txn));
if (txn->wr.dirtylist->length == 0 && !(txn->flags & MDBX_TXN_DIRTY) && parent->n_dbi == txn->n_dbi) {
VERBOSE("fast-complete pure nested txn %" PRIaTXN, txn->txnid);
tASSERT(txn, memcmp(&parent->geo, &txn->geo, sizeof(parent->geo)) == 0);
tASSERT(txn, memcmp(&parent->canary, &txn->canary, sizeof(parent->canary)) == 0);
tASSERT(txn, !txn->wr.spilled.list || pnl_size(txn->wr.spilled.list) == 0);
tASSERT(txn, txn->wr.loose_count == 0);
/* Update parent's DBs array */
eASSERT(env, parent->n_dbi == txn->n_dbi);
TXN_FOREACH_DBI_ALL(txn, dbi) {
tASSERT(txn, (txn->dbi_state[dbi] & (DBI_CREAT | DBI_DIRTY)) == 0);
if (txn->dbi_state[dbi] & DBI_FRESH) {
parent->dbs[dbi] = txn->dbs[dbi];
/* preserve parent's status */
const uint8_t state = txn->dbi_state[dbi] | DBI_FRESH;
DEBUG("dbi %zu dbi-state %s 0x%02x -> 0x%02x", dbi, (parent->dbi_state[dbi] != state) ? "update" : "still",
parent->dbi_state[dbi], state);
parent->dbi_state[dbi] = state;
}
}
return txn_end(txn, TXN_END_PURE_COMMIT | TXN_END_SLOT | TXN_END_FREE);
}
/* Preserve space for spill list to avoid parent's state corruption
* if allocation fails. */
const size_t parent_retired_len = (uintptr_t)parent->wr.retired_pages;
tASSERT(txn, parent_retired_len <= pnl_size(txn->wr.retired_pages));
const size_t retired_delta = pnl_size(txn->wr.retired_pages) - parent_retired_len;
if (retired_delta) {
int err = pnl_need(&txn->wr.repnl, retired_delta);
if (unlikely(err != MDBX_SUCCESS))
return err;
}
if (txn->wr.spilled.list) {
if (parent->wr.spilled.list) {
int err = pnl_need(&parent->wr.spilled.list, pnl_size(txn->wr.spilled.list));
if (unlikely(err != MDBX_SUCCESS))
return err;
}
spill_purge(txn);
}
if (unlikely(txn->wr.dirtylist->length + parent->wr.dirtylist->length > parent->wr.dirtylist->detent &&
!dpl_reserve(parent, txn->wr.dirtylist->length + parent->wr.dirtylist->length))) {
return MDBX_ENOMEM;
}
//-------------------------------------------------------------------------
parent->wr.retired_pages = txn->wr.retired_pages;
txn->wr.retired_pages = nullptr;
pnl_free(parent->wr.repnl);
parent->wr.repnl = txn->wr.repnl;
txn->wr.repnl = nullptr;
parent->wr.gc.spent = txn->wr.gc.spent;
rkl_destructive_move(&txn->wr.gc.reclaimed, &parent->wr.gc.reclaimed);
rkl_destructive_move(&txn->wr.gc.ready4reuse, &parent->wr.gc.ready4reuse);
tASSERT(txn, rkl_empty(&txn->wr.gc.comeback));
parent->geo = txn->geo;
parent->canary = txn->canary;
parent->flags |= txn->flags & MDBX_TXN_DIRTY;
/* Move loose pages to parent */
#if MDBX_ENABLE_REFUND
parent->wr.loose_refund_wl = txn->wr.loose_refund_wl;
#endif /* MDBX_ENABLE_REFUND */
parent->wr.loose_count = txn->wr.loose_count;
parent->wr.loose_pages = txn->wr.loose_pages;
if (txn->flags & txn_may_have_cursors)
/* Merge our cursors into parent's and close them */
txn_done_cursors(txn);
/* Update parent's DBs array */
eASSERT(env, parent->n_dbi == txn->n_dbi);
TXN_FOREACH_DBI_ALL(txn, dbi) {
if (txn->dbi_state[dbi] != (parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY))) {
eASSERT(env,
(txn->dbi_state[dbi] & (DBI_CREAT | DBI_FRESH | DBI_DIRTY)) != 0 ||
(txn->dbi_state[dbi] | DBI_STALE) == (parent->dbi_state[dbi] & ~(DBI_FRESH | DBI_CREAT | DBI_DIRTY)));
parent->dbs[dbi] = txn->dbs[dbi];
/* preserve parent's status */
const uint8_t state = txn->dbi_state[dbi] | (parent->dbi_state[dbi] & (DBI_CREAT | DBI_FRESH | DBI_DIRTY));
DEBUG("dbi %zu dbi-state %s 0x%02x -> 0x%02x", dbi, (parent->dbi_state[dbi] != state) ? "update" : "still",
parent->dbi_state[dbi], state);
parent->dbi_state[dbi] = state;
}
}
if (ts) {
ts->prep = osal_monotime();
ts->gc = /* no gc-update */ ts->prep;
ts->audit = /* no audit */ ts->gc;
ts->write = /* no write */ ts->audit;
ts->sync = /* no sync */ ts->write;
}
txn_merge(parent, txn, parent_retired_len);
tASSERT(parent, parent->flags & MDBX_TXN_HAS_CHILD);
parent->flags -= MDBX_TXN_HAS_CHILD;
env->txn = parent;
parent->nested = nullptr;
tASSERT(parent, dpl_check(parent));
#if MDBX_ENABLE_REFUND
txn_refund(parent);
if (ASSERT_ENABLED()) {
/* Check parent's loose pages not suitable for refund */
for (page_t *lp = parent->wr.loose_pages; lp; lp = page_next(lp)) {
tASSERT(parent, lp->pgno < parent->wr.loose_refund_wl && lp->pgno + 1 < parent->geo.first_unallocated);
MDBX_ASAN_UNPOISON_MEMORY_REGION(&page_next(lp), sizeof(page_t *));
VALGRIND_MAKE_MEM_DEFINED(&page_next(lp), sizeof(page_t *));
}
/* Check parent's reclaimed pages not suitable for refund */
if (pnl_size(parent->wr.repnl))
tASSERT(parent, MDBX_PNL_MOST(parent->wr.repnl) + 1 < parent->geo.first_unallocated);
}
#endif /* MDBX_ENABLE_REFUND */
txn->signature = 0;
osal_free(txn);
tASSERT(parent, audit_ex(parent, 0, false) == 0);
return MDBX_SUCCESS;
}

289
src/txn-ro.c Normal file
View File

@ -0,0 +1,289 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2015-2025
#include "internals.h"
static inline int txn_ro_rslot(MDBX_txn *txn) {
reader_slot_t *slot = txn->ro.slot;
STATIC_ASSERT(sizeof(uintptr_t) <= sizeof(slot->tid));
if (likely(slot)) {
if (likely(slot->pid.weak == txn->env->pid && slot->txnid.weak >= SAFE64_INVALID_THRESHOLD)) {
tASSERT(txn, slot->pid.weak == osal_getpid());
tASSERT(txn, slot->tid.weak == ((txn->env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self()));
return MDBX_SUCCESS;
}
return MDBX_BAD_RSLOT;
}
if (unlikely(!txn->env->lck_mmap.lck))
return MDBX_SUCCESS;
MDBX_env *const env = txn->env;
if (env->flags & ENV_TXKEY) {
eASSERT(env, !(env->flags & MDBX_NOSTICKYTHREADS));
slot = thread_rthc_get(env->me_txkey);
if (likely(slot)) {
if (likely(slot->pid.weak == env->pid && slot->txnid.weak >= SAFE64_INVALID_THRESHOLD)) {
tASSERT(txn, slot->pid.weak == osal_getpid());
tASSERT(txn, slot->tid.weak == ((env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self()));
txn->ro.slot = slot;
return MDBX_SUCCESS;
}
if (unlikely(slot->pid.weak) || !(globals.runtime_flags & MDBX_DBG_LEGACY_MULTIOPEN))
return MDBX_BAD_RSLOT;
thread_rthc_set(env->me_txkey, nullptr);
}
} else {
eASSERT(env, (env->flags & MDBX_NOSTICKYTHREADS));
}
bsr_t brs = mvcc_bind_slot(env);
if (likely(brs.err == MDBX_SUCCESS)) {
tASSERT(txn, brs.slot->pid.weak == osal_getpid());
tASSERT(txn, brs.slot->tid.weak == ((env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self()));
}
txn->ro.slot = brs.slot;
return brs.err;
}
static inline int txn_ro_seize(MDBX_txn *txn) {
/* Seek & fetch the last meta */
troika_t troika = meta_tap(txn->env);
uint64_t timestamp = 0;
size_t loop = 0;
do {
MDBX_env *const env = txn->env;
const meta_ptr_t head = likely(env->stuck_meta < 0) ? /* regular */ meta_recent(env, &troika)
: /* recovery mode */ meta_ptr(env, env->stuck_meta);
reader_slot_t *const r = txn->ro.slot;
if (likely(r != nullptr)) {
safe64_reset(&r->txnid, true);
atomic_store32(&r->snapshot_pages_used, head.ptr_v->geometry.first_unallocated, mo_Relaxed);
atomic_store64(&r->snapshot_pages_retired, unaligned_peek_u64_volatile(4, head.ptr_v->pages_retired), mo_Relaxed);
safe64_write(&r->txnid, head.txnid);
eASSERT(env, r->pid.weak == osal_getpid());
eASSERT(env, r->tid.weak == ((env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self()));
eASSERT(env, r->txnid.weak == head.txnid ||
(r->txnid.weak >= SAFE64_INVALID_THRESHOLD && head.txnid < env->lck->cached_oldest.weak));
atomic_store32(&env->lck->rdt_refresh_flag, true, mo_AcquireRelease);
} else {
/* exclusive mode without lck */
eASSERT(env, !env->lck_mmap.lck && env->lck == lckless_stub(env));
}
jitter4testing(true);
if (unlikely(meta_should_retry(env, &troika))) {
timestamp = 0;
continue;
}
/* Snap the state from current meta-head */
int err = coherency_fetch_head(txn, head, &timestamp);
jitter4testing(false);
if (unlikely(err != MDBX_SUCCESS)) {
if (err != MDBX_RESULT_TRUE)
return err;
continue;
}
const uint64_t snap_oldest = atomic_load64(&env->lck->cached_oldest, mo_AcquireRelease);
if (unlikely(txn->txnid < snap_oldest)) {
if (env->stuck_meta >= 0) {
ERROR("target meta-page %i is referenced to an obsolete MVCC-snapshot "
"%" PRIaTXN " < cached-oldest %" PRIaTXN,
env->stuck_meta, txn->txnid, snap_oldest);
return MDBX_MVCC_RETARDED;
}
continue;
}
if (!r || likely(txn->txnid == atomic_load64(&r->txnid, mo_Relaxed)))
return MDBX_SUCCESS;
} while (likely(++loop < 42));
ERROR("bailout waiting for valid snapshot (%s)", "meta-pages are too volatile");
return MDBX_PROBLEM;
}
int txn_ro_start(MDBX_txn *txn, unsigned flags) {
MDBX_env *const env = txn->env;
eASSERT(env, flags & MDBX_TXN_RDONLY);
eASSERT(env, (flags & ~(txn_ro_begin_flags | MDBX_WRITEMAP | MDBX_NOSTICKYTHREADS)) == 0);
txn->flags = flags;
int err = txn_ro_rslot(txn);
if (unlikely(err != MDBX_SUCCESS))
goto bailout;
STATIC_ASSERT(MDBX_TXN_RDONLY_PREPARE > MDBX_TXN_RDONLY);
reader_slot_t *r = txn->ro.slot;
if (flags & (MDBX_TXN_RDONLY_PREPARE - MDBX_TXN_RDONLY)) {
eASSERT(env, txn->txnid == 0);
eASSERT(env, txn->owner == 0);
eASSERT(env, txn->n_dbi == 0);
if (likely(r)) {
eASSERT(env, r->snapshot_pages_used.weak == 0);
eASSERT(env, r->txnid.weak >= SAFE64_INVALID_THRESHOLD);
atomic_store32(&r->snapshot_pages_used, 0, mo_Relaxed);
}
txn->flags = MDBX_TXN_RDONLY | MDBX_TXN_FINISHED;
return MDBX_SUCCESS;
}
txn->owner = likely(r) ? (uintptr_t)r->tid.weak : ((env->flags & MDBX_NOSTICKYTHREADS) ? 0 : osal_thread_self());
if ((env->flags & MDBX_NOSTICKYTHREADS) == 0 && env->txn && unlikely(env->basal_txn->owner == txn->owner) &&
(globals.runtime_flags & MDBX_DBG_LEGACY_OVERLAP) == 0) {
err = MDBX_TXN_OVERLAPPING;
goto bailout;
}
err = txn_ro_seize(txn);
if (unlikely(err != MDBX_SUCCESS))
goto bailout;
if (unlikely(txn->txnid < MIN_TXNID || txn->txnid > MAX_TXNID)) {
ERROR("%s", "environment corrupted by died writer, must shutdown!");
err = MDBX_CORRUPTED;
goto bailout;
}
return MDBX_SUCCESS;
bailout:
tASSERT(txn, err != MDBX_SUCCESS);
txn->txnid = INVALID_TXNID;
if (likely(txn->ro.slot))
safe64_reset(&txn->ro.slot->txnid, true);
return err;
}
int txn_ro_end(MDBX_txn *txn, unsigned mode) {
MDBX_env *const env = txn->env;
tASSERT(txn, (txn->flags & txn_may_have_cursors) == 0);
txn->n_dbi = 0; /* prevent further DBI activity */
if (txn->ro.slot) {
reader_slot_t *slot = txn->ro.slot;
if (unlikely(!env->lck))
txn->ro.slot = nullptr;
else {
eASSERT(env, slot->pid.weak == env->pid);
if (likely((txn->flags & MDBX_TXN_FINISHED) == 0)) {
if (likely((txn->flags & MDBX_TXN_PARKED) == 0)) {
ENSURE(env, txn->txnid >=
/* paranoia is appropriate here */ env->lck->cached_oldest.weak);
eASSERT(env, txn->txnid == slot->txnid.weak && slot->txnid.weak >= env->lck->cached_oldest.weak);
} else {
if ((mode & TXN_END_OPMASK) != TXN_END_OUSTED && safe64_read(&slot->tid) == MDBX_TID_TXN_OUSTED)
mode = (mode & ~TXN_END_OPMASK) | TXN_END_OUSTED;
do {
safe64_reset(&slot->txnid, false);
atomic_store64(&slot->tid, txn->owner, mo_AcquireRelease);
atomic_yield();
} while (
unlikely(safe64_read(&slot->txnid) < SAFE64_INVALID_THRESHOLD || safe64_read(&slot->tid) != txn->owner));
}
dxb_sanitize_tail(env, nullptr);
atomic_store32(&slot->snapshot_pages_used, 0, mo_Relaxed);
safe64_reset(&slot->txnid, true);
atomic_store32(&env->lck->rdt_refresh_flag, true, mo_Relaxed);
} else {
eASSERT(env, slot->pid.weak == env->pid);
eASSERT(env, slot->txnid.weak >= SAFE64_INVALID_THRESHOLD);
}
if (mode & TXN_END_SLOT) {
if ((env->flags & ENV_TXKEY) == 0)
atomic_store32(&slot->pid, 0, mo_Relaxed);
txn->ro.slot = nullptr;
}
}
}
#if defined(_WIN32) || defined(_WIN64)
if (txn->flags & txn_shrink_allowed)
imports.srwl_ReleaseShared(&env->remap_guard);
#endif
txn->flags = ((mode & TXN_END_OPMASK) != TXN_END_OUSTED) ? MDBX_TXN_RDONLY | MDBX_TXN_FINISHED
: MDBX_TXN_RDONLY | MDBX_TXN_FINISHED | MDBX_TXN_OUSTED;
txn->owner = 0;
if (mode & TXN_END_FREE) {
txn->signature = 0;
osal_free(txn);
}
return MDBX_SUCCESS;
}
int txn_ro_park(MDBX_txn *txn, bool autounpark) {
reader_slot_t *const rslot = txn->ro.slot;
tASSERT(txn, (txn->flags & (MDBX_TXN_FINISHED | MDBX_TXN_RDONLY | MDBX_TXN_PARKED)) == MDBX_TXN_RDONLY);
tASSERT(txn, txn->ro.slot->tid.weak < MDBX_TID_TXN_OUSTED);
if (unlikely((txn->flags & (MDBX_TXN_FINISHED | MDBX_TXN_RDONLY | MDBX_TXN_PARKED)) != MDBX_TXN_RDONLY))
return MDBX_BAD_TXN;
const uint32_t pid = atomic_load32(&rslot->pid, mo_Relaxed);
const uint64_t tid = atomic_load64(&rslot->tid, mo_Relaxed);
const uint64_t txnid = atomic_load64(&rslot->txnid, mo_Relaxed);
if (unlikely(pid != txn->env->pid)) {
ERROR("unexpected pid %u%s%u", pid, " != must ", txn->env->pid);
return MDBX_PROBLEM;
}
if (unlikely(tid != txn->owner || txnid != txn->txnid)) {
ERROR("unexpected thread-id 0x%" PRIx64 "%s0x%0zx"
" and/or txn-id %" PRIaTXN "%s%" PRIaTXN,
tid, " != must ", txn->owner, txnid, " != must ", txn->txnid);
return MDBX_BAD_RSLOT;
}
atomic_store64(&rslot->tid, MDBX_TID_TXN_PARKED, mo_AcquireRelease);
atomic_store32(&txn->env->lck->rdt_refresh_flag, true, mo_Relaxed);
txn->flags += autounpark ? MDBX_TXN_PARKED | MDBX_TXN_AUTOUNPARK : MDBX_TXN_PARKED;
return MDBX_SUCCESS;
}
int txn_ro_unpark(MDBX_txn *txn) {
if (unlikely((txn->flags & (MDBX_TXN_FINISHED | MDBX_TXN_HAS_CHILD | MDBX_TXN_RDONLY | MDBX_TXN_PARKED)) !=
(MDBX_TXN_RDONLY | MDBX_TXN_PARKED)))
return MDBX_BAD_TXN;
for (reader_slot_t *const rslot = txn->ro.slot; rslot; atomic_yield()) {
const uint32_t pid = atomic_load32(&rslot->pid, mo_Relaxed);
uint64_t tid = safe64_read(&rslot->tid);
uint64_t txnid = safe64_read(&rslot->txnid);
if (unlikely(pid != txn->env->pid)) {
ERROR("unexpected pid %u%s%u", pid, " != expected ", txn->env->pid);
return MDBX_PROBLEM;
}
if (unlikely(tid == MDBX_TID_TXN_OUSTED || txnid >= SAFE64_INVALID_THRESHOLD))
break;
if (unlikely(tid != MDBX_TID_TXN_PARKED || txnid != txn->txnid)) {
ERROR("unexpected thread-id 0x%" PRIx64 "%s0x%" PRIx64 " and/or txn-id %" PRIaTXN "%s%" PRIaTXN, tid, " != must ",
MDBX_TID_TXN_OUSTED, txnid, " != must ", txn->txnid);
break;
}
if (unlikely((txn->flags & MDBX_TXN_ERROR)))
break;
#if MDBX_64BIT_CAS
if (unlikely(!atomic_cas64(&rslot->tid, MDBX_TID_TXN_PARKED, txn->owner)))
continue;
#else
atomic_store32(&rslot->tid.high, (uint32_t)((uint64_t)txn->owner >> 32), mo_Relaxed);
if (unlikely(!atomic_cas32(&rslot->tid.low, (uint32_t)MDBX_TID_TXN_PARKED, (uint32_t)txn->owner))) {
atomic_store32(&rslot->tid.high, (uint32_t)(MDBX_TID_TXN_PARKED >> 32), mo_AcquireRelease);
continue;
}
#endif
txnid = safe64_read(&rslot->txnid);
tid = safe64_read(&rslot->tid);
if (unlikely(txnid != txn->txnid || tid != txn->owner)) {
ERROR("unexpected thread-id 0x%" PRIx64 "%s0x%zx"
" and/or txn-id %" PRIaTXN "%s%" PRIaTXN,
tid, " != must ", txn->owner, txnid, " != must ", txn->txnid);
break;
}
txn->flags &= ~(MDBX_TXN_PARKED | MDBX_TXN_AUTOUNPARK);
return MDBX_SUCCESS;
}
int err = txn_end(txn, TXN_END_OUSTED | TXN_END_RESET | TXN_END_UPDATE);
return err ? err : MDBX_OUSTED;
}

1015
src/txn.c

File diff suppressed because it is too large Load Diff

View File

@ -3,6 +3,17 @@
#include "internals.h" #include "internals.h"
MDBX_NOTHROW_CONST_FUNCTION MDBX_MAYBE_UNUSED MDBX_INTERNAL unsigned ceil_log2n(size_t value_uintptr) {
assert(value_uintptr > 0 && value_uintptr < INT32_MAX);
value_uintptr -= 1;
value_uintptr |= value_uintptr >> 1;
value_uintptr |= value_uintptr >> 2;
value_uintptr |= value_uintptr >> 4;
value_uintptr |= value_uintptr >> 8;
value_uintptr |= value_uintptr >> 16;
return log2n_powerof2(value_uintptr + 1);
}
MDBX_MAYBE_UNUSED MDBX_NOTHROW_CONST_FUNCTION MDBX_INTERNAL unsigned log2n_powerof2(size_t value_uintptr) { MDBX_MAYBE_UNUSED MDBX_NOTHROW_CONST_FUNCTION MDBX_INTERNAL unsigned log2n_powerof2(size_t value_uintptr) {
assert(value_uintptr > 0 && value_uintptr < INT32_MAX && is_powerof2(value_uintptr)); assert(value_uintptr > 0 && value_uintptr < INT32_MAX && is_powerof2(value_uintptr));
assert((value_uintptr & -(intptr_t)value_uintptr) == value_uintptr); assert((value_uintptr & -(intptr_t)value_uintptr) == value_uintptr);

View File

@ -58,6 +58,8 @@ MDBX_NOTHROW_CONST_FUNCTION MDBX_MAYBE_UNUSED static inline size_t ceil_powerof2
MDBX_NOTHROW_CONST_FUNCTION MDBX_MAYBE_UNUSED MDBX_INTERNAL unsigned log2n_powerof2(size_t value_uintptr); MDBX_NOTHROW_CONST_FUNCTION MDBX_MAYBE_UNUSED MDBX_INTERNAL unsigned log2n_powerof2(size_t value_uintptr);
MDBX_NOTHROW_CONST_FUNCTION MDBX_MAYBE_UNUSED MDBX_INTERNAL unsigned ceil_log2n(size_t value_uintptr);
MDBX_NOTHROW_CONST_FUNCTION MDBX_INTERNAL uint64_t rrxmrrxmsx_0(uint64_t v); MDBX_NOTHROW_CONST_FUNCTION MDBX_INTERNAL uint64_t rrxmrrxmsx_0(uint64_t v);
struct monotime_cache { struct monotime_cache {

View File

@ -3,11 +3,16 @@
#include "internals.h" #include "internals.h"
#if MDBX_VERSION_MAJOR != ${MDBX_VERSION_MAJOR} || MDBX_VERSION_MINOR != ${MDBX_VERSION_MINOR} #if !defined(MDBX_VERSION_UNSTABLE) && \
(MDBX_VERSION_MAJOR != ${MDBX_VERSION_MAJOR} || MDBX_VERSION_MINOR != ${MDBX_VERSION_MINOR})
#error "API version mismatch! Had `git fetch --tags` done?" #error "API version mismatch! Had `git fetch --tags` done?"
#endif #endif
static const char sourcery[] = MDBX_STRINGIFY(MDBX_BUILD_SOURCERY); static const char sourcery[] =
#ifdef MDBX_VERSION_UNSTABLE
"UNSTABLE@"
#endif
MDBX_STRINGIFY(MDBX_BUILD_SOURCERY);
__dll_export __dll_export
#ifdef __attribute_used__ #ifdef __attribute_used__

View File

@ -3,17 +3,6 @@
#include "internals.h" #include "internals.h"
typedef struct walk_ctx {
void *userctx;
walk_options_t options;
int deep;
walk_func *visitor;
MDBX_txn *txn;
MDBX_cursor *cursor;
} walk_ctx_t;
__cold static int walk_tbl(walk_ctx_t *ctx, walk_tbl_t *tbl);
static page_type_t walk_page_type(const page_t *mp) { static page_type_t walk_page_type(const page_t *mp) {
if (mp) if (mp)
switch (mp->flags & ~P_SPILLED) { switch (mp->flags & ~P_SPILLED) {
@ -41,7 +30,8 @@ static page_type_t walk_subpage_type(const page_t *sp) {
} }
/* Depth-first tree traversal. */ /* Depth-first tree traversal. */
__cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno, txnid_t parent_txnid) { __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno, txnid_t parent_txnid,
const pgno_t parent_pgno) {
assert(pgno != P_INVALID); assert(pgno != P_INVALID);
page_t *mp = nullptr; page_t *mp = nullptr;
int err = page_get(ctx->cursor, pgno, &mp, parent_txnid); int err = page_get(ctx->cursor, pgno, &mp, parent_txnid);
@ -90,7 +80,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
const size_t pagesize = pgno2bytes(ctx->txn->env, npages); const size_t pagesize = pgno2bytes(ctx->txn->env, npages);
const size_t over_unused = pagesize - over_payload - over_header; const size_t over_unused = pagesize - over_payload - over_header;
const int rc = ctx->visitor(large_pgno, npages, ctx->userctx, ctx->deep, tbl, pagesize, page_large, err, 1, const int rc = ctx->visitor(large_pgno, npages, ctx->userctx, ctx->deep, tbl, pagesize, page_large, err, 1,
over_payload, over_header, over_unused); over_payload, over_header, over_unused, pgno);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc; return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc;
payload_size += sizeof(pgno_t); payload_size += sizeof(pgno_t);
@ -159,7 +149,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
} }
const int rc = ctx->visitor(pgno, 0, ctx->userctx, ctx->deep + 1, tbl, node_data_size, subtype, err, nsubkeys, const int rc = ctx->visitor(pgno, 0, ctx->userctx, ctx->deep + 1, tbl, node_data_size, subtype, err, nsubkeys,
subpayload_size, subheader_size, subunused_size + subalign_bytes); subpayload_size, subheader_size, subunused_size + subalign_bytes, pgno);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc; return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc;
header_size += subheader_size; header_size += subheader_size;
@ -176,7 +166,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
} }
const int rc = ctx->visitor(pgno, 1, ctx->userctx, ctx->deep, tbl, ctx->txn->env->ps, type, err, nentries, const int rc = ctx->visitor(pgno, 1, ctx->userctx, ctx->deep, tbl, ctx->txn->env->ps, type, err, nentries,
payload_size, header_size, unused_size + align_bytes); payload_size, header_size, unused_size + align_bytes, parent_pgno);
if (unlikely(rc != MDBX_SUCCESS)) if (unlikely(rc != MDBX_SUCCESS))
return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc; return (rc == MDBX_RESULT_TRUE) ? MDBX_SUCCESS : rc;
@ -188,7 +178,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
if (type == page_branch) { if (type == page_branch) {
assert(err == MDBX_SUCCESS); assert(err == MDBX_SUCCESS);
ctx->deep += 1; ctx->deep += 1;
err = walk_pgno(ctx, tbl, node_pgno(node), mp->txnid); err = walk_pgno(ctx, tbl, node_pgno(node), mp->txnid, pgno);
ctx->deep -= 1; ctx->deep -= 1;
if (unlikely(err != MDBX_SUCCESS)) { if (unlikely(err != MDBX_SUCCESS)) {
if (err == MDBX_RESULT_TRUE) if (err == MDBX_RESULT_TRUE)
@ -236,7 +226,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
ctx->cursor = &ctx->cursor->subcur->cursor; ctx->cursor = &ctx->cursor->subcur->cursor;
ctx->deep += 1; ctx->deep += 1;
tbl->nested = &aligned_db; tbl->nested = &aligned_db;
err = walk_pgno(ctx, tbl, aligned_db.root, mp->txnid); err = walk_pgno(ctx, tbl, aligned_db.root, mp->txnid, 0);
tbl->nested = nullptr; tbl->nested = nullptr;
ctx->deep -= 1; ctx->deep -= 1;
subcur_t *inner_xcursor = container_of(ctx->cursor, subcur_t, cursor); subcur_t *inner_xcursor = container_of(ctx->cursor, subcur_t, cursor);
@ -251,7 +241,7 @@ __cold static int walk_pgno(walk_ctx_t *ctx, walk_tbl_t *tbl, const pgno_t pgno,
return MDBX_SUCCESS; return MDBX_SUCCESS;
} }
__cold static int walk_tbl(walk_ctx_t *ctx, walk_tbl_t *tbl) { __cold int walk_tbl(walk_ctx_t *ctx, walk_tbl_t *tbl) {
tree_t *const db = tbl->internal; tree_t *const db = tbl->internal;
if (unlikely(db->root == P_INVALID)) if (unlikely(db->root == P_INVALID))
return MDBX_SUCCESS; /* empty db */ return MDBX_SUCCESS; /* empty db */
@ -268,7 +258,7 @@ __cold static int walk_tbl(walk_ctx_t *ctx, walk_tbl_t *tbl) {
couple.outer.next = ctx->cursor; couple.outer.next = ctx->cursor;
couple.outer.top_and_flags = z_disable_tree_search_fastpath; couple.outer.top_and_flags = z_disable_tree_search_fastpath;
ctx->cursor = &couple.outer; ctx->cursor = &couple.outer;
rc = walk_pgno(ctx, tbl, db->root, db->mod_txnid ? db->mod_txnid : ctx->txn->txnid); rc = walk_pgno(ctx, tbl, db->root, db->mod_txnid ? db->mod_txnid : ctx->txn->txnid, 0);
ctx->cursor = couple.outer.next; ctx->cursor = couple.outer.next;
return rc; return rc;
} }

View File

@ -13,8 +13,19 @@ typedef struct walk_tbl {
typedef int walk_func(const size_t pgno, const unsigned number, void *const ctx, const int deep, typedef int walk_func(const size_t pgno, const unsigned number, void *const ctx, const int deep,
const walk_tbl_t *table, const size_t page_size, const page_type_t page_type, const walk_tbl_t *table, const size_t page_size, const page_type_t page_type,
const MDBX_error_t err, const size_t nentries, const size_t payload_bytes, const MDBX_error_t err, const size_t nentries, const size_t payload_bytes,
const size_t header_bytes, const size_t unused_bytes); const size_t header_bytes, const size_t unused_bytes, const size_t parent_pgno);
typedef enum walk_options { dont_check_keys_ordering = 1 } walk_options_t; typedef enum walk_options { dont_check_keys_ordering = 1 } walk_options_t;
MDBX_INTERNAL int walk_pages(MDBX_txn *txn, walk_func *visitor, void *user, walk_options_t options); MDBX_INTERNAL int walk_pages(MDBX_txn *txn, walk_func *visitor, void *user, walk_options_t options);
typedef struct walk_ctx {
void *userctx;
walk_options_t options;
int deep;
walk_func *visitor;
MDBX_txn *txn;
MDBX_cursor *cursor;
} walk_ctx_t;
MDBX_INTERNAL int walk_tbl(walk_ctx_t *ctx, walk_tbl_t *tbl);

View File

@ -218,27 +218,24 @@ else()
set_tests_properties(smoke PROPERTIES TIMEOUT 600 RUN_SERIAL OFF) set_tests_properties(smoke PROPERTIES TIMEOUT 600 RUN_SERIAL OFF)
if(MDBX_BUILD_TOOLS) if(MDBX_BUILD_TOOLS)
add_test(NAME smoke_chk COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv smoke.db) add_test(NAME smoke_chk COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv smoke.db)
set_tests_properties( set_tests_properties(smoke_chk PROPERTIES
smoke_chk DEPENDS smoke
PROPERTIES DEPENDS TIMEOUT 60
smoke FAIL_REGULAR_EXPRESSION "cooperative mode"
TIMEOUT REQUIRED_FILES smoke.db)
60
FAIL_REGULAR_EXPRESSION
"cooperative mode"
REQUIRED_FILES
smoke.db)
add_test(NAME smoke_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv smoke.db-copy) add_test(NAME smoke_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv smoke.db-copy)
set_tests_properties( set_tests_properties(smoke_chk_copy PROPERTIES
smoke_chk_copy DEPENDS smoke
PROPERTIES DEPENDS TIMEOUT 60
smoke FAIL_REGULAR_EXPRESSION "cooperative mode"
TIMEOUT REQUIRED_FILES smoke.db-copy)
60
FAIL_REGULAR_EXPRESSION add_test(NAME smoke_copy_asis COMMAND ${MDBX_OUTPUT_DIR}/mdbx_copy -f smoke.db copy_asis.db)
"cooperative mode" set_tests_properties(smoke_copy_asis PROPERTIES DEPENDS smoke TIMEOUT 60 REQUIRED_FILES smoke.db)
REQUIRED_FILES
smoke.db-copy) add_test(NAME smoke_copy_compactify COMMAND ${MDBX_OUTPUT_DIR}/mdbx_copy -f -c smoke.db copy_compactify.db)
set_tests_properties(smoke_copy_compactify PROPERTIES DEPENDS smoke TIMEOUT 60 REQUIRED_FILES smoke.db)
endif() endif()
add_test( add_test(
@ -252,16 +249,11 @@ else()
set_tests_properties(dupsort_writemap_chk PROPERTIES DEPENDS dupsort_writemap TIMEOUT 60 REQUIRED_FILES set_tests_properties(dupsort_writemap_chk PROPERTIES DEPENDS dupsort_writemap TIMEOUT 60 REQUIRED_FILES
dupsort_writemap.db) dupsort_writemap.db)
add_test(NAME dupsort_writemap_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvvc dupsort_writemap.db-copy) add_test(NAME dupsort_writemap_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvvc dupsort_writemap.db-copy)
set_tests_properties( set_tests_properties(dupsort_writemap_chk_copy PROPERTIES
dupsort_writemap_chk_copy DEPENDS dupsort_writemap
PROPERTIES DEPENDS TIMEOUT 60
dupsort_writemap FAIL_REGULAR_EXPRESSION "monopolistic mode"
TIMEOUT REQUIRED_FILES dupsort_writemap.db-copy)
60
FAIL_REGULAR_EXPRESSION
"monopolistic mode"
REQUIRED_FILES
dupsort_writemap.db-copy)
endif() endif()
add_test(NAME uniq_nested add_test(NAME uniq_nested
@ -270,27 +262,17 @@ else()
set_tests_properties(uniq_nested PROPERTIES TIMEOUT 1800 RUN_SERIAL OFF) set_tests_properties(uniq_nested PROPERTIES TIMEOUT 1800 RUN_SERIAL OFF)
if(MDBX_BUILD_TOOLS) if(MDBX_BUILD_TOOLS)
add_test(NAME uniq_nested_chk COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvvw uniq_nested.db) add_test(NAME uniq_nested_chk COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvvw uniq_nested.db)
set_tests_properties( set_tests_properties(uniq_nested_chk PROPERTIES
uniq_nested_chk DEPENDS uniq_nested
PROPERTIES DEPENDS TIMEOUT 60
uniq_nested FAIL_REGULAR_EXPRESSION "cooperative mode"
TIMEOUT REQUIRED_FILES uniq_nested.db)
60
FAIL_REGULAR_EXPRESSION
"cooperative mode"
REQUIRED_FILES
uniq_nested.db)
add_test(NAME uniq_nested_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv uniq_nested.db-copy) add_test(NAME uniq_nested_chk_copy COMMAND ${MDBX_OUTPUT_DIR}/mdbx_chk -nvv uniq_nested.db-copy)
set_tests_properties( set_tests_properties(uniq_nested_chk_copy PROPERTIES
uniq_nested_chk_copy DEPENDS uniq_nested
PROPERTIES DEPENDS TIMEOUT 60
uniq_nested FAIL_REGULAR_EXPRESSION "cooperative mode"
TIMEOUT REQUIRED_FILES uniq_nested.db-copy)
60
FAIL_REGULAR_EXPRESSION
"cooperative mode"
REQUIRED_FILES
uniq_nested.db-copy)
endif() endif()
if(NOT SUBPROJECT) if(NOT SUBPROJECT)
@ -298,6 +280,7 @@ else()
add_extra_test(upsert_alldups SOURCE extra/upsert_alldups.c) add_extra_test(upsert_alldups SOURCE extra/upsert_alldups.c)
add_extra_test(dupfix_addodd SOURCE extra/dupfix_addodd.c) add_extra_test(dupfix_addodd SOURCE extra/dupfix_addodd.c)
endif() endif()
add_extra_test(details_rkl SOURCE extra/details_rkl.c)
if(MDBX_BUILD_CXX) if(MDBX_BUILD_CXX)
if(NOT WIN32 OR NOT MDBX_CXX_STANDARD LESS 17) if(NOT WIN32 OR NOT MDBX_CXX_STANDARD LESS 17)
add_extra_test(cursor_closing TIMEOUT 10800) add_extra_test(cursor_closing TIMEOUT 10800)

View File

@ -3,36 +3,62 @@
# Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> # Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru>
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
TEST="./test/stochastic.sh --skip-make --db-upto-gb 32" TMUX=tmux
DIR="$(dirname ${BASH_SOURCE[0]})"
TEST="${DIR}/stochastic.sh --skip-make --db-upto-gb 32"
PREFIX="/dev/shm/mdbxtest-" PREFIX="/dev/shm/mdbxtest-"
tmux kill-session -t mdbx NUMACTL="$(which numactl 2>-)"
NUMALIST=()
NUMAIDX=0
if [ -n "${NUMACTL}" -a $(${NUMACTL} --hardware | grep 'node [0-9]\+ cpus' | wc -l) -gt 1 ]; then
NUMALIST=($(${NUMACTL} --hardware | grep 'node [0-9]\+ cpus' | cut -d ' ' -f 2))
fi
function test_numacycle {
NUMAIDX=$((NUMAIDX + 1))
if [ ${NUMAIDX} -ge ${#NUMALIST[@]} ]; then
NUMAIDX=0
fi
}
function test_numanode {
if [[ ${#NUMALIST[@]} > 1 ]]; then
echo "${TEST} --numa ${NUMALIST[$NUMAIDX]}"
else
echo "${TEST}"
fi
}
${TMUX} kill-session -t mdbx
rm -rf ${PREFIX}* rm -rf ${PREFIX}*
# git clean -x -f -d && make test-assertions # git clean -x -f -d && make test-assertions
tmux -f ./test/tmux.conf new-session -d -s mdbx htop ${TMUX} -f "${DIR}/tmux.conf" new-session -d -s mdbx htop
W=0 W=0
for ps in min 4k max; do for ps in min 4k max; do
for from in 1 30000; do for from in 1 30000; do
for n in 0 1 2 3; do for n in 0 1 2 3; do
CMD="${TEST} --delay $((n * 7)) --page-size ${ps} --from ${from} --dir ${PREFIX}page-${ps}.from-${from}.${n}" CMD="$(test_numanode) --delay $((n * 7)) --page-size ${ps} --from ${from} --dir ${PREFIX}page-${ps}.from-${from}.${n}"
if [ $n -eq 0 ]; then if [ $n -eq 0 ]; then
tmux new-window -t mdbx:$((++W)) -n "page-${ps}.from-${from}" -k -d "$CMD" ${TMUX} new-window -t mdbx:$((++W)) -n "page-${ps}.from-${from}" -k -d "$CMD"
tmux select-layout -E tiled ${TMUX} select-layout -E tiled
else else
tmux split-window -t mdbx:$W -l 20% -d $CMD ${TMUX} split-window -t mdbx:$W -l 20% -d $CMD
fi fi
test_numacycle
done done
for n in 0 1 2 3; do for n in 0 1 2 3; do
CMD="${TEST} --delay $((3 + n * 7)) --extra --page-size ${ps} --from ${from} --dir ${PREFIX}page-${ps}.from-${from}.${n}-extra" CMD="$(test_numanode) --delay $((3 + n * 7)) --extra --page-size ${ps} --from ${from} --dir ${PREFIX}page-${ps}.from-${from}.${n}-extra"
if [ $n -eq 0 ]; then if [ $n -eq 0 ]; then
tmux new-window -t mdbx:$((++W)) -n "page-${ps}.from-${from}-extra" -k -d "$CMD" ${TMUX} new-window -t mdbx:$((++W)) -n "page-${ps}.from-${from}-extra" -k -d "$CMD"
tmux select-layout -E tiled ${TMUX} select-layout -E tiled
else else
tmux split-window -t mdbx:$W -l 20% -d $CMD ${TMUX} split-window -t mdbx:$W -l 20% -d $CMD
fi fi
test_numacycle
done done
done done
done done
tmux attach -t mdbx ${TMUX} attach -t mdbx

View File

@ -72,6 +72,7 @@ void configure_actor(unsigned &last_space_id, const actor_testcase testcase, con
log_trace("configure_actor: space %lu for %s", space_id, testcase2str(testcase)); log_trace("configure_actor: space %lu for %s", space_id, testcase2str(testcase));
global::actors.emplace_back(actor_config(testcase, params, unsigned(space_id), wait4id)); global::actors.emplace_back(actor_config(testcase, params, unsigned(space_id), wait4id));
global::databases.insert(params.pathname_db); global::databases.insert(params.pathname_db);
params.prng_seed += bleach64(space_id);
} }
void testcase_setup(const char *casename, const actor_params &params, unsigned &last_space_id) { void testcase_setup(const char *casename, const actor_params &params, unsigned &last_space_id) {

View File

@ -15,12 +15,18 @@ public:
REGISTER_TESTCASE(copy); REGISTER_TESTCASE(copy);
void testcase_copy::copy_db(const bool with_compaction) { void testcase_copy::copy_db(const bool with_compaction) {
int err = mdbx_env_delete(copy_pathname.c_str(), MDBX_ENV_JUST_DELETE); int err;
const bool overwrite = flipcoin();
if (!overwrite) {
err = mdbx_env_delete(copy_pathname.c_str(), MDBX_ENV_JUST_DELETE);
if (err != MDBX_SUCCESS && err != MDBX_RESULT_TRUE) if (err != MDBX_SUCCESS && err != MDBX_RESULT_TRUE)
failure_perror("osal_removefile()", err); failure_perror("osal_removefile()", err);
}
if (flipcoin()) { if (flipcoin()) {
err = mdbx_env_copy(db_guard.get(), copy_pathname.c_str(), with_compaction ? MDBX_CP_COMPACT : MDBX_CP_DEFAULTS); err = mdbx_env_copy(db_guard.get(), copy_pathname.c_str(),
(with_compaction ? MDBX_CP_COMPACT : MDBX_CP_DEFAULTS) |
(overwrite ? MDBX_CP_OVERWRITE : MDBX_CP_DEFAULTS));
log_verbose("mdbx_env_copy(%s), err %d", with_compaction ? "true" : "false", err); log_verbose("mdbx_env_copy(%s), err %d", with_compaction ? "true" : "false", err);
if (unlikely(err != MDBX_SUCCESS)) if (unlikely(err != MDBX_SUCCESS))
failure_perror(with_compaction ? "mdbx_env_copy(MDBX_CP_COMPACT)" : "mdbx_env_copy(MDBX_CP_ASIS)", err); failure_perror(with_compaction ? "mdbx_env_copy(MDBX_CP_COMPACT)" : "mdbx_env_copy(MDBX_CP_ASIS)", err);
@ -31,11 +37,11 @@ void testcase_copy::copy_db(const bool with_compaction) {
const bool dynsize = flipcoin(); const bool dynsize = flipcoin();
const bool flush = flipcoin(); const bool flush = flipcoin();
const bool enable_renew = flipcoin(); const bool enable_renew = flipcoin();
const MDBX_copy_flags_t flags = (with_compaction ? MDBX_CP_COMPACT : MDBX_CP_DEFAULTS) | const MDBX_copy_flags_t flags =
(with_compaction ? MDBX_CP_COMPACT : MDBX_CP_DEFAULTS) |
(dynsize ? MDBX_CP_FORCE_DYNAMIC_SIZE : MDBX_CP_DEFAULTS) | (dynsize ? MDBX_CP_FORCE_DYNAMIC_SIZE : MDBX_CP_DEFAULTS) |
(throttle ? MDBX_CP_THROTTLE_MVCC : MDBX_CP_DEFAULTS) | (throttle ? MDBX_CP_THROTTLE_MVCC : MDBX_CP_DEFAULTS) | (flush ? MDBX_CP_DEFAULTS : MDBX_CP_DONT_FLUSH) |
(flush ? MDBX_CP_DEFAULTS : MDBX_CP_DONT_FLUSH) | (enable_renew ? MDBX_CP_RENEW_TXN : MDBX_CP_DEFAULTS) | (overwrite ? MDBX_CP_OVERWRITE : MDBX_CP_DEFAULTS);
(enable_renew ? MDBX_CP_RENEW_TXN : MDBX_CP_DEFAULTS);
txn_begin(ro); txn_begin(ro);
err = mdbx_txn_copy2pathname(txn_guard.get(), copy_pathname.c_str(), flags); err = mdbx_txn_copy2pathname(txn_guard.get(), copy_pathname.c_str(), flags);
log_verbose("mdbx_txn_copy2pathname(flags=0x%X), err %d", flags, err); log_verbose("mdbx_txn_copy2pathname(flags=0x%X), err %d", flags, err);

View File

@ -23,7 +23,13 @@
#define RELIEF_FACTOR 1 #define RELIEF_FACTOR 1
#endif #endif
#define NN (1000 / RELIEF_FACTOR) static const auto NN = 1000u / RELIEF_FACTOR;
#if defined(__cpp_lib_latch) && __cpp_lib_latch >= 201907L
static const auto N = std::min(17u, std::thread::hardware_concurrency());
#else
static const auto N = 3u;
#endif
static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg, static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg,
unsigned length) noexcept { unsigned length) noexcept {
@ -107,6 +113,7 @@ bool case0(mdbx::env env) {
* 4. Ждем завершения фоновых потоков. * 4. Ждем завершения фоновых потоков.
* 5. Закрываем оставшиеся курсоры и закрываем БД. */ * 5. Закрываем оставшиеся курсоры и закрываем БД. */
size_t global_seed = size_t(std::chrono::high_resolution_clock::now().time_since_epoch().count());
thread_local size_t salt; thread_local size_t salt;
static size_t prng() { static size_t prng() {
@ -262,7 +269,7 @@ void case1_write_cycle(mdbx::txn_managed txn, std::deque<mdbx::map_handle> &dbi,
pre.unbind(); pre.unbind();
if (!pre.txn()) if (!pre.txn())
pre.bind(txn, dbi[prng(dbi.size())]); pre.bind(txn, dbi[prng(dbi.size())]);
for (auto i = 0; i < NN; ++i) { for (auto i = 0u; i < NN; ++i) {
auto k = mdbx::default_buffer::wrap(prng(NN)); auto k = mdbx::default_buffer::wrap(prng(NN));
auto v = mdbx::default_buffer::wrap(prng(NN)); auto v = mdbx::default_buffer::wrap(prng(NN));
if (pre.find_multivalue(k, v, false)) if (pre.find_multivalue(k, v, false))
@ -284,7 +291,16 @@ void case1_write_cycle(mdbx::txn_managed txn, std::deque<mdbx::map_handle> &dbi,
} }
bool case1_thread(mdbx::env env, std::deque<mdbx::map_handle> dbi, mdbx::cursor pre) { bool case1_thread(mdbx::env env, std::deque<mdbx::map_handle> dbi, mdbx::cursor pre) {
salt = size_t(std::chrono::high_resolution_clock::now().time_since_epoch().count()); #if defined(__cpp_lib_latch) && __cpp_lib_latch >= 201907L
mdbx::error::success_or_throw(mdbx_txn_lock(env, false));
std::hash<std::thread::id> hasher;
salt = global_seed ^ hasher(std::this_thread::get_id());
std::cout << "thread " << std::this_thread::get_id() << ", salt " << salt << std::endl << std::flush;
mdbx_txn_unlock(env);
#else
salt = global_seed;
#endif
std::vector<MDBX_cursor *> pool; std::vector<MDBX_cursor *> pool;
for (auto loop = 0; loop < 333 / RELIEF_FACTOR; ++loop) { for (auto loop = 0; loop < 333 / RELIEF_FACTOR; ++loop) {
for (auto read = 0; read < 333 / RELIEF_FACTOR; ++read) { for (auto read = 0; read < 333 / RELIEF_FACTOR; ++read) {
@ -311,12 +327,7 @@ bool case1(mdbx::env env) {
bool ok = true; bool ok = true;
std::deque<mdbx::map_handle> dbi; std::deque<mdbx::map_handle> dbi;
std::vector<mdbx::cursor_managed> cursors; std::vector<mdbx::cursor_managed> cursors;
#if defined(__cpp_lib_latch) && __cpp_lib_latch >= 201907L for (auto t = 0u; t < N; ++t) {
static const auto N = 10;
#else
static const auto N = 3;
#endif
for (auto t = 0; t < N; ++t) {
auto txn = env.start_write(); auto txn = env.start_write();
auto table = txn.create_map(std::to_string(t), mdbx::key_mode::ordinal, mdbx::value_mode::multi_samelength); auto table = txn.create_map(std::to_string(t), mdbx::key_mode::ordinal, mdbx::value_mode::multi_samelength);
auto cursor = txn.open_cursor(table); auto cursor = txn.open_cursor(table);
@ -331,7 +342,7 @@ bool case1(mdbx::env env) {
#if defined(__cpp_lib_latch) && __cpp_lib_latch >= 201907L #if defined(__cpp_lib_latch) && __cpp_lib_latch >= 201907L
std::latch s(1); std::latch s(1);
std::vector<std::thread> threads; std::vector<std::thread> threads;
for (auto t = 1; t < N; ++t) { for (auto t = 1u; t < cursors.size(); ++t) {
case1_cycle_dbi(dbi); case1_cycle_dbi(dbi);
threads.push_back(std::thread([&, t]() { threads.push_back(std::thread([&, t]() {
s.wait(); s.wait();
@ -382,7 +393,7 @@ int doit() {
mdbx::env::remove(db_filename); mdbx::env::remove(db_filename);
mdbx::env_managed env(db_filename, mdbx::env_managed::create_parameters(), mdbx::env_managed env(db_filename, mdbx::env_managed::create_parameters(),
mdbx::env::operate_parameters(42, 0, mdbx::env::nested_transactions)); mdbx::env::operate_parameters(N + 2, 0, mdbx::env::nested_transactions));
bool ok = case0(env); bool ok = case0(env);
ok = case1(env) && ok; ok = case1(env) && ok;

View File

@ -2,17 +2,9 @@
#include <iostream> #include <iostream>
static char log_buffer[1024];
static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg,
unsigned length) noexcept {
(void)length;
(void)loglevel;
fprintf(stdout, "%s:%u %s", function, line, msg);
}
int doit() {
mdbx::path db_filename = "test-dbi"; mdbx::path db_filename = "test-dbi";
bool case1() {
mdbx::env::remove(db_filename); mdbx::env::remove(db_filename);
mdbx::env::operate_parameters operateParameters(100, 10, mdbx::env::nested_transactions); mdbx::env::operate_parameters operateParameters(100, 10, mdbx::env::nested_transactions);
@ -45,15 +37,15 @@ int doit() {
MDBX_stat stat; MDBX_stat stat;
int err = mdbx_dbi_stat(txn, dbi, &stat, sizeof(stat)); int err = mdbx_dbi_stat(txn, dbi, &stat, sizeof(stat));
if (err != MDBX_BAD_DBI) { if (err != MDBX_BAD_DBI) {
std::cerr << "unexpected result err-code " << err; std::cerr << "Unexpected err " << err << " (wanna MDBX_BAD_DBI/-30780)\n";
return EXIT_FAILURE; return false;
} }
txn.commit(); txn.commit();
} }
{ {
// снова проверяем что таблица открывается и хендл доступень в родительской транзакции после коммита открывшей его // снова проверяем что таблица открывается и хендл доступень в родительской транзакции,
// дочерней // после коммита открывшей его дочерней
mdbx::txn_managed txn = env.start_write(); mdbx::txn_managed txn = env.start_write();
mdbx::txn_managed nested = txn.start_nested(); mdbx::txn_managed nested = txn.start_nested();
mdbx::map_handle dbi = nested.open_map_accede("fap1"); mdbx::map_handle dbi = nested.open_map_accede("fap1");
@ -63,8 +55,165 @@ int doit() {
env.close_map(dbi); env.close_map(dbi);
} }
return true;
}
bool case2() {
bool ok = true;
mdbx::env_managed::create_parameters createParameters;
mdbx::env::remove(db_filename);
{
mdbx::env::operate_parameters operateParameters(0, 10, mdbx::env::nested_transactions);
mdbx::env_managed env(db_filename, createParameters, operateParameters);
{
mdbx::txn_managed txn = env.start_write();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "test", MDBX_CREATE, &dbi);
if (err != MDBX_DBS_FULL) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_DBS_FULL/-30791)\n";
ok = false;
}
}
{
mdbx::txn_managed txn = env.start_write();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "test", MDBX_CREATE | MDBX_DUPSORT | MDBX_DUPFIXED, &dbi);
if (err != MDBX_DBS_FULL) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_DBS_FULL/-30791)\n";
ok = false;
}
}
}
{
mdbx::env::operate_parameters operateParameters(1, 10, mdbx::env::nested_transactions);
mdbx::env_managed env(db_filename, createParameters, operateParameters);
{
mdbx::txn_managed txn = env.start_write();
mdbx::map_handle dbi = txn.create_map("dup", mdbx::key_mode::ordinal, mdbx::value_mode::multi_ordinal);
txn.commit();
env.close_map(dbi);
}
{
mdbx::txn_managed txn = env.start_write();
mdbx::map_handle dbi = txn.create_map("uni", mdbx::key_mode::reverse, mdbx::value_mode::single);
txn.commit();
env.close_map(dbi);
}
}
{
mdbx::env::operate_parameters operateParameters(0, 10, mdbx::env::nested_transactions);
mdbx::env_managed env(db_filename, createParameters, operateParameters);
{
mdbx::txn_managed txn = env.start_read();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "uni", MDBX_DB_ACCEDE, &dbi);
if (err != MDBX_DBS_FULL) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_DBS_FULL/-30791)\n";
ok = false;
}
if (dbi)
env.close_map(dbi);
}
{
mdbx::txn_managed txn = env.start_read();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "dup", MDBX_DB_ACCEDE, &dbi);
if (err != MDBX_DBS_FULL) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_DBS_FULL/-30791)\n";
ok = false;
}
if (dbi)
env.close_map(dbi);
}
}
{
{
mdbx::env::operate_parameters operateParameters(1, 10, mdbx::env::nested_transactions);
mdbx::env_managed env(db_filename, createParameters, operateParameters);
{
mdbx::txn_managed txn = env.start_read();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "uni", MDBX_DB_ACCEDE, &dbi);
if (err != MDBX_SUCCESS) {
std::cerr << "Unexpected err " << err << "\n";
ok = false;
}
if (dbi)
env.close_map(dbi);
}
{
mdbx::txn_managed txn = env.start_read();
MDBX_dbi dbi = 0;
int err = mdbx_dbi_open(txn, "dup", MDBX_DB_ACCEDE, &dbi);
if (err != MDBX_SUCCESS) {
std::cerr << "Unexpected err " << err << "\n";
ok = false;
}
if (dbi)
env.close_map(dbi);
}
}
}
return ok;
}
bool case3() {
bool ok = true;
mdbx::env_managed::create_parameters createParameters;
mdbx::env::remove(db_filename);
{
mdbx::env::operate_parameters operateParameters(1, 10, mdbx::env::nested_transactions);
mdbx::env_managed env(db_filename, createParameters, operateParameters);
{
mdbx::txn_managed txn = env.start_write();
MDBX_dbi notexists_dbi = 0;
int err = mdbx_dbi_open(txn, "test", MDBX_DB_DEFAULTS, &notexists_dbi);
if (err != MDBX_NOTFOUND) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_NOTFOUND/-30798)\n";
ok = false;
}
mdbx::map_handle dbi = txn.create_map("test", mdbx::key_mode::ordinal, mdbx::value_mode::single);
dbi = txn.open_map("test", mdbx::key_mode::ordinal, mdbx::value_mode::single);
err = mdbx_dbi_close(env, dbi);
if (err != MDBX_DANGLING_DBI) {
std::cerr << "Unexpected err " << err << " (wanna MDBX_DANGLING_DBI/-30412)\n";
ok = false;
}
txn.commit();
env.close_map(dbi);
}
}
return ok;
}
int doit() {
bool ok = true;
ok = case1() && ok;
ok = case2() && ok;
ok = case3() && ok;
if (ok) {
std::cout << "OK\n"; std::cout << "OK\n";
return EXIT_SUCCESS; return EXIT_SUCCESS;
} else {
std::cerr << "FAIL\n";
return EXIT_FAILURE;
}
}
static char log_buffer[1024];
static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg,
unsigned length) noexcept {
(void)length;
(void)loglevel;
fprintf(stdout, "%s:%u %s", function, line, msg);
} }
int main(int argc, char *argv[]) { int main(int argc, char *argv[]) {

488
test/extra/details_rkl.c Normal file
View File

@ -0,0 +1,488 @@
/// \copyright SPDX-License-Identifier: Apache-2.0
/// \author Леонид Юрьев aka Leonid Yuriev <leo@yuriev.ru> \date 2025
#define debug_log debug_log_sub
#include "../../src/rkl.c"
#include "../../src/txl.c"
MDBX_MAYBE_UNUSED __cold void debug_log_sub(int level, const char *function, int line, const char *fmt, ...) {
(void)level;
(void)function;
(void)line;
(void)fmt;
}
/*-----------------------------------------------------------------------------*/
static size_t tst_failed, tst_ok, tst_iterations, tst_cases, tst_cases_hole;
#ifndef NDEBUG
static size_t tst_target;
#endif
static bool check_bool(bool v, bool expect, const char *fn, unsigned line) {
if (unlikely(v != expect)) {
++tst_failed;
fflush(nullptr);
fprintf(stderr, "iteration %zi: got %s, expected %s, at %s:%u\n", tst_iterations, v ? "true" : "false",
expect ? "true" : "false", fn, line);
fflush(nullptr);
return false;
}
++tst_ok;
return true;
}
static bool check_eq(uint64_t v, uint64_t expect, const char *fn, unsigned line) {
if (unlikely(v != expect)) {
++tst_failed;
fflush(nullptr);
fprintf(stderr, "iteration %zi: %" PRIu64 " (got) != %" PRIu64 " (expected), at %s:%u\n", tst_iterations, v, expect,
fn, line);
fflush(nullptr);
return false;
}
++tst_ok;
return true;
}
#define CHECK_BOOL(T, EXPECT) check_bool((T), (EXPECT), __func__, __LINE__)
#define CHECK_TRUE(T) CHECK_BOOL(T, true)
#define CHECK_FALSE(T) CHECK_BOOL(T, false)
#define CHECK_EQ(T, EXPECT) check_eq((T), (EXPECT), __func__, __LINE__)
void trivia(void) {
rkl_t x, y;
rkl_init(&x);
rkl_init(&y);
CHECK_TRUE(rkl_check(&x));
CHECK_TRUE(rkl_empty(&x));
CHECK_EQ(rkl_len(&x), 0);
rkl_iter_t f = rkl_iterator(&x, false);
rkl_iter_t r = rkl_iterator(&x, true);
CHECK_EQ(rkl_left(&f, false), 0);
CHECK_EQ(rkl_left(&f, true), 0);
CHECK_EQ(rkl_left(&r, false), 0);
CHECK_EQ(rkl_left(&r, true), 0);
CHECK_EQ(rkl_turn(&f, false), 0);
CHECK_EQ(rkl_turn(&f, true), 0);
CHECK_EQ(rkl_turn(&r, false), 0);
CHECK_EQ(rkl_turn(&r, true), 0);
CHECK_TRUE(rkl_check(&x));
rkl_hole_t hole;
hole = rkl_hole(&f, true);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&f, false);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&r, true);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&r, false);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, MAX_TXNID);
CHECK_EQ((uint64_t)rkl_push(&x, 42), (uint64_t)MDBX_SUCCESS);
CHECK_TRUE(rkl_check(&x));
CHECK_FALSE(rkl_empty(&x));
CHECK_EQ(rkl_len(&x), 1);
// CHECK_EQ((uint64_t)rkl_push(&x, 42, true), (uint64_t)MDBX_RESULT_TRUE);
// CHECK_TRUE(rkl_check(&x));
f = rkl_iterator(&x, false);
r = rkl_iterator(&x, true);
CHECK_EQ(rkl_left(&f, false), 1);
CHECK_EQ(rkl_left(&f, true), 0);
CHECK_EQ(rkl_left(&r, false), 0);
CHECK_EQ(rkl_left(&r, true), 1);
CHECK_EQ(rkl_turn(&f, true), 0);
CHECK_EQ(rkl_turn(&f, false), 42);
CHECK_EQ(rkl_turn(&f, false), 0);
CHECK_EQ(rkl_turn(&f, true), 42);
CHECK_EQ(rkl_turn(&f, true), 0);
CHECK_EQ(rkl_turn(&r, false), 0);
CHECK_EQ(rkl_turn(&r, true), 42);
CHECK_EQ(rkl_turn(&r, true), 0);
CHECK_EQ(rkl_turn(&r, false), 42);
CHECK_EQ(rkl_turn(&r, false), 0);
f = rkl_iterator(&x, false);
hole = rkl_hole(&f, false);
CHECK_EQ(hole.begin, 43);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&f, false);
CHECK_EQ(hole.begin, MAX_TXNID);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&f, true);
CHECK_EQ(hole.begin, 43);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&f, true);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, 42);
hole = rkl_hole(&f, true);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, 42);
r = rkl_iterator(&x, true);
hole = rkl_hole(&r, false);
CHECK_EQ(hole.begin, MAX_TXNID);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&r, true);
CHECK_EQ(hole.begin, 43);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&r, true);
CHECK_EQ(hole.begin, 1);
CHECK_EQ(hole.end, 42);
hole = rkl_hole(&r, false);
CHECK_EQ(hole.begin, 43);
CHECK_EQ(hole.end, MAX_TXNID);
hole = rkl_hole(&r, false);
CHECK_EQ(hole.begin, MAX_TXNID);
CHECK_EQ(hole.end, MAX_TXNID);
rkl_resize(&x, 222);
CHECK_FALSE(rkl_empty(&x));
CHECK_TRUE(rkl_check(&x));
rkl_destructive_move(&x, &y);
CHECK_TRUE(rkl_check(&x));
CHECK_TRUE(rkl_check(&y));
rkl_destroy(&x);
rkl_destroy(&y);
}
/*-----------------------------------------------------------------------------*/
uint64_t prng_state;
static uint64_t prng(void) {
prng_state = prng_state * UINT64_C(6364136223846793005) + 1;
return prng_state;
}
static bool flipcoin(void) { return (bool)prng() & 1; }
static bool stochastic_pass(const unsigned start, const unsigned width, const unsigned n) {
rkl_t k, c;
txl_t l = txl_alloc();
if (!CHECK_TRUE(l))
return false;
rkl_init(&k);
rkl_init(&c);
const size_t errors = tst_failed;
rkl_iter_t f = rkl_iterator(&k, false);
rkl_iter_t r = rkl_iterator(&k, true);
txnid_t lowest = UINT_MAX;
txnid_t highest = 0;
while (txl_size(l) < n) {
txnid_t id = (txnid_t)(prng() % width + start);
if (id < MIN_TXNID || id >= INVALID_TXNID)
continue;
if (txl_contain(l, id)) {
if (CHECK_TRUE(rkl_contain(&k, id)) && CHECK_EQ((uint64_t)rkl_push(&k, id), (uint64_t)MDBX_RESULT_TRUE))
continue;
break;
}
if (!CHECK_FALSE(rkl_contain(&k, id)))
break;
if (tst_iterations % (1u << 24) == 0 && tst_iterations) {
printf("done %.3fM iteration, %zu cases\n", tst_iterations / 1000000.0, tst_cases);
fflush(nullptr);
}
tst_iterations += 1;
#ifndef NDEBUG
if (tst_iterations == tst_target) {
printf("reach %zu iteration\n", tst_iterations);
fflush(nullptr);
}
#endif
if (!CHECK_EQ(rkl_push(&k, id), MDBX_SUCCESS))
break;
if (!CHECK_TRUE(rkl_check(&k)))
break;
if (!CHECK_EQ(txl_append(&l, id), MDBX_SUCCESS))
break;
if (!CHECK_TRUE(rkl_contain(&k, id)))
break;
lowest = (lowest < id) ? lowest : id;
highest = (highest > id) ? highest : id;
if (!CHECK_EQ(rkl_lowest(&k), lowest))
break;
if (!CHECK_EQ(rkl_highest(&k), highest))
break;
}
txl_sort(l);
CHECK_EQ(rkl_len(&k), n);
CHECK_EQ(txl_size(l), n);
f = rkl_iterator(&k, false);
r = rkl_iterator(&k, true);
CHECK_EQ(rkl_left(&f, false), n);
CHECK_EQ(rkl_left(&f, true), 0);
CHECK_EQ(rkl_left(&r, false), 0);
CHECK_EQ(rkl_left(&r, true), n);
for (size_t i = 0; i < n; ++i) {
CHECK_EQ(rkl_turn(&f, false), l[n - i]);
CHECK_EQ(rkl_left(&f, false), n - i - 1);
CHECK_EQ(rkl_left(&f, true), i + 1);
CHECK_EQ(rkl_turn(&r, true), l[i + 1]);
r.pos += 1;
CHECK_EQ(rkl_turn(&r, true), l[i + 1]);
CHECK_EQ(rkl_left(&r, true), n - i - 1);
CHECK_EQ(rkl_left(&r, false), i + 1);
}
if (CHECK_EQ(rkl_copy(&k, &c), MDBX_SUCCESS)) {
for (size_t i = 1; i <= n; ++i) {
if (!CHECK_FALSE(rkl_empty(&k)))
break;
if (!CHECK_FALSE(rkl_empty(&c)))
break;
CHECK_EQ(rkl_pop(&k, true), l[i]);
CHECK_EQ(rkl_pop(&c, false), l[1 + n - i]);
}
}
CHECK_TRUE(rkl_empty(&k));
CHECK_TRUE(rkl_empty(&c));
rkl_destroy(&k);
rkl_destroy(&c);
txl_free(l);
++tst_cases;
return errors == tst_failed;
}
static bool stochastic(const size_t limit_cases, const size_t limit_loops) {
for (unsigned loop = 0; tst_cases < limit_cases || loop < limit_loops; ++loop)
for (unsigned width = 2; width < 10; ++width)
for (unsigned n = 1; n < width; ++n)
for (unsigned prev = 1, start = 0, t; start < 4242; t = start + prev, prev = start, start = t)
if (!stochastic_pass(start, 1u << width, 1u << n) || tst_failed > 42) {
puts("bailout\n");
return false;
}
return true;
}
/*-----------------------------------------------------------------------------*/
static bool bit(size_t set, size_t n) {
assert(n < CHAR_BIT * sizeof(set));
return (set >> n) & 1;
}
static size_t hamming_weight(size_t v) {
const size_t m1 = (size_t)UINT64_C(0x5555555555555555);
const size_t m2 = (size_t)UINT64_C(0x3333333333333333);
const size_t m4 = (size_t)UINT64_C(0x0f0f0f0f0f0f0f0f);
const size_t h01 = (size_t)UINT64_C(0x0101010101010101);
v -= (v >> 1) & m1;
v = (v & m2) + ((v >> 2) & m2);
v = (v + (v >> 4)) & m4;
return (v * h01) >> (sizeof(v) * 8 - 8);
}
static bool check_hole(const size_t set, const rkl_hole_t hole, size_t *acc) {
const size_t errors = tst_failed;
++tst_iterations;
if (hole.begin > 1)
CHECK_EQ(bit(set, hole.begin - 1), 1);
if (hole.end < CHAR_BIT * sizeof(set))
CHECK_EQ(bit(set, hole.end), 1);
for (size_t n = hole.begin; n < hole.end && n < CHAR_BIT * sizeof(set); n++) {
CHECK_EQ(bit(set, n), 0);
*acc += 1;
}
return errors == tst_failed;
}
static void debug_set(const size_t set, const char *str, int iter_offset) {
#if 1
(void)set;
(void)str;
(void)iter_offset;
#else
printf("\ncase %s+%d: count %zu, holes", str, iter_offset, hamming_weight(~set) - 1);
for (size_t k, i = 1; i < CHAR_BIT * sizeof(set); ++i) {
if (!bit(set, i)) {
printf(" %zu", i);
for (k = i; k < CHAR_BIT * sizeof(set) - 1 && !bit(set, k + 1); ++k)
;
if (k > i) {
printf("-%zu", k);
i = k;
}
}
}
printf("\n");
fflush(nullptr);
#endif
}
static bool check_holes_bothsides(const size_t set, rkl_iter_t const *i) {
const size_t number_of_holes = hamming_weight(~set) - 1;
size_t acc = 0;
rkl_iter_t f = *i;
for (;;) {
rkl_hole_t hole = rkl_hole(&f, false);
if (hole.begin == hole.end)
break;
if (!check_hole(set, hole, &acc))
return false;
if (hole.end >= CHAR_BIT * sizeof(set))
break;
}
rkl_iter_t b = *i;
for (;;) {
rkl_hole_t hole = rkl_hole(&b, true);
if (hole.begin == hole.end)
break;
if (!check_hole(set, hole, &acc))
return false;
if (hole.begin == 1)
break;
}
if (!CHECK_EQ(acc, number_of_holes))
return false;
return true;
}
static bool check_holes_fourways(const size_t set, const rkl_t *rkl) {
rkl_iter_t i = rkl_iterator(rkl, false);
int o = 0;
do {
debug_set(set, "initial-forward", o++);
if (!check_holes_bothsides(set, &i))
return false;
} while (rkl_turn(&i, false));
do {
debug_set(set, "recoil-reverse", --o);
if (!check_holes_bothsides(set, &i))
return false;
} while (rkl_turn(&i, true));
i = rkl_iterator(rkl, true);
o = 0;
do {
debug_set(set, "initial-reverse", --o);
if (!check_holes_bothsides(set, &i))
return false;
} while (rkl_turn(&i, false));
do {
debug_set(set, "recoil-forward", o++);
if (!check_holes_bothsides(set, &i))
return false;
} while (rkl_turn(&i, true));
return true;
}
static bool stochastic_pass_hole(size_t set, size_t trims) {
const size_t one = 1;
set &= ~one;
if (!set)
return true;
++tst_cases_hole;
rkl_t rkl;
rkl_init(&rkl);
for (size_t n = 1; n < CHAR_BIT * sizeof(set); ++n)
if (bit(set, n))
CHECK_EQ(rkl_push(&rkl, n), MDBX_SUCCESS);
if (!check_holes_fourways(set, &rkl))
return false;
while (rkl_len(&rkl) > 1 && trims-- > 0) {
if (flipcoin()) {
const size_t l = (size_t)rkl_pop(&rkl, false);
if (l == 0)
break;
assert(bit(set, l));
set -= one << l;
if (!check_holes_fourways(set, &rkl))
return false;
} else {
const size_t h = (size_t)rkl_pop(&rkl, true);
if (h == 0)
break;
assert(bit(set, h));
set -= one << h;
if (!check_holes_fourways(set, &rkl))
return false;
}
}
return true;
}
static size_t prng_word(void) {
size_t word = (size_t)(prng() >> 32);
if (sizeof(word) > 4)
word = (uint64_t)word << 32 | (size_t)(prng() >> 32);
return word;
}
static bool stochastic_hole(size_t probes) {
for (size_t n = 0; n < probes; ++n) {
size_t set = prng_word();
if (!stochastic_pass_hole(set, prng() % 11))
return false;
if (!stochastic_pass_hole(set & prng_word(), prng() % 11))
return false;
if (!stochastic_pass_hole(set | prng_word(), prng() % 11))
return false;
}
return true;
}
/*-----------------------------------------------------------------------------*/
int main(int argc, const char *argv[]) {
(void)argc;
(void)argv;
#ifndef NDEBUG
// tst_target = 281870;
#endif
prng_state = (uint64_t)time(nullptr);
printf("prng-seed %" PRIu64 "\n", prng_state);
fflush(nullptr);
trivia();
stochastic(42 * 42 * 42, 42);
stochastic_hole(24 * 24 * 24);
printf("done: %zu+%zu cases, %zu iterations, %zu checks ok, %zu checks failed\n", tst_cases, tst_cases_hole,
tst_iterations, tst_ok, tst_failed);
fflush(nullptr);
return tst_failed ? EXIT_FAILURE : EXIT_SUCCESS;
}

View File

@ -31,15 +31,6 @@ int main(int argc, const char *argv[]) {
#include <latch> #include <latch>
#include <thread> #include <thread>
static char log_buffer[1024];
static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg,
unsigned length) noexcept {
(void)length;
(void)loglevel;
fprintf(stdout, "%s:%u %s", function, line, msg);
}
bool case0(const mdbx::path &path) { bool case0(const mdbx::path &path) {
mdbx::env_managed::create_parameters createParameters; mdbx::env_managed::create_parameters createParameters;
createParameters.geometry.make_dynamic(21 * mdbx::env::geometry::MiB, 84 * mdbx::env::geometry::MiB); createParameters.geometry.make_dynamic(21 * mdbx::env::geometry::MiB, 84 * mdbx::env::geometry::MiB);
@ -322,19 +313,85 @@ bool case2(const mdbx::path &path, bool no_sticky_threads) {
return true; return true;
} }
bool case3(const mdbx::path &path, bool no_sticky_threads) {
mdbx::env::remove(path);
mdbx::env_managed::create_parameters createParameters;
createParameters.geometry.make_dynamic(21 * mdbx::env::geometry::MiB, 84 * mdbx::env::geometry::MiB);
mdbx::env::operate_parameters operateParameters(100, 10);
operateParameters.options.no_sticky_threads = no_sticky_threads;
mdbx::env_managed env(path, createParameters, operateParameters);
mdbx::pair pair = {"key", "val"};
const auto N = std::thread::hardware_concurrency() * 2;
std::latch s0(N + 1), s1(N + 1), s2(N + 1);
std::vector<std::thread> l;
volatile bool ok = true;
for (size_t n = 0; n < N; ++n)
l.push_back(std::thread([&]() {
try {
s0.arrive_and_wait();
{
auto txn = env.start_read();
mdbx::slice value;
int err = mdbx_get(txn, 1, pair.key, &value);
if (err != MDBX_NOTFOUND) {
ok = false;
std::cerr << "Unexpected error " << err << "\n";
}
}
s1.arrive_and_wait();
s2.arrive_and_wait();
{
auto txn = env.start_read();
if (txn.get(1, pair.key) != pair.value)
ok = false;
}
} catch (const std::exception &ex) {
std::cerr << "Exception: " << ex.what() << "\n";
ok = false;
}
}));
s0.arrive_and_wait();
auto txn = env.start_write();
s1.arrive_and_wait();
txn.insert(1, pair);
txn.commit();
s2.arrive_and_wait();
for (auto &t : l)
t.join();
return ok;
}
int doit() { int doit() {
mdbx::path path = "test-txn"; mdbx::path path = "test-txn";
mdbx::env::remove(path); mdbx::env::remove(path);
bool ok = case0(path); bool ok = true;
ok = case0(path) && ok;
ok = case1(path) && ok; ok = case1(path) && ok;
ok = case2(path, false) && ok; ok = case2(path, false) && ok;
ok = case2(path, true) && ok; ok = case2(path, true) && ok;
ok = case3(path, false) && ok;
ok = case3(path, true) && ok;
std::cout << (ok ? "OK\n" : "FAIL\n"); std::cout << (ok ? "OK\n" : "FAIL\n");
return ok ? EXIT_SUCCESS : EXIT_FAILURE; return ok ? EXIT_SUCCESS : EXIT_FAILURE;
} }
static char log_buffer[1024];
static void logger_nofmt(MDBX_log_level_t loglevel, const char *function, int line, const char *msg,
unsigned length) noexcept {
(void)length;
(void)loglevel;
fprintf(stdout, "%s:%u %s", function, line, msg);
}
int main(int argc, char *argv[]) { int main(int argc, char *argv[]) {
(void)argc; (void)argc;
(void)argv; (void)argv;

View File

@ -460,9 +460,9 @@ int main(int argc, char *const argv[]) {
params.datalen_max = params.datalen_min; params.datalen_max = params.datalen_min;
continue; continue;
} }
if (config::parse_option(argc, argv, narg, "batch.read", params.batch_read, config::no_scale, 1)) if (config::parse_option(argc, argv, narg, "batch.read", params.batch_read, config::decimal, 1))
continue; continue;
if (config::parse_option(argc, argv, narg, "batch.write", params.batch_write, config::no_scale, 1)) if (config::parse_option(argc, argv, narg, "batch.write", params.batch_write, config::decimal, 1))
continue; continue;
if (config::parse_option(argc, argv, narg, "delay", params.delaystart, config::duration)) if (config::parse_option(argc, argv, narg, "delay", params.delaystart, config::duration))
continue; continue;

View File

@ -381,7 +381,28 @@ int osal_actor_start(const actor_config &config, mdbx_pid_t &pid) {
actor_status osal_actor_info(const mdbx_pid_t pid) { return children.at(pid); } actor_status osal_actor_info(const mdbx_pid_t pid) { return children.at(pid); }
static void wait_actors(unsigned timeout) {
for (auto &pair : children)
if (pair.second <= as_running) {
osal_yield();
mdbx_pid_t pid = 0;
osal_actor_poll(pid, timeout);
if (!pid)
return;
}
}
void osal_killall_actors(void) { void osal_killall_actors(void) {
for (auto &pair : children)
kill(pair.first, SIGINT);
wait_actors(0);
for (auto &pair : children) {
osal_yield();
kill(pair.first, SIGTERM);
}
wait_actors(1);
for (auto &pair : children) { for (auto &pair : children) {
kill(pair.first, SIGKILL); kill(pair.first, SIGKILL);
pair.second = as_killed; pair.second = as_killed;

View File

@ -28,6 +28,7 @@ REPORT_DEPTH=no
REPEAT=11 REPEAT=11
ROUNDS=1 ROUNDS=1
SMALL=no SMALL=no
NUMABIND=
while [ -n "$1" ] while [ -n "$1" ]
do do
@ -51,6 +52,7 @@ do
echo "--db-upto-gb NN --''--''--''--''--''--''--''--''-- NN gigabytes" echo "--db-upto-gb NN --''--''--''--''--''--''--''--''-- NN gigabytes"
echo "--no-geometry-jitter Disable jitter for geometry upper-size" echo "--no-geometry-jitter Disable jitter for geometry upper-size"
echo "--pagesize NN Use specified page size (256 is minimal and used by default)" echo "--pagesize NN Use specified page size (256 is minimal and used by default)"
echo "--numa NODE Bind to the specified NUMA node"
echo "--dont-check-ram-size Don't check available RAM" echo "--dont-check-ram-size Don't check available RAM"
echo "--extra Iterate extra modes/flags" echo "--extra Iterate extra modes/flags"
echo "--taillog Dump tail of test log on failure" echo "--taillog Dump tail of test log on failure"
@ -209,6 +211,15 @@ do
--small) --small)
SMALL=yes SMALL=yes
;; ;;
--numa)
NUMANODE=$2
if [[ ! $NUMANODE =~ ^[0-9]+$ ]]; then
echo "Invalid value '$NUMANODE' for --numa option, expect an integer of NUMA-node"
exit -2
fi
NUMABIND="numactl --membind ${NUMANODE} --cpunodebind ${NUMANODE}"
shift
;;
*) *)
echo "Unknown option '$1'" echo "Unknown option '$1'"
exit -2 exit -2
@ -393,7 +404,7 @@ if [ "$SKIP_MAKE" != "yes" ]; then
fi fi
############################################################################### ###############################################################################
# 5. run stochastic iterations # 5. internal preparations
if which setsid >/dev/null 2>/dev/null; then if which setsid >/dev/null 2>/dev/null; then
SETSID=$(which setsid) SETSID=$(which setsid)
@ -504,9 +515,9 @@ function probe {
else else
exec {LFD}> >(logger) exec {LFD}> >(logger)
fi fi
${MONITOR} ./mdbx_test ${speculum} --random-writemap=no --ignore-dbfull --repeat=${REPEAT} --pathname=${TESTDB_DIR}/long.db --cleanup-after=no --geometry-jitter=${GEOMETRY_JITTER} "$@" $case >&${LFD} \ ${NUMABIND} ${MONITOR} ./mdbx_test ${speculum} --random-writemap=no --ignore-dbfull --repeat=${REPEAT} --pathname=${TESTDB_DIR}/long.db --cleanup-after=no --geometry-jitter=${GEOMETRY_JITTER} "$@" $case >&${LFD} \
&& ${MONITOR} ./mdbx_chk -q ${TESTDB_DIR}/long.db | tee ${TESTDB_DIR}/long-chk.log \ && ${NUMABIND} ${MONITOR} ./mdbx_chk -q ${TESTDB_DIR}/long.db | tee ${TESTDB_DIR}/long-chk.log \
&& ([ ! -e ${TESTDB_DIR}/long.db-copy ] || ${MONITOR} ./mdbx_chk -q ${TESTDB_DIR}/long.db-copy | tee ${TESTDB_DIR}/long-chk-copy.log) \ && ([ ! -e ${TESTDB_DIR}/long.db-copy ] || ${NUMABIND} ${MONITOR} ./mdbx_chk -q ${TESTDB_DIR}/long.db-copy | tee ${TESTDB_DIR}/long-chk-copy.log) \
|| failed || failed
if [ ${LFD} -ne 0 ]; then if [ ${LFD} -ne 0 ]; then
echo "@@@ END-OF-LOG/ITERATION" >&${LFD} echo "@@@ END-OF-LOG/ITERATION" >&${LFD}
@ -516,6 +527,115 @@ function probe {
done done
} }
# generate caseset
declare -A caseset_id2caption
declare -A caseset_id2args
cases=0
for ((bits=2**${#options[@]}; --bits >= 0; )); do
split=30
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,int-data, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
split=24
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,int-data, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
split=16
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,int-data, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
if [ "$EXTRA" != "no" ]; then
split=10
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,int-data, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="with-dups, split=${split}"
caseset_id2args[${cases}]="--table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
fi
split=4
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,int-data, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="w/o-dups, split=${split}"
caseset_id2args[${cases}]="--table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="int-key,fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
cases=$((++cases))
caseset_id2caption[${cases}]="fixdups, split=${split}"
caseset_id2args[${cases}]="--table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd --mode=$(bits2options $bits)${syncmodes[cases%4]}"
done
###############################################################################
# 6. run stochastic iterations
function pass { function pass {
for ((round=1; round <= ROUNDS; ++round)); do for ((round=1; round <= ROUNDS; ++round)); do
echo "=======================================================================" echo "======================================================================="
@ -524,122 +644,22 @@ function pass {
else else
${BANNER} "$nops / $wbatch" ${BANNER} "$nops / $wbatch"
fi fi
subcase=0
for ((bits=2**${#options[@]}; --bits >= 0; )); do
seed=$(($(date +%s) + RANDOM)) seed=$(($(date +%s) + RANDOM))
subcase=0
split=30 for id in $(seq 1 ${cases} | shuf); do
caption="$((++count)) int-key,with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \ caption="$((++count)) ${caseset_id2caption[${id}]}, case $((++subcase))/${id} of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \ --prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M ${caseset_id2args[${id}]} --nops=$nops --batch.write=$wbatch
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]} done
caption="$((++count)) int-key,int-data, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
split=24
caption="$((++count)) int-key,with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,int-data, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
split=16
caption="$((++count)) int-key,w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,int-data, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
if [ "$EXTRA" != "no" ]; then
split=10
caption="$((++count)) int-key,w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,int-data, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) with-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
fi
split=4
caption="$((++count)) int-key,w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,int-data, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.integer --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=max \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) w/o-dups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=-data.multi --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen.min=min --datalen.max=1111 \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) int-key,fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+key.integer,+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
caption="$((++count)) fixdups, split=${split}, case $((++subcase)) of ${cases}" probe \
--prng-seed=${seed} --pagesize=$PAGESIZE --size-upper-upto=${db_size_mb}M --table=+data.fixed --keygen.split=${split} --keylen.min=min --keylen.max=max --datalen=rnd \
--nops=$nops --batch.write=$wbatch --mode=$(bits2options $bits)${syncmodes[count%4]}
done # options
cases="${subcase}"
done done
} }
#------------------------------------------------------------------------------
if [ "$DELAY" != "0" ]; then if [ "$DELAY" != "0" ]; then
sleep $DELAY sleep $DELAY
fi fi
count=0 count=0
loop=0 loop=0
cases='?'
if [[ $SMALL != "yes" ]]; then if [[ $SMALL != "yes" ]]; then
for nops in 10 33 100 333 1000 3333 10000 33333 100000 333333 1000000 3333333 10000000 33333333 100000000 333333333 1000000000; do for nops in 10 33 100 333 1000 3333 10000 33333 100000 333333 1000000 3333333 10000000 33333333 100000000 333333333 1000000000; do
if [ $nops -lt $FROM ]; then continue; fi if [ $nops -lt $FROM ]; then continue; fi

View File

@ -770,7 +770,7 @@ static bool execute_thunk(const actor_config *const_config, const mdbx_pid_t pid
size_t iter = 0; size_t iter = 0;
do { do {
if (iter) { if (iter) {
prng_seed(config.params.prng_seed += INT32_C(0xA4F4D37B)); prng_salt(iter);
log_verbose("turn PRNG to %u", config.params.prng_seed); log_verbose("turn PRNG to %u", config.params.prng_seed);
} }
iter++; iter++;

Some files were not shown because too many files have changed in this diff Show More