Выяснилось что утилита `mdbx_copy` и функции `mdbx_env_copy()` могут
создавать ПРОБЛЕМЫ если целевой файл расположен в encryptfs (такая
файловая система в Linux).
При этом может быть четыре исхода в зависимости от версии ядра и
положения звезд на небе:
- всё хорошо;
- плохие данные в копии без возврата ошибок;
- ошибка EINVAL(22) при копировании;
- oops или зависание ядра, отвал смонтированной encryptfs и т.п.
В текущем понимании, причина обусловлена ошибой в коде fs, которая
проявляется при использовании системного вызова `copy_file_range`.
Есть основание полагать, что mremap() может возвращать MAP_FAILED, но НЕ
устанавливать errno в некоторых пограничных ситуациях. Например, когда
системных ресурсов не хватает на актуализацию/копирование/клонирование
состояния отображения на финальной стадии, в том числе из-за раскраски
исходного отображения разными флагами через madvise().
Это решает проблему срабатывания проверочного утверждения при сборке для
платформ где тип off_t шире соответствующих полей структуры flock,
используемой для блокировки файлов.
Изменение формата LCK-файла означает что версии libmdbx использующие
разный формат не смогут работать с одной БД одновременно, а только
поочередно (LCK-файл переписывается при открытии первым открывающим БД
процессом).
1. Поле mti_unsynced_pages теперь 64-битное (чтобы не контролировать
переполнение) и перемещено для соблюдения выравнивания.
2. Поле mti_sync_timestamp переименовано в mti_eoos_timestamp
одновременно со сменой семантики. Теперь время отсчитывается не от
момента сброса данных на диск, а с момента входа в «грязное» состояние.
Скорее всего, текущая версия формата LCK не окончательная
и изменится до релиза.
При проверке использовалось глобальное значение me_dxb_mmap.current,
к которому не должны обращаться читающие транзакции. В результате,
в сложных много-поточных сценариях с изменением размера БД и её
переполнением, проверка могла выдавать ложно-положительный результат.
С точки зрения пользователя, ошибка могла проявляться как возврат
`MDBX_CORRUPTED` из читающей транзакции, когда включен "безопасный
режим" (дополнительный контроль), а в параллельной пишущей транзакции
происходит увеличение размера БД с последующим переполнением и откатом
этой транзакции. При этом никакого повреждения структуры БД нет.
Ассерт мог срабатывать из-за отсутствия бита P_LEAF2 в передаваемом проверочном значении.
На что-либо другое не влияло, но не следует понять почему этот недочет ны был выявлен тестами раньше.
В режиме MDBX_WRITEMAP с опцией сборки MDBX_AVOID_MSYNC=0 отслеживание грязных страниц не требуется.
Эта доработка устраняет еще одну из недоделок (пункт в TODO).
Ранее, при конвертации очень коротких интервалов в формат фиксированной
точки 16-точка-16, всегда выполнялось замещение нуля единицей. Т.е. если
интервал был не нулевым, но меньше 15.259 микросекунд (1/65536 секунды),
то вместо 0 возвращалось 1.
Это приводило к тому, что сумма длительности отдельных стадий нередко
была больше чем общее время фиксации транзакции. Проблема усугублялась,
если получаемые значения аккумулировались по серии транзакций.
Теперь такая защита от нуля выполняется только для общего времени,
но не для отдельных стадий.
Было:
latency(ms): preparation=72.69 gc=72.69 write=73.04 sync=141.40 ending=72.69 whole=142.14
Аккумулированная сумма длительности этапов ВТРОЕ(!) больше общей длительности.
Стало:
latency(ms): preparation=0.00 gc=0.02 write=0.79 sync=67.98 ending=0.00 whole=140.81
Аккумулированная сумма длительности этапов меньше общей длительности,
так как для каждой транзакции общая длительность возвращается не менее 15.259 микросекунд.
The planned frontward release with new superior features on the day of 20 anniversary of [Positive Technologies](https://ptsecurty.com).
New:
----
- The `Big Foot` feature which significantly reduces GC overhead for processing large lists of retired pages from huge transactions.
Now _libmdbx_ avoid creating large chunks of PNLs (page number lists) which required a long sequences of free pages, aka large/overflow pages.
Thus avoiding searching, allocating and storing such sequences inside GC.
- Improved hot/online validation and checking of database pages both for more robustness and performance.
- New solid and fast method to latch meta-pages called `Troika`.
The minimum of memory barriers, reads, comparisons and conditional transitions are used.
- New `MDBX_VALIDATION` environment options to extra validation of DB structure and pages content for carefully/safe handling damaged or untrusted DB.
- Accelerated ×16/×8/×4 by AVX512/AVX2/SSE2/Neon implementations of search page sequences.
- Added the `gcrtime_seconds16dot16` counter to the "Page Operation Statistics" that accumulates time spent for GC searching and reclaiming.
- Copy-with-compactification now clears/zeroes unused gaps inside database pages.
- The `C` and `C++` APIs has been extended and/or refined to simplify using `wchar_t` pathnames.
On Windows the `mdbx_env_openW()`, `mdbx_env_get_pathW()`, `mdbx_env_copyW()`, `mdbx_env_open_for_recoveryW()` are available for now,
but the `mdbx_env_get_path()` has been replaced in favor of `mdbx_env_get_pathW()`.
- Added explicit error message for Buildroot's Microblaze toolchain maintainers.
- Added `MDBX_MANAGE_BUILD_FLAGS` build options for CMake.
- Speed-up internal `bsearch`/`lower_bound` implementation using branchless tactic, including workaround for CLANG x86 optimiser bug.
- A lot internal refinement and micro-optimisations.
- Internally counted volume of dirty pages (unused for now but for coming features).
Fixes:
------
- Never use modern `__cxa_thread_atexit()` on Apple's OSes.
- Don't check owner for finished transactions.
- Fixed typo in `MDBX_EINVAL` which breaks MingGW builds with CLANG.
37 files changed, 7604 insertions(+), 7417 deletions(-)
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
The stable bugfix release.
It is planned that this will be the last release of the v0.11 branch.
New:
----
- The C++ API has been refined to simplify support for `wchar_t` in path names.
- Added explicit error message for Buildroot's Microblaze toolchain maintainers.
Fixes:
------
- Never use modern `__cxa_thread_atexit()` on Apple's OSes.
- Use `MultiByteToWideChar(CP_THREAD_ACP)` instead of `mbstowcs()`.
- Don't check owner for finished transactions.
- Fixed typo in `MDBX_EINVAL` which breaks MingGW builds with CLANG.
Minors:
-------
- Fixed variable name typo.
- Using `ldd` to check used dso.
- Added `MDBX_WEAK_IMPORT_ATTRIBUTE` macro.
- Use current transaction geometry for untouched parameters when `env_set_geometry()` called within a write transaction.
- Minor clarified `iov_page()` failure case.
14 files changed, 263 insertions(+), 252 deletions(-)
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
Устранение крайне маловероятного регресса после перехода на мета-тройку:
- процесс А открыает БД и читает мета-траницы для формирования тройки;
- процесс Б постоянно коммитит новые транзакции;
- есть шанс что процесс А при чтении разных мета страниц попадет на момент их обновления более одного раза,
это может привести к ложной ошибке коллизии мета-страниц,
так как для обновляемых мета-страниц будет виден нулевой номер транзакции.
The stable bugfix release.
It is planned that this will be the last release of the v0.11 branch.
Acknowledgements:
-----------------
- [Alex Sharov](https://github.com/AskAlexSharov) and Erigon team for reporting and testing.
- [Andrew Ashikhmin](https://gitflic.ru/user/yperbasis) for contributing.
New:
----
- Ability to customise `MDBX_LOCK_SUFFIX`, `MDBX_DATANAME`, `MDBX_LOCKNAME` just by predefine ones during build.
- Added to [`mdbx::env_managed`](https://libmdbx.dqdkfa.ru/group__cxx__api.html#classmdbx_1_1env__managed)'s methods a few overloads with `const char* pathname` parameter (C++ API).
Fixes:
------
- Fixed hang copy-with-compactification of a corrupted DB
or in case the volume of output pages is a multiple of `MDBX_ENVCOPY_WRITEBUF`.
- Fixed standalone non-CMake build on MacOS (`#include AvailabilityMacros.h>`).
- Fixed unexpected `MDBX_PAGE_FULL` error in rare cases with large database page sizes.
Minors:
-------
- Minor fixes Doxygen references, comments, descriptions, etc.
- Fixed copy&paste typo inside `meta_checktxnid()`.
- Minor fix `meta_checktxnid()` to avoid assertion in debug mode.
- Minor fix `mdbx_env_set_geometry()` to avoid returning `EINVAL` in particular rare cases.
- Minor refine/fix batch-get testcase for large page size.
- Added `--pagesize NN` option to long-stotastic test script.
- Updated Valgrind-suppressions file for modern GCC.
- Fixed `has no symbols` warning from Apple's ranlib.
18 files changed, 318 insertions(+), 178 deletions(-)
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
1. Explicitly check and handle a race/collision case with `find_oldest_reader()`.
2. Handle "recovery mode" (me_stuck_meta >= 0) by the same code as for regular latch.
3. Add bailout error message for buggy compiler and/or hardware (paranoid).
1. Fix regression `assert: oldest >= lck->mti_oldest_reader.weak` after d4bf0a3332c7b05331ab0a87e3cd65b0903edc3c.
2. Add explicit check, kick and notice for stuck reader.
3. Made more e2k-frendly.
Кроме небольшого рефакторинга здесь реализуется более регулярный способ
обхода дерева при копировании с компактификаций. В частности, полная
инициализация курсоров позволяет выполнять больше проверок/контроля
структуры БД и избавиться от флажка CC_COPYING.
Beside a small refactoring, a more regular way of traversing the tree
when copying with compactification is implemented here. In particular,
full initialization of cursors allows to perform more checks/control of
the DB structure and get rid of the CC_COPYING flag.
Здесь основная часть изменений преобразующих отладочную проверку страниц
в регулярный и доступный пользователю осторожный/безопасный режим работы
с потенциально поврежденной БД.
Here the major part of the changes that transform a debugging check of
pages into a regular and user-accessible careful/safe mode for working
with a potentially corrupted database.
Chunking long list of retired pages during huge transactions commit
to avoid use sequences of pages:
- splits a long retired page-number-list into chunks
which fits one per single overflow/large page;
- this requires a few unique id for keys
for create such records into GC/freeDB;
- just use the necessary subsequent IDs following the current
transaction ID and then take the last of ones to update a meta-page.
Thus avoids using/allocating/searching a sequence of free pages
but just increase txnid more than one during the commit
a huge write transaction with a long retired-pages-list.
The stable release with an important fixes and workaround for the critical macOS thread-local-storage issue.
Acknowledgements:
-----------------
- [Masatoshi Fukunaga](https://github.com/mah0x211) for [Lua bindings](https://github.com/mah0x211/lua-libmdbx).
New:
----
- Added most of transactions flags to the public API.
- Added `MDBX_NOSUCCESS_EMPTY_COMMIT` build option to return non-success result (`MDBX_RESULT_TRUE`) on empty commit.
- Reworked validation and import of DBI-handles into a transaction.
Assumes these changes will be invisible to most users, but will cause fewer surprises in complex DBI cases.
- Added ability to open DB in without-LCK (exclusive read-only) mode in case no permissions to create/write LCK-file.
Fixes:
------
- A series of fixes and improvements for automatically generated documentation (Doxygen).
- Fixed copy&paste bug with could lead to `SIGSEGV` (nullptr dereference) in the exclusive/no-lck mode.
- Fixed minor warnings from modern Apple's CLANG 13.
- Fixed minor warnings from CLANG 14 and in-development CLANG 15.
- Fixed `SIGSEGV` regression in without-LCK (exclusive read-only) mode.
- Fixed `mdbx_check_fs_local()` for CDROM case on Windows.
- Fixed nasty typo of typename which caused false `MDBX_CORRUPTED` error in a rare execution path,
when the size of the thread-ID type not equal to 8.
- Fixed write-after-free memory corruption on latest `macOS` during finalization/cleanup of thread(s) that executed read transaction(s).
> The issue was suddenly discovered by a [CI](https://en.wikipedia.org/wiki/Continuous_integration)
> after adding an iteration with macOS 11 "Big Sur", and then reproduced on recent release of macOS 12 "Monterey".
> The issue was never noticed nor reported on macOS 10 "Catalina" nor others.
> Analysis shown that the problem caused by a change in the behavior of the system library (internals of dyld and pthread)
> during thread finalization/cleanup: now a memory allocated for a `__thread` variable(s) is released
> before execution of the registered Thread-Local-Storage destructor(s),
> thus a TLS-destructor will write-after-free just by legitime dereference any `__thread` variable.
> This is unexpected crazy-like behavior since the order of resources releasing/destroying
> is not the reverse of ones acquiring/construction order. Nonetheless such surprise
> is now workarounded by using atomic compare-and-swap operations on a 64-bit signatures/cookies.
- Fixed Elbrus/E2K LCC 1.26 compiler warnings (memory model for atomic operations, etc).
Minors:
-------
- Refined `release-assets` GNU Make target.
- Added logging to `mdbx_fetch_sdb()` to help debugging complex DBI-handels use cases.
- Added explicit error message from probe of no-support for `std::filesystem`.
- Added contributors "score" table by `git fame` to generated docs.
- Added `mdbx_assert_fail()` to public API (mostly for backtracing).
- Now C++20 concepts used/enabled only when `__cpp_lib_concepts >= 202002`.
- Don't provide nor report package information if used as a CMake subproject.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
The stable risen release after the Github's intentional malicious disaster.
We have migrated to a reliable trusted infrastructure
-----------------------------------------------------
The origin for now is at [GitFlic](https://gitflic.ru/project/erthink/libmdbx)
since on 2022-04-15 the Github administration, without any warning nor
explanation, deleted _libmdbx_ along with a lot of other projects,
simultaneously blocking access for many developers.
For the same reason ~~Github~~ is blacklisted forever.
GitFlic already support Russian and English languages, plan to support more,
including 和 中文. You are welcome!
New:
----
- Added the `tools-static` make target to build statically linked MDBX tools.
- Support for Microsoft Visual Studio 2022.
- Support build by MinGW' make from command line without CMake.
- Added `mdbx::filesystem` C++ API namespace that corresponds to `std::filesystem` or `std::experimental::filesystem`.
- Created [website](https://libmdbx.website.yandexcloud.net/) for online auto-generated documentation.
- Used `todo4recovery://erased_by_github/` for dead (or temporarily lost) resources deleted by ~~Github~~.
- Added `--loglevel=` command-line option to the `mdbx_test` tool.
- Added few fast smoke-like tests into CMake builds.
Fixes:
------
- Fixed a race between starting a transaction and creating a DBI descriptor that could lead to `SIGSEGV` in the cursor tracking code.
- Clarified description of `MDBX_EPERM` error returned from `mdbx_env_set_geometry()`.
- Fixed non-promoting the parent transaction to be dirty in case the undo of the geometry update failed during abortion of a nested transaction.
- Resolved linking issues with `libstdc++fs`/`libc++fs`/`libc++experimental` for C++ `std::filesystem` or `std::experimental::filesystem` for legacy compilers.
- Added workaround for GNU Make 3.81 and earlier.
- Added workaround for Elbrus/LCC 1.25 compiler bug of class inline `static constexpr` member field.
- [Fixed](https://github.com/ledgerwatch/erigon/issues/3874) minor assertion regression (only debug builds were affected).
- Fixed detection of `C++20` concepts accessibility.
- Fixed detection of Clang's LTO availability for Android.
- Fixed build for ARM/ARM64 by MSVC.
- Fixed non-x86 Windows builds with `MDBX_WITHOUT_MSVC_CRT=ON` and `MDBX_BUILD_SHARED_LIBRARY=ON`.
Minors:
-------
- Resolve minor MSVC warnings: avoid `/INCREMENTAL[:YES]` with `/LTCG`, `/W4` with `/W3`, the `C5105` warning.
- Switched to using `MDBX_EPERM` instead of `MDBX_RESULT_TRUE' to indicate that the geometry cannot be updated.
- Added `NULL` checking during memory allocation inside `mdbx_chk`.
- Resolved all warnings from MinGW while used without CMake.
- Added inheretable `target_include_directories()` to `CMakeLists.txt` for easy integration.
- Added build-time checks and paranoid runtime assertions for the `off_t` arguments of `fcntl()` which are used for locking.
- Added `-Wno-lto-type-mismatch` to avoid false-positive warnings from old GCC during LTO-enabled builds.
- Added checking for TID (system thread id) to avoid hang on 32-bit Bionic/Android within `pthread_mutex_lock()`.
- Reworked `MDBX_BUILD_TARGET` of CMake builds.
- Added `CMAKE_HOST_ARCH` and `CMAKE_HOST_CAN_RUN_EXECUTABLES_BUILT_FOR_TARGET`.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
Fixes https://github.com/ledgerwatch/erigon/issues/3874.
This was a minor regression after the c4a5325aafd3f03ce7520731b9da7253d7d178f0
that affects only debug builgs (with enabled assertions) and only when the added
code catch a incoherency of unified page/buffer cache.
The error was that the array of pointers in the transaction zeroed by the
value of env->me_numdbs and before txn->mt_numdbs was set to env->me_numdbs.
Thus, a cursor pointer(s) in the starting transaction could uninitialized if
another thread opened a dbi-handle(s) between the two aforementioned events.
The stable release with the complete workaround for an incoherence flaw of Linux unified page/buffer cache.
Nonetheless the cause for this trouble may be an issue of Intel CPU cache/MESI.
See [issue#269](https://github.com/erthink/libmdbx/issues/269) for more information.
Acknowledgements:
-----------------
- [David Bouyssié](https://github.com/david-bouyssie) for [Scala bindings](https://github.com/david-bouyssie/mdbx4s).
- [Michelangelo Riccobene](https://github.com/mriccobene) for reporting and testing.
Fixes:
------
- [Added complete workaround](https://github.com/erthink/libmdbx/issues/269) for an incoherence flaw of Linux unified page/buffer cache.
- [Fixed](https://github.com/erthink/libmdbx/issues/272) cursor reusing for read-only transactions.
- Fixed copy&paste typo inside `mdbx::cursor::find_multivalue()`.
Minors:
-------
- Minor refine C++ API for convenience.
- Minor internals refines.
- Added `lib-static` and `lib-shared` targets for make.
- Added minor workaround for AppleClang 13.3 bug.
- Clarified error messages of a signature/version mismatch.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
Briefly, this commit fixes a missed flaw:
- Cursor tracking is required to replacing shaded pages and adjusting the positions in writing transactions;
- Thus, historically, an internal linked list was maintained for a read-write transactions, but not for a read-only.
For this reason, the API for using cursors should be different for writing and reading transactions;
- However, the libmdbx's API has been significantly improved, including the ability to reuse cursors and a uniform cursors behavior for any kind of transactions.
My mistake is that due to working with MithrilDB, I forgot to make a same changes to libmdbx.
Fixes https://github.com/erthink/libmdbx/issues/272.
The stable release with the hotfix/workaround for a flaw of Linux 4.19 (at least) unified page/buffer cache.
See [issue#269](https://github.com/erthink/libmdbx/issues/269) for more information.
Acknowledgements:
-----------------
- [Simon Leier](https://github.com/leisim) for reporting and testing.
- [Kai Wetlesen](https://github.com/kaiwetlesen) for [RPMs](http://copr.fedorainfracloud.org/coprs/kwetlesen/libmdbx/).
- [Tullio Canepa](https://github.com/canepat) for reporting C++ API issue and contributing.
Fixes:
------
- [Added workaround](https://github.com/erthink/libmdbx/issues/269) for a flaw of Linux 4.19 (at least) unified page/buffer cache.
- [Fixed/Reworked](https://github.com/erthink/libmdbx/pull/270) move-assignment operators for "managed" classes of C++ API.
- Fixed potential `SIGSEGV` while open DB with overrided non-default page size.
- [Made](https://github.com/erthink/libmdbx/issues/267) `mdbx_env_open()` idempotence in failure cases.
- Refined/Fixed pages reservation inside `mdbx_update_gc()` to avoid non-reclamation in a rare cases.
- Fixed typo in a retained space calculation for the hsr-callback.
Minors:
-------
- Reworked functions for meta-pages, split-off non-volatile.
- Disentangled C11-atomic fences/barriers and pure-functions (with `__attribute__((__pure__))`) to avoid compiler misoptimization.
- Fixed hypotetic unaligned access to 64-bit dwords on ARM with `__ARM_FEATURE_UNALIGNED` defined.
- Reasonable paranoia that makes clarity for code readers.
- Minor fixes Doxygen references, comments, descriptions, etc.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
The three points:
- disentangle C11-atomic fences/barriers and pure-functions (with `__attribute__((__pure__))`) to avoid compiler misoptimization;
- fix hypotetic unaligned access to 64-bit dwords on ARM with `__ARM_FEATURE_UNALIGNED` defined;
- reasonable paranoia that makes clarity for code readers.
The stable release with fixes for large and huge databases sized of 4..128 TiB.
Acknowledgements:
-----------------
- Ledgerwatch, Binance and Positive Technologies teams for reporting, assistance in investigation and testing.
- Alex Sharov for reporting, testing and provide resources for remote debugging/investigation.
- Kris Zyp for Deno support.
New features, extensions and improvements:
------------------------------------------
- Added treating the `UINT64_MAX` value as maximum for given option inside `mdbx_env_set_option()`.
- Added `to_hex/to_base58/to_base64::output(std::ostream&)` overloads without using temporary string objects as buffers.
- Added `--geometry-jitter=YES|no` option to the test framework.
- Added support for [Deno](https://deno.land/) support by [Kris Zyp](https://github.com/kriszyp).
Fixes:
------
- Fixed handling `MDBX_opt_rp_augment_limit` for GC's records from huge transactions (Erigon/Akula/Ethereum).
- [Fixed](https://github.com/erthink/libmdbx/issues/258) build on Android (avoid including `sys/sem.h`).
- [Fixed](https://github.com/erthink/libmdbx/pull/261) missing copy assignment operator for `mdbx::move_result`.
- Fixed missing `&` for `std::ostream &operator<<()` overloads.
- Fixed unexpected `EXDEV` (Cross-device link) error from `mdbx_env_copy()`.
- Fixed base64 encoding/decoding bugs in auxillary C++ API.
- Fixed overflow of `pgno_t` during checking PNL on 64-bit platforms.
- [Fixed](https://github.com/erthink/libmdbx/issues/260) excessive PNL checking after sort for spilling.
- Reworked checking `MAX_PAGENO` and DB upper-size geometry limit.
- [Fixed](https://github.com/erthink/libmdbx/issues/265) build for some combinations of versions of MSVC and Windows SDK.
Minors:
-------
- Added workaround for CLANG bug [D79919/PR42445](https://reviews.llvm.org/D79919).
- Fixed build test on Android (using `pthread_barrier_t` stub).
- Disabled C++20 concepts for CLANG < 14 on Android.
- Fixed minor `unused parameter` warning.
- Added CI for Android.
- Refine/cleanup internal logging.
- Refined line splitting inside hex/base58/base64 encoding to avoid `\n` at the end.
- Added workaround for modern libstdc++ with CLANG < 4.x
- Relaxed txn-check rules for auxiliary functions.
- Clarified a comments and descriptions, etc.
- Using the `-fno-semantic interposition` option to reduce the overhead to calling self own public functions.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
This bug triggered only in the DEBUG builds or when the assertion checking is forcibly enabled.
It does not affect any core logic and cannot lead to DB corruption, data loss, and so on.
Fixes https://github.com/erthink/libmdbx/issues/260.