Briefly, this commit fixes a missed flaw:
- Cursor tracking is required to replacing shaded pages and adjusting the positions in writing transactions;
- Thus, historically, an internal linked list was maintained for a read-write transactions, but not for a read-only.
For this reason, the API for using cursors should be different for writing and reading transactions;
- However, the libmdbx's API has been significantly improved, including the ability to reuse cursors and a uniform cursors behavior for any kind of transactions.
My mistake is that due to working with MithrilDB, I forgot to make a same changes to libmdbx.
Fixes https://github.com/erthink/libmdbx/issues/272.
The stable release with the hotfix/workaround for a flaw of Linux 4.19 (at least) unified page/buffer cache.
See [issue#269](https://github.com/erthink/libmdbx/issues/269) for more information.
Acknowledgements:
-----------------
- [Simon Leier](https://github.com/leisim) for reporting and testing.
- [Kai Wetlesen](https://github.com/kaiwetlesen) for [RPMs](http://copr.fedorainfracloud.org/coprs/kwetlesen/libmdbx/).
- [Tullio Canepa](https://github.com/canepat) for reporting C++ API issue and contributing.
Fixes:
------
- [Added workaround](https://github.com/erthink/libmdbx/issues/269) for a flaw of Linux 4.19 (at least) unified page/buffer cache.
- [Fixed/Reworked](https://github.com/erthink/libmdbx/pull/270) move-assignment operators for "managed" classes of C++ API.
- Fixed potential `SIGSEGV` while open DB with overrided non-default page size.
- [Made](https://github.com/erthink/libmdbx/issues/267) `mdbx_env_open()` idempotence in failure cases.
- Refined/Fixed pages reservation inside `mdbx_update_gc()` to avoid non-reclamation in a rare cases.
- Fixed typo in a retained space calculation for the hsr-callback.
Minors:
-------
- Reworked functions for meta-pages, split-off non-volatile.
- Disentangled C11-atomic fences/barriers and pure-functions (with `__attribute__((__pure__))`) to avoid compiler misoptimization.
- Fixed hypotetic unaligned access to 64-bit dwords on ARM with `__ARM_FEATURE_UNALIGNED` defined.
- Reasonable paranoia that makes clarity for code readers.
- Minor fixes Doxygen references, comments, descriptions, etc.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
The three points:
- disentangle C11-atomic fences/barriers and pure-functions (with `__attribute__((__pure__))`) to avoid compiler misoptimization;
- fix hypotetic unaligned access to 64-bit dwords on ARM with `__ARM_FEATURE_UNALIGNED` defined;
- reasonable paranoia that makes clarity for code readers.
The stable release with fixes for large and huge databases sized of 4..128 TiB.
Acknowledgements:
-----------------
- Ledgerwatch, Binance and Positive Technologies teams for reporting, assistance in investigation and testing.
- Alex Sharov for reporting, testing and provide resources for remote debugging/investigation.
- Kris Zyp for Deno support.
New features, extensions and improvements:
------------------------------------------
- Added treating the `UINT64_MAX` value as maximum for given option inside `mdbx_env_set_option()`.
- Added `to_hex/to_base58/to_base64::output(std::ostream&)` overloads without using temporary string objects as buffers.
- Added `--geometry-jitter=YES|no` option to the test framework.
- Added support for [Deno](https://deno.land/) support by [Kris Zyp](https://github.com/kriszyp).
Fixes:
------
- Fixed handling `MDBX_opt_rp_augment_limit` for GC's records from huge transactions (Erigon/Akula/Ethereum).
- [Fixed](https://github.com/erthink/libmdbx/issues/258) build on Android (avoid including `sys/sem.h`).
- [Fixed](https://github.com/erthink/libmdbx/pull/261) missing copy assignment operator for `mdbx::move_result`.
- Fixed missing `&` for `std::ostream &operator<<()` overloads.
- Fixed unexpected `EXDEV` (Cross-device link) error from `mdbx_env_copy()`.
- Fixed base64 encoding/decoding bugs in auxillary C++ API.
- Fixed overflow of `pgno_t` during checking PNL on 64-bit platforms.
- [Fixed](https://github.com/erthink/libmdbx/issues/260) excessive PNL checking after sort for spilling.
- Reworked checking `MAX_PAGENO` and DB upper-size geometry limit.
- [Fixed](https://github.com/erthink/libmdbx/issues/265) build for some combinations of versions of MSVC and Windows SDK.
Minors:
-------
- Added workaround for CLANG bug [D79919/PR42445](https://reviews.llvm.org/D79919).
- Fixed build test on Android (using `pthread_barrier_t` stub).
- Disabled C++20 concepts for CLANG < 14 on Android.
- Fixed minor `unused parameter` warning.
- Added CI for Android.
- Refine/cleanup internal logging.
- Refined line splitting inside hex/base58/base64 encoding to avoid `\n` at the end.
- Added workaround for modern libstdc++ with CLANG < 4.x
- Relaxed txn-check rules for auxiliary functions.
- Clarified a comments and descriptions, etc.
- Using the `-fno-semantic interposition` option to reduce the overhead to calling self own public functions.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
This bug triggered only in the DEBUG builds or when the assertion checking is forcibly enabled.
It does not affect any core logic and cannot lead to DB corruption, data loss, and so on.
Fixes https://github.com/erthink/libmdbx/issues/260.
Added a check that the data of the BIGDATA node (containing the target page number) is located within the boundaries of the page being checked.
The third case of https://github.com/erthink/libmdbx/issues/217.
Here are some changes to avoid recursive acquisition of SRW-lock,
which is still in use:
- Read transactions don't acquire the shared SRW-lock with `MDBX_NOTLS.
- Memory-mapping of DB is always kept while DB opened,
therefore following limitations are:
- DB file can't be shrinked while it used,
including auto-shrink due to auto-compactification with corresponding geometry settings.
- The upper limit of DB size can't be changed while DB is used.
- The DB can grow within the upper size limit defined while opening by a first process,
but this does not work under Wine since there is no `NtExtendSection()` function.
Partially fix https://github.com/erthink/libmdbx/issues/203
Acknowledgements:
-----------------
- [Mahlon E. Smith](https://github.com/mahlonsmith) for [Ruby bindings](https://rubygems.org/gems/mdbx/).
- [Alex Sharov](https://github.com/AskAlexSharov) for [mdbx-go](https://github.com/torquem-ch/mdbx-go), bug reporting and testing.
- [Artem Vorotnikov](https://github.com/vorot93) for bug reporting and PR.
- [Paolo Rebuffo](https://www.linkedin.com/in/paolo-rebuffo-8255766/), [Alexey Akhunov](https://github.com/AlexeyAkhunov) and Mark Grosberg for donations.
- [Noel Kuntze](https://github.com/Thermi) for preliminary [Python bindings](https://github.com/Thermi/libmdbx/tree/python-bindings)
New features:
-------------
- Added `mdbx_env_set_option()` and `mdbx_env_get_option()` for controls
various runtime options for an environment (announce of this feature was missed in a previous news).
- Added `MDBX_DISABLE_PAGECHECKS` build option to disable some checks to reduce an overhead
and detection probability of database corruption to a values closer to the LMDB.
The `MDBX_DISABLE_PAGECHECKS=1` provides a performance boost of about 10% in CRUD scenarios,
and conjointly with the `MDBX_ENV_CHECKPID=0` and `MDBX_TXN_CHECKOWNER=0` options can yield
up to 30% more performance compared to LMDB.
- Using float point (exponential quantized) representation for internal 16-bit values
of grow step and shrink threshold when huge ones (https://github.com/erthink/libmdbx/issues/166).
To minimize the impact on compatibility, only the odd values inside the upper half
of the range (i.e. 32769..65533) are used for the new representation.
- Added the `mdbx_drop` similar to LMDB command-line tool to purge or delete (sub)database(s).
- [Ruby bindings](https://rubygems.org/gems/mdbx/) is available now by [Mahlon E. Smith](https://github.com/mahlonsmith).
- Added `MDBX_ENABLE_MADVISE` build option which controls the use of POSIX `madvise()` hints and friends.
- The internal node sizes were refined, resulting in a reduction in large/overflow pages in some use cases
and a slight increase in limits for a keys size to ≈½ of page size.
- Added to `mdbx_chk` output number of keys/items on pages.
- Added explicit `install-strip` and `install-no-strip` targets to the `Makefile` (https://github.com/erthink/libmdbx/pull/180).
- Major rework page splitting (af9b7b560505684249b76730997f9e00614b8113) for
- An "auto-appending" feature upon insertion for both ascending and
descending key sequences. As a result, the optimality of page filling
increases significantly (more densely, less slackness) while
inserting ordered sequences of keys,
- A "splitting at middle" to make page tree more balanced on average.
- Added `mdbx_get_sysraminfo()` to the API.
- Added guessing a reasonable maximum DB size for the default upper limit of geometry (https://github.com/erthink/libmdbx/issues/183).
- Major rework internal labeling of a dirty pages (958fd5b9479f52f2124ab7e83c6b18b04b0e7dda) for
a "transparent spilling" feature with the gist to make a dirty pages
be ready to spilling (writing to a disk) without further altering ones.
Thus in the `MDBX_WRITEMAP` mode the OS kernel able to oust dirty pages
to DB file without further penalty during transaction commit.
As a result, page swapping and I/O could be significantly reduced during extra large transactions and/or lack of memory.
- Minimized reading leaf-pages during dropping subDB(s) and nested trees.
- Major rework a spilling of dirty pages to support [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU))
policy and prioritization for a large/overflow pages.
- Statistics of page operations (split, merge, copy, spill, etc) now available through `mdbx_env_info_ex()`.
- Auto-setup limit for length of dirty pages list (`MDBX_opt_txn_dp_limit` option).
- Support `make options` to list available build options.
- Support `make help` to list available make targets.
- Silently `make`'s build by default.
- Preliminary [Python bindings](https://github.com/Thermi/libmdbx/tree/python-bindings) is available now
by [Noel Kuntze](https://github.com/Thermi) (https://github.com/erthink/libmdbx/issues/147).
Backward compatibility break:
-----------------------------
- The `MDBX_AVOID_CRT` build option was renamed to `MDBX_WITHOUT_MSVC_CRT`.
This option is only relevant when building for Windows.
- The `mdbx_env_stat()` always, and `mdbx_env_stat_ex()` when called with the zeroed transaction parameter,
now internally start temporary read transaction and thus may returns `MDBX_BAD_RSLOT` error.
So, just never use deprecated `mdbx_env_stat()' and call `mdbx_env_stat_ex()` with transaction parameter.
- The build option `MDBX_CONFIG_MANUAL_TLS_CALLBACK` was removed and now just a non-zero value of
the `MDBX_MANUAL_MODULE_HANDLER` macro indicates the requirement to manually call `mdbx_module_handler()`
when loading libraries and applications uses statically linked libmdbx on an obsolete Windows versions.
Fixes:
------
- Fixed performance regression due non-optimal C11 atomics usage (https://github.com/erthink/libmdbx/issues/160).
- Fixed "reincarnation" of subDB after it deletion (https://github.com/erthink/libmdbx/issues/168).
- Fixed (disallowing) implicit subDB deletion via operations on `@MAIN`'s DBI-handle.
- Fixed a crash of `mdbx_env_info_ex()` in case of a call for a non-open environment (https://github.com/erthink/libmdbx/issues/171).
- Fixed the selecting/adjustment values inside `mdbx_env_set_geometry()` for implicit out-of-range cases (https://github.com/erthink/libmdbx/issues/170).
- Fixed `mdbx_env_set_option()` for set initial and limit size of dirty page list ((https://github.com/erthink/libmdbx/issues/179).
- Fixed an unreasonably huge default upper limit for DB geometry (https://github.com/erthink/libmdbx/issues/183).
- Fixed `constexpr` specifier for the `slice::invalid()`.
- Fixed (no)readahead auto-handling (https://github.com/erthink/libmdbx/issues/164).
- Fixed non-alloy build for Windows.
- Switched to using Heap-functions instead of LocalAlloc/LocalFree on Windows.
- Fixed `mdbx_env_stat_ex()` to returning statistics of the whole environment instead of MainDB only (https://github.com/erthink/libmdbx/issues/190).
- Fixed building by GCC 4.8.5 (added workaround for a preprocessor's bug).
- Fixed building C++ part for iOS <= 13.0 (unavailability of `std::filesystem::path`).
- Fixed building for Windows target versions prior to Windows Vista (`WIN32_WINNT < 0x0600`).
- Fixed building by MinGW for Windows (https://github.com/erthink/libmdbx/issues/155).
TODO for a next releases:
-------------------------
- [Get rid of dirty-pages list in MDBX_WRITEMAP mode](https://github.com/erthink/libmdbx/issues/193).
- [Large/Overflow pages accounting for dirty-room](https://github.com/erthink/libmdbx/issues/192).
- [C++ Buffer issue](https://github.com/erthink/libmdbx/issues/191).
- Finalize C++ API (few typos and trivia bugs are still likely for now).
- [Support for RAW devices](https://github.com/erthink/libmdbx/issues/124).
- [Test framework issue](https://github.com/erthink/libmdbx/issues/127).
- [Support MessagePack for Keys & Values](https://github.com/erthink/libmdbx/issues/115).
- [Engage new terminology](https://github.com/erthink/libmdbx/issues/137).
- Packages for [Astra Linux](https://astralinux.ru/), [ALT Linux](https://www.altlinux.org/), [ROSA Linux](https://www.rosalinux.ru/), Fedora/RHEL, Debian/Ubuntu.
Briefly:
- Now constructor/destructor of "Thread Local Storage" handled automatically when possible.
- Otherwise the MDBX_CONFIG_MANUAL_TLS_CALLBACK macro defined to 1 to indicate that mdbx_module_handle() should be called manually.
- Corresponding build option MDBX_CONFIG_MANUAL_TLS_CALLBACK was removed.
Related to https://github.com/erthink/libmdbx/issues/155
Change-Id: Ic4e6a34b44f874676f0ab212ff473460e3d80559
Resolves https://github.com/erthink/libmdbx/issues/164
---
NOTE: Seems there is a bug in the Mach/Darwin/OSX kernel,
because MADV_WILLNEED with offset != 0 may cause SIGBUS
on following access to the hinted region.
19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
Change-Id: I11ebbf2bd35e3dba9d078be16cb5678aecf8329c
Basically, this (squashed) commit introduces:
- An "auto-appending" feature upon insertion for both ascending and
descending key sequences. As a result, the optimality of page filling
increases significantly (more densely, less slackness) while
inserting ordered sequences of keys,
- A "splitting at middle" for more balanced page tree on average.
---
1. Using left/middle/right tactics for finding the split point of a page:
- If a key is inserted close to an edge of page,
then the page splits at that edge;
- Otherwise a page splits at the middle,
which leads to a more balanced tree on average;
- So I expect a better behavior on average,
but actually effects should be studied further practically.
2. New code for calculating the midpoint of a page split.
3. APPEND-flags no longer affect choosing the page split point.
4. Added left-side splitting by inserting a pure page with a new entry.
Change-Id: Id7441acfc8c90636e3be6bc00a0df15714690f3c