Added a check that the data of the BIGDATA node (containing the target page number) is located within the boundaries of the page being checked.
The third case of https://github.com/erthink/libmdbx/issues/217.
Here are some changes to avoid recursive acquisition of SRW-lock,
which is still in use:
- Read transactions don't acquire the shared SRW-lock with `MDBX_NOTLS.
- Memory-mapping of DB is always kept while DB opened,
therefore following limitations are:
- DB file can't be shrinked while it used,
including auto-shrink due to auto-compactification with corresponding geometry settings.
- The upper limit of DB size can't be changed while DB is used.
- The DB can grow within the upper size limit defined while opening by a first process,
but this does not work under Wine since there is no `NtExtendSection()` function.
Partially fix https://github.com/erthink/libmdbx/issues/203
Acknowledgements:
-----------------
- [Mahlon E. Smith](https://github.com/mahlonsmith) for [Ruby bindings](https://rubygems.org/gems/mdbx/).
- [Alex Sharov](https://github.com/AskAlexSharov) for [mdbx-go](https://github.com/torquem-ch/mdbx-go), bug reporting and testing.
- [Artem Vorotnikov](https://github.com/vorot93) for bug reporting and PR.
- [Paolo Rebuffo](https://www.linkedin.com/in/paolo-rebuffo-8255766/), [Alexey Akhunov](https://github.com/AlexeyAkhunov) and Mark Grosberg for donations.
- [Noel Kuntze](https://github.com/Thermi) for preliminary [Python bindings](https://github.com/Thermi/libmdbx/tree/python-bindings)
New features:
-------------
- Added `mdbx_env_set_option()` and `mdbx_env_get_option()` for controls
various runtime options for an environment (announce of this feature was missed in a previous news).
- Added `MDBX_DISABLE_PAGECHECKS` build option to disable some checks to reduce an overhead
and detection probability of database corruption to a values closer to the LMDB.
The `MDBX_DISABLE_PAGECHECKS=1` provides a performance boost of about 10% in CRUD scenarios,
and conjointly with the `MDBX_ENV_CHECKPID=0` and `MDBX_TXN_CHECKOWNER=0` options can yield
up to 30% more performance compared to LMDB.
- Using float point (exponential quantized) representation for internal 16-bit values
of grow step and shrink threshold when huge ones (https://github.com/erthink/libmdbx/issues/166).
To minimize the impact on compatibility, only the odd values inside the upper half
of the range (i.e. 32769..65533) are used for the new representation.
- Added the `mdbx_drop` similar to LMDB command-line tool to purge or delete (sub)database(s).
- [Ruby bindings](https://rubygems.org/gems/mdbx/) is available now by [Mahlon E. Smith](https://github.com/mahlonsmith).
- Added `MDBX_ENABLE_MADVISE` build option which controls the use of POSIX `madvise()` hints and friends.
- The internal node sizes were refined, resulting in a reduction in large/overflow pages in some use cases
and a slight increase in limits for a keys size to ≈½ of page size.
- Added to `mdbx_chk` output number of keys/items on pages.
- Added explicit `install-strip` and `install-no-strip` targets to the `Makefile` (https://github.com/erthink/libmdbx/pull/180).
- Major rework page splitting (af9b7b560505684249b76730997f9e00614b8113) for
- An "auto-appending" feature upon insertion for both ascending and
descending key sequences. As a result, the optimality of page filling
increases significantly (more densely, less slackness) while
inserting ordered sequences of keys,
- A "splitting at middle" to make page tree more balanced on average.
- Added `mdbx_get_sysraminfo()` to the API.
- Added guessing a reasonable maximum DB size for the default upper limit of geometry (https://github.com/erthink/libmdbx/issues/183).
- Major rework internal labeling of a dirty pages (958fd5b9479f52f2124ab7e83c6b18b04b0e7dda) for
a "transparent spilling" feature with the gist to make a dirty pages
be ready to spilling (writing to a disk) without further altering ones.
Thus in the `MDBX_WRITEMAP` mode the OS kernel able to oust dirty pages
to DB file without further penalty during transaction commit.
As a result, page swapping and I/O could be significantly reduced during extra large transactions and/or lack of memory.
- Minimized reading leaf-pages during dropping subDB(s) and nested trees.
- Major rework a spilling of dirty pages to support [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU))
policy and prioritization for a large/overflow pages.
- Statistics of page operations (split, merge, copy, spill, etc) now available through `mdbx_env_info_ex()`.
- Auto-setup limit for length of dirty pages list (`MDBX_opt_txn_dp_limit` option).
- Support `make options` to list available build options.
- Support `make help` to list available make targets.
- Silently `make`'s build by default.
- Preliminary [Python bindings](https://github.com/Thermi/libmdbx/tree/python-bindings) is available now
by [Noel Kuntze](https://github.com/Thermi) (https://github.com/erthink/libmdbx/issues/147).
Backward compatibility break:
-----------------------------
- The `MDBX_AVOID_CRT` build option was renamed to `MDBX_WITHOUT_MSVC_CRT`.
This option is only relevant when building for Windows.
- The `mdbx_env_stat()` always, and `mdbx_env_stat_ex()` when called with the zeroed transaction parameter,
now internally start temporary read transaction and thus may returns `MDBX_BAD_RSLOT` error.
So, just never use deprecated `mdbx_env_stat()' and call `mdbx_env_stat_ex()` with transaction parameter.
- The build option `MDBX_CONFIG_MANUAL_TLS_CALLBACK` was removed and now just a non-zero value of
the `MDBX_MANUAL_MODULE_HANDLER` macro indicates the requirement to manually call `mdbx_module_handler()`
when loading libraries and applications uses statically linked libmdbx on an obsolete Windows versions.
Fixes:
------
- Fixed performance regression due non-optimal C11 atomics usage (https://github.com/erthink/libmdbx/issues/160).
- Fixed "reincarnation" of subDB after it deletion (https://github.com/erthink/libmdbx/issues/168).
- Fixed (disallowing) implicit subDB deletion via operations on `@MAIN`'s DBI-handle.
- Fixed a crash of `mdbx_env_info_ex()` in case of a call for a non-open environment (https://github.com/erthink/libmdbx/issues/171).
- Fixed the selecting/adjustment values inside `mdbx_env_set_geometry()` for implicit out-of-range cases (https://github.com/erthink/libmdbx/issues/170).
- Fixed `mdbx_env_set_option()` for set initial and limit size of dirty page list ((https://github.com/erthink/libmdbx/issues/179).
- Fixed an unreasonably huge default upper limit for DB geometry (https://github.com/erthink/libmdbx/issues/183).
- Fixed `constexpr` specifier for the `slice::invalid()`.
- Fixed (no)readahead auto-handling (https://github.com/erthink/libmdbx/issues/164).
- Fixed non-alloy build for Windows.
- Switched to using Heap-functions instead of LocalAlloc/LocalFree on Windows.
- Fixed `mdbx_env_stat_ex()` to returning statistics of the whole environment instead of MainDB only (https://github.com/erthink/libmdbx/issues/190).
- Fixed building by GCC 4.8.5 (added workaround for a preprocessor's bug).
- Fixed building C++ part for iOS <= 13.0 (unavailability of `std::filesystem::path`).
- Fixed building for Windows target versions prior to Windows Vista (`WIN32_WINNT < 0x0600`).
- Fixed building by MinGW for Windows (https://github.com/erthink/libmdbx/issues/155).
TODO for a next releases:
-------------------------
- [Get rid of dirty-pages list in MDBX_WRITEMAP mode](https://github.com/erthink/libmdbx/issues/193).
- [Large/Overflow pages accounting for dirty-room](https://github.com/erthink/libmdbx/issues/192).
- [C++ Buffer issue](https://github.com/erthink/libmdbx/issues/191).
- Finalize C++ API (few typos and trivia bugs are still likely for now).
- [Support for RAW devices](https://github.com/erthink/libmdbx/issues/124).
- [Test framework issue](https://github.com/erthink/libmdbx/issues/127).
- [Support MessagePack for Keys & Values](https://github.com/erthink/libmdbx/issues/115).
- [Engage new terminology](https://github.com/erthink/libmdbx/issues/137).
- Packages for [Astra Linux](https://astralinux.ru/), [ALT Linux](https://www.altlinux.org/), [ROSA Linux](https://www.rosalinux.ru/), Fedora/RHEL, Debian/Ubuntu.
Briefly:
- Now constructor/destructor of "Thread Local Storage" handled automatically when possible.
- Otherwise the MDBX_CONFIG_MANUAL_TLS_CALLBACK macro defined to 1 to indicate that mdbx_module_handle() should be called manually.
- Corresponding build option MDBX_CONFIG_MANUAL_TLS_CALLBACK was removed.
Related to https://github.com/erthink/libmdbx/issues/155
Change-Id: Ic4e6a34b44f874676f0ab212ff473460e3d80559
Resolves https://github.com/erthink/libmdbx/issues/164
---
NOTE: Seems there is a bug in the Mach/Darwin/OSX kernel,
because MADV_WILLNEED with offset != 0 may cause SIGBUS
on following access to the hinted region.
19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
Change-Id: I11ebbf2bd35e3dba9d078be16cb5678aecf8329c
Basically, this (squashed) commit introduces:
- An "auto-appending" feature upon insertion for both ascending and
descending key sequences. As a result, the optimality of page filling
increases significantly (more densely, less slackness) while
inserting ordered sequences of keys,
- A "splitting at middle" for more balanced page tree on average.
---
1. Using left/middle/right tactics for finding the split point of a page:
- If a key is inserted close to an edge of page,
then the page splits at that edge;
- Otherwise a page splits at the middle,
which leads to a more balanced tree on average;
- So I expect a better behavior on average,
but actually effects should be studied further practically.
2. New code for calculating the midpoint of a page split.
3. APPEND-flags no longer affect choosing the page split point.
4. Added left-side splitting by inserting a pure page with a new entry.
Change-Id: Id7441acfc8c90636e3be6bc00a0df15714690f3c
Using float point (exponential quantized) representation for internal 16-bit values
of grow step and shrink threshold when huge ones
.
To minimize the impact on compatibility, only the odd values inside the upper half
of the range (i.e. 32769..65533) are used for the new representation.
Resolve https://github.com/erthink/libmdbx/issues/166
Change-Id: I273127c1842deef0d7d8885b55a805b1463556eb
Acknowledgements:
-----------------
- [Mahlon E. Smith](http://www.martini.nu/) for [FreeBSD port of libmdbx](https://svnweb.freebsd.org/ports/head/databases/mdbx/).
- [장세연](http://www.castis.com) for bug fixing and PR.
- [Clément Renault](https://github.com/Kerollmops/heed) for [Heed](https://github.com/Kerollmops/heed) fully typed Rust wrapper.
- [Alex Sharov](https://github.com/AskAlexSharov) for bug reporting.
- [Noel Kuntze](https://github.com/Thermi) for bug reporting.
Removed options and features:
-----------------------------
- Drop `MDBX_HUGE_TRANSACTIONS` build-option (now no longer required).
New features:
-------------
- Package for FreeBSD is available now by Mahlon E. Smith.
- New API functions to get/set various options (https://github.com/erthink/libmdbx/issues/128):
- the maximum number of named databases for the environment;
- the maximum number of threads/reader slots;
- threshold (since the last unsteady commit) to force flush the data buffers to disk;
- relative period (since the last unsteady commit) to force flush the data buffers to disk;
- limit to grow a list of reclaimed/recycled page's numbers for finding a sequence of contiguous pages for large data items;
- limit to grow a cache of dirty pages for reuse in the current transaction;
- limit of a pre-allocated memory items for dirty pages;
- limit of dirty pages for a write transaction;
- initial allocation size for dirty pages list of a write transaction;
- maximal part of the dirty pages may be spilled when necessary;
- minimal part of the dirty pages should be spilled when necessary;
- how much of the parent transaction dirty pages will be spilled while start each child transaction;
- Unlimited/Dynamic size of retired and dirty page lists (https://github.com/erthink/libmdbx/issues/123).
- Added `-p` option (purge subDB before loading) to `mdbx_load` tool.
- Reworked spilling of large transaction and committing of nested transactions:
- page spilling code reworked to avoid the flaws and bugs inherited from LMDB;
- limit for number of dirty pages now is controllable at runtime;
- a spilled pages, including overflow/large pages, now can be reused and refunded/compactified in nested transactions;
- more effective refunding/compactification especially for the loosed page cache.
- Added `MDBX_ENABLE_REFUND` and `MDBX_PNL_ASCENDING` internal/advanced build options.
- Added `mdbx_default_pagesize()` function.
- Better support architectures with a weak/relaxed memory consistency model (ARM, AARCH64, PPC, MIPS, RISC-V, etc) by means [C11 atomics](https://en.cppreference.com/w/c/atomic).
- Speed up page number lists and dirty page lists (https://github.com/erthink/libmdbx/issues/132).
- Added `LIBMDBX_NO_EXPORTS_LEGACY_API` build option.
Fixes:
------
- Fixed missing cleanup (null assigned) in the C++ commit/abort (https://github.com/erthink/libmdbx/pull/143).
- Fixed `mdbx_realloc()` for case of nullptr and `MDBX_AVOID_CRT=ON` for Windows.
- Fixed the possibility to use invalid and renewed (closed & re-opened, dropped & re-created) DBI-handles (https://github.com/erthink/libmdbx/issues/146).
- Fixed 4-byte aligned access to 64-bit integers, including access to the `bootid` meta-page's field (https://github.com/erthink/libmdbx/issues/153).
- Fixed minor/potential memory leak during page flushing and unspilling.
- Fixed handling states of cursors's and subDBs's for nested transactions.
- Fixed page leak in extra rare case the list of retired pages changed during update GC on transaction commit.
- Fixed assertions to avoid false-positive UB detection by CLANG/LLVM (https://github.com/erthink/libmdbx/issues/153).
- Fixed `MDBX_TXN_FULL` and regressive `MDBX_KEYEXIST` during large transaction commit with `MDBX_LIFORECLAIM` (https://github.com/erthink/libmdbx/issues/123).
- Fixed auto-recovery (`weak->steady` with the same boot-id) when Database size at last weak checkpoint is large than at last steady checkpoint.
- Fixed operation on systems with unusual small/large page size, including PowerPC (https://github.com/erthink/libmdbx/issues/157).
TODO:
-----
- Engage new terminology (https://github.com/erthink/libmdbx/issues/137).
- Resolve few TODOs (https://github.com/erthink/libmdbx/issues/124, https://github.com/erthink/libmdbx/issues/127, https://github.com/erthink/libmdbx/issues/115).
- Finalize C++ API.
- Packages for [ROSA Linux](https://www.rosalinux.ru/), [ALT Linux](https://www.altlinux.org/), Fedora/RHEL, Debian/Ubuntu.
Change-Id: I414b8ef2e4b90e04fb344779c0e3f1b4bd1c06be
This done better support architectures with a weak/relaxed memory consistency model (ARM, AARCH64, PPC, MIPS, RISC-V, etc).
Change-Id: Iee831c8dc564f1d027ff84b0d6daa559325d5a9b
Fix regression related to https://github.com/erthink/libmdbx/issues/123 and https://github.com/erthink/libmdbx/issues/128.
Related to https://github.com/erthink/libmdbx/issues/131.
В lifo-режиме при фиксации транзакции, записи в GC могли быть перезаписаны (с утечкой страниц БД), либо могла возникать ошибка MDBX_KEYEXISTS, по следующему сценарию:
- В истории БД были две транзакции с огромным кол-вом retired pages, после которых в GC остались две соответствующие записи.
- В ходе очередной транзакции первая из огромных GC-записей попадает в переработку и образует огромный reclaimed list.
- При фиксации транзакции производится попытка разбить огромный reclaimed list на чанки размером в одну страницу. Для этого требуется много id для записей, которые в соответствии с lifo должны быть максимально близки к голове GC, т. е. получены путем переработки последних записей GC.
- В ходе переработки последних записей очередь доходит до второй огромной записи, при этом переработка прерывается, ибо иначе reclaimed list переполнится.
- Однако прерывание переработки внутри mdbx_update_gc() трактовалось как отсутствие записей в GC, поэтому список доступных просто добавлялись соответствующие id-шники.
- Если в списке доступных id-шников для помещения в GC были переработанные, то записи с id по всему списку удалялись - тогда вторая большая запись (и возможно предыдущие) удалялись, а содержащиеся в них номера страниц выпадали из оборота.
- Если же в списке доступных id-шников не было переработанных, то чистка не проводилась - тогда при последующая попытка помещения чанков reclaimed list в GC завершалась ошибкой MDBX_KEYEXISTS, которая и возвращалась из mdbx_commit_ex().
Change-Id: I3e5d40ef7950b7476da0513c6836fcba1de74879
Historically, the page header provides 4-byte data alignment.
Therefore, unfortunately, the meta page data is also aligned on a 4-byte boundary, but contains 64-bit values.
This commit eliminates potentially unsafe access (SPARC, MIPS, etc) to these 64-bit values aligned on a 4-byte boundary.
Thus, a build with the `-fsanitize=undefined` now passes the tests both with CLANG 11 and GCC 10.
Change-Id: Ie441103e53ed96fd40507d8c0be0689e3fee69f5