mirror of
https://github.com/isar/libmdbx.git
synced 2025-01-23 07:28:20 +08:00
256 lines
9.8 KiB
Diff
256 lines
9.8 KiB
Diff
|
From 0bf9d06e8b090e2d9783d03074f3752ed708f6cf Mon Sep 17 00:00:00 2001
|
||
|
In-Reply-To: <CAO2+NUDJUL_FYfQ3E_cmmu20F3VOagxDpy_243a6mU3oAeASNA@mail.gmail.com>
|
||
|
References: <CAO2+NUDJUL_FYfQ3E_cmmu20F3VOagxDpy_243a6mU3oAeASNA@mail.gmail.com>
|
||
|
From: Leonid Yuriev <leo@yuriev.ru>
|
||
|
Date: Tue, 6 Oct 2020 22:19:56 +0300
|
||
|
Cc: Heiko Thiery <heiko.thiery@gmail.com>
|
||
|
Subject: [PATCH v4 0/1] package/libmdbx: new package (library/database)
|
||
|
|
||
|
This patch adds libmdbx v0.9.1 and below is a brief overview of libmdbx.
|
||
|
|
||
|
Changes v1 -> v2:
|
||
|
- libmdbx version v0.8.2 -> v0.9.1 (released 2020-09-30)
|
||
|
|
||
|
Changes v2 -> v3:
|
||
|
- removed outcommented stuff (suggested by Heiko Thiery).
|
||
|
- cleaned up and simplified the makefile (suggested by Heiko Thiery).
|
||
|
- added patch for C++ header installation.
|
||
|
- added patch with fix minor copy&paste typo.
|
||
|
- added patch with pthread workaround for buggy toolchain/cmake/buildroot.
|
||
|
- added patch for `pthread_yield()`.
|
||
|
- passed `utils/check-package package/libmdbx/*` (suggested by Heiko Thiery).
|
||
|
- passed `utils/test-pkg -a -p libmdbx` (suggested by Heiko Thiery).
|
||
|
|
||
|
Changes v3 -> v4:
|
||
|
- fix passing BR2_PACKAGE_LIBMDBX_TOOLS option to cmake.
|
||
|
- fix using `depend on` instead of `select`,
|
||
|
and add `comment` for C++ toolchain (suggested by Heiko Thiery).
|
||
|
- fix minor help typo.
|
||
|
|
||
|
Please merge.
|
||
|
|
||
|
Regards,
|
||
|
Leonid.
|
||
|
|
||
|
--
|
||
|
|
||
|
libmdbx is an extremely fast, compact, powerful, embedded, transactional
|
||
|
key-value database, with permissive license. libmdbx has a specific set
|
||
|
of properties and capabilities, focused on creating unique lightweight
|
||
|
solutions.
|
||
|
|
||
|
Historically, libmdbx (MDBX) is a deeply revised and extended descendant
|
||
|
of the legendary LMDB (Lightning Memory-Mapped Database). libmdbx
|
||
|
inherits all benefits from LMDB, but resolves some issues and adds a set
|
||
|
of improvements.
|
||
|
|
||
|
According to developers, for now libmdbx surpasses the LMDB in terms of
|
||
|
reliability, features and performance.
|
||
|
|
||
|
|
||
|
The most important differences MDBX from LMDB:
|
||
|
==============================================
|
||
|
|
||
|
1. More attention is paid to the quality of the code, to an
|
||
|
"unbreakability" of the API, to testing and automatic checks (i.e.
|
||
|
sanitizers, etc). So there:
|
||
|
- more control during operation;
|
||
|
- more checking parameters, internal audit of database structures;
|
||
|
- no warnings from compiler;
|
||
|
- no issues from ASAN, UBSAN, Valgrind, Coverity;
|
||
|
- etc.
|
||
|
|
||
|
2. Keys could be more than 2 times longer than LMDB.
|
||
|
|
||
|
3. Up to 20% faster than LMDB in CRUD benchmarks.
|
||
|
|
||
|
4. Automatic on-the-fly database size adjustment,
|
||
|
both increment and reduction.
|
||
|
|
||
|
5. Automatic continuous zero-overhead database compactification.
|
||
|
|
||
|
6. The same database format for 32- and 64-bit builds.
|
||
|
|
||
|
7. LIFO policy for Garbage Collection recycling (this can significantly
|
||
|
increase write performance due write-back disk cache up to several times
|
||
|
in a best case scenario).
|
||
|
|
||
|
8. Range query estimation.
|
||
|
|
||
|
9. Utility for checking the integrity of the database structure with
|
||
|
some recovery capabilities.
|
||
|
|
||
|
For more info please refer:
|
||
|
- https://github.com/erthink/libmdbx for source code and README.
|
||
|
- https://erthink.github.io/libmdbx for API description.
|
||
|
|
||
|
--
|
||
|
|
||
|
MDBX is a Btree-based database management library modeled loosely on the
|
||
|
BerkeleyDB API, but much simplified. The entire database (aka
|
||
|
"environment") is exposed in a memory map, and all data fetches return
|
||
|
data directly from the mapped memory, so no malloc's or memcpy's occur
|
||
|
during data fetches. As such, the library is extremely simple because it
|
||
|
requires no page caching layer of its own, and it is extremely high
|
||
|
performance and memory-efficient. It is also fully transactional with
|
||
|
full ACID semantics, and when the memory map is read-only, the database
|
||
|
integrity cannot be corrupted by stray pointer writes from application
|
||
|
code.
|
||
|
|
||
|
The library is fully thread-aware and supports concurrent read/write
|
||
|
access from multiple processes and threads. Data pages use a
|
||
|
copy-on-write strategy so no active data pages are ever overwritten,
|
||
|
which also provides resistance to corruption and eliminates the need of
|
||
|
any special recovery procedures after a system crash. Writes are fully
|
||
|
serialized; only one write transaction may be active at a time, which
|
||
|
guarantees that writers can never deadlock. The database structure is
|
||
|
multi-versioned so readers run with no locks; writers cannot block
|
||
|
readers, and readers don't block writers.
|
||
|
|
||
|
Unlike other well-known database mechanisms which use either write-ahead
|
||
|
transaction logs or append-only data writes, MDBX requires no
|
||
|
maintenance during operation. Both write-ahead loggers and append-only
|
||
|
databases require periodic checkpointing and/or compaction of their log
|
||
|
or database files otherwise they grow without bound. MDBX tracks
|
||
|
retired/freed pages within the database and re-uses them for new write
|
||
|
operations, so the database size does not grow without bound in normal
|
||
|
use.
|
||
|
|
||
|
The memory map can be used as a read-only or read-write map. It is
|
||
|
read-only by default as this provides total immunity to corruption.
|
||
|
Using read-write mode offers much higher write performance, but adds the
|
||
|
possibility for stray application writes thru pointers to silently
|
||
|
corrupt the database.
|
||
|
|
||
|
|
||
|
Features
|
||
|
========
|
||
|
|
||
|
- Key-value data model, keys are always sorted.
|
||
|
|
||
|
- Fully ACID-compliant, through to MVCC and CoW.
|
||
|
|
||
|
- Multiple key-value sub-databases within a single datafile.
|
||
|
|
||
|
- Range lookups, including range query estimation.
|
||
|
|
||
|
- Efficient support for short fixed length keys, including native
|
||
|
32/64-bit integers.
|
||
|
|
||
|
- Ultra-efficient support for multimaps. Multi-values sorted, searchable
|
||
|
and iterable. Keys stored without duplication.
|
||
|
|
||
|
- Data is memory-mapped and accessible directly/zero-copy. Traversal of
|
||
|
database records is extremely-fast.
|
||
|
|
||
|
- Transactions for readers and writers, ones do not block others.
|
||
|
|
||
|
- Writes are strongly serialized. No transaction conflicts nor
|
||
|
deadlocks.
|
||
|
|
||
|
- Readers are non-blocking, notwithstanding snapshot isolation.
|
||
|
|
||
|
- Nested write transactions.
|
||
|
|
||
|
- Reads scale linearly across CPUs.
|
||
|
|
||
|
- Continuous zero-overhead database compactification.
|
||
|
|
||
|
- Automatic on-the-fly database size adjustment.
|
||
|
|
||
|
- Customizable database page size.
|
||
|
|
||
|
- Olog(N) cost of lookup, insert, update, and delete operations by
|
||
|
virtue of B+ tree characteristics.
|
||
|
|
||
|
- Online hot backup.
|
||
|
|
||
|
- Append operation for efficient bulk insertion of pre-sorted data.
|
||
|
|
||
|
- No WAL nor any transaction journal. No crash recovery needed. No
|
||
|
maintenance is required.
|
||
|
|
||
|
- No internal cache and/or memory management, all done by basic OS
|
||
|
services.
|
||
|
|
||
|
|
||
|
Limitations
|
||
|
===========
|
||
|
|
||
|
- Page size: a power of 2, maximum 65536 bytes, default 4096 bytes.
|
||
|
|
||
|
- Key size: minimum 0, maximum ≈¼ pagesize (1300 bytes for default 4K
|
||
|
pagesize, 21780 bytes for 64K pagesize).
|
||
|
|
||
|
- Value size: minimum 0, maximum 2146435072 (0x7FF00000) bytes for maps,
|
||
|
≈¼ pagesize for multimaps (1348 bytes default 4K pagesize, 21828 bytes
|
||
|
for 64K pagesize).
|
||
|
|
||
|
- Write transaction size: up to 4194301 (0x3FFFFD) pages (16 GiB for
|
||
|
default 4K pagesize, 256 GiB for 64K pagesize).
|
||
|
|
||
|
- Database size: up to 2147483648 pages (8 TiB for default 4K pagesize,
|
||
|
128 TiB for 64K pagesize).
|
||
|
|
||
|
- Maximum sub-databases: 32765.
|
||
|
|
||
|
|
||
|
Gotchas
|
||
|
=======
|
||
|
|
||
|
- There cannot be more than one writer at a time, i.e. no more than one
|
||
|
write transaction at a time.
|
||
|
|
||
|
- libmdbx is based on B+ tree, so access to database pages is mostly
|
||
|
random. Thus SSDs provide a significant performance boost over
|
||
|
spinning disks for large databases.
|
||
|
|
||
|
- libmdbx uses shadow paging instead of WAL. Thus syncing data to disk
|
||
|
might be a bottleneck for write intensive workload.
|
||
|
|
||
|
- libmdbx uses copy-on-write for snapshot isolation during updates, but
|
||
|
read transactions prevents recycling an old retired/freed pages, since
|
||
|
it read ones. Thus altering of data during a parallel long-lived read
|
||
|
operation will increase the process work set, may exhaust entire free
|
||
|
database space, the database can grow quickly, and result in
|
||
|
performance degradation. Try to avoid long running read transactions.
|
||
|
|
||
|
- libmdbx is extraordinarily fast and provides minimal overhead for data
|
||
|
access, so you should reconsider using brute force techniques and
|
||
|
double check your code. On the one hand, in the case of libmdbx, a
|
||
|
simple linear search may be more profitable than complex indexes. On
|
||
|
the other hand, if you make something suboptimally, you can notice
|
||
|
detrimentally only on sufficiently large data.
|
||
|
|
||
|
--
|
||
|
Leonid Yuriev (1):
|
||
|
package/libmdbx: new package (library/database).
|
||
|
|
||
|
DEVELOPERS | 3 +
|
||
|
package/Config.in | 1 +
|
||
|
.../0001-mdbx-fix-minor-copy-paste-typo.patch | 27 ++++++++
|
||
|
...e-fix-missing-installation-of-mdbx.h.patch | 54 +++++++++++++++
|
||
|
.../0003-mdbx-fix-obsolete-__noreturn.patch | 27 ++++++++
|
||
|
...ield-instruction-on-ARM-if-unsupport.patch | 28 ++++++++
|
||
|
...fix-minor-false-positive-GCC-warning.patch | 27 ++++++++
|
||
|
...ad-workaround-for-buggy-toolchain-cm.patch | 65 +++++++++++++++++++
|
||
|
...mdbx-fix-pthread_yield-for-non-GLIBC.patch | 42 ++++++++++++
|
||
|
package/libmdbx/Config.in | 42 ++++++++++++
|
||
|
package/libmdbx/libmdbx.hash | 2 +
|
||
|
package/libmdbx/libmdbx.mk | 24 +++++++
|
||
|
12 files changed, 342 insertions(+)
|
||
|
create mode 100644 package/libmdbx/0001-mdbx-fix-minor-copy-paste-typo.patch
|
||
|
create mode 100644 package/libmdbx/0002-mdbx-cmake-fix-missing-installation-of-mdbx.h.patch
|
||
|
create mode 100644 package/libmdbx/0003-mdbx-fix-obsolete-__noreturn.patch
|
||
|
create mode 100644 package/libmdbx/0004-mdbx-don-t-use-yield-instruction-on-ARM-if-unsupport.patch
|
||
|
create mode 100644 package/libmdbx/0005-mdbx-load-fix-minor-false-positive-GCC-warning.patch
|
||
|
create mode 100644 package/libmdbx/0006-mdbx-cmake-pthread-workaround-for-buggy-toolchain-cm.patch
|
||
|
create mode 100644 package/libmdbx/0007-mdbx-fix-pthread_yield-for-non-GLIBC.patch
|
||
|
create mode 100644 package/libmdbx/Config.in
|
||
|
create mode 100644 package/libmdbx/libmdbx.hash
|
||
|
create mode 100644 package/libmdbx/libmdbx.mk
|
||
|
|
||
|
--
|
||
|
2.28.0
|
||
|
|