mirror of
https://github.com/isar/libmdbx.git
synced 2025-01-08 04:54:13 +08:00
9e5ea95f0f
Change-Id: I27c1ad3843d712823258da1067f9fd79fa255853
220 lines
7.8 KiB
Diff
220 lines
7.8 KiB
Diff
From 0bf9d06e8b090e2d9783d03074f3752ed708f6cf Mon Sep 17 00:00:00 2001
|
|
From: Leonid Yuriev <leo@yuriev.ru>
|
|
Date: Fri, 27 Nov 2020 16:31:12 +0300
|
|
Cc: Heiko Thiery <heiko.thiery@gmail.com>
|
|
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
|
|
Subject: [PATCH v5 0/1] cover letter for package/libmdbx: new package (library/database)
|
|
|
|
This patch adds libmdbx v0.9.2 and below is a brief overview of libmdbx.
|
|
Please merge.
|
|
|
|
Regards,
|
|
Leonid.
|
|
|
|
--
|
|
|
|
libmdbx is an extremely fast, compact, powerful, embedded, transactional
|
|
key-value database, with permissive license. libmdbx has a specific set
|
|
of properties and capabilities, focused on creating unique lightweight
|
|
solutions.
|
|
|
|
Historically, libmdbx (MDBX) is a deeply revised and extended descendant
|
|
of the legendary LMDB (Lightning Memory-Mapped Database). libmdbx
|
|
inherits all benefits from LMDB, but resolves some issues and adds a set
|
|
of improvements.
|
|
|
|
According to developers, for now libmdbx surpasses the LMDB in terms of
|
|
reliability, features and performance.
|
|
|
|
|
|
The most important differences MDBX from LMDB:
|
|
==============================================
|
|
|
|
1. More attention is paid to the quality of the code, to an
|
|
"unbreakability" of the API, to testing and automatic checks (i.e.
|
|
sanitizers, etc). So there:
|
|
- more control during operation;
|
|
- more checking parameters, internal audit of database structures;
|
|
- no warnings from compiler;
|
|
- no issues from ASAN, UBSAN, Valgrind, Coverity;
|
|
- etc.
|
|
|
|
2. Keys could be more than 2 times longer than LMDB.
|
|
|
|
3. Up to 20% faster than LMDB in CRUD benchmarks.
|
|
|
|
4. Automatic on-the-fly database size adjustment,
|
|
both increment and reduction.
|
|
|
|
5. Automatic continuous zero-overhead database compactification.
|
|
|
|
6. The same database format for 32- and 64-bit builds.
|
|
|
|
7. LIFO policy for Garbage Collection recycling (this can significantly
|
|
increase write performance due write-back disk cache up to several times
|
|
in a best case scenario).
|
|
|
|
8. Range query estimation.
|
|
|
|
9. Utility for checking the integrity of the database structure with
|
|
some recovery capabilities.
|
|
|
|
For more info please refer:
|
|
- https://github.com/erthink/libmdbx for source code and README.
|
|
- https://erthink.github.io/libmdbx for API description.
|
|
|
|
--
|
|
|
|
MDBX is a Btree-based database management library modeled loosely on the
|
|
BerkeleyDB API, but much simplified. The entire database (aka
|
|
"environment") is exposed in a memory map, and all data fetches return
|
|
data directly from the mapped memory, so no malloc's or memcpy's occur
|
|
during data fetches. As such, the library is extremely simple because it
|
|
requires no page caching layer of its own, and it is extremely high
|
|
performance and memory-efficient. It is also fully transactional with
|
|
full ACID semantics, and when the memory map is read-only, the database
|
|
integrity cannot be corrupted by stray pointer writes from application
|
|
code.
|
|
|
|
The library is fully thread-aware and supports concurrent read/write
|
|
access from multiple processes and threads. Data pages use a
|
|
copy-on-write strategy so no active data pages are ever overwritten,
|
|
which also provides resistance to corruption and eliminates the need of
|
|
any special recovery procedures after a system crash. Writes are fully
|
|
serialized; only one write transaction may be active at a time, which
|
|
guarantees that writers can never deadlock. The database structure is
|
|
multi-versioned so readers run with no locks; writers cannot block
|
|
readers, and readers don't block writers.
|
|
|
|
Unlike other well-known database mechanisms which use either write-ahead
|
|
transaction logs or append-only data writes, MDBX requires no
|
|
maintenance during operation. Both write-ahead loggers and append-only
|
|
databases require periodic checkpointing and/or compaction of their log
|
|
or database files otherwise they grow without bound. MDBX tracks
|
|
retired/freed pages within the database and re-uses them for new write
|
|
operations, so the database size does not grow without bound in normal
|
|
use.
|
|
|
|
The memory map can be used as a read-only or read-write map. It is
|
|
read-only by default as this provides total immunity to corruption.
|
|
Using read-write mode offers much higher write performance, but adds the
|
|
possibility for stray application writes thru pointers to silently
|
|
corrupt the database.
|
|
|
|
|
|
Features
|
|
========
|
|
|
|
- Key-value data model, keys are always sorted.
|
|
|
|
- Fully ACID-compliant, through to MVCC and CoW.
|
|
|
|
- Multiple key-value sub-databases within a single datafile.
|
|
|
|
- Range lookups, including range query estimation.
|
|
|
|
- Efficient support for short fixed length keys, including native
|
|
32/64-bit integers.
|
|
|
|
- Ultra-efficient support for multimaps. Multi-values sorted, searchable
|
|
and iterable. Keys stored without duplication.
|
|
|
|
- Data is memory-mapped and accessible directly/zero-copy. Traversal of
|
|
database records is extremely-fast.
|
|
|
|
- Transactions for readers and writers, ones do not block others.
|
|
|
|
- Writes are strongly serialized. No transaction conflicts nor
|
|
deadlocks.
|
|
|
|
- Readers are non-blocking, notwithstanding snapshot isolation.
|
|
|
|
- Nested write transactions.
|
|
|
|
- Reads scale linearly across CPUs.
|
|
|
|
- Continuous zero-overhead database compactification.
|
|
|
|
- Automatic on-the-fly database size adjustment.
|
|
|
|
- Customizable database page size.
|
|
|
|
- Olog(N) cost of lookup, insert, update, and delete operations by
|
|
virtue of B+ tree characteristics.
|
|
|
|
- Online hot backup.
|
|
|
|
- Append operation for efficient bulk insertion of pre-sorted data.
|
|
|
|
- No WAL nor any transaction journal. No crash recovery needed. No
|
|
maintenance is required.
|
|
|
|
- No internal cache and/or memory management, all done by basic OS
|
|
services.
|
|
|
|
|
|
Limitations
|
|
===========
|
|
|
|
- Page size: a power of 2, maximum 65536 bytes, default 4096 bytes.
|
|
|
|
- Key size: minimum 0, maximum ≈¼ pagesize (1300 bytes for default 4K
|
|
pagesize, 21780 bytes for 64K pagesize).
|
|
|
|
- Value size: minimum 0, maximum 2146435072 (0x7FF00000) bytes for maps,
|
|
≈¼ pagesize for multimaps (1348 bytes default 4K pagesize, 21828 bytes
|
|
for 64K pagesize).
|
|
|
|
- Write transaction size: up to 4194301 (0x3FFFFD) pages (16 GiB for
|
|
default 4K pagesize, 256 GiB for 64K pagesize).
|
|
|
|
- Database size: up to 2147483648 pages (8 TiB for default 4K pagesize,
|
|
128 TiB for 64K pagesize).
|
|
|
|
- Maximum sub-databases: 32765.
|
|
|
|
|
|
Gotchas
|
|
=======
|
|
|
|
- There cannot be more than one writer at a time, i.e. no more than one
|
|
write transaction at a time.
|
|
|
|
- libmdbx is based on B+ tree, so access to database pages is mostly
|
|
random. Thus SSDs provide a significant performance boost over
|
|
spinning disks for large databases.
|
|
|
|
- libmdbx uses shadow paging instead of WAL. Thus syncing data to disk
|
|
might be a bottleneck for write intensive workload.
|
|
|
|
- libmdbx uses copy-on-write for snapshot isolation during updates, but
|
|
read transactions prevents recycling an old retired/freed pages, since
|
|
it read ones. Thus altering of data during a parallel long-lived read
|
|
operation will increase the process work set, may exhaust entire free
|
|
database space, the database can grow quickly, and result in
|
|
performance degradation. Try to avoid long running read transactions.
|
|
|
|
- libmdbx is extraordinarily fast and provides minimal overhead for data
|
|
access, so you should reconsider using brute force techniques and
|
|
double check your code. On the one hand, in the case of libmdbx, a
|
|
simple linear search may be more profitable than complex indexes. On
|
|
the other hand, if you make something suboptimally, you can notice
|
|
detrimentally only on sufficiently large data.
|
|
|
|
--
|
|
Leonid Yuriev (1):
|
|
package/libmdbx: new package (library/database).
|
|
|
|
DEVELOPERS | 3 +++
|
|
package/Config.in | 1 +
|
|
package/libmdbx/Config.in | 45 ++++++++++++++++++++++++++++++++++++
|
|
package/libmdbx/libmdbx.hash | 5 ++++
|
|
package/libmdbx/libmdbx.mk | 33 ++++++++++++++++++++++++++
|
|
5 files changed, 87 insertions(+)
|
|
create mode 100644 package/libmdbx/Config.in
|
|
create mode 100644 package/libmdbx/libmdbx.hash
|
|
create mode 100644 package/libmdbx/libmdbx.mk
|
|
|
|
--
|
|
2.29.2
|