mdbx: made README content less ugly.

Change-Id: I537ab63a2d8a1cd3b84d5865f689ee53a29d4ad4
2026-02-02 09:22:23 +08:00 · 2019-07-16 03:16:25 +03:00
parent 4adb1ab2d8
commit 7c7d5f4434
1 changed files with 70 additions and 97 deletions
--- a/README.md
+++ b/README.md
@@ -54,8 +54,8 @@ and free Continuous Integration service will be available.
 - [Main features](#main-features)
 - [Improvements over LMDB](#improvements-over-lmdb)
 - [Gotchas](#gotchas)
-  - [Long-time read transactions problem](#long-time-read-transactions-problem)
+  - [Problem of long-time reading](#problem-of-long-time-reading)
-  - [Data safety in async-write-mode](#data-safety-in-async-write-mode)
+  - [Durability in asynchronous writing mode](#durability-in-asynchronous-writing-mode)
 - [Performance comparison](#performance-comparison)
  - [Integral performance](#integral-performance)
  - [Read scalability](#read-scalability)
@@ -72,42 +72,31 @@ for performance under Linux and Windows.
 _libmdbx_ allows multiple processes to read and update several key-value
 tables concurrently, while being
 [ACID](https://en.wikipedia.org/wiki/ACID)-compliant, with minimal
-overhead and operation cost of Olog(N).
+overhead and Olog(N) operation cost.
-_libmdbx_ provides
+_libmdbx_ enforce [serializability](https://en.wikipedia.org/wiki/Serializability) for writers by single [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) for parallel readers without atomic/interlocked operations, while writing and reading transactions do not block each other.
 [serializability](https://en.wikipedia.org/wiki/Serializability) and
 consistency of data after crash. Read-write transactions don't block
 read-only transactions and are
 [serialized](https://en.wikipedia.org/wiki/Serializability) by
 [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion).
-_libmdbx_
+_libmdbx_ can guarantee consistency after crash depending of operation mode.
 [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom)
 provides parallel read transactions without atomic operations or
 synchronization primitives.
 _libmdbx_ uses [B+Trees](https://en.wikipedia.org/wiki/B%2B_tree) and
-[mmap](https://en.wikipedia.org/wiki/Memory-mapped_file), doesn't use
+[Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), doesn't use
-[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging). This might
+[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) which
-have caveats for some workloads.
+might be a caveat for some workloads.
 ### Comparison with other DBs
-Because _libmdbx_ is currently overhauled, I think it's better to just
+For now please refer to [chapter of "BoltDB comparison with other
-link [chapter of Comparison with other
+databases"](https://github.com/coreos/bbolt#comparison-with-other-databases)
-databases](https://github.com/coreos/bbolt#comparison-with-other-databases)
+which is also (mostly) applicable to MDBX.
 here.
 ### History
 The _libmdbx_ design is based on [Lightning Memory-Mapped
 Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database).
-Initial development was going in
+Initial development was going in [ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project.
-[ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project, about a
+About a year later libmdbx was isolated to separate project, which was [presented at Highload++
 year later it received separate development effort and in autumn 2015
 was isolated to separate project, which was [presented at Highload++
 2015 conference](http://www.highload.ru/2015/abstracts/1831.html).
-Since early 2017 _libmdbx_ is used in [Fast PositiveTables](https://github.com/leo-yuriev/libfpta),
+Since early 2017 _libmdbx_ is used in [Fast Positive Tables](https://github.com/leo-yuriev/libfpta),
-by [Positive Technologies](https://www.ptsecurity.com).
+and development is funded by [Positive Technologies](https://www.ptsecurity.com).
 #### Acknowledgments
 Howard Chu (Symas Corporation) - the author of LMDB, from which
@@ -143,10 +132,10 @@ don't use [atomic
 operations](https://en.wikipedia.org/wiki/Linearizability#High-level_atomic_operations).
 Readers don't block each other and aren't blocked by writers. Read
 performance scales linearly with CPU core count.
-  > Though "connect to DB" (start of first read transaction in thread) and
+  > Nonetheless, "connect to DB" (start of first read transaction in thread) and
  > "disconnect from DB" (shutdown or thread termination) requires to
  > acquire a lock to register/unregister current thread from "readers
-  > table"
+  > table".
 5. Keys with multiple values are stored efficiently without key
 duplication, sorted by value, including integers (reasonable for
@@ -201,7 +190,7 @@ optimal query execution plan.
 6. Support for keys and values of zero length, including sorted
 duplicates.
-7. Ability to assign up to 3 markers to commiting transaction with
+7. Ability to assign up to 3 persistent 64-bit markers to commiting transaction with
 `mdbx_canary_put()` and then get them in read transaction by
 `mdbx_canary_get()`.
@@ -346,7 +335,7 @@ performance bottleneck in `MAPASYNC` mode.
  > storage then it's much more preferable to use `std::map`.
-4. LMDB has a problem of long-time readers which degrades performance
+4. _LMDB_ has a problem of long-time readers which degrades performance
 and bloats DB.
  > _libmdbx_ addresses that, details below.
@@ -357,56 +346,41 @@ of data.
  > Details below.
-#### Long-time read transactions problem
+#### Problem of long-time reading
 Garbage collection problem exists in all databases one way or another
 (e.g. VACUUM in PostgreSQL). But in _libmdbx_ and LMDB it's even more
-important because of high performance and deliberate simplification of
+discernible because of high transaction rate and intentional internals
-internals with emphasis on performance.
+simplification in favor of performance.
-* Altering data during long read operation may exhaust available space
+Understanding the problem requires some explanation, but can be
-on persistent storage.
+difficult for quick perception. So is is reasonable
 to simplify this as follows:
-* If available space is exhausted then any attempt to update data
+* Massive altering of data during a parallel long read operation may
-results in `MAP_FULL` error until long read operation ends.
+exhaust the free DB space.
-* Main examples of long readers is hot backup and debugging of client
+* If the available space is exhausted, any attempt to update the data
-application which actively uses read transactions.
+* will cause a "MAP_FULL" error until a long read transaction is completed.
 * A good example of long readers is a hot backup or debugging of
 a client application while retaining an active read transaction.
 * In _LMDB_ this results in degraded performance of all operations of
-syncing data to persistent storage.
+writing data to persistent storage.
-* _libmdbx_ has a mechanism which aborts such operations and `LIFO RECLAIM`
+* _libmdbx_ has the `OOM-KICK` mechanism which allow to abort such
-mode which addresses performance degradation.
+operations and the `LIFO RECLAIM` mode which addresses performance
 degradation.
-Read operations operate only over snapshot of DB which is consistent on
+#### Durability in asynchronous writing mode
-the moment when read transaction started. This snapshot doesn't change
+In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written
-throughout the transaction but this leads to inability to reclaim the
+to persistent storage by the OS kernel. This means that if the
-pages until read transaction ends.
+application fails, the OS kernel will finish writing all updated
-
+data to disk and nothing will be lost.
-In _LMDB_ this leads to a problem that memory pages, allocated for
+However, in the case of hardware malfunction or OS kernel fatal error,
-operations during long read, will be used for operations and won't be
+only some updated data can be written to disk and the database structure
-reclaimed until DB process terminates. In _LMDB_ they are used in
+is likely to be destroyed.
-[FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics))
+In such situation, DB is completely corrupted and can't be repaired.
 manner, which causes increased page count and less chance of cache hit
 during I/O. In other words: one long-time reader can impact performance
 of all database until it'll be reopened.
 _libmdbx_ addresses the problem, details below. Illustrations to this
 problem can be found in the
 [presentation](http://www.slideshare.net/leoyuriev/lmdb). There is also
 example of performance increase thanks to
 [BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration)
 when `LIFO RECLAIM` enabled in _libmdbx_.
 #### Data safety in async-write mode
 In `WRITEMAP+MAPSYNC` mode dirty pages are written to persistent storage
 by kernel. This means that in case of application crash OS kernel will
 write all dirty data to disk and nothing will be lost. But in case of
 hardware malfunction or OS kernel fatal error only some dirty data might
 be synced to disk, and there is high probability that pages with
 metadata saved, will point to non-saved, hence non-existent, data pages.
 In such situation, DB is completely corrupted and can't be repaired even
 if there was full sync before the crash via `mdbx_env_sync().
 _libmdbx_ addresses this by fully reimplementing write path of data:
@@ -414,39 +388,38 @@ _libmdbx_ addresses this by fully reimplementing write path of data:
 instead their shadow copies are used and their updates are synced after
 data is flushed to disk.
-* During transaction commit _libmdbx_ marks synchronization points as
+* During transaction commit _libmdbx_ marks it as a steady or weak
-steady or weak depending on how much synchronization needed between RAM
+depending on synchronization status between RAM and persistent storage.
-and persistent storage, e.g. in `WRITEMAP+MAPSYNC` commited transactions
+For instance, in the `WRITEMAP+MAPSYNC` mode committed transactions
-are marked as weak, but during explicit data synchronization - as
+are marked as weak by default, but as steady after explicit data flushes.
 steady.
 * _libmdbx_ maintains three separate meta-pages instead of two. This
-allows to commit transaction with steady or weak synchronization point
+allows to commit transaction as steady or weak without losing two
-without losing two previous synchronization points (one of them can be
+previous commit points (one of them can be steady, and another
-steady, and second - weak). This allows to order weak and steady
+weak). Thus, after a fatal system failure, it will be possible to
-synchronization points in any order without losing consistency in case
+rollback to the last steady commit point.
 of system crash.
-* During DB open _libmdbx_ rollbacks to the last steady synchronization
+* During DB open _libmdbx_ rollbacks to the last steady commit point,
-point, this guarantees database integrity.
+this guarantees database integrity after a crash. However, if the
 database opening in read-only mode, such rollback cannot be performed
 which will cause returning the MDBX_WANNA_RECOVERY error.
-For data safety pages which form database snapshot with steady
+For data integrity a pages which form database snapshot with steady
-synchronization point must not be updated until next steady
+commit point, must not be updated until next steady commit point.
-synchronization point. So last steady synchronization point creates
+Therefore the last steady commit point creates an effect analogues to "long-time read".
-"long-time read" effect. The only difference that in case of memory
+The only difference that now in case of space exhaustion the problem
-exhaustion the problem will be immediately addressed by flushing changes
+will be immediately addressed by writing changes to disk and forming
-to persistent storage and forming new steady synchronization point.
+the new steady commit point.
-So in async-write mode _libmdbx_ will always use new pages until memory
+So in async-write mode _libmdbx_ will always use new pages until the
-is exhausted or `mdbx_env_sync()` is invoked. Total disk usage will be
+free DB space will be exhausted or `mdbx_env_sync()` will be invoked,
-almost the same as in sync-write mode.
+and the total write traffic to the disk will be the same as in sync-write mode.
-Current _libmdbx_ gives a choice of safe async-write mode (default) and
+Currently libmdbx gives a choice between a safe async-write mode (default) and
-`UTTERLY_NOSYNC` mode which may result in full DB corruption during
+`UTTERLY_NOSYNC` mode which may lead to DB corruption after a system crash, i.e. like the LMDB.
 system crash as with LMDB.
-Next version of _libmdbx_ will create steady synchronization points
+Next version of _libmdbx_ will be automatically create steady commit
-automatically in async-write mode.
+points in async-write mode upon completion transfer data to the disk.
 --------------------------------------------------------------------------------