mdbx: update README (MacOS support).

Change-Id: Id85b79fb605702fff606b62a5114951bfb9cb22e
This commit is contained in:
Leonid Yuriev 2019-08-20 15:04:56 +03:00
parent e04bfc05fa
commit 887cbc7f00
2 changed files with 124 additions and 99 deletions

View File

@ -1,4 +1,4 @@
## The [repository was moved out from Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. ## The [repository now only mirrored on the Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans.
<!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences --> <!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences -->
--- ---
@ -6,31 +6,24 @@ libmdbx
====================================== ======================================
**The revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).** **The revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).**
*The Future will Positive. Всё будет хорошо.* *The Future will (be) Positive. Всё будет хорошо.*
[![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx) [![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx)
[![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) [![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master)
[![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx) [![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx)
English version [by Google](https://translate.googleusercontent.com/translate_c?act=url&ie=UTF8&sl=ru&tl=en&u=https://github.com/leo-yuriev/libmdbx/tree/master) English version of this README is [here](README.md), also the translations [by Google](https://translate.googleusercontent.com/translate_c?act=url&ie=UTF8&sl=ru&tl=en&u=https://github.com/leo-yuriev/libmdbx/tree/master)
and [by Yandex](https://translate.yandex.ru/translate?url=https%3A%2F%2Fgithub.com%2FReOpen%2Flibmdbx%2Ftree%2Fmaster&lang=ru-en). and [by Yandex](https://translate.yandex.ru/translate?url=https%3A%2F%2Fgithub.com%2FReOpen%2Flibmdbx%2Ftree%2Fmaster&lang=ru-en).
### Project Status ### Статус проекта
**Сейчас MDBX _активно перерабатывается_** предстоит _libmdbx_ работает на Linux, FreeBSD, MacOS X и других ОС
большое изменение как API, так и формата базы данных. К сожалению, соответствующих POSIX.1-2008, а также поддерживает Windows (начиная с
обновление приведет к потере совместимости с предыдущими версиями. Windows XP) в качестве дополнительной платформы.
Цель этой революции - обеспечение более четкого надежного API и Отдельно ведётся не-публичная разработка следующей версии, в которой
добавление новых функции, а также наделение базы данных новыми будет большое изменение как API, так и формата базы данных. Цель этой
свойствами. революции - обеспечение более четкого и надежного API, добавление новых
функций, а также наделение базы данных новыми свойствами.
В настоящее время MDBX работает на Linux и ОС соответствующих
POSIX.1-2008, а также поддерживает Windows (начиная с Windows XP) в
качестве дополнительной платформы. Поддержка других ОС может быть
обеспечена на коммерческой основе. Однако такие усовершенствования (т.
е. pull-requests) могут быть приняты в мейнстрим только в том случае,
если будет доступен соответствующий публичный и бесплатный сервис
непрерывной интеграции (aka Continuous Integration).
## Содержание ## Содержание
- [Обзор](#Обзор) - [Обзор](#Обзор)
@ -53,8 +46,7 @@ POSIX.1-2008, а также поддерживает Windows (начиная с
## Обзор ## Обзор
_libmdbx_ - это встраиваемый key-value движок хранения со специфическим _libmdbx_ - это встраиваемый key-value движок хранения со специфическим
набором свойств и возможностей, ориентированный на создание уникальных набором свойств и возможностей, ориентированный на создание уникальных
легковесных решений с предельной производительностью под Linux и легковесных решений с предельной производительностью.
Windows.
_libmdbx_ позволяет множеству процессов совместно читать и обновлять _libmdbx_ позволяет множеству процессов совместно читать и обновлять
несколько key-value таблиц с соблюдением несколько key-value таблиц с соблюдением
@ -84,10 +76,11 @@ _libmdbx_ не использует
### Сравнение с другими СУБД ### Сравнение с другими СУБД
Ввиду того, что в _libmdbx_ сейчас происходит революция, я посчитал
лучшим решением ограничится здесь ссылкой на [главу Comparison with На данный момент, пожалуйста, обратитесь к [главе "сравнение BoltDB с
other databases](https://github.com/coreos/bbolt#comparison-with-other-databases) другими базами
в описании _BoltDB_. данных"](https://github.com/coreos/bbolt#comparison-with-other-databases),
которая также (в основном) применима к MDBX.
### История ### История
@ -108,13 +101,13 @@ Tables](https://github.com/leo-yuriev/libfpta), aka ["Позитивные
Technologies](https://www.ptsecurity.ru). Technologies](https://www.ptsecurity.ru).
#### Acknowledgments #### Выражение признательности
Howard Chu (Symas Corporation) - the author of LMDB, from which
originated the MDBX in 2015.
Martin Hedenfalk <martin@bzero.se> - the author of `btree.c` code, which Говард Чу (Howard Chu) <hyc@openldap.org> - автор движка LMDB, от
was used for begin development of LMDB. которого в 2015 году произошел MDBX.
Мартин Хеденфальк (Martin Hedenfalk) <martin@bzero.se> - автор кода
`btree.c`, который использовался для начала разработки LMDB.
Основные свойства Основные свойства
================= =================
@ -332,6 +325,25 @@ Amplification Factor) и RAF (Read Amplification Factor) также Olog(N).
> - попытки повторного освобождения памяти; > - попытки повторного освобождения памяти;
> - повреждение памяти и ошибки сегментации. > - повреждение памяти и ошибки сегментации.
32. На **MacOS X** для синхронизации данных с диском _по-умолчанию_
используется системная функция `fcntl(F_FULLFSYNC)`, так как [только
этим гарантируется сохранность
данных](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html)
при сбое электропитания. К сожалению, в сценариях с высокой
интенсивностью пишущих транзакций, использование `F_FULLFSYNC` приводит
к существенной деградации производительности в сравнении с LMDB, где
используется системная функция `fsync()`. Поэтому _libmdbx_ позволяет
переопределить это поведение определением опции
`MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1` при сборке библиотеки.
33. На **Windows** _libmdbx_ использует файловые блокировки
`LockFileEx()`, так как это позволяет размещать БД на сетевых дисках, а
также обеспечивает защиту от некомпетентных действий пользователя
([защиту от
дурака](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D1%89%D0%B8%D1%82%D0%B0_%D0%BE%D1%82_%D0%B4%D1%83%D1%80%D0%B0%D0%BA%D0%B0)).
Поэтому _libmdbx_ может немного отставать в тестах производительность от
LMDB, где используются именованные мьютексы.
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
## Недостатки и Компромиссы ## Недостатки и Компромиссы

153
README.md
View File

@ -1,4 +1,4 @@
## The [repository was moved out from Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. ## The [repository now only mirrored on the Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans.
<!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences --> <!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences -->
--- ---
@ -6,52 +6,33 @@ libmdbx
====================================== ======================================
**Revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).** **Revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).**
*The Future will be positive.* *The Future will (be) positive.*
[![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx) [![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx)
[![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) [![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master)
[![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx) [![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx)
## Project Status for now Русскоязычная версия этого README [здесь](README-RU.md).
- The stable versions ## Project Status
([_stable/0.0_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.0)
and
[_stable/0.1_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.1)
branches) of _MDBX_ are frozen, i.e. no new features or API changes, but
only bug fixes.
- The next version _libmdbx_ works on Linux, FreeBSD, MacOS X and other systems compliant
([_devel_](https://github.com/leo-yuriev/libmdbx/tree/devel) branch) with POSIX.1-2008, but also support Windows as a complementary platform.
**is under active non-public development**, i.e. current API and set of
features are extreme volatile.
- The immediate goal of development is formation of the stable API and The next version
the stable internal database format, which allows realise all PLANNED ([_devel_](https://github.com/leo-yuriev/libmdbx/tree/devel) branch) is
FEATURES: under active non-public development, i.e. API and set of features are
1. Integrity check by [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree); volatile. The goal of this revolution is to provide a clearer and more
2. Support for [raw block devices](https://en.wikipedia.org/wiki/Raw_device); reliable API, adding set of features and a new database properties.
3. Separate place (HDD) for large data items;
4. Using "[Roaring bitmaps](http://roaringbitmap.org/about/)" inside garbage collector;
5. Non-sequential reclaiming, like PostgreSQL's [Vacuum](https://www.postgresql.org/docs/9.1/static/sql-vacuum.html);
6. [Asynchronous lazy data flushing](https://sites.fas.harvard.edu/~cs265/papers/kathuria-2008.pdf) to disk(s);
7. etc...
Don't miss libmdbx for other runtimes. Don't miss libmdbx for other runtimes:
| Runtime | GitHub | Author | | Runtime | GitHub | Author |
| ------------- | ------------- | ------------- | | ------------- | ------------- | ------------- |
| JVM | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) | | Java | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) |
| .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) | | .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) |
----- -----
Nowadays MDBX works on Linux and OS'es compliant with POSIX.1-2008, but
also support Windows (since Windows XP) as a complementary platform.
Support for other OS could be implemented on commercial basis. However
such enhancements (i.e. pull requests) could be accepted in mainstream
only when corresponding public and free Continuous Integration service
will be available.
## Contents ## Contents
- [Overview](#overview) - [Overview](#overview)
- [Comparison with other DBs](#comparison-with-other-dbs) - [Comparison with other DBs](#comparison-with-other-dbs)
@ -72,21 +53,28 @@ will be available.
## Overview ## Overview
_libmdbx_ is an embedded lightweight key-value database engine oriented _libmdbx_ is an embedded lightweight key-value database engine oriented
for performance under Linux and Windows. for performance.
_libmdbx_ allows multiple processes to read and update several key-value _libmdbx_ allows multiple processes to read and update several key-value
tables concurrently, while being tables concurrently, while being
[ACID](https://en.wikipedia.org/wiki/ACID)-compliant, with minimal [ACID](https://en.wikipedia.org/wiki/ACID)-compliant, with minimal
overhead and Olog(N) operation cost. overhead and Olog(N) operation cost.
_libmdbx_ enforce [serializability](https://en.wikipedia.org/wiki/Serializability) for writers by single [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) for parallel readers without atomic/interlocked operations, while writing and reading transactions do not block each other. _libmdbx_ enforce
[serializability](https://en.wikipedia.org/wiki/Serializability) for
writers by single
[mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords
[wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom)
for parallel readers without atomic/interlocked operations, while
writing and reading transactions do not block each other.
_libmdbx_ can guarantee consistency after crash depending of operation mode. _libmdbx_ can guarantee consistency after crash depending of operation
mode.
_libmdbx_ uses [B+Trees](https://en.wikipedia.org/wiki/B%2B_tree) and _libmdbx_ uses [B+Trees](https://en.wikipedia.org/wiki/B%2B_tree) and
[Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), doesn't use [Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file),
[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) which doesn't use [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging)
might be a caveat for some workloads. which might be a caveat for some workloads.
### Comparison with other DBs ### Comparison with other DBs
For now please refer to [chapter of "BoltDB comparison with other For now please refer to [chapter of "BoltDB comparison with other
@ -96,15 +84,17 @@ which is also (mostly) applicable to MDBX.
### History ### History
The _libmdbx_ design is based on [Lightning Memory-Mapped The _libmdbx_ design is based on [Lightning Memory-Mapped
Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database).
Initial development was going in [ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project. Initial development was going in
About a year later libmdbx was isolated to separate project, which was [presented at Highload++ [ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project. About a
2015 conference](http://www.highload.ru/2015/abstracts/1831.html). year later libmdbx was isolated to separate project, which was
[presented at Highload++ 2015
conference](http://www.highload.ru/2015/abstracts/1831.html).
Since early 2017 _libmdbx_ is used in [Fast Positive Tables](https://github.com/leo-yuriev/libfpta), Since early 2017 _libmdbx_ is used in [Fast Positive Tables](https://github.com/leo-yuriev/libfpta),
and development is funded by [Positive Technologies](https://www.ptsecurity.com). and development is funded by [Positive Technologies](https://www.ptsecurity.com).
#### Acknowledgments #### Acknowledgments
Howard Chu (Symas Corporation) - the author of LMDB, from which Howard Chu <hyc@openldap.org> - the author of LMDB, from which
originated the MDBX in 2015. originated the MDBX in 2015.
Martin Hedenfalk <martin@bzero.se> - the author of `btree.c` code, which Martin Hedenfalk <martin@bzero.se> - the author of `btree.c` code, which
@ -184,20 +174,23 @@ additional resources for that.
[BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration) [BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration)
this may greatly improve write performance. this may greatly improve write performance.
4. Fast estimation of range query result size via functions `mdbx_estimate_range()`, 4. Fast estimation of range query result size via functions
`mdbx_estimate_move()` and `mdbx_estimate_distance()`. E.g. for selection the `mdbx_estimate_range()`, `mdbx_estimate_move()` and
optimal query execution plan. `mdbx_estimate_distance()`. E.g. for selection the optimal query
execution plan.
5. `mdbx_chk` tool for DB integrity check. 5. `mdbx_chk` tool for DB integrity check.
6. Support for keys and values of zero length, including multi-values (aka sorted duplicates). 6. Support for keys and values of zero length, including multi-values
(aka sorted duplicates).
7. Ability to assign up to 3 persistent 64-bit markers to commiting transaction with 7. Ability to assign up to 3 persistent 64-bit markers to commiting
`mdbx_canary_put()` and then get them in read transaction by transaction with `mdbx_canary_put()` and then get them in read
`mdbx_canary_get()`. transaction by `mdbx_canary_get()`.
8. Ability to update or delete record and get previous value via `mdbx_replace()`. 8. Ability to update or delete record and get previous value via
Also allows update the specific item from multi-value with the same key. `mdbx_replace()`. Also allows update the specific item from multi-value
with the same key.
9. Sequence generation via `mdbx_dbi_sequence()`. 9. Sequence generation via `mdbx_dbi_sequence()`.
@ -297,6 +290,24 @@ to avoid hard-to-debug errors.
> - double-free; > - double-free;
> - memory corruption and segfaults. > - memory corruption and segfaults.
32. On **Mac OS X** the `fcntl(F_FULLFSYNC)` syscall is used _by
default_ to synchronize data with the disk, as this is [the only way to
guarantee data
durability](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html)
in case of power failure. Unfortunately, in scenarios with high write
intensity, the use of `F_FULLFSYNC` significant degrades performance
compared to LMDB, where the `fsync()` syscall is used. Therefore,
_libmdbx_ allows you to override this behavior by defining the
`MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1` option while build the library.
33. On **Windows** the `LockFileEx()` syscall is used for locking, since
it allows place the database on network drives, and provides protection
against incompetent user actions (aka
[poka-yoke](https://en.wikipedia.org/wiki/Poka-yoke)). Therefore
_libmdbx_ may be a little lag in performance tests from LMDB where a
named mutexes are used.
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
## Gotchas ## Gotchas
@ -360,7 +371,8 @@ to simplify this as follows:
exhaust the free DB space. exhaust the free DB space.
* If the available space is exhausted, any attempt to update the data * If the available space is exhausted, any attempt to update the data
will cause a "MAP_FULL" error until a long read transaction is completed. will cause a "MAP_FULL" error until a long read transaction is
completed.
* A good example of long readers is a hot backup or debugging of * A good example of long readers is a hot backup or debugging of
a client application while retaining an active read transaction. a client application while retaining an active read transaction.
@ -373,14 +385,13 @@ operations and the `LIFO RECLAIM` mode which addresses performance
degradation. degradation.
#### Durability in asynchronous writing mode #### Durability in asynchronous writing mode
In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written to
to persistent storage by the OS kernel. This means that if the persistent storage by the OS kernel. This means that if the application
application fails, the OS kernel will finish writing all updated fails, the OS kernel will finish writing all updated data to disk and
data to disk and nothing will be lost. nothing will be lost. However, in the case of hardware malfunction or OS
However, in the case of hardware malfunction or OS kernel fatal error, kernel fatal error, only some updated data can be written to disk and
only some updated data can be written to disk and the database structure the database structure is likely to be destroyed. In such situation, DB
is likely to be destroyed. is completely corrupted and can't be repaired.
In such situation, DB is completely corrupted and can't be repaired.
_libmdbx_ addresses this by fully reimplementing write path of data: _libmdbx_ addresses this by fully reimplementing write path of data:
@ -406,17 +417,19 @@ which will cause returning the MDBX_WANNA_RECOVERY error.
For data integrity a pages which form database snapshot with steady For data integrity a pages which form database snapshot with steady
commit point, must not be updated until next steady commit point. commit point, must not be updated until next steady commit point.
Therefore the last steady commit point creates an effect analogues to "long-time read". Therefore the last steady commit point creates an effect analogues to
The only difference that now in case of space exhaustion the problem "long-time read". The only difference that now in case of space
will be immediately addressed by writing changes to disk and forming exhaustion the problem will be immediately addressed by writing changes
the new steady commit point. to disk and forming the new steady commit point.
So in async-write mode _libmdbx_ will always use new pages until the So in async-write mode _libmdbx_ will always use new pages until the
free DB space will be exhausted or `mdbx_env_sync()` will be invoked, free DB space will be exhausted or `mdbx_env_sync()` will be invoked,
and the total write traffic to the disk will be the same as in sync-write mode. and the total write traffic to the disk will be the same as in
sync-write mode.
Currently libmdbx gives a choice between a safe async-write mode (default) and Currently libmdbx gives a choice between a safe async-write mode
`UTTERLY_NOSYNC` mode which may lead to DB corruption after a system crash, i.e. like the LMDB. (default) and `UTTERLY_NOSYNC` mode which may lead to DB corruption
after a system crash, i.e. like the LMDB.
Next version of _libmdbx_ will be automatically create steady commit Next version of _libmdbx_ will be automatically create steady commit
points in async-write mode upon completion transfer data to the disk. points in async-write mode upon completion transfer data to the disk.