diff --git a/README-RU.md b/README-RU.md index 28a5c0d1..cf8b79ba 100644 --- a/README-RU.md +++ b/README-RU.md @@ -1,4 +1,4 @@ -## The [repository was moved out from Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. +## The [repository now only mirrored on the Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. --- @@ -6,31 +6,24 @@ libmdbx ====================================== **The revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).** -*The Future will Positive. Всё будет хорошо.* +*The Future will (be) Positive. Всё будет хорошо.* [![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx) [![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) [![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx) -English version [by Google](https://translate.googleusercontent.com/translate_c?act=url&ie=UTF8&sl=ru&tl=en&u=https://github.com/leo-yuriev/libmdbx/tree/master) +English version of this README is [here](README.md), also the translations [by Google](https://translate.googleusercontent.com/translate_c?act=url&ie=UTF8&sl=ru&tl=en&u=https://github.com/leo-yuriev/libmdbx/tree/master) and [by Yandex](https://translate.yandex.ru/translate?url=https%3A%2F%2Fgithub.com%2FReOpen%2Flibmdbx%2Ftree%2Fmaster&lang=ru-en). -### Project Status +### Статус проекта -**Сейчас MDBX _активно перерабатывается_** предстоит -большое изменение как API, так и формата базы данных. К сожалению, -обновление приведет к потере совместимости с предыдущими версиями. +_libmdbx_ работает на Linux, FreeBSD, MacOS X и других ОС +соответствующих POSIX.1-2008, а также поддерживает Windows (начиная с +Windows XP) в качестве дополнительной платформы. -Цель этой революции - обеспечение более четкого надежного API и -добавление новых функции, а также наделение базы данных новыми -свойствами. - -В настоящее время MDBX работает на Linux и ОС соответствующих -POSIX.1-2008, а также поддерживает Windows (начиная с Windows XP) в -качестве дополнительной платформы. Поддержка других ОС может быть -обеспечена на коммерческой основе. Однако такие усовершенствования (т. -е. pull-requests) могут быть приняты в мейнстрим только в том случае, -если будет доступен соответствующий публичный и бесплатный сервис -непрерывной интеграции (aka Continuous Integration). +Отдельно ведётся не-публичная разработка следующей версии, в которой +будет большое изменение как API, так и формата базы данных. Цель этой +революции - обеспечение более четкого и надежного API, добавление новых +функций, а также наделение базы данных новыми свойствами. ## Содержание - [Обзор](#Обзор) @@ -53,8 +46,7 @@ POSIX.1-2008, а также поддерживает Windows (начиная с ## Обзор _libmdbx_ - это встраиваемый key-value движок хранения со специфическим набором свойств и возможностей, ориентированный на создание уникальных -легковесных решений с предельной производительностью под Linux и -Windows. +легковесных решений с предельной производительностью. _libmdbx_ позволяет множеству процессов совместно читать и обновлять несколько key-value таблиц с соблюдением @@ -84,10 +76,11 @@ _libmdbx_ не использует ### Сравнение с другими СУБД -Ввиду того, что в _libmdbx_ сейчас происходит революция, я посчитал -лучшим решением ограничится здесь ссылкой на [главу Comparison with -other databases](https://github.com/coreos/bbolt#comparison-with-other-databases) -в описании _BoltDB_. + +На данный момент, пожалуйста, обратитесь к [главе "сравнение BoltDB с +другими базами +данных"](https://github.com/coreos/bbolt#comparison-with-other-databases), +которая также (в основном) применима к MDBX. ### История @@ -108,13 +101,13 @@ Tables](https://github.com/leo-yuriev/libfpta), aka ["Позитивные Technologies](https://www.ptsecurity.ru). -#### Acknowledgments -Howard Chu (Symas Corporation) - the author of LMDB, from which -originated the MDBX in 2015. +#### Выражение признательности -Martin Hedenfalk - the author of `btree.c` code, which -was used for begin development of LMDB. +Говард Чу (Howard Chu) - автор движка LMDB, от +которого в 2015 году произошел MDBX. +Мартин Хеденфальк (Martin Hedenfalk) - автор кода +`btree.c`, который использовался для начала разработки LMDB. Основные свойства ================= @@ -332,6 +325,25 @@ Amplification Factor) и RAF (Read Amplification Factor) также Olog(N). > - попытки повторного освобождения памяти; > - повреждение памяти и ошибки сегментации. +32. На **MacOS X** для синхронизации данных с диском _по-умолчанию_ +используется системная функция `fcntl(F_FULLFSYNC)`, так как [только +этим гарантируется сохранность +данных](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html) +при сбое электропитания. К сожалению, в сценариях с высокой +интенсивностью пишущих транзакций, использование `F_FULLFSYNC` приводит +к существенной деградации производительности в сравнении с LMDB, где +используется системная функция `fsync()`. Поэтому _libmdbx_ позволяет +переопределить это поведение определением опции +`MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1` при сборке библиотеки. + +33. На **Windows** _libmdbx_ использует файловые блокировки +`LockFileEx()`, так как это позволяет размещать БД на сетевых дисках, а +также обеспечивает защиту от некомпетентных действий пользователя +([защиту от +дурака](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D1%89%D0%B8%D1%82%D0%B0_%D0%BE%D1%82_%D0%B4%D1%83%D1%80%D0%B0%D0%BA%D0%B0)). +Поэтому _libmdbx_ может немного отставать в тестах производительность от +LMDB, где используются именованные мьютексы. + -------------------------------------------------------------------------------- ## Недостатки и Компромиссы diff --git a/README.md b/README.md index 37c446de..0ab65ea4 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -## The [repository was moved out from Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. +## The [repository now only mirrored on the Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. --- @@ -6,52 +6,33 @@ libmdbx ====================================== **Revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).** -*The Future will be positive.* +*The Future will (be) positive.* [![Build Status](https://travis-ci.org/leo-yuriev/libmdbx.svg?branch=master)](https://travis-ci.org/leo-yuriev/libmdbx) [![Build status](https://ci.appveyor.com/api/projects/status/ue94mlopn50dqiqg/branch/master?svg=true)](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) [![Coverity Scan Status](https://scan.coverity.com/projects/12915/badge.svg)](https://scan.coverity.com/projects/reopen-libmdbx) -## Project Status for now +Русскоязычная версия этого README [здесь](README-RU.md). - - The stable versions - ([_stable/0.0_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.0) - and - [_stable/0.1_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.1) - branches) of _MDBX_ are frozen, i.e. no new features or API changes, but - only bug fixes. +## Project Status - - The next version - ([_devel_](https://github.com/leo-yuriev/libmdbx/tree/devel) branch) - **is under active non-public development**, i.e. current API and set of - features are extreme volatile. +_libmdbx_ works on Linux, FreeBSD, MacOS X and other systems compliant +with POSIX.1-2008, but also support Windows as a complementary platform. - - The immediate goal of development is formation of the stable API and - the stable internal database format, which allows realise all PLANNED - FEATURES: - 1. Integrity check by [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree); - 2. Support for [raw block devices](https://en.wikipedia.org/wiki/Raw_device); - 3. Separate place (HDD) for large data items; - 4. Using "[Roaring bitmaps](http://roaringbitmap.org/about/)" inside garbage collector; - 5. Non-sequential reclaiming, like PostgreSQL's [Vacuum](https://www.postgresql.org/docs/9.1/static/sql-vacuum.html); - 6. [Asynchronous lazy data flushing](https://sites.fas.harvard.edu/~cs265/papers/kathuria-2008.pdf) to disk(s); - 7. etc... +The next version +([_devel_](https://github.com/leo-yuriev/libmdbx/tree/devel) branch) is +under active non-public development, i.e. API and set of features are +volatile. The goal of this revolution is to provide a clearer and more +reliable API, adding set of features and a new database properties. -Don't miss libmdbx for other runtimes. +Don't miss libmdbx for other runtimes: -| Runtime | GitHub | Author | -| ------------- | ------------- | ------------- | -| JVM | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) | -| .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) | + | Runtime | GitHub | Author | + | ------------- | ------------- | ------------- | + | Java | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) | + | .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) | ----- -Nowadays MDBX works on Linux and OS'es compliant with POSIX.1-2008, but -also support Windows (since Windows XP) as a complementary platform. -Support for other OS could be implemented on commercial basis. However -such enhancements (i.e. pull requests) could be accepted in mainstream -only when corresponding public and free Continuous Integration service -will be available. - ## Contents - [Overview](#overview) - [Comparison with other DBs](#comparison-with-other-dbs) @@ -72,21 +53,28 @@ will be available. ## Overview _libmdbx_ is an embedded lightweight key-value database engine oriented -for performance under Linux and Windows. +for performance. _libmdbx_ allows multiple processes to read and update several key-value tables concurrently, while being [ACID](https://en.wikipedia.org/wiki/ACID)-compliant, with minimal overhead and Olog(N) operation cost. -_libmdbx_ enforce [serializability](https://en.wikipedia.org/wiki/Serializability) for writers by single [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) for parallel readers without atomic/interlocked operations, while writing and reading transactions do not block each other. +_libmdbx_ enforce +[serializability](https://en.wikipedia.org/wiki/Serializability) for +writers by single +[mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords +[wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) +for parallel readers without atomic/interlocked operations, while +writing and reading transactions do not block each other. -_libmdbx_ can guarantee consistency after crash depending of operation mode. +_libmdbx_ can guarantee consistency after crash depending of operation +mode. _libmdbx_ uses [B+Trees](https://en.wikipedia.org/wiki/B%2B_tree) and -[Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), doesn't use -[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) which -might be a caveat for some workloads. +[Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), +doesn't use [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) +which might be a caveat for some workloads. ### Comparison with other DBs For now please refer to [chapter of "BoltDB comparison with other @@ -96,15 +84,17 @@ which is also (mostly) applicable to MDBX. ### History The _libmdbx_ design is based on [Lightning Memory-Mapped Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). -Initial development was going in [ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project. -About a year later libmdbx was isolated to separate project, which was [presented at Highload++ -2015 conference](http://www.highload.ru/2015/abstracts/1831.html). +Initial development was going in +[ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project. About a +year later libmdbx was isolated to separate project, which was +[presented at Highload++ 2015 +conference](http://www.highload.ru/2015/abstracts/1831.html). Since early 2017 _libmdbx_ is used in [Fast Positive Tables](https://github.com/leo-yuriev/libfpta), and development is funded by [Positive Technologies](https://www.ptsecurity.com). #### Acknowledgments -Howard Chu (Symas Corporation) - the author of LMDB, from which +Howard Chu - the author of LMDB, from which originated the MDBX in 2015. Martin Hedenfalk - the author of `btree.c` code, which @@ -184,20 +174,23 @@ additional resources for that. [BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration) this may greatly improve write performance. -4. Fast estimation of range query result size via functions `mdbx_estimate_range()`, -`mdbx_estimate_move()` and `mdbx_estimate_distance()`. E.g. for selection the -optimal query execution plan. +4. Fast estimation of range query result size via functions +`mdbx_estimate_range()`, `mdbx_estimate_move()` and +`mdbx_estimate_distance()`. E.g. for selection the optimal query +execution plan. 5. `mdbx_chk` tool for DB integrity check. -6. Support for keys and values of zero length, including multi-values (aka sorted duplicates). +6. Support for keys and values of zero length, including multi-values +(aka sorted duplicates). -7. Ability to assign up to 3 persistent 64-bit markers to commiting transaction with -`mdbx_canary_put()` and then get them in read transaction by -`mdbx_canary_get()`. +7. Ability to assign up to 3 persistent 64-bit markers to commiting +transaction with `mdbx_canary_put()` and then get them in read +transaction by `mdbx_canary_get()`. -8. Ability to update or delete record and get previous value via `mdbx_replace()`. -Also allows update the specific item from multi-value with the same key. +8. Ability to update or delete record and get previous value via +`mdbx_replace()`. Also allows update the specific item from multi-value +with the same key. 9. Sequence generation via `mdbx_dbi_sequence()`. @@ -297,6 +290,24 @@ to avoid hard-to-debug errors. > - double-free; > - memory corruption and segfaults. + +32. On **Mac OS X** the `fcntl(F_FULLFSYNC)` syscall is used _by +default_ to synchronize data with the disk, as this is [the only way to +guarantee data +durability](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html) +in case of power failure. Unfortunately, in scenarios with high write +intensity, the use of `F_FULLFSYNC` significant degrades performance +compared to LMDB, where the `fsync()` syscall is used. Therefore, +_libmdbx_ allows you to override this behavior by defining the +`MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1` option while build the library. + +33. On **Windows** the `LockFileEx()` syscall is used for locking, since +it allows place the database on network drives, and provides protection +against incompetent user actions (aka +[poka-yoke](https://en.wikipedia.org/wiki/Poka-yoke)). Therefore +_libmdbx_ may be a little lag in performance tests from LMDB where a +named mutexes are used. + -------------------------------------------------------------------------------- ## Gotchas @@ -360,7 +371,8 @@ to simplify this as follows: exhaust the free DB space. * If the available space is exhausted, any attempt to update the data -will cause a "MAP_FULL" error until a long read transaction is completed. +will cause a "MAP_FULL" error until a long read transaction is +completed. * A good example of long readers is a hot backup or debugging of a client application while retaining an active read transaction. @@ -373,14 +385,13 @@ operations and the `LIFO RECLAIM` mode which addresses performance degradation. #### Durability in asynchronous writing mode -In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written -to persistent storage by the OS kernel. This means that if the -application fails, the OS kernel will finish writing all updated -data to disk and nothing will be lost. -However, in the case of hardware malfunction or OS kernel fatal error, -only some updated data can be written to disk and the database structure -is likely to be destroyed. -In such situation, DB is completely corrupted and can't be repaired. +In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written to +persistent storage by the OS kernel. This means that if the application +fails, the OS kernel will finish writing all updated data to disk and +nothing will be lost. However, in the case of hardware malfunction or OS +kernel fatal error, only some updated data can be written to disk and +the database structure is likely to be destroyed. In such situation, DB +is completely corrupted and can't be repaired. _libmdbx_ addresses this by fully reimplementing write path of data: @@ -406,17 +417,19 @@ which will cause returning the MDBX_WANNA_RECOVERY error. For data integrity a pages which form database snapshot with steady commit point, must not be updated until next steady commit point. -Therefore the last steady commit point creates an effect analogues to "long-time read". -The only difference that now in case of space exhaustion the problem -will be immediately addressed by writing changes to disk and forming -the new steady commit point. +Therefore the last steady commit point creates an effect analogues to +"long-time read". The only difference that now in case of space +exhaustion the problem will be immediately addressed by writing changes +to disk and forming the new steady commit point. So in async-write mode _libmdbx_ will always use new pages until the free DB space will be exhausted or `mdbx_env_sync()` will be invoked, -and the total write traffic to the disk will be the same as in sync-write mode. +and the total write traffic to the disk will be the same as in +sync-write mode. -Currently libmdbx gives a choice between a safe async-write mode (default) and -`UTTERLY_NOSYNC` mode which may lead to DB corruption after a system crash, i.e. like the LMDB. +Currently libmdbx gives a choice between a safe async-write mode +(default) and `UTTERLY_NOSYNC` mode which may lead to DB corruption +after a system crash, i.e. like the LMDB. Next version of _libmdbx_ will be automatically create steady commit points in async-write mode upon completion transfer data to the disk.