mdbx-doc: provide non-API docs via doxygen (squashed).

Change-Id: Ie33858517f964f794ec182a1e8bb630730a0f172
2025-12-24 19:52:22 +08:00 · 2020-07-21 01:24:29 +03:00
parent bb3d4ab9ba
commit 5f4f828bae
11 changed files with 685 additions and 633 deletions
--- a/docs/_starting.md
+++ b/docs/_starting.md
@@ -0,0 +1,241 @@
+Getting started {#starting}
+===============
+
+> This section is based on Bert Hubert's intro "LMDB Semantics", with
+> edits reflecting the improvements and enhancements were made in MDBX.
+> See Bert Hubert's [original](https://github.com/ahupowerdns/ahutils/blob/master/lmdb-semantics.md).
+
+Everything starts with an environment, created by `mdbx_env_create()`.
+Once created, this environment must also be opened with `mdbx_env_open()`,
+and after use be closed by `mdbx_env_close()`. At that a non-zero value of the
+last argument "mode" supposes MDBX will create database and directory if ones
+does not exist. In this case the non-zero "mode" argument specifies the file
+mode bits be applied when a new files are created by `open()` function.
+
+Within that directory, a lock file (aka LCK-file) and a storage file (aka
+DXB-file) will be generated. If you don't want to use a directory, you can
+pass the `MDBX_NOSUBDIR` option, in which case the path you provided is used
+directly as the DXB-file, and another file with a "-lck" suffix added
+will be used for the LCK-file.
+
+Once the environment is open, a transaction can be created within it using
+`mdbx_txn_begin()`. Transactions may be read-write or read-only, and read-write
+transactions may be nested. A transaction must only be used by one thread at
+a time. Transactions are always required, even for read-only access. The
+transaction provides a consistent view of the data.
+
+Once a transaction has been created, a database (i.e. key-value space inside
+the environment) can be opened within it using `mdbx_dbi_open()`. If only one
+database will ever be used in the environment, a `NULL` can be passed as the
+database name. For named databases, the `MDBX_CREATE` flag must be used to
+create the database if it doesn't already exist. Also, `mdbx_env_set_maxdbs()`
+must be called after `mdbx_env_create()` and before `mdbx_env_open()` to set
+the maximum number of named databases you want to support.
+
+\note A single transaction can open multiple databases. Generally databases
+should only be opened once, by the first transaction in the process.
+
+Within a transaction, `mdbx_get()` and `mdbx_put()` can store single key-value
+pairs if that is all you need to do (but see \ref Cursors below if you want to do
+more).
+
+A key-value pair is expressed as two `MDBX_val` structures. This struct that is
+exactly similar to POSIX's `struct iovec` and has two fields, `iov_len` and
+`iov_base`. The data is a `void` pointer to an array of `iov_len` bytes.
+\note The notable difference between MDBX and LMDB is that MDBX support zero
+length keys.
+
+Because MDBX is very efficient (and usually zero-copy), the data returned in
+an `MDBX_val` structure may be memory-mapped straight from disk. In other words
+look but do not touch (or `free()` for that matter). Once a transaction is
+closed, the values can no longer be used, so make a copy if you need to keep
+them after that.
+
+## Cursors {#Cursors}
+To do more powerful things, we must use a cursor.
+
+Within the transaction, a cursor can be created with `mdbx_cursor_open()`.
+With this cursor we can store/retrieve/delete (multiple) values using
+`mdbx_cursor_get()`, `mdbx_cursor_put()` and `mdbx_cursor_del()`.
+
+The `mdbx_cursor_get()` positions itself depending on the cursor operation
+requested, and for some operations, on the supplied key. For example, to list
+all key-value pairs in a database, use operation `MDBX_FIRST` for the first
+call to `mdbx_cursor_get()`, and `MDBX_NEXT` on subsequent calls, until the end
+is hit.
+
+To retrieve all keys starting from a specified key value, use `MDBX_SET`. For
+more cursor operations, see the API description below.
+
+When using `mdbx_cursor_put()`, either the function will position the cursor
+for you based on the key, or you can use operation `MDBX_CURRENT` to use the
+current position of the cursor. \note Note that key must then match the current
+position's key.
+
+
+## Summarizing the opening
+
+So we have a cursor in a transaction which opened a database in an
+environment which is opened from a filesystem after it was separately
+created.
+
+Or, we create an environment, open it from a filesystem, create a transaction
+within it, open a database within that transaction, and create a cursor
+within all of the above.
+
+Got it?
+
+
+## Threads and processes
+
+Do not have open an database twice in the same process at the same time, MDBX
+will track and prevent this. Instead, share the MDBX environment that has
+opened the file across all threads. The reason for this is:
+ - When the "Open file description" locks (aka OFD-locks) are not available,
+   MDBX uses POSIX locks on files, and these locks have issues if one process
+   opens a file multiple times.
+ - If a single process opens the same environment multiple times, closing it
+   once will remove all the locks held on it, and the other instances will be
+   vulnerable to corruption from other processes.
+ + For compatibility with LMDB which allows multi-opening, MDBX can be
+   configured at runtime by `mdbx_setup_debug(MDBX_DBG_LEGACY_MULTIOPEN, ...)`
+   prior to calling other MDBX funcitons. In this way MDBX will track
+   databases opening, detect multi-opening cases and then recover POSIX file
+   locks as necessary. However, lock recovery can cause unexpected pauses,
+   such as when another process opened the database in exclusive mode before
+   the lock was restored - we have to wait until such a process releases the
+   database, and so on.
+
+Do not use opened MDBX environment(s) after `fork()` in a child process(es),
+MDBX will check and prevent this at critical points. Instead, ensure there is
+no open MDBX-instance(s) during fork(), or atleast close it immediately after
+`fork()` in the child process and reopen if required - for instance by using
+`pthread_atfork()`. The reason for this is:
+ - For competitive consistent reading, MDBX assigns a slot in the shared
+   table for each process that interacts with the database. This slot is
+   populated with process attributes, including the PID.
+ - After `fork()`, in order to remain connected to a database, the child
+   process must have its own such "slot", which can't be assigned in any
+   simple and robust way another than the regular.
+ - A write transaction from a parent process cannot continue in a child
+   process for obvious reasons.
+ - Moreover, in a multithreaded process at the fork() moment any number of
+   threads could run in critical and/or intermediate sections of MDBX code
+   with interaction and/or racing conditions with threads from other
+   process(es). For instance: shrinking a database or copying it to a pipe,
+   opening or closing environment, begining or finishing a transaction,
+   and so on.
+ = Therefore, any solution other than simply close database (and reopen if
+   necessary) in a child process would be both extreme complicated and so
+   fragile.
+
+Do not start more than one transaction for a one thread. If you think about
+this, it's really strange to do something with two data snapshots at once,
+which may be different. MDBX checks and preventing this by returning
+corresponding error code (`MDBX_TXN_OVERLAPPING`, `MDBX_BAD_RSLOT`, `MDBX_BUSY`)
+unless you using `MDBX_NOTLS` option on the environment. Nonetheless, with the
+`MDBX_NOTLS option`, you must know exactly what you are doing, otherwise you
+will get deadlocks or reading an alien data.
+
+Also note that a transaction is tied to one thread by default using Thread
+Local Storage. If you want to pass read-only transactions across threads,
+you can use the MDBX_NOTLS option on the environment. Nevertheless, a write
+transaction entirely should only be used in one thread from start to finish.
+MDBX checks this in a reasonable manner and return the MDBX_THREAD_MISMATCH
+error in rules violation.
+
+
+## Transactions, rollbacks etc
+
+To actually get anything done, a transaction must be committed using
+`mdbx_txn_commit()`. Alternatively, all of a transaction's operations
+can be discarded using `mdbx_txn_abort()`.
+
+\attention An important difference between MDBX and LMDB is that MDBX required
+that any opened cursors can be reused and must be freed explicitly, regardless
+ones was opened in a read-only or write transaction. The REASON for this is
+eliminates ambiguity which helps to avoid errors such as: use-after-free,
+double-free, i.e. memory corruption and segfaults.
+
+For read-only transactions, obviously there is nothing to commit to storage.
+\attention An another notable difference between MDBX and LMDB is that MDBX make
+handles opened for existing databases immediately available for other
+transactions, regardless this transaction will be aborted or reset. The
+REASON for this is to avoiding the requirement for multiple opening a same
+handles in concurrent read transactions, and tracking of such open but hidden
+handles until the completion of read transactions which opened them.
+
+In addition, as long as a transaction is open, a consistent view of the
+database is kept alive, which requires storage. A read-only transaction that
+no longer requires this consistent view should be terminated (committed or
+aborted) when the view is no longer needed (but see below for an
+optimization).
+
+There can be multiple simultaneously active read-only transactions but only
+one that can write. Once a single read-write transaction is opened, all
+further attempts to begin one will block until the first one is committed or
+aborted. This has no effect on read-only transactions, however, and they may
+continue to be opened at any time.
+
+
+## Duplicate keys aka Multi-values
+
+`mdbx_get()` and `mdbx_put()` respectively have no and only some support or
+multiple key-value pairs with identical keys. If there are multiple values
+for a key, `mdbx_get()` will only return the first value.
+
+When multiple values for one key are required, pass the `MDBX_DUPSORT` flag to
+`mdbx_dbi_open()`. In an `MDBX_DUPSORT` database, by default `mdbx_put()` will
+not replace the value for a key if the key existed already. Instead it will add
+the new value to the key. In addition, `mdbx_del()` will pay attention to the
+value field too, allowing for specific values of a key to be deleted.
+
+Finally, additional cursor operations become available for traversing through
+and retrieving duplicate values.
+
+
+## Some optimization
+
+If you frequently begin and abort read-only transactions, as an optimization,
+it is possible to only reset and renew a transaction.
+
+`mdbx_txn_reset()` releases any old copies of data kept around for a read-only
+transaction. To reuse this reset transaction, call `mdbx_txn_renew()` on it.
+Any cursors in this transaction can also be renewed using `mdbx_cursor_renew()`
+or freed by `mdbx_cursor_close()`.
+
+To permanently free a transaction, reset or not, use `mdbx_txn_abort()`.
+
+
+## Cleaning up
+
+Any created cursors must be closed using `mdbx_cursor_close()`. It is advisable
+to repeat:
+\note An important difference between MDBX and LMDB is that MDBX required that
+any opened cursors can be reused and must be freed explicitly, regardless
+ones was opened in a read-only or write transaction. The REASON for this is
+eliminates ambiguity which helps to avoid errors such as: use-after-free,
+double-free, i.e. memory corruption and segfaults.
+
+It is very rarely necessary to close a database handle, and in general they
+should just be left open. When you close a handle, it immediately becomes
+unavailable for all transactions in the environment. Therefore, you should
+avoid closing the handle while at least one transaction is using it.
+
+
+## Now read up on the full API!
+
+The full MDBX documentation lists further details below, like how to:
+
+- configure database size and automatic size management
+- drop and clean a database
+- detect and report errors
+- optimize (bulk) loading speed
+- (temporarily) reduce robustness to gain even more speed
+- gather statistics about the database
+- estimate size of range query result
+- double perfomance by LIFO reclaiming on storages with write-back
+- use sequences and canary markers
+- use lack-of-space callback (aka OOM-KICK)
+- use exclusive mode
+- define custom sort orders (but this is recommended to be avoided)