Getting started {#starting} =============== > This section is based on Bert Hubert's intro "LMDB Semantics", with > edits reflecting the improvements and enhancements were made in MDBX. > See Bert Hubert's [original](https://github.com/ahupowerdns/ahutils/blob/master/lmdb-semantics.md). Everything starts with an environment, created by `mdbx_env_create()`. Once created, this environment must also be opened with `mdbx_env_open()`, and after use be closed by `mdbx_env_close()`. At that a non-zero value of the last argument "mode" supposes MDBX will create database and directory if ones does not exist. In this case the non-zero "mode" argument specifies the file mode bits be applied when a new files are created by `open()` function. Within that directory, a lock file (aka LCK-file) and a storage file (aka DXB-file) will be generated. If you don't want to use a directory, you can pass the `MDBX_NOSUBDIR` option, in which case the path you provided is used directly as the DXB-file, and another file with a "-lck" suffix added will be used for the LCK-file. Once the environment is open, a transaction can be created within it using `mdbx_txn_begin()`. Transactions may be read-write or read-only, and read-write transactions may be nested. A transaction must only be used by one thread at a time. Transactions are always required, even for read-only access. The transaction provides a consistent view of the data. Once a transaction has been created, a database (i.e. key-value space inside the environment) can be opened within it using `mdbx_dbi_open()`. If only one database will ever be used in the environment, a `NULL` can be passed as the database name. For named databases, the `MDBX_CREATE` flag must be used to create the database if it doesn't already exist. Also, `mdbx_env_set_maxdbs()` must be called after `mdbx_env_create()` and before `mdbx_env_open()` to set the maximum number of named databases you want to support. \note A single transaction can open multiple databases. Generally databases should only be opened once, by the first transaction in the process. Within a transaction, `mdbx_get()` and `mdbx_put()` can store single key-value pairs if that is all you need to do (but see \ref Cursors below if you want to do more). A key-value pair is expressed as two `MDBX_val` structures. This struct that is exactly similar to POSIX's `struct iovec` and has two fields, `iov_len` and `iov_base`. The data is a `void` pointer to an array of `iov_len` bytes. \note The notable difference between MDBX and LMDB is that MDBX support zero length keys. Because MDBX is very efficient (and usually zero-copy), the data returned in an `MDBX_val` structure may be memory-mapped straight from disk. In other words look but do not touch (or `free()` for that matter). Once a transaction is closed, the values can no longer be used, so make a copy if you need to keep them after that. ## Cursors {#Cursors} To do more powerful things, we must use a cursor. Within the transaction, a cursor can be created with `mdbx_cursor_open()`. With this cursor we can store/retrieve/delete (multiple) values using `mdbx_cursor_get()`, `mdbx_cursor_put()` and `mdbx_cursor_del()`. The `mdbx_cursor_get()` positions itself depending on the cursor operation requested, and for some operations, on the supplied key. For example, to list all key-value pairs in a database, use operation `MDBX_FIRST` for the first call to `mdbx_cursor_get()`, and `MDBX_NEXT` on subsequent calls, until the end is hit. To retrieve all keys starting from a specified key value, use `MDBX_SET`. For more cursor operations, see the API description below. When using `mdbx_cursor_put()`, either the function will position the cursor for you based on the key, or you can use operation `MDBX_CURRENT` to use the current position of the cursor. \note Note that key must then match the current position's key. ## Summarizing the opening So we have a cursor in a transaction which opened a database in an environment which is opened from a filesystem after it was separately created. Or, we create an environment, open it from a filesystem, create a transaction within it, open a database within that transaction, and create a cursor within all of the above. Got it? ## Threads and processes Do not have open an database twice in the same process at the same time, MDBX will track and prevent this. Instead, share the MDBX environment that has opened the file across all threads. The reason for this is: - When the "Open file description" locks (aka OFD-locks) are not available, MDBX uses POSIX locks on files, and these locks have issues if one process opens a file multiple times. - If a single process opens the same environment multiple times, closing it once will remove all the locks held on it, and the other instances will be vulnerable to corruption from other processes. + For compatibility with LMDB which allows multi-opening, MDBX can be configured at runtime by `mdbx_setup_debug(MDBX_DBG_LEGACY_MULTIOPEN, ...)` prior to calling other MDBX funcitons. In this way MDBX will track databases opening, detect multi-opening cases and then recover POSIX file locks as necessary. However, lock recovery can cause unexpected pauses, such as when another process opened the database in exclusive mode before the lock was restored - we have to wait until such a process releases the database, and so on. Do not use opened MDBX environment(s) after `fork()` in a child process(es), MDBX will check and prevent this at critical points. Instead, ensure there is no open MDBX-instance(s) during fork(), or atleast close it immediately after `fork()` in the child process and reopen if required - for instance by using `pthread_atfork()`. The reason for this is: - For competitive consistent reading, MDBX assigns a slot in the shared table for each process that interacts with the database. This slot is populated with process attributes, including the PID. - After `fork()`, in order to remain connected to a database, the child process must have its own such "slot", which can't be assigned in any simple and robust way another than the regular. - A write transaction from a parent process cannot continue in a child process for obvious reasons. - Moreover, in a multithreaded process at the fork() moment any number of threads could run in critical and/or intermediate sections of MDBX code with interaction and/or racing conditions with threads from other process(es). For instance: shrinking a database or copying it to a pipe, opening or closing environment, begining or finishing a transaction, and so on. = Therefore, any solution other than simply close database (and reopen if necessary) in a child process would be both extreme complicated and so fragile. Do not start more than one transaction for a one thread. If you think about this, it's really strange to do something with two data snapshots at once, which may be different. MDBX checks and preventing this by returning corresponding error code (`MDBX_TXN_OVERLAPPING`, `MDBX_BAD_RSLOT`, `MDBX_BUSY`) unless you using `MDBX_NOTLS` option on the environment. Nonetheless, with the `MDBX_NOTLS option`, you must know exactly what you are doing, otherwise you will get deadlocks or reading an alien data. Also note that a transaction is tied to one thread by default using Thread Local Storage. If you want to pass read-only transactions across threads, you can use the MDBX_NOTLS option on the environment. Nevertheless, a write transaction entirely should only be used in one thread from start to finish. MDBX checks this in a reasonable manner and return the MDBX_THREAD_MISMATCH error in rules violation. ## Transactions, rollbacks etc To actually get anything done, a transaction must be committed using `mdbx_txn_commit()`. Alternatively, all of a transaction's operations can be discarded using `mdbx_txn_abort()`. \attention An important difference between MDBX and LMDB is that MDBX required that any opened cursors can be reused and must be freed explicitly, regardless ones was opened in a read-only or write transaction. The REASON for this is eliminates ambiguity which helps to avoid errors such as: use-after-free, double-free, i.e. memory corruption and segfaults. For read-only transactions, obviously there is nothing to commit to storage. \attention An another notable difference between MDBX and LMDB is that MDBX make handles opened for existing databases immediately available for other transactions, regardless this transaction will be aborted or reset. The REASON for this is to avoiding the requirement for multiple opening a same handles in concurrent read transactions, and tracking of such open but hidden handles until the completion of read transactions which opened them. In addition, as long as a transaction is open, a consistent view of the database is kept alive, which requires storage. A read-only transaction that no longer requires this consistent view should be terminated (committed or aborted) when the view is no longer needed (but see below for an optimization). There can be multiple simultaneously active read-only transactions but only one that can write. Once a single read-write transaction is opened, all further attempts to begin one will block until the first one is committed or aborted. This has no effect on read-only transactions, however, and they may continue to be opened at any time. ## Duplicate keys aka Multi-values `mdbx_get()` and `mdbx_put()` respectively have no and only some support or multiple key-value pairs with identical keys. If there are multiple values for a key, `mdbx_get()` will only return the first value. When multiple values for one key are required, pass the `MDBX_DUPSORT` flag to `mdbx_dbi_open()`. In an `MDBX_DUPSORT` database, by default `mdbx_put()` will not replace the value for a key if the key existed already. Instead it will add the new value to the key. In addition, `mdbx_del()` will pay attention to the value field too, allowing for specific values of a key to be deleted. Finally, additional cursor operations become available for traversing through and retrieving duplicate values. ## Some optimization If you frequently begin and abort read-only transactions, as an optimization, it is possible to only reset and renew a transaction. `mdbx_txn_reset()` releases any old copies of data kept around for a read-only transaction. To reuse this reset transaction, call `mdbx_txn_renew()` on it. Any cursors in this transaction can also be renewed using `mdbx_cursor_renew()` or freed by `mdbx_cursor_close()`. To permanently free a transaction, reset or not, use `mdbx_txn_abort()`. ## Cleaning up Any created cursors must be closed using `mdbx_cursor_close()`. It is advisable to repeat: \note An important difference between MDBX and LMDB is that MDBX required that any opened cursors can be reused and must be freed explicitly, regardless ones was opened in a read-only or write transaction. The REASON for this is eliminates ambiguity which helps to avoid errors such as: use-after-free, double-free, i.e. memory corruption and segfaults. It is very rarely necessary to close a database handle, and in general they should just be left open. When you close a handle, it immediately becomes unavailable for all transactions in the environment. Therefore, you should avoid closing the handle while at least one transaction is using it. ## Now read up on the full API! The full MDBX documentation lists further details below, like how to: - configure database size and automatic size management - drop and clean a database - detect and report errors - optimize (bulk) loading speed - (temporarily) reduce robustness to gain even more speed - gather statistics about the database - estimate size of range query result - double perfomance by LIFO reclaiming on storages with write-back - use sequences and canary markers - use lack-of-space callback (aka OOM-KICK) - use exclusive mode - define custom sort orders (but this is recommended to be avoided)