Plain SSTable format
=====================

| *PlainTable* is a RocksDB's SST file format optimized for low query latency on
 pure-memory or really low-latency media.

Top-Level
------------

+--------------+---------------+
| label        | type          |
+--------------+---------------+
| data rows    | row[N]        |
+--------------+---------------+
| property     |               |
+--------------+---------------+
| footer       | fixed size    |
+--------------+---------------+

0. Row Format
---------------

.. note::
    The format of data row.

+------------------+------------------+
| label            | type             |
+------------------+------------------+
| encoded_key      |                  |
+------------------+------------------+
| value_size       | varint32         |
+------------------+------------------+
| value            | char[value_size] |
+------------------+------------------+

0.0 Key Encoding
``````````````````

1. Plain Encoding
    - internal encoding with fixed given key size
    - [length of key : varint32] + [user key] + [internal encoding] when
      without fixed given key size

2. Prefix Encoding

.. note::
    Share the same prefix of keys to save size.

There are three type packets as below:

| - Full Key: with the full key bytes. [full key flag + size] +
    [full user key] + [internal encoding]
| - Second Key: with the prefix size. [prefix key flag + size] +
    [suffix key flag + size] + [suffix key] + [internal encoding]
| - Others: with the suffix key bytes. [suffix key flag + size] +
    [suffix key] + [internal encoding]

| The [flag + size] is the byte with format as below:
| [type(2b)|size(6b)]. But all bits of size will be set to 1 when size
    beyond the limit, and there will be *varint32* writen after this and the
    size are the sum of 0x3F and value of variable size.The type are *full key*
    , *second key* and *suffix key*.

3. Internal Encoding

| In both of Plain and Prefix encoding data, internal encoding of the internal
 are encoded in the same way. The internal encoding seems as below:

+--------------+----------+-------------------------------------+
| label        | type     |             note                    |
+--------------+----------+-------------------------------------+
| type         | char     | row type(value, delete, merge, etc.)|
+--------------+----------+-------------------------------------+
| sequence ID  |  char[7] |                                     |
+--------------+----------+-------------------------------------+

| This can be compressed as below when no previous value for this key in
 the system.

+------+
| 0x80 |
+------+

1. Property
------------

| 1. data_size : the end of data part of the file.
| 2. fixed_key_len : length of the keys if all keys has the same length,
 0 otherwise.


In-Memory Index
-----------------

.. warning::
    the In-Memory Index was built by scan the Plain SSTable file. So this is
    not a part of Plain SSTable file now.

| On top level, In-memory Index is the hash table with each bucket to be either
 offset in the file or a binary search index. The binary search buffer is
 needed in two cases:

| 1. Hash Collisions: two or more prefixes are hashed to the same bucket.
| 2. Too many keys for one prefix: need to speed-up the look-up inside the
 prefix.

Format
```````

| The index consists of two piece fo memory: an array as hash buckets, and some
 binary search buffers.

+-------------+---------------------------------------------+
| record                                                    |
+-------------+---------------------------------------------+
| Flag(1b)    | Offset to binary search buffer or file(31b) |
+-------------+---------------------------------------------+

| 1. If Flag = 0 and Offset equals to the offset of end of the data of the file,
 it means NULL - no data for this bucket; if the offset is smaller, it means
 there is only one prefix for this hash bucket.
| 2. If Flag = 1, it means the offset is for binary search buffer.

The format of binary search buffer is as below:

+-------------------+-------------------------------------+
| label             | type                                |
+-------------------+-------------------------------------+
| number_of_records | varint32                            |
+-------------------+-------------------------------------+
| records           | fixed32[number_of_records]          |
+-------------------+-------------------------------------+