key-value结构如何存储表中各个索引,于是增加了

作者: www.9159.com  发布:2019-11-29

     Rocksdb是一个kv引擎,由facebook团队基于levelDB改进而来,Rocksdb采用LSM-tree存储数据,良好的读写特性以及压缩特性使得其非常受欢迎。此外,Rocksdb引擎作为插件已经集成在facebook维护的MySQL分支,用户可以通过SQL来访问rocksDB。本文主要通过分析Rocksdb引擎的记录格式,并通过对比innodb,来让大家了解Rocksdb。Rocksdb作为一个kv引擎,用户通过put(key,value)来写入key,或者通过get(key)接口来获取value,对rocksdb本身而言,每条记录都是一个key-value。当Rocksdb作为一个存储引擎接入到MySQL时,key-value结构如何存储表中各个索引,以及如何记录中各个列的信息是本文要具体讨论的。rocksdb引擎与innodb引擎类似,也是采用索引组织表,无论是表(主键索引)还是二级索引都是以LSM tree方式组织,rocksdb记录主要包括三部分,key,value和meta三部分内容,具体见下表,然后我通过介绍一条具体记录在rocksdb引擎中的存储格式来说明问题。

Using innodb_large_prefix to avoid ERROR 1071

       单列索引限制
上面有提到单列索引限制767,起因是256×3-1。这个3是字符最大占用空间(utf8)。但是在5.5以后,开始支持4个字节的uutf8。255×4>767, 于是增加了一个参数叫做 innodb_large_prefix。
这个参数默认值是OFF。当改为ON时,允许列索引最大达到3072。
       联合索引3072
我们知道InnoDB一个page的默认大小是16k。由于是Btree组织,要求叶子节点上一个page至少要包含两条记录(否则就退化链表了)。
所以一个记录最多不能超过8k。
又由于InnoDB的聚簇索引结构,一个二级索引要包含主键索引,因此每个单个索引不能超过4k (极端情况,pk和某个二级索引都达到这个限制)。
由于需要预留和辅助空间,扣掉后不能超过3500,取个“整数”就是(1024*3)。
具体参见丁奇的分享: http://www.mysqlops.com/2012/09/ ... 6%E7%9A%84tips.html

 

原文  http://mechanics.flite.com/blog/2014/07/29/using-innodb-large-prefix-to-avoid-error-1071/

If you’ve ever tried to add an index that includes a long varchar column to an InnoDB table in MySQL, you may have seen this error:

 

The character limit depends on the character set you use. For example if you uselatin1 then the largest column you can index is varchar(767) , but if you useutf8 then the limit is varchar(255) . There is also a separate 3072 byte limit per index. The 767 byte limit is per column, so you can include multiple columns (each 767 bytes or smaller) up to 3072 total bytes per index, but no column longer than 767 bytes. (MyISAM is a little different. It has a 1000 byte index length limit, but no separate column length limit within that).

One workaround for these limits is to only index a prefix of the longer columns, but what if you want to index more than 767 bytes of a column in InnoDB?

In that case you should consider using innodb_large_prefix , which was introduced in MySQL 5.5.14 and allows you to include columns up to 3072 bytes long in InnoDB indexes. It does not affect the index limit, which is still 3072 bytes as quoted inthe manual :

 

The InnoDB internal maximum key length is 3500 bytes, but MySQL itself restricts this to 3072 bytes. This limit applies to the length of the combined index key in a multi-column index.

 

Read on for details and examples about innodb_large_prefix .

Here are a few pre-requisites for using innodb_large_prefix :

  • At the database level you have to use innodb_file_format=BARRACUDA
  • At the table level you have to use ROW_FORMAT=DYNAMIC orROW_FORMAT=COMPRESSED

The default file format is still Antelope for backwards compatibility, and the default row format is COMPACT.

You can set both innodb_file_format and innodb_large_prefixdynamically, but you should also set them in my.cnf so they survive a restart.  innodb_file_format 默认值为Antelope,  innodb_large_prefixdynamically默认值为OFF.

Here’s an example. If I try to create this table with innodb_large_prefixdisabled I get an error:

 

If I enable innodb_large_prefix I can create the table successfully:

 

在/etc/my.cnf中添加这两行,效果一样: innodb_file_format=Barracuda,  innodb_large_prefix=ON

The examples are similar for latin1 , but I can use columns three times as long since it’s a single-byte character set.

 

And here’s what happens if I try to create an index longer than 3072 bytes, 1500*3=4500>3072:

 

只要各个column的各列不大于3072就不会报错:

create table if not exists tmp1 (
     x0 varchar(3072) not null,
     x1 varchar(3072) not null,
     x2 varchar(3072) not null,
     x3 varchar(3072) not null,
     unique index unique_index (x3)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

Query OK, 0 rows affected (0.23 sec)

有一column超过3072即报错:

mysql> create table if not exists tmp1 (
    ->      x3 varchar(3073) not null,
    ->      unique index unique_index (x3)
    ->     ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes

 

 

 

 

stackoverflow上的回答:

Using DYNAMIC or COMPRESSED means that InnoDB stores varchar/text/blob fields that don't fit in the page completely off-page. But other than those columns, which then only count 20 bytes per column, the InnoDB row size limit has not changed; it's still limited to about 8000 bytes per row.

InnoDB only supports indexes of 767 bytes per column. You can raise this 3072 bytes by setting innodb_large_prefix=1 and using either DYNAMIC or COMPRESSED row format.

Using COMPRESSED row format does not make InnoDB support longer indexes.

Regarding performance, this is one of those cases where "it depends." Compression is generally a tradeoff between storage size and CPU load to compress and uncompress. It's true that this takes a bit more CPU to work with compressed data, but you have to keep in mind that database servers are typically waiting for I/O and have CPU resources to spare.

But not always -- if you do complex queries against data that is in the buffer pool, you may be constrained by CPU more than I/O. So it depends on many factors, like how well your data fits in RAM, the type of queries you run and how many queries per second, as well as hardware specs. Too many factors for anyone else to be able to answer for your application on your server. You'll just have to test it.


Re your comment:

One possibility is that the index is not fitting in the buffer pool. Performance degrades significantly if an index search needs to load pages and evict pages during every SELECT query. An EXPLAIN analysis can't tell you whether the index fits in the buffer pool.

I don't know how many columns or what data types of the columns in your index, but if you are indexing long varchar columns you should consider using prefix indexes (or decreasing the length of the columns).

You could also get more RAM and increase the size of the buffer pool.

 

 

 

 

rocksdb基本记录存储格式

Official doc 5.6

Limits on InnoDB Tables

The InnoDB internal maximum key length is 3500 bytes, but MySQL itself restricts this to 3072 bytes. This limit applies to the length of the combined index key in a multi-column index.


The MyISAM Storage Engine

The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used.


However your table is declared as InnoDB. So I don't know what to think

There's also a clue at the end of this old bug

If you need this you should really look at MySQL 5.5 and the innodb_large_prefix option that is available from 5.5.14 (July 2011) onwards because it probably does what you are looking for:

"Enable this option to allow index key prefixes longer than 767 bytes (up to 3072 bytes), for InnoDB tables that use the DYNAMIC and COMPRESSED row formats. (Creating such tables also requires the option values innodb_file_format=barracuda and innodb_file_per_table=true.) See Section 13.3.15, “Limits on InnoDB Tables” for the relevant maximums associated with index key prefixes under various settings.

 

 

 

 

 

key_size

Enabling Compression for a Table

Before creating a compressed table, make sure the innodb_file_per_table configuration option is enabled, and innodb_file_format is set to Barracuda. You can set these parameters in the MySQL configuration file my.cnf or my.ini, or with theSET statement without shutting down the MySQL server.

To enable compression for a table, you use the clauses ROW_FORMAT=COMPRESSEDKEY_BLOCK_SIZE, or both in a CREATE TABLE or ALTER TABLE statement.

To create a compressed table, you might use statements like these:

SET GLOBAL innodb_file_per_table=1;
SET GLOBAL innodb_file_format=Barracuda;
CREATE TABLE t1
 (c1 INT PRIMARY KEY) 
 ROW_FORMAT=COMPRESSED 
 KEY_BLOCK_SIZE=8;
  • If you specify ROW_FORMAT=COMPRESSED, you can omit KEY_BLOCK_SIZE; the default compressed page size of 8KB is used.

  • If you specify KEY_BLOCK_SIZE, you can omit ROW_FORMAT=COMPRESSED; compression is enabled automatically.

  • To determine the best value for KEY_BLOCK_SIZE, typically you create several copies of the same table with different values for this clause, then measure the size of the resulting .ibd files and see how well each performs with a realistic workload.

  • For additional performance-related configuration options, see Section 14.12.3, “Tuning Compression for InnoDB Tables”.

The default uncompressed size of InnoDB data pages is 16KB. Depending on the combination of option values, MySQL uses a page size of 1KB, 2KB, 4KB, 8KB, or 16KB for the .ibd file of the table. The actual compression algorithm is not affected by theKEY_BLOCK_SIZE value; the value determines how large each compressed chunk is, which in turn affects how many rows can be packed into each compressed page.

Setting KEY_BLOCK_SIZE=16 typically does not result in much compression, since the normal InnoDB page size is 16KB. This setting may still be useful for tables with many long BLOBVARCHAR or TEXT columns, because such values often do compress well, and might therefore require fewer overflow pages as described in Section 14.12.5, “How Compression Works for InnoDB Tables”.

All indexes of a table (including the clustered index) are compressed using the same page size, as specified in the CREATE TABLE or ALTER TABLE statement. Table attributes such as ROW_FORMAT and KEY_BLOCK_SIZE are not part of the CREATE INDEXsyntax, and are ignored if they are specified (although you see them in the output of the SHOW CREATE TABLE statement).

key

Restrictions on Compressed Tables

Because MySQL versions prior to 5.1 cannot process compressed tables, using compression requires specifying the configuration parameter innodb_file_format=Barracuda, to avoid accidentally introducing compatibility issues.

Table compression is also not available for the InnoDB system tablespace. The system tablespace (space 0, the ibdata* files) can contain user data, but it also contains internal system information, and therefore is never compressed. Thus, compression applies only to tables (and indexes) stored in their own tablespaces, that is, created with the innodb_file_per_table option enabled.

Compression applies to an entire table and all its associated indexes, not to individual rows, despite the clause name ROW_FORMAT.

 

 

value_size

MyISAM和InnoDB的行格式ROW_FORMAT

MyISAM行存储

MyISAM有3种行存储格式:fixed/dynamic/compressed;

其中fixed为默认格式,只有当表不包含变长字段(varchar/varbinary/blob/text)时使用,该每行都是固定的,所以很容易获取行在页上的具体位置,存取效率比较高,但是占用磁盘空间较多;

dynamic

每行都有一个行头部,包含bitmap,用以记录那些列为空(NULL列不算为空);

相比于fixed,其有如下特性:

所有字符串列都是动态存储的,除非长度小于4;

字符类型若长度为0/数字类型为0都会不占用存储空间,由bitmap标注,NULL值不包含在内;

如果要update行,其扩展后很容易导致行链接既而产生碎片,一旦crash若link丢失则比较难恢复,fixed模式update不会产生碎片;

 

compressed只能通过myisampack创建且为只读;

 

MyISAM的索引文件包含一个flag记录基表是否正常关闭,如果mysqld启动时指定了--myisam-recover-options,则在打开表时检测并自动修复表

 

 

 

InnoDB行存储

Innodb plugin新引入Barracuda梭子鱼,其包含compressed/dynamic两种行格式,而之前的compact/redundant统属于antelope羚羊;

 

Barracuda VS antelope

由innodb_file_format(动态)参数决定,目前可选值由Antelope和Barracuda,默认为前者;要想要此参数生效,

因为共享表空间默认为Antelope,因此要想使用Barracuda为默认值,还必须先声明innodb_file_per_table;

Innodb_file_format用于控制行格式,全局变量可动态调整,5.5默认依旧是Antelope;

 

下面只看antelope格式

 

Redundant行结构

字段长度偏移列表

记录头信息

列1数据

列2数据

….

 

行头部为字段长度偏移信息,包括变长和非变长的, 还包含了3个隐藏列:RowID(没有主键时使用)/Transaction ID/Roll Point; 而compact只包含变长的,节约了空间;

冗余行格式没有NULL标志位;对于redundant格式,varchar为Null时不占用空间,但是char为NULL需要占用空间,因为其没有Null标志位;

记录头信息占用6个字节,比compact多1字节;

对于定长char,若为NULL依旧填充整个字段,而varchar为Null时不占用空间;

记录头信息,与compact相比,多了黑体字部分,缺失record_type

名称

长度bit

功能

Deleted_flag

1

是否被删除

Min_rec_flag

1

1则表示该记录为预先被定义的最小记录

N_owned

4

该记录拥有的总记录数

Heap_no

13

索引中该行的排序记录

N_fields

10

记录中列数量

1byte_offs_flag

1

偏移量列表是1字节还是2字节

Next_recorder

16

下一条记录相对位置

()

1

未知

()

1

未知

 

Create table test(t1 varchar(10), t2 varchar(10), t3 char(10),t4 varchar(10)) charset=latin1 row_format=redundant;

--该表有3个变长列

Insert into test values(‘a’,’bb’,’bb’,’ccc’);

使用hexdump –C –v test.idb查看其二进制代码

--长度偏移列表,

 

 

 

compact行格式

字段长度偏移列表

NULL标志位

记录头信息

列1数据

列2数据

….

 

5.0引入

行头存放该行内变长字段的length,当列小于255字节时占用1个字节,大于255而小于65535时占用2个字节;故varchar最大长度为2的16次方-1;

第2个指示该行是否有NULL值,占用1字节;NULL列不占用数据存储空间;

记录头信息:5个字节共计40bit,用于链接相邻的记录案的行级锁

名称

长度bit

功能

Deleted_flag

1

是否被删除

Min_rec_flag

1

1则表示该记录为预先被定义的最小记录

N_owned

4

该记录拥有的总记录数

Heap_no

13

索引中该行的排序记录

Record_type

3

行类型 0=普通 1=B+节点指针

Next_recorder

16

下一条记录相对位置

()

1

未知

()

1

未知

 

除此之外,每页还有两个隐含字段:

DB_TRX_ID:6字节,记录最近的一个事务标示符

DB_ROLL_ID:7字节,指向回滚日志记录

--若没有主键,则还会有DB_ROW_ID:6字节,包含在clustered索引中

创建一个compact行格式的表

Create table test(t1 varchar(10), t2 varchar(10), t3 char(10),t4 varchar(10)) row_format=compact;

--该表有3个变长列

Insert into test values(‘a’,’bb’,’bb’,’ccc’);

使用hexdump –C –v test.idb查看其二进制代码

第一行

03 02 01—变长字段长度列表(逆序),实际顺序为01 02 03,这也是t1,t2,t4的实际长度

00—Null标志位,第一行没有NULL

00 00 10 00 2c—记录头信息,5字节,后4个字节指向下一个记录next_recorder

00 00 00 2b 68 00—6字节rowid,因为没有主键

00 00 00 00 06 05 –事务ID,6字节

80 00 00 00 32 01 10—回滚指针,7字节

61 –列1

62 62 –列2

62 62 20 20 20 20 20 20 20 20 –列3,char会填充余下部分

63 63 63 –列4

 

余下的为列数据,其中t3由于采用固定长度,故会填充满10个字节;

第二行

Insert into test values(‘d’,null,null,’fff’);

03 01--变长字段长度列表,逆序

06-- Null标志位,有NULL值,转换为二进制00000110,表示第2/3列为null

……

64—列1数据

66 66 66—列4数据,而第2/3列为NULL不占用存储空间

注:对于redundant格式,varchar为Null时同样不占用空间,但是char为NULL需要占用空间,因为其没有Null标志位

 

 

行溢出

Innodb表为IOT,采用了B+数类型,故每个页面至少要存储2行数据,如果行过大则会产生行溢出;

理论上mysql的varchar可存储65525字节,强于oracle的4000,但对于InnoDB其实际上限为65532,且该值为表所有varchar列长度总和;对于utf8字符集,一个字符占3个字节,则其上限又缩小为1/3;

如果强行创建varchar(65535)的字段,在sql_mode不为restricted的情况下,其会被隐式转换为mediumtext;

 

不论是varchar还是blob/text,只要保证一个16k的页面能容下2行数据,应该不会行溢出;

而一旦行溢出,字段前768字节依旧存放于当前页面,数据一般使用B-tree Node页,而溢出的行存放于Uncompress Blob页;

 

而barracuda采用了完全行溢出,即只保留字段的前20字节;

 

 

 

 

value

PK/SecKey

Columns data

SeqenceId,flag 

create table row_format(  
id int not null,  
c1 int,  
c2 char(10) not null,  
c3 char(10),  
c4 varchar(10),  
c5 varchar(10) not null,  
c6 blob,  
c7 binary(10) not null,  
c8 varbinary(10)) engine=rocksdb;
insert into row_format(id,c2,c4,c5,c7) values(1,'abc','abc','efg','111')

 key部分:

Index_id

key

4bytes

8bytes

0x7fdfa4278ea0: 0x00    0x00    0x01    0x7b    0x00    0x00    0x00    0x00

0x7fdfa4278ea8: 0x00    0x00    0x00    0x05

Index_id:索引的编号,全局唯一。

rowid:由于表没有主键,系统会产生一个bigint类型的rowid作为主键,占用8个字节,而innodb引擎的rowid占6个字节,需要注意的是rowid存储采用的大端的存储(高位存储低字节),这里主要是为了memcompare。

Value部分:

 

Null-flag

ID

C1

C2

C3

C4

C5

C6

C7

C8

Length

1B

4B

----

30B

----

4B

4B

----

10B

----

Value

 

1

 

abc0x20…

 

len+value

len+value

 

1110x00…

 

0x7fdfa4251e50: 0x1b    0x01    0x00    0x00    0x00    0x61    0x62    0x63

0x7fdfa4251e58: 0x20    0x20    0x20    0x20    0x20    0x20    0x20    0x20

0x7fdfa4251e60: 0x20    0x20    0x20    0x20    0x20    0x20    0x20    0x20

0x7fdfa4251e68: 0x20    0x20    0x20    0x20    0x20    0x20    0x20    0x20

0x7fdfa4251e70: 0x20    0x20    0x20    0x03    0x61    0x62    0x63    0x03

0x7fdfa4251e78: 0x65    0x66    0x67    0x31    0x31    0x31    0x00    0x00

0x7fdfa4251e80: 0x00    0x00    0x00    0x00    0x00

说明:

  1. Value的最前面部分(0x1b)就是存放记录的null信息。根据记录中可以为null字段的个数,确认需要占用的字节数,如果小于8个,则只需要一个字节。例子中,c1,c3,c4,c6,c8均可以为null,因此需要5个bit,所以用1个byte表示Null-flag即可,由于插入记录中,c4不为null,则对应的bit为0,也就是0x00011011。
  2. 对于null,无论是定长还是非定长数据类型,都不占用真实的存储空间,只需要一个bit位来表示为null即可。
  3. 空串’’与null,上面提到了null需要占一个标记位,而对于’’,如果是变长字段仍然需要存储长度信息,对于定长字段,则会补全。
  4. 对于变长字段,比如varchar,0x03 0x61 0x62 0x63数据有len+data组成,如果数据长度小于256,len只需要占用一个byte;如果len大于255,且小于65536,则需要占用2个字节,对于longblob类型,则需要占4个字节。
  5. 对于定长字段,不需要存长度信息直接存储data,如果不足则补充。补充字符有点诧异,对于char类型,补充0x20,对于binary类型,补充0x00。
  6. 对于lob类型,比如tinyblob,blob,mediumblob,longblob,以及对应的text类型,处理策略与varchar类似,存储长度的字节数根据数据类型的范围确定,比如blob长度占用2个字节,而longblob的长度占4个字节。所以在rocksdb里面,没有innodb中所谓“溢出页”的概念。对于innodb引擎,如果blob字段内容超过768字节,多余的data存储在溢出页,页内通过20个字节指向溢出页,主要包括第一个blob页的space_id,page_no和起始偏移,如果存在多个blob页,则页与页之间通过类似的方式进行关联。具体可以参考btr0cur.h文件中关于BTR_EXTERN_xxx相关的宏定义,以及接口btr_copy_externally_stored_field_prefix_low。
  7. 有关value部分的存储实现可以参考rocksdb引擎接口convert_record_to_storage_format,convert_record_from_storage_format和innodb引擎接口row_mysql_store_col_in_innobase_format,row_sel_field_store_in_mysql_format。

Meta部分:

Meta部分主要是SequenceID,这个SequenceID在事务提交时产生,主要用于rocksDB实现MVCC,用于可见性判断,此外Meta中还包含flag信息,由于标示记录类型,put,delete,singleDelete等,具体而言Sequence占7个字节,flag占1个字节。

rocksdb索引格式

      Rocksdb中,所有的数据都是通过索引来组织,与Innodb类似,也是索引组织表,每个索引有一个全局唯一的index_id。索引主要包括两类:主键索引和二级索引,前面介绍的记录格式,也就是主键索引的格式,包括key,value和meta三部分。二级索引也包含key,value和meta三部分,但是value中不包含任何数据,只是包含checksum信息。

主键索引

key

Value

Meta

Index_id

PK

NULL标记位

列数据

Checksum(可选)

SeqId,flag

二级索引

key

Value

Meta

Index_id

SecondaryKey

PK

Checksum(可选)

SeqId,flag

对比innodb引擎(innodb_file_format=Barracuda,row_format=compact)

innodb记录格式

变长字段长度列表

NULL标记位

record_header

Trxid

Roll_ptr

列数据

create table row_format(  
id int not null,  
c1 int,  
c2 char(10) not null,  
c3 char(10),  
c4 varchar(10),  
c5 varchar(10) not null,  
c6 blob,  
c7 binary(10) not null,  
c8 varbinary(10)) engine=innodb;
insert into row_format(id,c2,c4,c5,c7) values(1,'1234','ab','efg','111');

记录内容:

0000c0b0  00 00 03 02 0a 1b 00 00  18 ff b5 00 00 00 00 28  |...............(|

0000c0c0  00 00 00 00 01 01 03 83  00 00 01 36 01 10 80 00  |...........6....|

0000c0d0  00 01 31 32 33 34 20 20  20 20 20 20 61 62 65 66  |..1234      abef|

0000c0e0  67 31 31 31 00 00 00 00  00 00 00 00 00 00 00 00  |g111............|

说明:

     1.   03 02 0a,这里存的是长度信息,所有非null的变长列信息都逆序存在一起,这里按先后顺序是c5,c4,c2,这里innodb将char(10)也当作变长字段处理了。

     2.  1b存储的是null信息,与rocksdb对null处理一致。00 00  18 ff b5存储的是record-header。

     3. 00 00 00 00 28 00 00 00 00 01 01 03 83  00 00 01 36 01 10, 这三部分别是rowid,trxid和roll_ptr,分别占6个字节,6个字节和7个字节。

     4. 最后一部分是数据,null不占任何存储空间,与rocksdb处理类似。区别在于对于char类型的处理,innodb将字段c2类型char(10)补齐到10个字节,存储为31 32 33 34 20 20 20 20 20 20,将其作为varchar处理,记录了长度信息;而Rocksdb则是补齐到30个字节(utf8字符集),作为char处理,不记录长度信息。

      整体而言,innodb记录格式包含了record_header(记录头信息),占5个字节,主要包括记录号(heap_no),列数目,下一条记录的位置以及是否删除等信息。rocksdb则相对简单,只有整体的value-size,以及通过Meta中flag标示记录的状态put 或者是delete。innodb将变长列长度信息集中存放在一起,使得查找任意列的代价都差不多,而rocksdb的变长列信息则是放在每列的前面,访问最后一列需要逐一计算前面的列,才能定位。此外,由于innodb引擎与rocksdb引擎由于实现MVCC的机制不同,导致innodb引擎和rocksdb引擎需要存储的额外信息也不同。Innodb实现MVCC依赖于回滚段信息,记录需要额外存储trxid和roll_ptr两个字段,分别是6个字节和7个字节(type,rsegid,pageNO,offset),其中type占一个bit位,标示insert 或者是update类型,rsegid回滚段id占7bit位,pageNo占4个字节,页内偏移占2个字节。Rocksdb实现MVCC则是依赖于SequenceID,通过SequenceID来判断记录的可见性,SequenceID占7个字节。

      细节上来说,RocksDB引擎和innodb引擎在处理null,char和varchar的方式类似,但innodb对于char类型做了优化,统一作为varchar处理。另外rocksdb引擎没有对blob做特殊处理。你可能会有疑问,rocksdb不是也有block_size吗,如果设置为16k,blob数据超过16k怎么办?对于innodb而言,由于表实质是以一个个page通过B-tree组织起来的,每个page是固定大小,当记录非常大时,就需要借助溢出页,通过链接的方式关联起来。而rocksdb中block_size只是一个压缩单位,并没有严格约束,文件内容以block组织,由于文件中block可能是压缩过的,因此每个block的大小不固定,通过偏移来定位具体某个block的位置。如果遇到大的blob数据,则可能这个block比较大,记录所有数据存储在一起,不会跨block。

      对于索引长度限制也有所不同,对于innodb引擎来说,索引中单列长度不能超过767个字节,而rocksdb引擎单列长度不超过2048个字节,具体可以参考max_supported_key_part_length各自的实现;整个索引的长度,rocksdb和innodb都限制在3072个字节,实际上是server层的限制,因为它们的各自限制的长度都比server层的大。具体可以参考各自max_supported_key_length的实现。

参考文档

 

本文由9159.com发布于www.9159.com,转载请注明出处:key-value结构如何存储表中各个索引,于是增加了

关键词:

上一篇:没有了
下一篇:没有了