7 MySQL Table Types
As of MySQL Version 3.23.6, you can choose between three basic
table formats (ISAM, HEAP and MyISAM). Newer
versions of MySQL support additional table types (InnoDB,
or BDB), depending on how you compile it. A database may contain
tables of different types.
When you create a new table, you can tell MySQL what type of table to create.
The default table type is usually MyISAM.
MySQL will always create a `.frm' file to hold the table and column definitions. The table's index and data will be stored in one or more other files, depending on the table type.
If you try to use a table type that is not compiled-in or activated,
MySQL will instead create a table of type MyISAM. This behavior
is convenient when you want to copy tables between MySQL servers that
support different table types. (Perhaps your master server supports
transactional storage engines for increased safety, while the slave servers use
only non-transactional storage engines for greater speed.)
This automatic change of table types can be confusing for new MySQL users. We plan to fix this by introducing warnings in the new client/server protocol in version 4.1 and generating a warning when a table type is automatically changed.
You can convert tables between different types with the ALTER
TABLE statement. See section 6.5.4 ALTER TABLE Syntax.
Note that MySQL supports two different kinds of
tables: transaction-safe tables (InnoDB and BDB)
and not transaction-safe tables (HEAP, ISAM,
MERGE, and MyISAM).
Advantages of transaction-safe tables (TST):
- Safer. Even if MySQL crashes or you get hardware problems, you can get your data back, either by automatic recovery or from a backup + the transaction log.
-
You can combine many statements and accept these all in one go with
the
COMMITcommand. -
You can execute
ROLLBACKto ignore your changes (if you are not running in auto-commit mode). - If an update fails, all your changes will be restored. (With NTST tables all changes that have taken place are permanent)
- Can provide better concurrency if the table gets many updates concurrently with reads.
Note that to use InnoDB tables you have to use at least the
innodb_data_file_path startup option. See section 7.5.3 InnoDB Startup Options.
Advantages of not transaction-safe tables (NTST):
- Much faster as there is no transaction overhead.
- Will use less disk space as there is no overhead of transactions.
- Will use less memory to do updates.
You can combine TST and NTST tables in the same statements to get the best of both worlds.
7.1 MyISAM Tables
MyISAM is the default table type in MySQL Version 3.23. It's
based on the ISAM code and has a lot of useful extensions.
The index is stored in a file with the `.MYI' (MYIndex) extension,
and the data is stored in a file with the `.MYD' (MYData) extension.
You can check/repair MyISAM tables with the myisamchk
utility. See section 4.5.6.7 Using myisamchk for Crash Recovery. You can compress MyISAM tables with
myisampack to take up much less space.
See section 4.8.4 myisampack, The MySQL Compressed Read-only Table Generator.
The following is new in MyISAM:
-
There is a flag in the
MyISAMfile that indicates whether the table was closed correctly. Ifmysqldis started with--myisam-recover,MyISAMtables will automatically be checked and/or repaired on open if the table wasn't closed properly. -
You can
INSERTnew rows in a table that doesn't have free blocks in the middle of the datafile, at the same time other threads are reading from the table (concurrent insert). A free block can come from an update of a dynamic length row with much data to a row with less data or when deleting rows. When all free blocks are used up, all future inserts will be concurrent again. - Support for big files (63-bit) on filesystems/operating systems that support big files.
- All data is stored with the low byte first. This makes the data machine and OS independent. The only requirement for binary portability is that the machine uses two's-complement signed integers (as every machine for the last 20 years has) and IEEE floating-point format (also totally dominant among mainstream machines). The only area of machines that may not support binary compatibility are embedded systems (because they sometimes have peculiar processors). There is no big speed penalty in storing data low byte first; the bytes in a table row is normally unaligned and it doesn't take that much more power to read an unaligned byte in order than in reverse order. The actual fetch-column-value code is also not time critical compared to other code.
- All number keys are stored with high byte first to give better index compression.
-
Internal handling of one
AUTO_INCREMENTcolumn.MyISAMwill automatically update this onINSERT/UPDATE. TheAUTO_INCREMENTvalue can be reset withmyisamchk. This will makeAUTO_INCREMENTcolumns faster (at least 10%) and old numbers will not be reused as with the oldISAM. Note that when anAUTO_INCREMENTis defined on the end of a multi-part-key the old behavior is still present. -
When inserted in sorted order (as when you are using an
AUTO_INCREMENTcolumn) the key tree will be split so that the high node only contains one key. This will improve the space utilisation in the key tree. -
BLOBandTEXTcolumns can be indexed. -
NULLvalues are allowed in indexed columns. This takes 0-1 bytes/key. - Maximum key length is 500 bytes by default (can be changed by recompiling). In cases of keys longer than 250 bytes, a bigger key block size than the default of 1024 bytes is used for this key.
-
Maximum number of keys/table is 32 as default. This can be enlarged to 64
without having to recompile
myisamchk. -
myisamchkwill mark tables as checked if one runs it with--update-state.myisamchk --fastwill only check those tables that don't have this mark. -
myisamchk -astores statistics for key parts (and not only for whole keys as inISAM). - Dynamic size rows will now be much less fragmented when mixing deletes with updates and inserts. This is done by automatically combining adjacent deleted blocks and by extending blocks if the next block is deleted.
-
myisampackcan packBLOBandVARCHARcolumns. -
You can use put the datafile and index file on different directories
to get more speed (with the
DATA/INDEX DIRECTORY="path"option toCREATE TABLE). See section 6.5.3CREATE TABLESyntax.
MyISAM also supports the following things, which MySQL
will be able to use in the near future:
-
Support for a true
VARCHARtype; aVARCHARcolumn starts with a length stored in 2 bytes. -
Tables with
VARCHARmay have fixed or dynamic record length. -
VARCHARandCHARmay be up to 64K. All key segments have their own language definition. This will enable MySQL to have different language definitions per column. -
A hashed computed index can be used for
UNIQUE. This will allow you to haveUNIQUEon any combination of columns in a table. (You can't search on aUNIQUEcomputed index, however.)
Note that index files are usually much smaller with MyISAM than with
ISAM. This means that MyISAM will normally use less
system resources than ISAM, but will need more CPU time when inserting
data into a compressed index.
The following options to mysqld can be used to change the behavior of
MyISAM tables. See section 4.6.8.4 SHOW VARIABLES.
| Option | Description |
--myisam-recover=# | Automatic recovery of crashed tables. |
-O myisam_sort_buffer_size=# | Buffer used when recovering tables. |
--delay-key-write=ALL | Don't flush key buffers between writes for any MyISAM table |
-O myisam_max_extra_sort_file_size=# | Used to help MySQL to decide when to use the slow but safe key cache index create method. Note that this parameter is given in megabytes before 4.0.3 and in bytes beginning with this version. |
-O myisam_max_sort_file_size=# | Don't use the fast sort index method to created index if the temporary file would get bigger than this. Note that this parameter is given in megabytes before 4.0.3 and in bytes beginning with this version. |
-O bulk_insert_buffer_size=# | Size of tree cache used in bulk insert optimization. Note that this is a limit per thread! |
The automatic recovery is activated if you start mysqld with
--myisam-recover=#. See section 4.1.1 mysqld Command-line Options.
On open, the table is checked if it's marked as crashed or if the open
count variable for the table is not 0 and you are running with
--skip-external-locking. If either of the above is true the following
happens.
- The table is checked for errors.
- If we found an error, try to do a fast repair (with sorting and without re-creating the datafile) of the table.
- If the repair fails because of an error in the datafile (for example a duplicate key error), we try again, but this time we re-create the datafile.
- If the repair fails, retry once more with the old repair option method (write row by row without sorting) which should be able to repair any type of error with little disk requirements..
If the recover wouldn't be able to recover all rows from a previous
completed statement and you didn't specify FORCE as an option to
myisam-recover, then the automatic repair will abort with an error
message in the error file:
Error: Couldn't repair table: test.g00pages
If you in this case had used the FORCE option you would instead have got
a warning in the error file:
Warning: Found 344 of 354 rows when repairing ./test/g00pages
Note that if you run automatic recover with the BACKUP option,
you should have a cron script that automatically moves file with names
like `tablename-datetime.BAK' from the database directories to a
backup media.
See section 4.1.1 mysqld Command-line Options.
7.1.1 Space Needed for Keys
MySQL can support different index types, but the normal type is
ISAM or MyISAM. These use a B-tree index, and you can roughly calculate
the size for the index file as (key_length+4)/0.67, summed over
all keys. (This is for the worst case when all keys are inserted in
sorted order and we don't have any compressed keys.)
String indexes are space compressed. If the first index part is a
string, it will also be prefix compressed. Space compression makes the
index file smaller than the above figures if the string column has a lot
of trailing space or is a VARCHAR column that is not always used
to the full length. Prefix compression is used on keys that start
with a string. Prefix compression helps if there are many strings
with an identical prefix.
In MyISAM tables, you can also prefix compress numbers by specifying
PACK_KEYS=1 when you create the table. This helps when you have
many integer keys that have an identical prefix when the numbers are stored
high-byte first.
7.1.2 MyISAM Table Formats
MyISAM supports 3 different table types. Two of them are chosen
automatically depending on the type of columns you are using. The third,
compressed tables, can only be created with the myisampack tool.
When you CREATE or ALTER a table that doesn't have
BLOB values, you can force the table format to DYNAMIC or
FIXED with the ROW_FORMAT=# table option. In the future
you will be able to compress/decompress tables by specifying
ROW_FORMAT=compressed | default to ALTER TABLE.
See section 6.5.3 CREATE TABLE Syntax.
7.1.2.1 Static (Fixed-length) Table Characteristics
This is the default format. It's used when the table contains no
VARCHAR, BLOB, or TEXT columns.
This format is the simplest and most secure format. It is also the fastest of the on-disk formats. The speed comes from the easy way data can be found on disk. When looking up something with an index and static format it is very simple. Just multiply the row number by the row length.
Also, when scanning a table it is very easy to read a constant number of records with each disk read.
The security is evidenced if your computer crashes when writing to a
fixed-size MyISAM file, in which case myisamchk can easily figure out where each
row starts and ends. So it can usually reclaim all records except the
partially written one. Note that in MySQL all indexes can always be
reconstructed:
-
All
CHAR,NUMERIC, andDECIMALcolumns are space-padded to the column width. - Very quick.
- Easy to cache.
- Easy to reconstruct after a crash, because records are located in fixed positions.
-
Doesn't have to be reorganized (with
myisamchk) unless a huge number of records are deleted and you want to return free disk space to the operating system. - Usually requires more disk space than dynamic tables.
7.1.2.2 Dynamic Table Characteristics
This format is used if the table contains any VARCHAR, BLOB,
or TEXT columns or if the table was created with
ROW_FORMAT=dynamic.
This format is a little more complex because each row has to have a header that says how long it is. One record can also end up at more than one location when it is made longer at an update.
You can use OPTIMIZE table or myisamchk to defragment a
table. If you have static data that you access/change a lot in the same
table as some VARCHAR or BLOB columns, it might be a good
idea to move the dynamic columns to other tables just to avoid
fragmentation:
- All string columns are dynamic (except those with a length less than 4).
-
Each record is preceded by a bitmap indicating which columns are empty
(
'') for string columns, or zero for numeric columns. (This isn't the same as columns containingNULLvalues.) If a string column has a length of zero after removal of trailing spaces, or a numeric column has a value of zero, it is marked in the bit map and not saved to disk. Non-empty strings are saved as a length byte plus the string contents. - Usually takes much less disk space than fixed-length tables.
- Each record uses only as much space as is required. If a record becomes larger, it is split into as many pieces as are required. This results in record fragmentation.
-
If you update a row with information that extends the row length, the
row will be fragmented. In this case, you may have to run
myisamchk -rfrom time to time to get better performance. Usemyisamchk -ei tbl_namefor some statistics. - Not as easy to reconstruct after a crash, because a record may be fragmented into many pieces and a link (fragment) may be missing.
-
The expected row length for dynamic sized records is:
3 + (number of columns + 7) / 8 + (number of char columns) + packed size of numeric columns + length of strings + (number of NULL columns + 7) / 8
There is a penalty of 6 bytes for each link. A dynamic record is linked whenever an update causes an enlargement of the record. Each new link will be at least 20 bytes, so the next enlargement will probably go in the same link. If not, there will be another link. You may check how many links there are withmyisamchk -ed. All links may be removed withmyisamchk -r.
7.1.2.3 Compressed Table Characteristics
This is a read-only type that is generated with the optional
myisampack tool (pack_isam for ISAM tables):
-
All MySQL distributions, even those that existed before MySQL
went
GPL, can read tables that were compressed withmyisampack. - Compressed tables take very little disk space. This minimises disk usage, which is very nice when using slow disks (like CD-ROMs).
-
Each record is compressed separately (very little access overhead). The
header for a record is fixed (1-3 bytes) depending on the biggest record in the
table. Each column is compressed differently. Some of the compression types
are:
- There is usually a different Huffman table for each column.
- Suffix space compression.
- Prefix space compression.
-
Numbers with value
0are stored using 1 bit. -
If values in an integer column have a small range, the column is stored using
the smallest possible type. For example, a
BIGINTcolumn (8 bytes) may be stored as aTINYINTcolumn (1 byte) if all values are in the range0to255. -
If a column has only a small set of possible values, the column type is
converted to
ENUM. - A column may use a combination of the above compressions.
- Can handle fixed- or dynamic-length records.
-
Can be uncompressed with
myisamchk.
7.1.3 MyISAM Table Problems
The file format that MySQL uses to store data has been extensively tested, but there are always circumstances that may cause database tables to become corrupted.
7.1.3.1 Corrupted MyISAM Tables
Even if the MyISAM table format is very reliable (all changes to a table is written before the SQL statements returns) , you can still get corrupted tables if some of the following things happens:
-
The
mysqldprocess being killed in the middle of a write. - Unexpected shutdown of the computer (for example, if the computer is turned off).
- A hardware error.
- You are using an external program (like myisamchk) on a live table.
- A software bug in the MySQL or MyISAM code.
Typial typical symptoms for a corrupt table is:
-
You get the error
Incorrect key file for table: '...'. Try to repair itwhile selecting data from the table. - Queries doesn't find rows in the table or returns incomplete data.
You can check if a table is ok with the command CHECK
TABLE. See section 4.5.4 CHECK TABLE Syntax.
You can repair a corrupted table with REPAIR TABLE. See section 4.5.5 REPAIR TABLE Syntax.
You can also repair a table, when mysqld is not running with
the myisamchk command. myisamchk syntax.
If your tables get corrupted a lot you should try to find the reason for this! See section A.4.1 What To Do If MySQL Keeps Crashing.
In this case the most important thing to know is if the table got
corrupted if the mysqld died (one can easily verify this by
checking if there is a recent row restarted mysqld in the mysqld
error file). If this isn't the case, then you should try to make a test
case of this. See section E.1.6 Making a Test Case If You Experience Table Corruption.
7.1.3.2 Clients is using or hasn't closed the table properly
Each MyISAM `.MYI' file has in the header a counter that can
be used to check if a table has been closed properly.
If you get the following warning from CHECK TABLE or myisamchk:
# clients is using or hasn't closed the table properly
this means that this counter has come out of sync. This doesn't mean that the table is corrupted, but means that you should at least do a check on the table to verify that it's okay.
The counter works as follows:
- The first time a table is updated in MySQL, a counter in the header of the index files is incremented.
- The counter is not changed during further updates.
-
When the last instance of a table is closed (because of a
FLUSHor because there isn't room in the table cache) the counter is decremented if the table has been updated at any point. - When you repair the table or check the table and it was okay, the counter is reset to 0.
- To avoid problems with interaction with other processes that may do a check on the table, the counter is not decremented on close if it was 0.
In other words, the only ways this can go out of sync are:
-
The
MyISAMtables are copied without aLOCKandFLUSH TABLES. - MySQL has crashed between an update and the final close. (Note that the table may still be okay, as MySQL always issues writes for everything between each statement.)
-
Someone has done a
myisamchk --recoverormyisamchk --update-stateon a table that was in use bymysqld. -
Many
mysqldservers are using the table and one has done aREPAIRorCHECKof the table while it was in use by another server. In this setup theCHECKis safe to do (even if you will get the warning from other servers), butREPAIRshould be avoided as it currently replaces the datafile with a new one, which is not signaled to the other servers.
7.2 MERGE Tables
MERGE tables are new in MySQL Version 3.23.25. The code
is still in gamma, but should be reasonable stable.
A MERGE table (also known as a MRG_MyISAM table) is a
collection of identical MyISAM tables that can be used as one.
You can only SELECT, DELETE, and UPDATE from the
collection of tables. If you DROP the MERGE table, you
are only dropping the MERGE specification.
Note that DELETE FROM merge_table used without a WHERE
will only clear the mapping for the table, not delete everything in the
mapped tables. (We plan to fix this in 4.1).
With identical tables we mean that all tables are created with identical
column and key information. You can't merge tables in which the
columns are packed differently, doesn't have exactly the same columns,
or have the keys in different order. However, some of the tables can be
compressed with myisampack. See section 4.8.4 myisampack, The MySQL Compressed Read-only Table Generator.
When you create a MERGE table, you will get a `.frm' table
definition file and a `.MRG' table list file. The `.MRG' just
contains a list of the index files (`.MYI' files) that should
be used as one. Before 4.1.1 all used tables had to be in the same
database as the MERGE table itself.
For the moment, you need to have SELECT, UPDATE, and
DELETE privileges on the tables you map to a MERGE table.
MERGE tables can help you solve the following problems:
-
Easily manage a set of log tables. For example, you can put data from
different months into separate files, compress some of them with
myisampack, and then create aMERGEto use these as one. -
Give you more speed. You can split a big read-only table based on some
criteria and then put the different table part on different disks.
A
MERGEtable on this could be much faster than using the big table. (You can, of course, also use a RAID to get the same kind of benefits.) -
Do more efficient searches. If you know exactly what you are looking
after, you can search in just one of the split tables for some queries
and use a
MERGEtable for others. You can even have many differentMERGEtables active, with possible overlapping files. -
More efficient repairs. It's easier to repair the individual files that
are mapped to a
MERGEfile than trying to repair a really big file. -
Instant mapping of many files as one. A
MERGEtable uses the index of the individual tables. It doesn't need to maintain an index of its one. This makesMERGEtable collections VERY fast to make or remap. Note that you must specify the key definitions when you create aMERGEtable!. -
If you have a set of tables that you join to a big table on demand or
batch, you should instead create a
MERGEtable on them on demand. This is much faster and will save a lot of disk space. - Go around the file-size limit for the operating system.
-
You can create an alias/synonym for a table by just using
MERGEover one table. There shouldn't be any really notable performance impacts of doing this (only a couple of indirect calls andmemcpy()calls for each read).
The disadvantages with MERGE tables are:
-
You can only use identical
MyISAMtables for aMERGEtable. -
REPLACEdoesn't work. -
MERGEtables uses more file descriptors. If you are using aMERGEtable that maps over 10 tables and 10 users are using this, you are using 10*10 + 10 file descriptors. (10 datafiles for 10 users and 10 shared index files.) -
Key reads are slower. When you do a read on a key, the
MERGEstorage engine will need to issue a read on all underlying tables to check which one most closely matches the given key. If you then do a "read-next" then theMERGEstorage engine will need to search the read buffers to find the next key. Only when one key buffer is used up, the storage engine will need to read the next key block. This makesMERGEkeys much slower oneq_refsearches, but not much slower onrefsearches. See section 5.2.1EXPLAINSyntax (Get Information About aSELECT). -
You can't do
DROP TABLE,ALTER TABLE,DELETE FROM table_namewithout aWHEREclause,REPAIR TABLE,TRUNCATE TABLE,OPTIMIZE TABLE, orANALYZE TABLEon any of the table that is mapped by aMERGEtable that is "open". If you do this, theMERGEtable may still refer to the original table and you will get unexpected results. The easiest way to get around this deficiency is to issue theFLUSH TABLEScommand, ensuring noMERGEtables remain "open".
When you create a MERGE table you have to specify with
UNION=(list-of-tables) which tables you want to use as
one. Optionally you can specify with INSERT_METHOD if you want
insert for the MERGE table to happen in the first or last table
in the UNION list. If you don't specify INSERT_METHOD or
specify NO, then all INSERT commands on the MERGE
table will return an error.
The following example shows you how to use MERGE tables:
CREATE TABLE t1 (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, message CHAR(20));
CREATE TABLE t2 (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, message CHAR(20));
INSERT INTO t1 (message) VALUES ("Testing"),("table"),("t1");
INSERT INTO t2 (message) VALUES ("Testing"),("table"),("t2");
CREATE TABLE total (a INT NOT NULL AUTO_INCREMENT, message CHAR(20), KEY(a))
TYPE=MERGE UNION=(t1,t2) INSERT_METHOD=LAST;
SELECT * FROM total;
Note that we didn't create a UNIQUE or PRIMARY KEY in the
total table as the key isn't going to be unique in the total
table.
Note that you can also manipulate the `.MRG' file directly from the outside of the MySQL server:
shell> cd /mysql-data-directory/current-database shell> ls -1 t1.MYI t2.MYI > total.MRG shell> mysqladmin flush-tables
Now you can do things like:
mysql> SELECT * FROM total; +---+---------+ | a | message | +---+---------+ | 1 | Testing | | 2 | table | | 3 | t1 | | 1 | Testing | | 2 | table | | 3 | t2 | +---+---------+
Note that the a column, though declared as PRIMARY KEY,
is not really unique, as MERGE table cannot enforce uniqueness
over a set of underlying MyISAM tables.
To remap a MERGE table you can do one of the following:
-
DROPthe table and re-create it -
Use
ALTER TABLE table_name UNION=(...) -
Change the `.MRG' file and issue a
FLUSH TABLEon theMERGEtable and all underlying tables to force the storage engine to read the new definition file.
If you use ALTER TABLE to change a MERGE table to another
table type, the mapping to the underlying tables is lost. Instead, the rows
from the underlying MyISAM tables are copied into the altered table,
which then is assigned the new type.
7.2.1 MERGE Table Problems
The following are the known problems with MERGE tables:
-
A
MERGEtable cannot maintainUNIQUEconstraints over the whole table. When you doINSERT, the data goes into the first or last table (according toINSERT_METHOD=xxx) and thisMyISAMtable ensures that the data are unique, but it knows nothing about othersMyISAMtables. -
DELETE FROM merge_tableused without aWHEREwill only clear the mapping for the table, not delete everything in the mapped tables. -
RENAME TABLEon a table used in an activeMERGEtable may corrupt the table. This will be fixed in MySQL 4.1.x. -
Creation of a table of type
MERGEdoesn't check if the underlying tables are of compatible types or if they exists. MySQL will do a quick check if the record length is equal between mapped tables when theMERGEtable is used, but this is not a fullproof check. If you useMERGEtables in this fashion, you are very likely to run into strange problems. -
If you use
ALTER TABLEto first add anUNIQUEindex to a table used in aMERGEtable and then useALTER TABLEto add a normal index on theMERGEtable, the key order will be different for the tables if there was an old non-unique key in the table. This is becauseALTER TABLEputsUNIQUEkeys before normal keys to be able to detect duplicate keys as early as possible. -
DROP TABLEon a table that is in use by aMERGEtable will not work on Windows because theMERGEstorage engine does the table mapping hidden from the upper layer of MySQL. Because Windows doesn't allow you to drop files that are open, you first must flush allMERGEtables (withFLUSH TABLES) or drop theMERGEtable before dropping the table. We will fix this at the same time we introduce views.
7.3 ISAM Tables
The deprecated ISAM table type will disappear in MySQL version 5.0.
In MySQL 4.1 it's included in the source but not compiled anymore.
MyISAM is a better implementation of this table handler and you should
convert all ISAM tables to MySAM tables as soon as possible.
ISAM uses a B-tree index. The index is stored in a file
with the `.ISM' extension, and the data is stored in a file with
the `.ISD' extension.
You can check/repair ISAM tables with the isamchk utility.
See section 4.5.6.7 Using myisamchk for Crash Recovery.
ISAM has the following features/properties:
- Compressed and fixed-length keys
- Fixed and dynamic record length
- 16 keys with 16 key parts/key
- Max key length 256 (default)
- Data is stored in machine format; this is fast, but is machine/OS dependent.
Most of the things true for MyISAM tables are also true for ISAM
tables. See section 7.1 MyISAM Tables. The major differences compared
to MyISAM tables are:
ISAMtables are not binary portable across OS/Platforms.- Can't handle tables > 4G.
- Only support prefix compression on strings.
- Smaller key limits.
- Dynamic tables get more fragmented.
- Tables are compressed with
pack_isamrather than withmyisampack.
If you want to convert an ISAM table to a MyISAM table so
that you can use utilities such as mysqlcheck, use an ALTER
TABLE statement:
mysql> ALTER TABLE tbl_name TYPE = MYISAM;
The embedded MySQL versions doesn't support ISAM tables.
7.4 HEAP Tables
HEAP tables use hashed indexes and are stored in memory. This
makes them very fast, but if MySQL crashes you will lose all
data stored in them. HEAP is very useful for temporary tables!
The MySQL internal HEAP tables use 100% dynamic hashing
without overflow areas. There is no extra space needed for free lists.
HEAP tables also don't have problems with delete + inserts, which
normally is common with hashed tables:
mysql> CREATE TABLE test TYPE=HEAP SELECT ip,SUM(downloads) AS down
-> FROM log_table GROUP BY ip;
mysql> SELECT COUNT(ip),AVG(down) FROM test;
mysql> DROP TABLE test;
Here are some things you should consider when you use HEAP tables:
-
You should always use specify
MAX_ROWSin theCREATEstatement to ensure that you accidentally do not use all memory. -
Indexes will only be used with
=and<=>(but are VERY fast). -
HEAPtables can only use whole keys to search for a row; compare this toMyISAMtables where any prefix of the key can be used to find rows. -
HEAPtables use a fixed record length format. -
HEAPdoesn't supportBLOB/TEXTcolumns. -
HEAPdoesn't supportAUTO_INCREMENTcolumns. -
Prior to MySQL 4.0.2,
HEAPdoesn't support an index on aNULLcolumn. -
You can have non-unique keys in a
HEAPtable (this isn't common for hashed tables). -
HEAPtables are shared between all clients (just like any other table). -
You can't search for the next entry in order (that is, to use the index
to do an
ORDER BY). -
Data for
HEAPtables are allocated in small blocks. The tables are 100% dynamic (on inserting). No overflow areas and no extra key space are needed. Deleted rows are put in a linked list and are reused when you insert new data into the table. -
You need enough extra memory for all
HEAPtables that you want to use at the same time. -
To free memory, you should execute
DELETE FROM heap_table,TRUNCATE heap_tableorDROP TABLE heap_table. -
MySQL cannot find out approximately how many rows there
are between two values (this is used by the range optimizer to decide which
index to use). This may affect some queries if you change a
MyISAMtable to aHEAPtable. -
To ensure that you accidentally don't do anything foolish, you can't create
HEAPtables bigger thanmax_heap_table_size.
The memory needed for one row in a HEAP table is:
SUM_OVER_ALL_KEYS(max_length_of_key + sizeof(char*) * 2) + ALIGN(length_of_row+1, sizeof(char*))
sizeof(char*) is 4 on 32-bit machines and 8 on 64-bit machines.
7.5 InnoDB Tables
7.5.1 InnoDB Tables Overview
InnoDB provides MySQL with a transaction-safe (ACID compliant)
storage engine with commit, rollback, and crash recovery capabilities.
InnoDB does locking on row level and also provides an Oracle-style
consistent
non-locking read in SELECT statements. These features increase
multiuser concurrency and performance. There is no need for
lock escalation in InnoDB,
because row level locks in InnoDB fit in very small space.
InnoDB is the first storage manager in MySQL to support
FOREIGN KEY constraints.
InnoDB has been designed for maximum performance when processing
large data volumes. Its CPU efficiency is probably not
matched by any other disk-based relational database engine.
InnoDB is used in production at numerous
large database sites requiring high performance.
The famous Internet news site Slashdot.org runs on
InnoDB. Mytrix, Inc. stores over 1 TB of data in
InnoDB, and another site handles an average
load of 800 inserts/updates per second in InnoDB.
Technically, InnoDB is a complete database backend placed under MySQL.
InnoDB has its own buffer pool for caching data and indexes in main
memory. InnoDB stores its tables and indexes in a tablespace, which
may consist of several files (or raw disk partitions).
This is different from, for example,
MyISAM tables where each table is stored as a separate file.
InnoDB tables can be of any size even on operating
systems where file-size is limited to 2 GB.
You can find the latest information about InnoDB at
http://www.innodb.com/. The most up-to-date version of the
InnoDB manual is always placed there.
InnoDB is published under the same GNU GPL License Version 2
(of June 1991) as MySQL. If you distribute MySQL/InnoDB, and your application
does not satisfy the restrictions of the GPL license, you have to buy a
commercial
MySQL Pro license from https://order.mysql.com/?sub=pg&pg_no=1.
7.5.2 InnoDB in MySQL Version 3.23
From MySQL version 4.0, InnoDB is enabled by default.
The following information only applies to the 3.23 series.
InnoDB tables are included in the MySQL source distribution starting from 3.23.34a and are activated in the MySQL -Max binary of the 3.23 series. For Windows the -Max binaries are contained in the standard distribution.
If you have downloaded a binary version of MySQL that includes
support for InnoDB, simply follow the instructions of the
MySQL manual
for installing a binary version of MySQL. If you already have
MySQL-3.23 installed, then the simplest way to install
MySQL -Max is to replace the server executable `mysqld'
with the corresponding executable in the -Max distribution.
MySQL and MySQL -Max differ only in the server executable.
See section 2.2.6 Installing a MySQL Binary Distribution.
See section 4.8.5 mysqld-max, An Extended mysqld Server.
To compile MySQL with InnoDB support,
download MySQL-3.23.34a or newer version from
http://www.mysql.com/
and configure MySQL with the
--with-innodb option. See the
MySQL manual
about installing a MySQL source distribution.
See section 2.3 Installing a MySQL Source Distribution.
cd /path/to/source/of/mysql-3.23.37 ./configure --with-innodb
To use InnoDB tables in MySQL-Max-3.23 you must specify
configuration parameters in the [mysqld] section of the
configuration file `my.cnf', or on Windows optionally in
`my.ini'.
At the minimum, in 3.23 you must specify innodb_data_file_path
where you specify the names and the sizes of datafiles. If you do
not mention innodb_data_home_dir in `my.cnf' the default
is to create these files to the datadir of MySQL.
If you specify innodb_data_home_dir as an empty string,
then you can give absolute paths to your datafiles in
innodb_data_file_path.
The minimal way
to modify it is to add to the [mysqld] section the line
innodb_data_file_path=ibdata:30M
but to get good performance it is best that you specify options as recommended. See section 7.5.3 InnoDB Startup Options.
7.5.3 InnoDB Startup Options
To enable InnoDB tables in MySQL version 3.23, see
section 7.5.2 InnoDB in MySQL Version 3.23.
In MySQL-4.0 you are not required to do anything specific to
enable InnoDB tables.
The default behavior in MySQL-4.0 and MySQL-4.1 is to
create an auto-extending 10 MB file
`ibdata1' in the datadir of MySQL and
two 5 MB `ib_logfile' log files to the datadir.
(In MySQL-4.0.0 and 4.0.1 the datafile is 64 MB and not auto-extending.)
Note: To get good performance you should explicitly set the InnoDB parameters listed in the following examples.
If you don't want to use InnoDB tables, you can add the
skip-innodb option to your MySQL option file.
Starting from versions 3.23.50 and 4.0.2, InnoDB allows the last
datafile on the innodb_data_file_path line
to be specified as auto-extending. The syntax for
innodb_data_file_path is then the following:
pathtodatafile:sizespecification;pathtodatafile:sizespecification;... ... ;pathtodatafile:sizespecification[:autoextend[:max:sizespecification]]
If you specify the last datafile with the autoextend option,
InnoDB
will extend the last datafile if it runs out of free space in the
tablespace. The increment is 8 MB at a time. An example:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextend
instructs InnoDB to create just a single datafile whose initial size is
100 MB and which is extended in 8 MB blocks when space runs out.
If the disk becomes full you may want to add another datafile
to another disk, for example. Then you have to look at the size
of `ibdata1', round the size downward to
the closest multiple of 1024 * 1024 bytes (= 1 MB), and specify
the rounded size of `ibdata1' explicitly in
innodb_data_file_path.
After that you can add another datafile:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend
Be cautious on filesystems where the maximum file-size is 2 GB. InnoDB is not aware of the OS maximum file-size. On those filesystems you might want to specify the max size for the datafile:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextend:max:2000M
A simple `my.cnf' example. Suppose you have a computer
with 128 MB RAM and one hard disk. Below is an example of
possible configuration parameters in `my.cnf' or
`my.ini' for InnoDB. We assume you are running
MySQL-Max-3.23.50 or later, or MySQL-4.0.2 or later.
This example suits most users, both on Unix and Windows,
who do not want to distribute InnoDB datafiles and
log files on several disks. This creates an
auto-extending datafile `ibdata1' and two InnoDB log files
`ib_logfile0' and `ib_logfile1' to the
datadir of MySQL (typically `/mysql/data').
Also the small archived InnoDB log file
`ib_arch_log_0000000000' ends up in the datadir.
[mysqld] # You can write your other MySQL server options here # ... # Datafile(s) must be able to # hold your data and indexes. # Make sure you have enough # free disk space. innodb_data_file_path = ibdata1:10M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory set-variable = innodb_buffer_pool_size=70M set-variable = innodb_additional_mem_pool_size=10M # Set the log file size to about # 25 % of the buffer pool size set-variable = innodb_log_file_size=20M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit # to 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1
Check that the MySQL server has the rights to create files in
datadir.
Note that datafiles must be < 2 GB in some file systems! The combined size of the log files must be < 4 GB. The combined size of datafiles must be >= 10 MB.
When you for the first time create an InnoDB database, it is best that you start the MySQL server from the command prompt. Then InnoDB will print the information about the database creation to the screen, and you see what is happening. See below next section what the printout should look like. For example, in Windows you can start `mysqld-max.exe' with:
your-path-to-mysqld\mysqld-max --console
Where to put `my.cnf' or `my.ini' in Windows? The rules for Windows are the following:
- Only one of `my.cnf' or `my.ini' should be created.
- The `my.cnf' file should be placed in the root directory of the drive `C:'.
- The `my.ini' file should be placed in the WINDIR directory, e.g,
`C:\WINDOWS' or `C:\WINNT'. You can use the
SETcommand of MS-DOS to print the value of WINDIR. - If your PC uses a boot loader where the `C:' drive is not the boot drive, then your only option is to use the `my.ini' file.
Where to specify options in Unix? On Unix `mysqld' reads options from the following files, if they exist, in the following order:
- `/etc/my.cnf' Global options.
- `COMPILATION_DATADIR/my.cnf' Server-specific options.
- `defaults-extra-file' The file specified with
--defaults-extra-file=.... - `~/.my.cnf' User-specific options.
`COMPILATION_DATADIR' is the MySQL data directory which was
specified as a ./configure option when `mysqld'
was compiled
(typically `/usr/local/mysql/data' for a binary installation or `/usr/local/var' for a source installation).
If you are not sure from where `mysqld' reads its `my.cnf'
or `my.ini', you can give the path as the first command-line
option to the server:
mysqld --defaults-file=your_path_to_my_cnf.
InnoDB forms the directory path to a datafile by textually catenating
innodb_data_home_dir to a datafile name or path in
innodb_data_file_path, adding a possible slash or
backslash in between if needed. If the keyword
innodb_data_home_dir is not mentioned in
`my.cnf' at all, the default for it is the
'dot' directory `./' which means the datadir of MySQL.
An advanced `my.cnf' example. Suppose you have a Linux computer with 2 GB RAM and three 60 GB hard disks (at directory paths `/', `/dr2' and `/dr3'). Below is an example of possible configuration parameters in `my.cnf' for InnoDB.
Note that InnoDB does not create directories: you
have to create them yourself. Use the Unix or MS-DOS
mkdir command to create the data and log group home directories.
[mysqld] # You can write your other MySQL server options here # ... innodb_data_home_dir = # Datafiles must be able to # hold your data and indexes innodb_data_file_path = /ibdata/ibdata1:2000M;/dr2/ibdata/ibdata2:2000M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory, but make sure on Linux # x86 total memory usage is # < 2 GB set-variable = innodb_buffer_pool_size=1G set-variable = innodb_additional_mem_pool_size=20M innodb_log_group_home_dir = /dr3/iblogs # .._log_arch_dir must be the same # as .._log_group_home_dir innodb_log_arch_dir = /dr3/iblogs set-variable = innodb_log_files_in_group=3 # Set the log file size to about # 15 % of the buffer pool size set-variable = innodb_log_file_size=150M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit to # 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1 set-variable = innodb_lock_wait_timeout=50 #innodb_flush_method=fdatasync #set-variable = innodb_thread_concurrency=5
Note that we have placed the two datafiles on different disks. InnoDB will fill the tablespace formed by the datafiles from bottom up. In some cases it will improve the performance of the database if all data is not placed on the same physical disk. Putting log files on a different disk from data is very often beneficial for performance. You can also use raw disk partitions (raw devices) as datafiles. In some Unixes they speed up I/O. See the manual section on InnoDB file space management about how to specify them in `my.cnf'.
Warning: on Linux x86 you must be careful you do not set memory usage too high. glibc will allow the process heap to grow over thread stacks, which will crash your server. It is a risk if the value of
innodb_buffer_pool_size + key_buffer + max_connections * (sort_buffer + read_buffer_size) + max_connections * 2 MB
is close to 2 GB or exceeds 2 GB. Each thread will use a stack
(often 2 MB, but in MySQL AB binaries only 256 KB) and in the worst case also
sort_buffer + read_buffer_size
additional memory.
How to tune other `mysqld' server parameters? Typical values which suit most users are:
skip-locking set-variable = max_connections=200 set-variable = read_buffer_size=1M set-variable = sort_buffer=1M # Set key_buffer to 5 - 50% # of your RAM depending on how # much you use MyISAM tables, but # keep key_buffer + InnoDB # buffer pool size < 80% of # your RAM set-variable = key_buffer=...
Note that some parameters are given using the numeric `my.cnf'
parameter format: set-variable = innodb... = 123, others
(string and boolean parameters) with another format:
innodb_... = ... .
The meanings of the configuration parameters are the following:
| Option | Description |
innodb_file_per_table | Available starting from 4.1.1. This option makes InnoDB to store each created table into its own `.ibd' file. See the section about multiple tablespaces. |
innodb_open_files | Available starting from 4.1.1. This is relevant only if you use multiple tablespaces in InnoDB. This specifies the maximum how many `.ibd' files InnoDB can keep open at one time. The minimum value for this is 10. The default is 300. Numeric `my.cnf' parameter format. |
innodb_data_home_dir |
The common part of the directory path for all InnoDB datafiles.
If you do not mentioned this option in `my.cnf'
the default is the datadir of MySQL.
You can specify this also as an empty string, in which case you
can use absolute file paths in innodb_data_file_path.
|
innodb_data_file_path |
Paths to individual datafiles and their sizes. The full directory path
to each datafile is acquired by concatenating innodb_data_home_dir to
the paths specified here. The file sizes are specified in megabytes,
hence the 'M' after the size specification above.
InnoDB also understands the abbreviation 'G', 1 G meaning 1024 MB.
Starting from
3.23.44 you can set the file-size bigger than 4 GB on those
operating systems which support big files.
On some operating systems files must be < 2 GB.
If you do not specify innodb_data_file_path, the default
behavior starting from 4.0 is to create a 10 MB auto-extending
datafile `ibdata1'.
The sum of the sizes of the files must be at least 10 MB.
|
innodb_mirrored_log_groups | Number of identical copies of log groups we keep for the database. Currently this should be set to 1. |
innodb_log_group_home_dir |
Directory path to InnoDB log files.
If you do not mentioned this option in `my.cnf'
the default is the datadir of MySQL.
|
innodb_log_files_in_group | Number of log files in the log group. InnoDB writes to the files in a circular fashion. Value 2 is recommended here. The default is 2. |
innodb_log_file_size | Size of each log file in a log group in megabytes. Sensible values range from 1M to 1/n-th of the size of the buffer pool specified below, where n is the number of log files in the group. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery will be slower in case of a crash. The combined size of log files must be less than 4 GB on 32-bit computers. The default is 5M. |
innodb_log_buffer_size | The size of the buffer which InnoDB uses to write log to the log files on disk. Sensible values range from 1M to 8M. A big log buffer allows large transactions to run without a need to write the log to disk until the transaction commit. Thus, if you have big transactions, making the log buffer big will save disk I/O. |
innodb_flush_log_at_trx_commit | Normally you set this to 1, meaning that at a transaction commit the log is flushed to disk, and the modifications made by the transaction become permanent, and survive a database crash. If you are willing to compromise this safety, and you are running small transactions, you may set this to 0 or 2 to reduce disk I/O to the logs. Value 0 means that the log is only written to the log file and the log file flushed to disk approximately once per second. Value 2 means the log is written to the log file at each commit, but the log file is only flushed to disk approximately once per second. The default value is 1 starting from MySQL-4.0.13; previously it was 0. |
innodb_log_arch_dir |
The directory where fully written log files would be archived if we used
log archiving. The value of this parameter should currently be set the
same as innodb_log_group_home_dir.
|
innodb_log_archive | This value should currently be set to 0. As recovery from a backup is done by MySQL using its own log files, there is currently no need to archive InnoDB log files. |
innodb_buffer_pool_size | The size of the memory buffer InnoDB uses to cache data and indexes of its tables. The bigger you set this the less disk I/O is needed to access data in tables. On a dedicated database server you may set this parameter up to 80% of the machine physical memory size. Do not set it too large, though, because competition of the physical memory may cause paging in the operating system. |
innodb_buffer_pool_awe_mem_mb | Size of the buffer pool in MB, if it is placed in the AWE memory of 32-bit Windows. Available starting from 4.1.0 and only relevant in 32-bit Windows. If your 32-bit Windows operating system supports > 4 GB memory, so-called Address Windowing Extensions, you can allocate the InnoDB buffer pool into the AWE physical memory using this parameter. The maximum possible value for this is 64000. If this parameter is specified, then innodb_buffer_pool_size is the window in the 32-bit address space of mysqld where InnoDB maps that AWE memory. A good value for innodb_buffer_pool_size is then 500M. |
innodb_additional_mem_pool_size | Size of a memory pool InnoDB uses to store data dictionary information and other internal data structures. A sensible value for this might be 2M, but the more tables you have in your application the more you will need to allocate here. If InnoDB runs out of memory in this pool, it will start to allocate memory from the operating system, and write warning messages to the MySQL error log. |
innodb_file_io_threads | Number of file I/O threads in InnoDB. Normally, this should be 4, but on Windows disk I/O may benefit from a larger number. |
innodb_lock_wait_timeout |
Timeout in seconds an InnoDB transaction may wait for a lock before
being rolled back. InnoDB automatically detects transaction deadlocks
in its own lock table and rolls back the transaction. If you use the
LOCK TABLES command, or other transaction-safe storage engines
than InnoDB in the same transaction, then a deadlock may arise which
InnoDB cannot notice. In cases like this the timeout is useful to
resolve the situation.
|
innodb_flush_method |
(Available from 3.23.40 up.)
The default value for this is fdatasync.
Another option is O_DSYNC.
|
innodb_force_recovery | Warning: this option should only be defined in an emergency situation when you want to dump your tables from a corrupt database! Possible values are 1 - 6. See below at section 'Forcing recovery' about the meanings of the values. As a safety measure InnoDB prevents a user from modifying data when this option is > 0. This option is available starting from version 3.23.44. |
7.5.4 Creating InnoDB Tablespace
Suppose you have installed MySQL and have edited `my.cnf' so that it contains the necessary InnoDB configuration parameters. Before starting MySQL you should check that the directories you have specified for InnoDB datafiles and log files exist and that you have access rights to those directories. InnoDB cannot create directories, only files. Check also you have enough disk space for the data and log files.
When you now start MySQL, InnoDB will start creating your datafiles and log files. InnoDB will print something like the following:
~/mysqlm/sql > mysqld InnoDB: The first specified datafile /home/heikki/data/ibdata1 did not exist: InnoDB: a new database to be created! InnoDB: Setting file /home/heikki/data/ibdata1 size to 134217728 InnoDB: Database physically writes the file full: wait... InnoDB: datafile /home/heikki/data/ibdata2 did not exist: new to be created InnoDB: Setting file /home/heikki/data/ibdata2 size to 262144000 InnoDB: Database physically writes the file full: wait... InnoDB: Log file /home/heikki/data/logs/ib_logfile0 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile0 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile1 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile1 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile2 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile2 size to 5242880 InnoDB: Started mysqld: ready for connections
A new InnoDB database has now been created. You can connect to the MySQL
server with the usual MySQL client programs like mysql.
When you shut down the MySQL server with `mysqladmin shutdown',
InnoDB output will be like the following:
010321 18:33:34 mysqld: Normal shutdown 010321 18:33:34 mysqld: Shutdown Complete InnoDB: Starting shutdown... InnoDB: Shutdown completed
You can now look at the datafiles and logs directories and you will see the files created. The log directory will also contain a small file named `ib_arch_log_0000000000'. That file resulted from the database creation, after which InnoDB switched off log archiving. When MySQL is again started, the output will be like the following:
~/mysqlm/sql > mysqld InnoDB: Started mysqld: ready for connections
7.5.4.1 If Something Goes Wrong in Database Creation
If InnoDB prints an operating system error in a file operation, usually the problem is one of the following:
- You did not create InnoDB data or log directories.
- `mysqld' does not have the rights to create files in those directories.
- `mysqld' does not read the right `my.cnf' or `my.ini' file, and consequently does not see the options you specified.
- The disk is full or a disk quota is exceeded.
- You have created a subdirectory whose name is equal to a datafile you specified.
- There is a syntax error in
innodb_data_home_dirorinnodb_data_file_path.
If something goes wrong in an InnoDB database creation, you should delete all files created by InnoDB. This means all datafiles, all log files, the small archived log file, and in the case you already did create some InnoDB tables, delete also the corresponding `.frm' files for these tables from the MySQL database directories. Then you can try the InnoDB database creation again.
7.5.5 Creating InnoDB Tables
Suppose you have started the MySQL client with the command
mysql test.
To create a table in the InnoDB format you must specify
TYPE = InnoDB in the table creation SQL command:
CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB;
This SQL command will create a table and an index on column A
into the InnoDB tablespace consisting of the datafiles you specified
in `my.cnf'. In addition MySQL will create a file
`CUSTOMER.frm' to the MySQL database directory `test'.
Internally, InnoDB will add to its own data dictionary an entry
for table 'test/CUSTOMER'. Thus you can create a table
of the same name CUSTOMER in another database of MySQL, and
the table names will not collide inside InnoDB.
You can query the amount of free space in the InnoDB tablespace
by issuing the table status command of MySQL for any table you have
created with TYPE = InnoDB. Then the amount of free
space in the tablespace appears in the table comment section in the
output of SHOW. An example:
SHOW TABLE STATUS FROM test LIKE 'CUSTOMER'
Note that the statistics SHOW gives about InnoDB tables
are only approximate: they are used in SQL optimization. Table and
index reserved sizes in bytes are accurate, though.
7.5.5.1 Converting MyISAM Tables to InnoDB
InnoDB does not have a special optimization for separate index creation.
Therefore it does not pay to export and import the table and create indexes
afterwards.
The fastest way to alter a table to InnoDB is to do the inserts
directly to an InnoDB table, that is, use ALTER TABLE ... TYPE=INNODB,
or create an empty InnoDB table with identical definitions and insert
the rows with INSERT INTO ... SELECT * FROM ....
To get better control over the insertion process, it may be good to insert big tables in pieces:
INSERT INTO newtable SELECT * FROM oldtable WHERE yourkey > something AND yourkey <= somethingelse;
After all data has been inserted you can rename the tables.
During the conversion of big tables you should set the InnoDB buffer pool size big to reduce disk I/O. Not bigger than 80% of the physical memory, though. You should set InnoDB log files big, and also the log buffer large.
Make sure you do not run out of tablespace: InnoDB tables take a lot
more space than MyISAM tables. If an ALTER TABLE runs out
of space, it will start a rollback, and that can take hours if it is
disk-bound.
In inserts InnoDB uses the insert buffer to merge secondary index records
to indexes in batches. That saves a lot of disk I/O. In rollback no such
mechanism is used, and the rollback can take 30 times longer than the
insertion.
In the case of a runaway rollback, if you do not have valuable data in your database, it is better that you kill the database process and delete all InnoDB datafiles and log files and all InnoDB table `.frm' files, and start your job again, rather than wait for millions of disk I/Os to complete.
7.5.5.2 FOREIGN KEY Constraints
Starting from version 3.23.43b InnoDB features foreign key constraints. InnoDB is the first MySQL table type which allows you to define foreign key constraints to guard the integrity of your data.
The syntax of a foreign key constraint definition in InnoDB:
[CONSTRAINT [symbol]] FOREIGN KEY (index_col_name, ...)
REFERENCES table_name (index_col_name, ...)
[ON DELETE {CASCADE | SET NULL | NO ACTION
| RESTRICT}]
[ON UPDATE {CASCADE | SET NULL | NO ACTION
| RESTRICT}]
Both tables have to be InnoDB type, in the table there must be an INDEX where the foreign key columns are listed as the FIRST columns in the same order, and in the referenced table there must be an INDEX where the referenced columns are listed as the FIRST columns in the same order. InnoDB does not auto-create indexes on foreign keys or referenced keys: you have to create them explicitly. The indexes are needed for foreign key checks to be fast and not require a table scan.
Corresponding columns in the foreign key
and the referenced key must have similar internal datatypes
inside InnoDB so that they can be compared without a type
conversion.
The size and the signedness of integer types has to be the same.
The length of string types need not be the same.
If you specify a SET NULL action, make sure you
have not declared the columns in the child table
NOT NULL.
If MySQL gives the error number 1005 from a CREATE TABLE
statement, and the error message string refers to errno 150, then
the table creation failed because a foreign key constraint was not
correctly formed.
Similarly, if an ALTER TABLE fails and it refers to errno
150, that means a foreign key definition would be incorrectly
formed for the altered table. Starting from version 4.0.13,
you can use SHOW INNODB STATUS to look at a detailed explanation
of the latest InnoDB foreign key error in the server.
Starting from version 3.23.50, InnoDB does not check foreign key constraints on those foreign key or referenced key values which contain a NULL column.
A deviation from SQL standards: if in the parent table
there are several rows which have the same referenced key value,
then InnoDB acts in foreign key checks like the other parent
rows with the same key value would not exist. For example,
if you have defined a RESTRICT type constraint, and there
is a child row with several parent rows, InnoDB does not allow
the deletion of any of those parent rows.
Starting from version 3.23.50, you can also associate the
ON DELETE CASCADE or ON DELETE SET NULL clause with
the foreign key constraint. Corresponding ON UPDATE options
are available starting from 4.0.8. If ON DELETE CASCADE is
specified, and a row in the parent table is deleted, then InnoDB
automatically deletes also all those rows in the child table
whose foreign key values are equal to the referenced key value in
the parent row. If ON DELETE SET NULL is specified, the
child rows are automatically updated so that the columns in the
foreign key are set to the SQL NULL value.
A deviation from SQL standards: if
ON UPDATE CASCADE or ON UPDATE SET NULL recurses to
update the SAME TABLE it has already updated during the cascade,
it acts like RESTRICT. This is to prevent infinite loops
resulting from cascaded updates. A self-referential ON DELETE
SET NULL, on the other hand, works starting from 4.0.13.
A self-referential ON DELETE CASCADE has always worked.
An example:
CREATE TABLE parent(id INT NOT NULL, PRIMARY KEY (id)) TYPE=INNODB;
CREATE TABLE child(id INT, parent_id INT, INDEX par_ind (parent_id),
FOREIGN KEY (parent_id) REFERENCES parent(id)
ON DELETE SET NULL
) TYPE=INNODB;
A complex example:
CREATE TABLE product (category INT NOT NULL, id INT NOT NULL,
price DECIMAL,
PRIMARY KEY(category, id)) TYPE=INNODB;
CREATE TABLE customer (id INT NOT NULL,
PRIMARY KEY (id)) TYPE=INNODB;
CREATE TABLE product_order (no INT NOT NULL AUTO_INCREMENT,
product_category INT NOT NULL,
product_id INT NOT NULL,
customer_id INT NOT NULL,
PRIMARY KEY(no),
INDEX (product_category, product_id),
FOREIGN KEY (product_category, product_id)
REFERENCES product(category, id)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (customer_id),
FOREIGN KEY (customer_id)
REFERENCES customer(id)) TYPE=INNODB;
Starting from version 3.23.50, InnoDB allows you to add a new foreign key constraint to a table through
ALTER TABLE yourtablename ADD [CONSTRAINT [symbol]] FOREIGN KEY (...) REFERENCES anothertablename(...) [on_delete_and_on_update_actions]
Remember to create the required indexes first, though.
Starting from version 4.0.13, InnoDB supports
ALTER TABLE yourtablename DROP FOREIGN KEY internally_generated_foreign_key_id
You have to use SHOW CREATE TABLE to determine the internally
generated foreign key ID when you want to drop a foreign key.
In InnoDB versions < 3.23.50 ALTER TABLE
or CREATE INDEX
should not be used in connection with tables which have foreign
key constraints or which are referenced in foreign key constraints:
Any ALTER TABLE removes all foreign key
constraints defined for the table. You should not use
ALTER TABLE to the referenced table either, but
use DROP TABLE and CREATE TABLE to modify the
schema. When MySQL does an ALTER TABLE it may internally
use RENAME TABLE, and that will confuse the
foreign key costraints which refer to the table.
A CREATE INDEX statement is in MySQL
processed as an ALTER TABLE, and these
restrictions apply also to it.
When doing foreign key checks, InnoDB sets shared row level locks on child or parent records it has to look at. InnoDB checks foreign key constraints immediately: the check is not deferred to transaction commit.
If you want to ignore foreign key constraints during, for example for a
LOAD DATA operation, you can do SET FOREIGN_KEY_CHECKS=0.
InnoDB allows you to drop any table even though that would break the foreign key constraints which reference the table. When you drop a table the constraints which were defined in its create statement are also dropped.
If you re-create a table which was dropped, it has to have a definition which conforms to the foreign key constraints referencing it. It must have the right column names and types, and it must have indexes on the referenced keys, as stated above. If these are not satisfied, MySQL returns error number 1005 and refers to errno 150 in the error message string.
Starting from version 3.23.50 InnoDB returns the foreign key definitions of a table when you call
SHOW CREATE TABLE yourtablename
Then also `mysqldump' produces correct definitions of tables to the dump file, and does not forget about the foreign keys.
You can also list the foreign key constraints for a table
T with
SHOW TABLE STATUS FROM yourdatabasename LIKE 'T'
The foreign key constraints are listed in the table comment of the output.
7.5.5.3 Multiple tablespaces - putting each table into its own .ibd file
IMPORTANT NOTE: if you upgrade to InnoDB-4.1.1, you cannot downgrade any more! That is because earlier versions of InnoDB are not aware of multiple tablespaces.
Starting from MySQL-4.1.1, you can now store each InnoDB table and its indexes into its own file. This feature is called multiple tablespaces, because then each table is stored into its own tablespace.
You can enable this feature by putting the line
innodb_file_per_table
in the [mysqld] section of `my.cnf'. Then InnoDB stores each
table into its own file `tablename.ibd' in the database directory where
the table belongs. This is like MyISAM does, but MyISAM divides the table
into a data file `tablename.MYD' and the index file
`tablename.MYI'. For InnoDB, both the data and the indexes are in the
`.ibd' file.
If you remove the line innodb_file_per_table from `my.cnf',
then InnoDB creates tables inside the `ibdata' files again. The old
tables you had in the `ibdata' files before an upgrade to >= 4.1.1
remain there, they are not converted into `.ibd' files.
InnoDB always needs the system tablespace, `.ibd' files are not enough. The system tablespace consists of the familiar `ibdata' files. InnoDB puts there its internal data dictionary and undo logs.
You CANNOT FREELY MOVE .ibd files around, like you can MyISAM tables. This is because the table definition is stored in the InnoDB system tablespace, and also because InnoDB must preserve the consistency of transaction id's and log sequence numbers.
You can move an `.ibd' file and the associated table from a database
to another (within the same MySQL/InnoDB installation) with the familiar
RENAME command:
RENAME TABLE olddatabasename.tablename TO newdatabasename.tablename;
If you have a clean backup of an `.ibd' file taken from the SAME MySQL/InnoDB installation, you can restore it to an InnoDB database with the commands:
ALTER TABLE tablename DISCARD TABLESPACE; /* CAUTION: deletes the current .ibd file! */ <put the backup .ibd file to the proper place> ALTER TABLE tablename IMPORT TABLESPACE;
Clean in this context means:
- There are no uncommitted modifications by transactions in the `.ibd' file.
- There are no unmerged insert buffer entries to the `.ibd' file.
- Purge has removed all delete-marked index records from the `.ibd' file.
- `mysqld' has flushed all modified pages of the `.ibd' file from the buffer pool to the file.
You can make such a clean backup `.ibd' file with the following method.
- Stop all activity from the `mysqld' server and commit all transactions.
-
Wait that
SHOW INNODB STATUS\Gshows that there are no active transactions in the database, and the main thread of InnoDB isWaiting for server activity. Then you can take a copy of the `.ibd' file.
Another (non-free) method to make such a clean `.ibd' file is to
- Use InnoDB Hot Backup to backup the InnoDB installation.
- Start a second `mysqld' server on the backup and let it clean up the `.ibd' files in the backup.
It is in the TODO to allow moving clean `.ibd' files also to another MySQL/InnoDB installation. That requires resetting of trx id's and log sequence numbers in the `.ibd' file.
7.5.6 Adding and Removing InnoDB Data and Log Files
From version 3.23.50 and 4.0.2 you can specify the last InnoDB datafile
to autoextend. Alternatively, you can increase to your tablespace
by specifying an additional datafile. To do this you have to shut down
the MySQL server, edit the `my.cnf' file adding a new datafile
to the end of innodb_data_file_path, and then start
the MySQL server again.
Currently you cannot remove a datafile from InnoDB. To decrease the size of your database you have to use `mysqldump' to dump all your tables, create a new database, and import your tables to the new database.
If you want to change the number or the size of your InnoDB log files, you have to shut down MySQL and make sure that it shuts down without errors. Then copy the old log files into a safe place just in case something went wrong in the shutdown and you will need them to recover the database. Delete then the old log files from the log file directory, edit `my.cnf', and start MySQL again. InnoDB will tell you at the startup that it is creating new log files.
7.5.7 Backing up and Recovering an InnoDB Database
The key to safe database management is taking regular backups.
InnoDB Hot Backup is an online backup tool you can use to backup your InnoDB database while it is running. InnoDB Hot Backup does not require you to shut down your database and it does not set any locks or disturb your normal database processing. InnoDB Hot Backup is a non-free additional tool which is not included in the standard MySQL distribution. See the InnoDB Hot Backup homepage http://www.innodb.com/manual.php for detailed information and screenshots.
If you are able to shut down your MySQL server, then to take a 'binary' backup of your database you have to do the following:
- Shut down your MySQL database and make sure it shuts down without errors.
- Copy all your datafiles into a safe place.
- Copy all your InnoDB log files to a safe place.
- Copy your `my.cnf' configuration file or files to a safe place.
- Copy all the `.frm' files for your InnoDB tables into a safe place.
In addition to taking the binary backups described above, you should also regularly take dumps of your tables with `mysqldump'. The reason to this is that a binary file may be corrupted without you noticing it. Dumped tables are stored into text files which are human-readable and much simpler than database binary files. Seeing table corruption from dumped files is easier, and since their format is simpler, the chance for serious data corruption in them is smaller.
A good idea is to take the dumps at the same time you take a binary backup of your database. You have to shut out all clients from your database to get a consistent snapshot of all your tables into your dumps. Then you can take the binary backup, and you will then have a consistent snapshot of your database in two formats.
To be able to recover your InnoDB database to the present from the binary backup described above, you have to run your MySQL database with the general logging and log archiving of MySQL switched on. Here by the general logging we mean the logging mechanism of the MySQL server which is independent of InnoDB logs.
To recover from a crash of your MySQL server process, the only thing you have to do is to restart it. InnoDB will automatically check the logs and perform a roll-forward of the database to the present. InnoDB will automatically roll back uncommitted transactions which were present at the time of the crash. During recovery, InnoDB will print out something like the following:
~/mysqlm/sql > mysqld InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 13674004 InnoDB: Doing recovery: scanned up to log sequence number 0 13739520 InnoDB: Doing recovery: scanned up to log sequence number 0 13805056 InnoDB: Doing recovery: scanned up to log sequence number 0 13870592 InnoDB: Doing recovery: scanned up to log sequence number 0 13936128 ... InnoDB: Doing recovery: scanned up to log sequence number 0 20555264 InnoDB: Doing recovery: scanned up to log sequence number 0 20620800 InnoDB: Doing recovery: scanned up to log sequence number 0 20664692 InnoDB: 1 uncommitted transaction(s) which must be rolled back InnoDB: Starting rollback of uncommitted transactions InnoDB: Rolling back trx no 16745 InnoDB: Rolling back of trx no 16745 completed InnoDB: Rollback of uncommitted transactions completed InnoDB: Starting an apply batch of log records to the database... InnoDB: Apply batch completed InnoDB: Started mysqld: ready for connections
If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup which is not corrupted. From a backup do the recovery from the general log files of MySQL according to instructions in the MySQL manual.
7.5.7.1 Forcing recovery
If there is database page corruption, you may want to dump
your tables from the database with SELECT INTO OUTFILE,
and usually most of the data
is intact and correct. But the corruption may cause
SELECT * FROM table, or InnoDB
background operations to crash or assert, or even the InnoDB
roll-forward recovery to crash. Starting from the InnoDB
version 3.23.44, there is a `my.cnf' option with which
you can force
InnoDB to start up, and you can also prevent background operations
from running, so that
you will be able to dump your tables. For example, you can set
set-variable = innodb_force_recovery = 4
in `my.cnf'.
The alternatives for innodb_force_recovery are listed below.
The database must not otherwise be used with these options!
As a safety measure InnoDB prevents a user from doing INSERT,
UPDATE, or DELETE when this option is > 0.
Starting from version 3.23.53 and 4.0.4, you are allowed to
DROP or CREATE a table even if
forced recovery is used. If you know that a certain table is causing
a crash in rollback, you can drop it.
You can use this also to stop a runaway rollback
caused by a failing mass import or ALTER TABLE.
You can kill the mysqld process and
use the `my.cnf' option innodb_force_recovery=3
to bring your database up without the rollback.
Then DROP the table which is causing the runaway rollback.
A bigger number below means that all precautions of lower numbers are included. If you are able to dump your tables with an option at most 4, then you are relatively safe that only some data on corrupt individual pages is lost. Option 6 is more dramatic, because database pages are left in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures.
- 1 (SRV_FORCE_IGNORE_CORRUPT) let the server run even if it detects a
corrupt page; try to make
SELECT * FROM tablejump over corrupt index records and pages, which helps in dumping tables; - 2 (SRV_FORCE_NO_BACKGROUND) prevent the main thread from running: if a crash would occur in purge, this prevents it;
- 3 (SRV_FORCE_NO_TRX_UNDO) do not run transaction rollbacks after recovery;
- 4 (SRV_FORCE_NO_IBUF_MERGE) prevent also insert buffer merge operations: if they would cause a crash, better not do them; do not calculate table statistics;
- 5 (SRV_FORCE_NO_UNDO_LOG_SCAN) do not look at undo logs when starting the database: InnoDB will treat even incomplete transactions as committed;
- 6 (SRV_FORCE_NO_LOG_REDO) do not do the log roll-forward in connection with recovery.
7.5.7.2 Checkpoints
InnoDB implements a checkpoint mechanism called a fuzzy checkpoint. InnoDB will flush modified database pages from the buffer pool in small batches, there is no need to flush the buffer pool in one single batch, which would in practice stop processing of user SQL statements for a while.
In crash recovery InnoDB looks for a checkpoint label written to the log files. It knows that all modifications to the database before the label are already present on the disk image of the database. Then InnoDB scans the log files forward from the place of the checkpoint applying the logged modifications to the database.
InnoDB writes to the log files in a circular fashion. All committed modifications which make the database pages in the buffer pool different from the images on disk must be available in the log files in case InnoDB has to do a recovery. This means that when InnoDB starts to reuse a log file in the circular fashion, it has to make sure that the database page images on disk already contain the modifications logged in the log file InnoDB is going to reuse. In other words, InnoDB has to make a checkpoint and often this involves flushing of modified database pages to disk.
The above explains why making your log files very big may save disk I/O in checkpointing. It can make sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback in big log files is that crash recovery can last longer because there will be more log to apply to the database.
7.5.8 Moving an InnoDB Database to Another Machine
On Windows InnoDB stores the database names and table names internally always in lower case. To move databases in a binary format from Unix to Windows or from Windows to Unix you should have all table and database names in lower case. A convenient way to accomplish this is to add on Unix the line
set-variable=lower_case_table_names=1
to the [mysqld] section of your `my.cnf' before you start
creating your tables. On Windows the setting 1 is the default.
InnoDB data and log files are binary-compatible on all platforms
if the floating-point number format on the machines is the same.
You can move an InnoDB database simply by copying all the relevant
files, which we already listed in the previous section on backing up
a database. If the floating-point formats on the machines are
different but you have not used FLOAT or DOUBLE
datatypes in your tables then the procedure is the same: just copy
the relevant files. If the formats are different and your tables
contain floating-point data, you have to use `mysqldump'
and `mysqlimport' to move those tables.
A performance tip is to switch off auto-commit mode when you import data into your database, assuming your tablespace has enough space for the big rollback segment the big import transaction will generate. Do the commit only after importing a whole table or a segment of a table.
7.5.9 InnoDB Transaction Model and Locking
In the InnoDB transaction model the goal has been to combine the best
properties of a multi-versioning database to traditional two-phase locking.
InnoDB does locking on row level and runs queries by default
as non-locking consistent reads, in the style of Oracle.
The lock table in InnoDB is stored so space-efficiently that lock
escalation is not needed: typically several users are allowed
to lock every row in the database, or any random subset of the rows,
without InnoDB running out of memory.
In InnoDB all user activity happens inside transactions. If the
autocommit mode is used in MySQL, then each SQL statement
forms a single transaction.
MySQL always starts a new connection with the autocommit
mode switched on.
If the autocommit mode is
switched off with SET AUTOCOMMIT = 0,
then we can think that a user always has a transaction
open. If he issues
the SQL COMMIT or ROLLBACK statement,
it ends the current transaction, and a new one starts. Both statements
will release all InnoDB locks that were set during the
current transaction. A COMMIT means that the
changes made in the current transaction are made permanent
and become visible to other users. A ROLLBACK statement,
on the other hand, cancels all modifications made by the current
transaction.
If the connection has AUTOCOMMIT = 1, then the user
can still perform a multi-statement transaction by starting it with
START TRANSACTION or
BEGIN and ending it with COMMIT
or ROLLBACK.
7.5.9.1 InnoDB and SET ... TRANSACTION ISOLATION LEVEL ...
In terms of the SQL-92 transaction isolation levels,
the InnoDB default is REPEATABLE READ.
Starting from version 4.0.5, InnoDB offers all 4 different
transaction isolation levels described by the SQL-92 standard.
You can set the default isolation level for all connections
in the [mysqld] section of `my.cnf':
transaction-isolation = {READ-UNCOMMITTED | READ-COMMITTED
| REPEATABLE-READ | SERIALIZABLE}
A user can change the isolation level of a single session or
all new incoming connections with the SET TRANSACTION
statement. Its syntax is as follows:
SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL
{READ UNCOMMITTED | READ COMMITTED
| REPEATABLE READ | SERIALIZABLE}
Note that there are no hyphens in level names in the SQL syntax.
The default behavior is to set the isolation level for the next (not
started) transaction. If you use the GLOBAL keyword, the statement
sets the default transaction level globally for all new connections
created from that point on (but not existing connections).
You need the SUPER privilege to do this. Using the SESSION
keyword sets the default transaction level for all future transactions
performed on the current connection. Any client is free to change the
session isolation level (even in the middle of a transaction), or the
isolation level for the next transaction.
In versions earlier than 3.23.50, SET TRANSACTION had no effect
on InnoDB tables. In versions < 4.0.5 only REPEATABLE READ
and SERIALIZABLE were available.
You can query the global and session transaction isolation levels with:
SELECT @@global.tx_isolation; SELECT @@tx_isolation;
In row level locking InnoDB uses so-called next-key locking.
That means that besides index records, InnoDB can also lock
the ``gap'' before an index record to block insertions by other users
immediately before the index record. A next-key lock means
a lock which locks an index record and the gap before it.
A gap lock means a lock which only locks a gap before some
index record.
A detailed description of each isolation level in InnoDB:
-
READ UNCOMMITTEDThis is also called ``dirty read'': non-lockingSELECTstatements are performed so that we do not look at a possible earlier version of a record; thus they are not 'consistent' reads under this isolation level; otherwise this level works likeREAD COMMITTED. -
READ COMMITTEDSomewhat Oracle-like isolation level. AllSELECT ... FOR UPDATEandSELECT ... LOCK IN SHARE MODEstatements only lock the index records, not the gaps before them, and thus allow free inserting of new records next to locked records.UPDATEandDELETEwhich use a unique index with a unique search condition, only lock the index record found, not the gap before it. But still in range typeUPDATEandDELETE,InnoDBmust set next-key or gap locks and block insertions by other users to the gaps covered by the range. This is necessary since ``phantom rows'' have to be blocked for MySQL replication and recovery to work. Consistent reads behave as in Oracle: each consistent read, even within the same transaction, sets and reads its own fresh snapshot. -
REPEATABLE READThis is the default isolation level ofInnoDB.SELECT ... FOR UPDATE,SELECT ... LOCK IN SHARE MODE,UPDATE, andDELETEwhich use a unique index with a unique search condition, only lock the index record found, not the gap before it. Otherwise these operations employ next-key locking, locking the index range scanned with next-key or gap locks, and block new insertions by other users. In consistent reads there is an important difference from the previous isolation level: in this level all consistent reads within the same transaction read the same snapshot established by the first read. This convention means that if you issue several plainSELECTstatements within the same transaction, theseSELECTstatements are consistent also with respect to each other. -
SERIALIZABLEThis level is like the previous one, but all plainSELECTstatements are implicitly converted toSELECT ... LOCK IN SHARE MODE.
7.5.9.2 Consistent Non-Locking Read
A consistent read means that InnoDB uses its multi-versioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself which issues the query.
If you are running with the default REPEATABLE READ isolation level,
then all consistent reads within the same transaction read the snapshot
established by the first such read in that transaction. You can get a
fresher snapshot for your queries by committing the current transaction
and after that issuing new queries.
Consistent read is the default mode in which InnoDB processes
SELECT statements in READ COMMITTED and
REPEATABLE READ isolation levels. A consistent read
does not set any locks on the tables it accesses, and
therefore other users are free to modify those tables at
the same time a consistent read is being performed on the table.
7.5.9.3 Locking Reads SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE
A consistent read is not convenient in some circumstances.
Suppose you want to add a new row into your table CHILD,
and make sure that the child already has a parent in table
PARENT.
Suppose you use a consistent read to read the table PARENT
and indeed see the parent of the child in the table. Can you now safely
add the child row to table CHILD? No, because it may
happen that meanwhile some other user has deleted the parent row
from the table PARENT, and you are not aware of that.
The solution is to perform the SELECT in a locking
mode, LOCK IN SHARE MODE.
SELECT * FROM PARENT WHERE NAME = 'Jones' LOCK IN SHARE MODE;
Performing a read in share mode means that we read the latest
available data, and set a shared mode lock on the rows we read.
If the latest data belongs to a yet uncommitted transaction of another
user, we will wait until that transaction commits.
A shared mode lock prevents others from updating or deleting
the row we have read. After we see that the above query returns
the parent 'Jones', we can safely add his child
to table CHILD, and commit our transaction.
This example shows how to implement referential
integrity in your application code.
Let us look at another example: we have an integer counter field in
a table CHILD_CODES which we use to assign
a unique identifier to each child we add to table CHILD.
Obviously, using a consistent read or a shared mode read
to read the present value of the counter is not a good idea, since
then two users of the database may see the same value for the
counter, and we will get a duplicate key error when we add
the two children with the same identifier to the table.
In this case there are two good ways to implement the
reading and incrementing of the counter: (1) update the counter
first by incrementing it by 1 and only after that read it,
or (2) read the counter first with
a lock mode FOR UPDATE, and increment after that:
SELECT COUNTER_FIELD FROM CHILD_CODES FOR UPDATE; UPDATE CHILD_CODES SET COUNTER_FIELD = COUNTER_FIELD + 1;
A SELECT ... FOR UPDATE will read the latest
available data setting exclusive locks on each row it reads.
Thus it sets the same locks a searched SQL UPDATE would set
on the rows.
7.5.9.4 Next-key Locking: Avoiding the Phantom Problem
In row level locking InnoDB uses an algorithm called next-key locking. InnoDB does the row level locking so that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records it encounters. Thus the row level locks are more precisely called index record locks.
The locks InnoDB sets on index records also affect the 'gap'
before that index record. If a user has a shared or exclusive
lock on record R in an index, then another user cannot insert
a new index record immediately before R in the index order.
This locking of gaps is done to prevent the so-called phantom
problem. Suppose I want to read and lock all children with identifier
bigger than 100 from table CHILD,
and update some field in the selected rows.
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
Suppose there is an index on table CHILD on column
ID. Our query will scan that index starting from
the first record where ID is bigger than 100.
Now, if the locks set on the index records would not lock out
inserts made in the gaps, a new child might meanwhile be
inserted to the table. If now I in my transaction execute
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
again, I will see a new child in the result set the query returns. This is against the isolation principle of transactions: a transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, then the new 'phantom' child would break this isolation principle.
When InnoDB scans an index it can also lock the gap
after the last record in the index. Just that happens in the previous
example: the locks set by InnoDB will prevent any insert to
the table where ID would be bigger than 100.
You can use next-key locking to implement a uniqueness check in your application: if you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to 'lock' the non-existence of something in your table.
7.5.9.5 Locks Set by Different SQL Statements in InnoDB
-
SELECT ... FROM ...: this is a consistent read, reading a snapshot of the database and setting no locks. -
SELECT ... FROM ... LOCK IN SHARE MODE: sets shared next-key locks on all index records the read encounters. -
SELECT ... FROM ... FOR UPDATE: sets exclusive next-key locks on all index records the read encounters. -
INSERT INTO ... VALUES (...): sets an exclusive lock on the inserted row; note that this lock is not a next-key lock and does not prevent other users from inserting to the gap before the inserted row. If a duplicate key error occurs, sets a shared lock on the duplicate index record. -
INSERT INTO T SELECT ... FROM S WHERE ...sets an exclusive (non-next-key) lock on each row inserted intoT. Does the search onSas a consistent read, but sets shared next-key locks onSif the MySQL logging is on.InnoDBhas to set locks in the latter case because in roll-forward recovery from a backup every SQL statement has to be executed in exactly the same way as it was done originally. -
CREATE TABLE ... SELECT ...performs theSELECTas a consistent read or with shared locks, like in the previous item. -
REPLACEis done like an insert if there is no collision on a unique key. Otherwise, an exclusive next-key lock is placed on the row which has to be updated. -
UPDATE ... SET ... WHERE ...: sets an exclusive next-key lock on every record the search encounters. -
DELETE FROM ... WHERE ...: sets an exclusive next-key lock on every record the search encounters. -
If a
FOREIGN KEYconstraint is defined on a table, any insert, update, or delete which requires checking of the constraint condition sets shared record level locks on the records it looks at to check the constraint. Also in the case where the constraint fails,InnoDBsets these locks. -
LOCK TABLES ...: sets table locks. In the implementation the MySQL layer of code sets these locks. The automatic deadlock detection ofInnoDBcannot detect deadlocks where such table locks are involved: see the following section. Also, since MySQL does know about row level locks, it is possible that you get a table lock on a table where another user currently has row level locks. But that does not put transaction integrity into danger. See section 7.5.15 Restrictions on InnoDB Tables.
7.5.9.6 Deadlock Detection and Rollback
InnoDB automatically detects a deadlock of transactions and rolls back a
transaction or transactions to prevent the deadlock. Starting from
version 4.0.5, InnoDB will try to pick small transactions to roll
back. The size of a transaction is determined by the number of rows
it has inserted, updated, or deleted. Previous to 4.0.5, InnoDB
always rolled back the transaction whose lock request was the last
one to build a deadlock, that is, a cycle in the waits-for graph
of transactions.
InnoDB cannot detect deadlocks where a lock set by a MySQL
LOCK TABLES statement is involved, or if a lock set
in another storage engine than InnoDB is involved. You have to resolve
these situations using innodb_lock_wait_timeout set in
`my.cnf'.
When InnoDB performs a complete rollback of a transaction, all the
locks of the transaction are released. However, if just a single SQL
statement is rolled back as a result of an error, some of the locks
set by the SQL statement may be preserved. This is because InnoDB
stores row locks in a format where it cannot afterwards know which was
set by which SQL statement.
7.5.9.7 An Example of How the Consistent Read Works in InnoDB
Suppose you are running on the default REPEATABLE READ isolation level.
When you issue a consistent read, that is, an ordinary SELECT
statement, InnoDB will give your transaction a timepoint according
to which your query sees the database. Thus, if transaction B deletes
a row and commits after your timepoint was assigned, then you will
not see the row deleted. Similarly with inserts and updates.
You can advance your timepoint by committing your transaction
and then doing another SELECT.
This is called multi-versioned concurrency control.
User A User B
SET AUTOCOMMIT=0; SET AUTOCOMMIT=0;
time
| SELECT * FROM t;
| empty set
| INSERT INTO t VALUES (1, 2);
|
v SELECT * FROM t;
empty set
COMMIT;
SELECT * FROM t;
empty set;
COMMIT;
SELECT * FROM t;
---------------------
| 1 | 2 |
---------------------
Thus user A sees the row inserted by B only when B has committed the insert, and A has committed his own transaction so that the timepoint is advanced past the commit of B.
If you want to see the ``freshest'' state of the database, you should use a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
7.5.9.8 How to Cope With Deadlocks
Deadlocks are a classic problem in transactional databases, but they are not dangerous, unless they are so frequent that you cannot run certain transactions at all. Normally you have to write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
InnoDB uses automatic row level locking. You can get deadlocks
even in the case of transactions which just insert or delete a
single row. That is because these operations are not really 'atomic':
they automatically set locks on the (possibly several) index
records of the row inserted/deleted.
You can cope with deadlocks and reduce the number of them with the following tricks:
-
Use
SHOW INNODB STATUSin MySQL versions >= 3.23.52 and >= 4.0.3 to determine the cause of the latest deadlock. That can help you to tune your application to avoid deadlocks. - Always be prepared to re-issue a transaction if it fails in a deadlock. Deadlocks are not dangerous. Just try again.
- Commit your transactions often. Small transactions are less prone to collide.
-
If you are using locking reads
SELECT ... FOR UPDATEor... LOCK IN SHARE MODE, try using a lower isolation levelREAD COMMITTED. - Access your tables and rows in a fixed order. Then transactions will form nice queues, and do not deadlock.
-
Add well-chosen indexes to your tables. Then your queries need to
scan fewer index records and consequently set fewer locks.
Use
EXPLAIN SELECTto determine that MySQL picks appropriate indexes for your queries. -
Use less locking: if you can afford a
SELECTto return data from an old snapshot, do not add the clauseFOR UPDATEorLOCK IN SHARE MODEto it. UsingREAD COMMITTEDisolation level is good here, because each consistent read within the same transaction reads from its own fresh snapshot. -
If nothing helps, serialize your transactions with table
level locks:
LOCK TABLES t1 WRITE, t2 READ, ... ; [do something with tables t1 and t2 here]; UNLOCK TABLES. Table level locks make you transactions to queue nicely, and deadlocks are avoided. Note thatLOCK TABLESimplicitly starts a transaction, just like the commandBEGIN, andUNLOCK TABLESimplicitly ends the transaction in aCOMMIT. -
Another solution to serialize transactions is to create
an auxiliary 'semaphore' table where there is just a
single row. Each transaction updates that row before
accessing other tables. In that way all transactions
happen in a serial fashion. Note that then also the
InnoDBinstant deadlock detection algorithm works, because the serializing lock is a row level lock. In MySQL table level locks we have to resort to the timeout method to resolve a deadlock.
7.5.10 Performance Tuning Tips
- If the Unix `top' or the Windows `Task Manager' shows that the CPU usage percentage with your workload is less than 70%, your workload is probably disk-bound. Maybe you are making too many transaction commits, or the buffer pool is too small. Making the buffer pool bigger can help, but do not set it bigger than 80% of physical memory.
-
Wrap several modifications into one transaction.
InnoDBmust flush the log to disk at each transaction commit, if that transaction made modifications to the database. Since the rotation speed of a disk is typically at most 167 revolutions/second, that constrains the number of commits to the same 167/second if the disk does not fool the operating system. -
If you can afford the loss of some latest committed transactions, you can
set the `my.cnf' parameter
innodb_flush_log_at_trx_committo 0.InnoDBtries to flush the log once per second anyway, though the flush is not guaranteed. -
Make your log files big, even as big as the buffer pool. When
InnoDBhas written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files will cause many unnecessary disk writes. The drawback in big log files is that recovery time will be longer. - Also the log buffer should be quite big, say 8 MB.
-
(Relevant from 3.23.39 up.)
In some versions of Linux and Unix, flushing files to disk with the Unix
fdatasyncand other similar methods is surprisingly slow. The default methodInnoDBuses is thefdatasyncfunction. If you are not satisfied with the database write performance, you may try settinginnodb_flush_methodin `my.cnf' toO_DSYNC, thoughO_DSYNCseems to be slower on most systems. -
In importing data to
InnoDB, make sure that MySQL does not haveautocommit=1on. Then every insert requires a log flush to disk. Put before your plain SQL import file lineSET AUTOCOMMIT=0;
and after itCOMMIT;
If you use the `mysqldump' option--opt, you will get dump files which are fast to import also to anInnoDBtable, even without wrapping them to the aboveSET AUTOCOMMIT=0; ... COMMIT;wrappers. -
Beware of big rollbacks of mass inserts:
InnoDBuses the insert buffer to save disk I/O in inserts, but in a corresponding rollback no such mechanism is used. A disk-bound rollback can take 30 times the time of the corresponding insert. Killing the database process will not help because the rollback will start again at the database startup. The only way to get rid of a runaway rollback is to increase the buffer pool so that the rollback becomes CPU-bound and runs fast, or delete the wholeInnoDBdatabase. -
Beware also of other big disk-bound operations.
Use
DROP TABLEorTRUNCATE(from MySQL-4.0 up) to empty a table, notDELETE FROM yourtable. -
Use the multi-line
INSERTto reduce communication overhead between the client and the server if you need to insert many rows:INSERT INTO yourtable VALUES (1, 2), (5, 5);
This tip is of course valid for inserts into any table type, not justInnoDB.
7.5.10.1 SHOW INNODB STATUS and the InnoDB Monitors
Starting from version 3.23.42, InnoDB includes InnoDB
Monitors that print information about the InnoDB internal state.
Starting from versions 3.23.52 and 4.0.3 you can use the SQL
command SHOW INNODB STATUS
to fetch the output of the standard InnoDB Monitor to the SQL client.
The data is useful in performance tuning. If you are using the
`mysql' interactive SQL client, the output is more readable
if you replace the usual semicolon statement terminator by \G:
SHOW INNODB STATUS\G
Another way to use InnoDB Monitors is to let them continuosly
write data to the standard output of the server `mysqld'
(note: the MySQL client will not print anything).
When switched on, InnoDB Monitors print data
about once every 15 seconds. If you run `mysqld'
as a daemon then this output is usually directed to
the `.err' log in the MySQL datadir.
This data is useful in performance tuning.
On Windows you must start mysqld-max
from an MS-DOS prompt with the --console
option if you want to direct the output to the MS-DOS prompt
window.
There is a separate innodb_lock_monitor which
prints the same information as innodb_monitor
plus information on locks set by each transaction.
The printed information includes data on:
- lock waits of a transactions,
- semaphore waits of threads,
- pending file I/O requests,
- buffer pool statistics, and
-
purge and insert buffer merge activity of the main thread
of
InnoDB.
You can start InnoDB Monitor through the following SQL command:
CREATE TABLE innodb_monitor(a INT) type = innodb;
and stop it by
DROP TABLE innodb_monitor;
The CREATE TABLE syntax is just a way to pass a command
to the InnoDB engine through the MySQL SQL parser: the created
table is not relevant at all for InnoDB Monitor. If you shut down
the database when the monitor is running, and you want to start
the monitor again, you have to drop the
table before you can issue a new CREATE TABLE
to start the monitor.
This syntax may change in a future release.
A sample output of the InnoDB Monitor:
================================ 010809 18:45:06 INNODB MONITOR OUTPUT ================================ -------------------------- LOCKS HELD BY TRANSACTIONS -------------------------- LOCK INFO: Number of locks in the record hash table 1294 LOCKS FOR TRANSACTION ID 0 579342744 TABLE LOCK table test/mytable trx id 0 582333343 lock_mode IX RECORD LOCKS space id 0 page no 12758 n bits 104 table test/mytable index PRIMARY trx id 0 582333343 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 74; 1-byte offs FALSE; info bits 0 0: len 4; hex 0001a801; asc ;; 1: len 6; hex 000022b5b39f; asc ";; 2: len 7; hex 000002001e03ec; asc ;; 3: len 4; hex 00000001; ... ----------------------------------------------- CURRENT SEMAPHORES RESERVED AND SEMAPHORE WAITS ----------------------------------------------- SYNC INFO: Sorry, cannot give mutex list info in non-debug version! Sorry, cannot give rw-lock list info in non-debug version! ----------------------------------------------------- SYNC ARRAY INFO: reservation count 6041054, signal count 2913432 4a239430 waited for by thread 49627477 op. S-LOCK file NOT KNOWN line 0 Mut ex 0 sp 5530989 r 62038708 sys 2155035; rws 0 8257574 8025336; rwx 0 1121090 1848344 ----------------------------------------------------- CURRENT PENDING FILE I/O'S -------------------------- Pending normal aio reads: Reserved slot, messages 40157658 4a4a40b8 Reserved slot, messages 40157658 4a477e28 ... Reserved slot, messages 40157658 4a4424a8 Reserved slot, messages 40157658 4a39ea38 Total of 36 reserved aio slots Pending aio writes: Total of 0 reserved aio slots Pending insert buffer aio reads: Total of 0 reserved aio slots Pending log writes or reads: Reserved slot, messages 40158c98 40157f98 Total of 1 reserved aio slots Pending synchronous reads or writes: Total of 0 reserved aio slots ----------- BUFFER POOL ----------- LRU list length 8034 Free list length 0 Flush list length 999 Buffer pool size in pages 8192 Pending reads 39 Pending writes: LRU 0, flush list 0, single page 0 Pages read 31383918, created 51310, written 2985115 ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 010809 18:45:22 InnoDB starts purge 010809 18:45:22 InnoDB purged 0 pages
Some notes on the output:
- If the section LOCKS HELD BY TRANSACTIONS reports lock waits, then your application may have lock contention. The output can also help to trace reasons for transaction deadlocks.
-
Section SYNC INFO will report reserved semaphores
if you compile InnoDB with
UNIV_SYNC_DEBUGdefined in `univ.i'. - Section SYNC ARRAY INFO reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A big number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries, or problems in operating system thread scheduling.
- Section CURRENT PENDING FILE I/O'S lists pending file I/O requests. A large number of these indicates that the workload is disk I/O-bound.
- Section BUFFER POOL gives you statistics on pages read and written. You can calculate from these numbers how many datafile I/Os your queries are currently doing.
7.5.11 Implementation of Multi-versioning
Since InnoDB is a multi-versioned database, it must keep information of old versions of rows in the tablespace. This information is stored in a data structure we call a rollback segment after an analogous data structure in Oracle.
InnoDB internally adds two fields to each row stored in the database. A 6-byte field tells the transaction identifier for the last transaction which inserted or updated the row. Also a deletion is internally treated as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, then the undo log record contains the information necessary to rebuild the content of the row before it was updated.
InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are only needed in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, and they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
You must remember to commit your transactions regularly, also those transactions which only issue consistent reads. Otherwise InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.
In our multi-versioning scheme a row is not physically removed from the database immediately when you delete it with an SQL statement. Only when InnoDB can discard the update undo log record written for the deletion, it can also physically remove the corresponding row and its index records from the database. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement which did the deletion.
7.5.12 Table and Index Structures
MySQL stores its data dictionary information of tables
in `.frm'
files in database directories. But every InnoDB type table
also has its own entry in InnoDB internal data dictionaries
inside the tablespace. When MySQL drops a table or a database,
it has to delete both a `.frm' file or files, and
the corresponding entries inside the InnoDB data dictionary.
This is the reason why you cannot move InnoDB tables between
databases simply by moving the `.frm' files, and why
DROP DATABASE did not work for InnoDB type tables
in MySQL versions <= 3.23.43.
Every InnoDB table has a special index called the clustered index
where the data of the rows is stored. If you define a
PRIMARY KEY on your table, then the index of the primary key
will be the clustered index.
If you do not define a primary key for your table, InnoDB will internally generate a clustered index where the rows are ordered by the row ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field which monotonically increases as new rows are inserted. Thus the rows ordered by the row ID will be physically in the insertion order.
Accessing a row through the clustered index is fast, because the row data will be on the same page where the index search leads us. In many databases the data is traditionally stored on a different page from the index record. If a table is large, the clustered index architecture often saves a disk I/O when compared to the traditional solution.
The records in non-clustered indexes (we also call them secondary indexes), in InnoDB contain the primary key value for the row. InnoDB uses this primary key value to search for the row from the clustered index. Note that if the primary key is long, the secondary indexes will use more space.
7.5.12.1 Physical Structure of an Index
All indexes in InnoDB are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16 KB. When new records are inserted, InnoDB tries to leave 1 / 16 of the page free for future insertions and updates of the index records.
If index records are inserted in a sequential (ascending or descending) order, the resulting index pages will be about 15/16 full. If records are inserted in a random order, then the pages will be 1/2 - 15/16 full. If the fillfactor of an index page drops below 1/2, InnoDB will try to contract the index tree to free the page.
7.5.12.2 Insert Buffering
It is a common situation in a database application that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus the insertions to the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually non-unique and insertions happen in a relatively random order into secondary indexes. This would cause a lot of random disk I/Os without a special mechanism used in InnoDB.
If an index record should be inserted to a non-unique secondary index, InnoDB checks if the secondary index page is already in the buffer pool. If that is the case, InnoDB will do the insertion directly to the index page. But, if the index page is not found from the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it entirely fits in the buffer pool, and insertions can be made to it very fast.
The insert buffer is periodically merged to the secondary index trees in the database. Often we can merge several insertions on the same page in of the index tree, and hence save disk I/Os. It has been measured that the insert buffer can speed up insertions to a table up to 15 times.
7.5.12.3 Adaptive Hash Indexes
If a database fits almost entirely in main memory, then the fastest way to perform queries on it is to use hash indexes. InnoDB has an automatic mechanism which monitors index searches made to the indexes defined for a table, and if InnoDB notices that queries could benefit from building of a hash index, such an index is automatically built.
But note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on what search pattern InnoDB observes on the B-tree index. A hash index can be partial: it is not required that the whole B-tree index is cached in the buffer pool. InnoDB will build hash indexes on demand to those pages of the index which are often accessed.
In a sense, through the adaptive hash index mechanism InnoDB adapts itself to ample main memory, coming closer to the architecture of main memory databases.
7.5.12.4 Physical Record Structure
- Each index record in InnoDB contains a header of 6 bytes. The header is used to link consecutive records together, and also in the row level locking.
- Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte field for the transaction id and a 7-byte field for the roll pointer.
- If the user has not defined a primary key for a table, then each clustered index record contains also a 6-byte row ID field.
- Each secondary index record contains also all the fields defined for the clustered index key.
- A record contains also a pointer to each field of the record. If the total length of the fields in a record is < 128 bytes, then the pointer is 1 byte, else 2 bytes.
7.5.12.5 How an AUTO_INCREMENT Column Works in InnoDB
After a database startup, when a user first does an insert to a
table T
where an auto-increment column has been defined, and the user does not provide
an explicit value for the column, then InnoDB executes SELECT
MAX(auto-inc-column) FROM T, and assigns that value incremented
by one to the column and the auto-increment counter of the table.
We say that
the auto-increment counter for table T has been initialized.
InnoDB follows the same procedure in initializing the auto-increment counter for a freshly created table.
Note that if the user specifies in an insert the value 0 to the auto-increment column, then InnoDB treats the row like the value would not have been specified.
After the auto-increment counter has been initialized, if a user inserts a row where he explicitly specifies the column value, and the value is bigger than the current counter value, then the counter is set to the specified column value. If the user does not explicitly specify a value, then InnoDB increments the counter by one and assigns its new value to the column.
The auto-increment mechanism, when assigning values from the counter, bypasses locking and transaction handling. Therefore you may also get gaps in the number sequence if you roll back transactions which have got numbers from the counter.
The behavior of auto-increment is not defined if a user gives a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
7.5.13 File Space Management and Disk I/O
7.5.13.1 Disk I/O
In disk I/O InnoDB uses asynchronous I/O. On Windows NT it uses the native asynchronous I/O provided by the operating system. On Unix, InnoDB uses simulated asynchronous I/O built into InnoDB: InnoDB creates a number of I/O threads to take care of I/O operations, such as read-ahead. In a future version we will add support for simulated aio on Windows NT and native aio on those versions of Unix which have one.
On Windows NT InnoDB uses non-buffered I/O. That means that the disk pages InnoDB reads or writes are not buffered in the operating system file cache. This saves some memory bandwidth.
Starting from 3.23.41 InnoDB uses a novel file flush technique called doublewrite. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync operations.
Doublewrite means that InnoDB before writing pages to a datafile first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed, InnoDB writes the pages to their proper positions in the datafile. If the operating system crashes in the middle of a page write, InnoDB will in recovery find a good copy of the page from the doublewrite buffer.
Starting from 3.23.41
you can also use a raw disk partition as a datafile, though this has
not been tested yet. When you create a new datafile you have
to put the keyword newraw immediately after the datafile size
in innodb_data_file_path. The partition must be at least as large
as the size that you specify. Note that 1M in InnoDB is
1024 x 1024 bytes, while in disk specifications 1 MB usually means
1000 000 bytes.
innodb_data_file_path=/dev/hdd1:5Gnewraw;/dev/hdd2:2Gnewraw
When you start the database again you must change the keyword
to raw. Otherwise, InnoDB will write over your
partition!
innodb_data_file_path=/dev/hdd1:5Graw;/dev/hdd2:2Graw
By using a raw disk you can on some versions of Unix perform unbuffered I/O.
When you use raw disk partitions, make sure they have permissions that allow read and write access to the account used for running the MySQL server.
There are two read-ahead heuristics in InnoDB: sequential read-ahead and random read-ahead. In sequential read-ahead InnoDB notices that the access pattern to a segment in the tablespace is sequential. Then InnoDB will post in advance a batch of reads of database pages to the I/O system. In random read-ahead InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool. Then InnoDB posts the remaining reads to the I/O system.
7.5.13.2 File Space Management
The datafiles you define in the configuration file form the tablespace of InnoDB. The files are simply catenated to form the tablespace, there is no striping in use. Currently you cannot define where in the tablespace your tables will be allocated. However, in a newly created tablespace, InnoDB will allocate space starting from the low end.
The tablespace consists of database pages whose default size is 16 KB. The pages are grouped into extents of 64 consecutive pages. The 'files' inside a tablespace are called segments in InnoDB. The name of the rollback segment is somewhat misleading because it actually contains many segments in the tablespace.
For each index in InnoDB we allocate two segments: one is for non-leaf nodes of the B-tree, the other is for the leaf nodes. The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.
When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.
When you issue a query SHOW TABLE STATUS FROM ... LIKE ...
to ask for available free space in the tablespace, InnoDB will
report the extents which are definitely free in the tablespace.
InnoDB always reserves some extents for clean-up and other internal
purposes; these reserved extents are not included in the free space.
When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes if that frees individual pages or extents to the tablespace, so that the freed space is available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows can be physically removed only in a purge operation after they are no longer needed in transaction rollback or consistent read.
7.5.13.3 Defragmenting a Table
If there are random insertions or deletions in the indexes of a table, the indexes may become fragmented. By fragmentation we mean that the physical ordering of the index pages on the disk is not close to the alphabetical ordering of the records on the pages, or that there are many unused pages in the 64-page blocks which were allocated to the index.
It can speed up index scans if you
periodically use mysqldump to dump the table to
a text file, drop the table, and reload it from the dump.
Another way to do the defragmenting is to
perform a 'null' alter table operation
ALTER TABLE tablename TYPE=InnoDB.
That makes MySQL to rebuild the table.
If the insertions to an index are always ascending and records are deleted only from the end, then the file space management algorithm of InnoDB guarantees that fragmentation in the index will not occur.
7.5.14 Error Handling
The error handling in InnoDB is not always the same as specified in the SQL standard. According to SQL-99, any error during an SQL statement should cause the rollback of that statement. InnoDB sometimes rolls back only part of the statement, or the whole transaction. The following list specifies the error handling of InnoDB.
-
If you run out of file space in the tablespace,
you will get the MySQL
'Table is full'error and InnoDB rolls back the SQL statement. - A transaction deadlock or a timeout in a lock wait make InnoDB to roll back the whole transaction.
-
A duplicate key error only rolls back the insert of that particular row,
even in a statement like
INSERT INTO ... SELECT .... This will probably change so that the SQL statement will be rolled back if you have not specified theIGNOREoption in your statement. - A 'row too long' error rolls back the SQL statement.
- Other errors are mostly detected by the MySQL layer of code, and they roll back the corresponding SQL statement.
7.5.15 Restrictions on InnoDB Tables
- InnoDB tables do not support full-text indexes.
- In Windows, InnoDB stores the database names and table names internally always in lower case. To move databases in a binary format from Unix to Windows or from Windows to Unix you should have all table and database names in lower case.
-
TRUNCATE table_namedoesn't reset anyAUTO_INCREMENTcounters. -
Warning: do NOT convert MySQL system tables from
MyISAM TO InnoDB tables! This is not supported; if you do this MySQL
will not restart until you restore the old system tables from a backup
or re-generate them with the
mysql_install_dbscript. -
SHOW TABLE STATUSdoes not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization. -
If you try to create a unique index on a prefix of a column you will get an
error:
CREATE TABLE T (A CHAR(20), B INT, UNIQUE (A(5))) TYPE = InnoDB;
If you create a non-unique index on a prefix of a column, InnoDB will create an index over the whole column. -
INSERT DELAYEDis not supported for InnoDB tables. -
The MySQL
LOCK TABLESoperation does not know of InnoDB row level locks set in already completed SQL statements: this means that you can get a table lock on a table even if there still exist transactions of other users which have row level locks on the same table. Thus your operations on the table may have to wait if they collide with these locks of other users. Also a deadlock is possible. However, this does not endanger transaction integrity, because the row level locks set by InnoDB will always take care of the integrity. Also, a table lock prevents other transactions from acquiring more row level locks (in a conflicting lock mode) on the table. - A table cannot contain more than 1000 columns.
-
DELETE FROM TABLEdoes not regenerate the table but instead deletes all rows, one by one, which is not that fast. In future versions of MySQL you can useTRUNCATEwhich is fast. - The default database page size in InnoDB is 16 KB. By recompiling the code one can set it from 8 KB to 64 KB. The maximun row length is slightly less than half of a database page in versions <= 3.23.40 of InnoDB. Starting from source release 3.23.41 BLOB and TEXT columns are allowed to be < 4 GB, the total row length must also be < 4 GB. InnoDB does not store fields whose size is <= 128 bytes on separate pages. After InnoDB has modified the row by storing long fields on separate pages, the remaining length of the row must be less than half a database page. The maximun key length is 7000 bytes.
- On some operating systems datafiles must be < 2 GB. The combined size of log files must be < 4 GB.
- The maximum tablespace size is 4 billion database pages. This is also the maximum size for a table. The minimum tablespace size is 10 MB.
-
When you restart the MySQL server, InnoDB may reuse an old value for
an
AUTO_INCREMENTcolumn. -
You cannot set the first
AUTO_INCREMENTcolumn value in InnoDB withCREATE TABLE ... AUTO_INCREMENT=...(orALTER TABLE ...). To set the value insert a dummy row with a value one less, and delete that dummy row.
7.5.16 InnoDB Change History
7.5.16.1 MySQL/InnoDB-4.1.1, December 4, 2003
- Multiple tablespaces now available for InnoDB. You can store each InnoDB type table and its indexes into a separate `.ibd' file into a MySQL database directory, into the same directory where the `.frm' file is stored.
-
The MySQL query cache now works for InnoDB tables also if
AUTOCOMMIT=0, or the statements are enclosed insideBEGIN ... COMMIT. - Reduced InnoDB memory consumption by a few megabytes if one sets the buffer pool size < 8 MB.
- You can use raw disk partitions also in Windows.
7.5.16.2 MySQL/InnoDB-4.0.16, October 22, 2003
- Fixed a bug: in contrary to what was said in the manual, in a locking read InnoDB set two record locks if a unique exact match search condition was used on a multi-column unique key. For a single column unique key it worked right.
-
Fixed a bug: if one used the rename trick
#sql... -> rsql...to recover a temporary table, InnoDB asserted inrow_mysql_lock_data_dictionary(). - There are several outstanding non-critical bugs reported in the MySQL bugs database. Their fixing has been delayed, because resources are allocated to the upcoming 4.1.1 release.
7.5.16.3 MySQL/InnoDB-3.23.58, September 15, 2003
-
Fixed a bug: InnoDB could make the index page directory corrupt in the
first B-tree page splits after `mysqld' startup. A symptom would be
an assertion failure in `page0page.c', in function
page_dir_find_slot(). -
Fixed a bug: InnoDB could in rare cases return an extraneous row if a
rollback, purge, and a
SELECTcoincided. -
Fixed a possible hang over the `btr0sea.c' latch if
SELECTwas used insideLOCK TABLES. -
Fixed a bug: if a single
DELETEstatement first managed to delete some rows and then failed in aFOREIGN KEYerror or aTable is fullerror, MySQL did not roll back the whole SQL statement as it should.
7.5.16.4 MySQL/InnoDB-4.0.15, September 10, 2003
-
Fixed a bug: if you updated a row so that the 8000 byte maximum length
(without
BLOBandTEXT) was exceeded, InnoDB simply removed the record from the clustered index. In a similar insert, InnoDB would leak reserved file space extents, which would only be freed at the next mysqld startup. -
Fixed a bug: if you used big
BLOBvalues, and your log files were relatively small, InnoDB could in a bigBLOBoperation temporarily write over the log produced after the latest checkpoint. If InnoDB would crash at that moment, then the crash recovery would fail, because InnoDB would not be able to scan the log even up to the latest checkpoint. Starting from this version, InnoDB tries to ensure the latest checkpoint is young enough. If that is not possible, InnoDB prints a warning to the `.err' log of MySQL and advises you to make the log files bigger. -
Fixed a bug: setting
innodb_fast_shutdown=0had no effect. -
Fixed a bug introduced in 4.0.13: if a
CREATE TABLEended in a comment, that could cause a memory overrun. -
Fixed a bug: If InnoDB printed
Operating system error number .. in a file operationto the `.err' log in Windows, the error number explanation was wrong. Workaround: look at section 13.2 of http://www.innodb.com/ibman.php about Windows error numbers. -
Fixed a bug: If you created a column prefix
PRIMARY KEYlike int(a CHAR(200), PRIMARY KEY (a(10)))on a fixed-lengthCHARcolumn, InnoDB would crash even in a simpleSELECT. CCHECK TABLEwould report the table as corrupt, also in the case where the created key was notPRIMARY.
7.5.16.5 MySQL/InnoDB-4.0.14, July 22, 2003
-
InnoDB now supports the
SAVEPOINTandROLLBACK TO SAVEPOINTSQL statements. See http://www.innodb.com/ibman.php#Savepoints for the syntax. -
You can now create column prefix keys like in
CREATE TABLE t (a BLOB, INDEX (a(10))). -
You can also use
O_DIRECTas theinnodb_flush_methodon the latest versions of Linux and FreeBSD. Beware of possible bugs in those operating systems, though. -
Fixed the checksum calculation of data pages. Previously most OS file
system corruption went unnoticed. Note that if you downgrade from
version >= 4.0.14 to an earlier version < 4.0.14 then in the first
startup(s) InnoDB will print warnings:
InnoDB: Warning: an inconsistent page in the doublewrite buffer InnoDB: space id 2552202359 page number 8245, 127'th page in dblwr buf.
but that is not dangerous and can be ignored. - Modified the buffer pool replacement algorithm so that it tries to flush modified pages if there are no replaceable pages in the last 10 % of the LRU list. This can reduce disk i/o if the workload is a mixture of reads and writes.
- The buffer pool checkpoint flush algorithm now tries to flush also close neighbors of the page at the end of the flush list. This can speed up database shutdown, and can also speed up disk writes if InnoDB log files are very small compared to the buffer pool size.
-
In 4.0.13 we made
SHOW INNODB STATUSto print detailed info on the latestUNIQUE KEYerror, but storing that info could slow downREPLACEsignificantly. We no longer store or print the info. -
Fixed a bug:
SET FOREIGN_KEY_CHECKS=0was not replicated properly in the MySQL replication. The fix will not be backported to 3.23. -
Fixed a bug: the parameter
innodb_max_dirty_pages_pctforgot to take into account the free pages in the buffer pool. This could lead to excessive flushing even though there were lots of free pages in the buffer pool. Workaround:SET GLOBAL innodb_max_dirty_pages_pct = 100. - Fixed a bug: if there were big index scans then a file read request could starve and InnoDB could assert because of a very long semaphore wait.
-
Fixed a bug: if
AUTOCOMMIT=1then insideLOCK TABLESMySQL failed to do the commit after an updating SQL statement if binlogging was not on, and forSELECTstatements did not commit regardless of binlogging state. - Fixed a bug: InnoDB could make the index page directory corrupt in the first B-tree page splits after a mysqld startup. A symptom would be an assertion in page0page.c, in function page_dir_find_slot().
-
Fixed a bug: if in a
FOREIGN KEYwith anUPDATE CASCADEclause the parent column was of a different internal storage length than the child column, then a cascaded update would make the column length wrong in the child table and corrupt the child table. Because of MySQL's 'silent column specification changes' a fixed-lengthCHARcolumn can change internally to aVARCHARand cause this error. -
Fixed a bug: if a non-
latin1character set was used and if in aFOREIGN KEYthe parent column was of a different internal storage length than the child column, then all inserts to the child table would fail in a foreign key error. -
Fixed a bug: InnoDB could complain that it cannot find the clustered
index record, or in rare cases return an extraneous row if a rollback,
purge, and a
SELECTcoincided. -
Fixed a possible hang over the btr0sea.c latch if
SELECTwas used insideLOCK TABLES. - Fixed a bug: contrary to what the release note of 4.0.13 said, the group commit still did not work if the MySQL binlogging was on.
- Fixed a bug: os_event_wait() did not work properly in Unix, which might have caused starvation in various log operations.
-
Fixed a bug: if a single
DELETEstatement first managed to delete some rows and then failed in aFOREIGN KEYerror or a 'Table is full error', MySQL did not roll back the whole SQL statement as it should, and also wrote the failed statement to the binlog, reporting there a non-zero error_code. -
Fixed a bug: the maximum allowed number of columns in a table is 1000, but
InnoDB did not check that limit in
CREATE TABLE, and a subsequentINSERTorSELECTfrom that table could cause an assertion.
7.5.16.6 MySQL/InnoDB-3.23.57, June 20, 2003
-
Changed the default value of
innodb_flush_log_at_trx_commitfrom 0 to 1. If you have not specified it explicitly in your `my.cnf', and your application runs much slower with this new release, it is because the value 1 causes a log flush to disk at each transaction commit. - Fixed a bug: InnoDB forgot to call pthread_mutex_destroy() when a table was dropped. That could cause memory leakage on FreeBSD and other non-Linux Unixes.
- Fixed a bug: MySQL could erroneously return 'Empty set' if InnoDB estimated an index range size to 0 records though the range was not empty; MySQL also failed to do the next-key locking in the case of an empty index range.
-
Fixed a bug:
GROUP BYandDISTINCTcould treat NULL values inequal.
7.5.16.7 MySQL/InnoDB-4.0.13, May 20, 2003
-
InnoDBnow supportsALTER TABLE DROP FOREIGN KEY. You have to useSHOW CREATE TABLEto find the internally generated foreign key ID when you want to drop a foreign key. -
SHOW INNODB STATUSnow prints detailed information of the latest detectedFOREIGN KEYandUNIQUE KEYerrors. If you do not understand whyInnoDBgives the error 150 from aCREATE TABLE, you can use this statement to study the reason. -
ANALYZE TABLEnow works also forInnoDBtype tables. It makes 10 random dives to each of the index trees and updates index cardinality estimates accordingly. Note that since it is only an estimate, repeated runs ofANALYZE TABLEmay produce different numbers. MySQL uses index cardinality estimates only in join optimization. If some join is not optimized in the right way, you may try usingANALYZE TABLE. -
InnoDBgroup commit capability now works also when MySQL binlogging is switched on. There have to be > 2 client threads for the group commit to become active. -
Changed the default value of
innodb_flush_log_at_trx_commitfrom 0 to 1. If you have not specified it explicitly in your `my.cnf', and your application runs much slower with this new release, it is because the value 1 causes a log flush to disk at each transaction commit. -
Added a new global settable MySQL system variable
innodb_max_dirty_pages_pct. It is an integer in the range 0 - 100. The default is 90. The main thread inInnoDBtries to flush pages from the buffer pool so that at most this many percents are not yet flushed at any time. -
If
innodb_force_recovery=6, do not letInnoDBdo repair of corrupt pages based on the doublewrite buffer. -
InnoDBstart-up now happens faster because it does not set the memory in the buffer pool to zero. -
Fixed a bug: The
InnoDBparser forFOREIGN KEYdefinitions was confused by the keywords 'foreign key' inside MySQL comments. -
Fixed a bug: If you dropped a table to which there was a
FOREIGN KEYreference, and later created the same table with non-matching column types,InnoDBcould assert in `dict0load.c', in functiondict_load_table(). -
Fixed a bug:
GROUP BYandDISTINCTcould treatNULLvalues as not equal. MySQL also failed to do the next-key locking in the case of an empty index range. -
Fixed a bug: Do not commit the current transaction when a MyISAM
table is updated; this also makes
CREATE TABLEnot to commit anInnoDBtransaction, even when binlogging is enabled. -
Fixed a bug: We did not allow
ON DELETE SET NULLto modify the same table where the delete was made; we can allow it because that cannot produce infinite loops in cascaded operations. -
Fixed a bug: Allow
HANDLER PREVandNEXTalso after positioning the cursor with a unique search on the primary key. -
Fixed a bug: If
MIN()orMAX()resulted in a deadlock or a lock wait timeout, MySQL did not return an error, but returnedNULLas the function value. -
Fixed a bug:
InnoDBforgot to callpthread_mutex_destroy()when a table was dropped. That could cause memory leakage on FreeBSD and other non-Linux Unix systems.
7.5.16.8 MySQL/InnoDB-4.1.0, April 3, 2003
-
InnoDBnow supports up to 64 GB of buffer pool memory in a Windows 32-bit Intel computer. This is possible becauseInnoDBcan use the AWE extension of Windows to address memory over the 4 GB limit of a 32-bit process. A new startup variableinnodb_buffer_pool_awe_mem_mbenables AWE and sets the size of the buffer pool in megabytes. -
Reduced the size of buffer headers and the lock table.
InnoDBuses 2 % less memory.
7.5.16.9 MySQL/InnoDB-3.23.56, March 17, 2003
- Fixed a major bug in InnoDB query optimization: queries of type SELECT ... WHERE indexcolumn < x and SELECT ... WHERE indexcolumn > x could cause a table scan even if the selectivity would have been very good.
- Fixed a potential bug if MySQL calls store_lock with TL_IGNORE in the middle of a query.
7.5.16.10 MySQL/InnoDB-4.0.12, March 18, 2003
- In crash recovery InnoDB now prints the progress in percents of a transaction rollback.
- Fixed a bug/feature: if your application program used mysql_use_result(), and used >= 2 connections to send SQL queries, it could deadlock on the adaptive hash S-latch in btr0sea.c. Now mysqld releases the S-latch whenever it passes data from a SELECT to the client.
- Fixed a bug: MySQL could erroneously return 'Empty set' if InnoDB estimated an index range size to 0 records though the range was not empty; MySQL also failed to do the next-key locking in the case of an empty index range.
7.5.16.11 MySQL/InnoDB-4.0.11, February 25, 2003
- Fixed a bug introduced in 4.0.10: SELECT ... FROM ... ORDER BY ... DESC could hang in an infinite loop.
- An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication.
7.5.16.12 MySQL/InnoDB-4.0.10, February 4, 2003
- In INSERT INTO t1 SELECT ... FROM t2 WHERE ... MySQL previously set a table level read lock on t2. This lock is now removed.
- Increased SHOW INNODB STATUS max printed length to 200 KB.
- Fixed a major bug in InnoDB query optimization: queries of type SELECT ... WHERE indexcolumn < x and SELECT ... WHERE indexcolumn > x could cause a table scan even if the selectivity would have been very good.
- Fixed a bug: purge could cause a hang in a BLOB table where the primary key index tree was of height 1. Symptom: semaphore waits caused by an X-latch set in btr_free_externally_stored_field().
- Fixed a bug: using InnoDB HANDLER commands on a fresh handle crashed mysqld in ha_innobase::change_active_index().
- Fixed a bug: if MySQL estimated a query in the middle of a SELECT statement, InnoDB could hang on the adaptive hash index latch in btr0sea.c.
- Fixed a bug: InnoDB could report table corruption and assert in page_dir_find_owner_slot() if an adaptive hash index search coincided with purge or an insert.
- Fixed a bug: some file system snapshot tool in Windows 2000 could cause an InnoDB file write to fail with error 33 ERROR_LOCK_VIOLATION. In synchronous writes InnoDB now retries the write 100 times at 1 second intervals.
- Fixed a bug: REPLACE INTO t1 SELECT ... did not work if t1 has an auto-inc column.
- An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication.
7.5.16.13 MySQL/InnoDB-3.23.55, January 24, 2003
- In INSERT INTO t1 SELECT ... FROM t2 WHERE ... MySQL previously set a table level read lock on t2. This lock is now removed.
- Fixed a bug: if the combined size of InnoDB log files was >= 2 GB in a 32-bit computer, InnoDB would write log in a wrong position. That could make crash recovery and InnoDB Hot Backup to fail in log scan.
- Fixed a bug: index cursor restoration could theoretically fail.
- Fixed a bug: an assertion in btr0sea.c, in function btr_search_info_update_slow could theoretically fail in a race of 3 threads.
- Fixed a bug: purge could cause a hang in a BLOB table where the primary key index tree was of height 1. Symptom: semaphore waits caused by an X-latch set in btr_free_externally_stored_field().
- Fixed a bug: if MySQL estimated a query in the middle of a SELECT statement, InnoDB could hang on the adaptive hash index latch in btr0sea.c.
- Fixed a bug: InnoDB could report table corruption and assert in page_dir_find_owner_slot() if an adaptive hash index search coincided with purge or an insert.
- Fixed a bug: some file system snapshot tool in Windows 2000 could cause an InnoDB file write to fail with error 33 ERROR_LOCK_VIOLATION. In synchronous writes InnoDB now retries the write 100 times at 1 second intervals.
- An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication. The fix will appear in 4.0.11 and will probably not be backported to 3.23.
- Fixed bug in InnoDB page0cur.c in function page_cur_search_with_match which caused InnoDB to remain on the same page forever. This bug is evident only in tables with more than one page.
7.5.16.14 MySQL/InnoDB-4.0.9, January 14, 2003
- Removed the warning message: 'InnoDB: Out of memory in additional memory pool.'
- Fixed a bug: if the combined size of InnoDB log files was >= 2 GB in a 32-bit computer, InnoDB would write log in a wrong position. That could make crash recovery and InnoDB Hot Backup to fail.
- Fixed a bug: index cursor restoration could theoretically fail.
7.5.16.15 MySQL/InnoDB-4.0.8, January 7, 2003
- InnoDB now supports also FOREIGN KEY (...) REFERENCES ...(...) [ON UPDATE CASCADE | ON UPDATE SET NULL | ON UPDATE RESTRICT | ON UPDATE NO ACTION].
- Tables and indexes now reserve 4 % less space in the tablespace. Also existing tables reserve less space. By upgrading to 4.0.8 you will see more free space in "InnoDB free" in SHOW TABLE STATUS.
- Fixed bugs: updating the PRIMARY KEY of a row would generate a foreign key error on all FOREIGN KEYs which referenced secondary keys of the row to be updated. Also, if a referencing FOREIGN KEY constraint only referenced the first columns in an index, and there were more columns in that index, updating the additional columns generated a foreign key error.
- Fixed a bug: if an index contains some column twice, and that column is updated, the table will become corrupt. From now on InnoDB prevents creation of such indexes.
- Fixed a bug: removed superfluous error 149 and 150 printouts from the .err log when a locking SELECT caused a deadlock or a lock wait timeout.
- Fixed a bug: an assertion in btr0sea.c, in function btr_search_info_update_slow could theoretically fail in a race of 3 threads.
- Fixed a bug: one could not switch a session transaction isolation level back to REPEATABLE READ after setting it to something else.
7.5.16.16 MySQL/InnoDB-4.0.7, December 26, 2002
- InnoDB in 4.0.7 is essentially the same as in 4.0.6.
7.5.16.17 MySQL/InnoDB-4.0.6, December 19, 2002
- Since innodb_log_arch_dir has no relevance under MySQL, there is no need to specify it any more in my.cnf.
- LOAD DATA INFILE in AUTOCOMMIT=1 mode no longer does implicit commits for each 1 MB of written binlog.
- Fixed a bug introduced in 4.0.4: LOCK TABLES ... READ LOCAL should not set row locks on the rows read. This caused deadlocks and lock wait timeouts in mysqldump.
- Fixed two bugs introduced in 4.0.4: in AUTO_INCREMENT, REPLACE could cause the counter to be left 1 too low. A deadlock or a lock wait timeout could cause the same problem.
- Fixed a bug: TRUNCATE on a TEMPORARY table crashed InnoDB.
- Fixed a bug introduced in 4.0.5: if binlogging was not switched on, INSERT INTO ... SELECT ... or CREATE TABLE ... SELECT ... could cause InnoDB to hang on a semaphore created in btr0sea.c, line 128. Workaround: switch binlogging on.
- Fixed a bug: in replication issuing SLAVE STOP in the middle of a multi-statement transaction could cause that SLAVE START would only perform a part of the transaction. A similar error could occur if the slave crashed and was restarted.
7.5.16.18 MySQL/InnoDB-3.23.54, December 12, 2002
- Fixed a bug: the InnoDB range estimator greatly exaggerated the size of a short index range if the paths to the endpoints of the range in the index tree happened to branch already in the root. This could cause unnecessary table scans in SQL queries.
- Fixed a bug: ORDER BY could fail if you had not created a primary key to a table, but had defined several indexes of which at least one was a UNIQUE index with all its columns declared as NOT NULL.
- Fixed a bug: a lock wait timeout in connection with ON DELETE CASCADE could cause corruption in indexes.
- Fixed a bug: if a SELECT was done with a unique key from a primary index, and the search matched to a delete-marked record, InnoDB could erroneously return the NEXT record.
- Fixed a bug introduced in 3.23.53: LOCK TABLES ... READ LOCAL should not set row locks on the rows read. This caused deadlocks and lock wait timeouts in mysqldump.
- Fixed a bug: if an index contains some column twice, and that column is updated, the table will become corrupt. From now on InnoDB prevents creation of such indexes.
7.5.16.19 MySQL/InnoDB-4.0.5, November 18, 2002
- InnoDB now supports also transaction isolation levels READ COMMITTED and READ UNCOMMITTED. READ COMMITTED more closely emulates Oracle and makes porting of applications from Oracle to MySQL easier.
- Deadlock resolution is now selective: we try to pick as victims transactions with less modified or inserted rows.
- FOREIGN KEY definitions are now aware of the lower_case_table_names setting in my.cnf.
- SHOW CREATE TABLE does not output the database name to a FOREIGN KEY definition if the referred table is in the same database as the table.
- InnoDB does a consistency check to most index pages before writing them to a datafile.
- If you set innodb_force_recovery > 0, InnoDB tries to jump over corrupt index records and pages when doing SELECT * FROM table. This helps in dumping.
- InnoDB now again uses asynchronous unbuffered I/O in Windows 2000 and XP; only unbuffered simulated async I/O in NT, 95/98/ME.
- Fixed a bug: the InnoDB range estimator greatly exaggerated the size of a short index range if the paths to the endpoints of the range in the index tree happened to branch already in the root. This could cause unnecessary table scans in SQL queries. The fix will also be backported to 3.23.54.
- Fixed a bug present in 3.23.52, 4.0.3, 4.0.4: InnoDB startup could take very long or even crash on some Windows 95/98/ME computers.
- Fixed a bug: the AUTO-INC lock was held to the end of the transaction if it was granted after a lock wait. This could cause unnecessary deadlocks.
- Fixed a bug: if SHOW INNODB STATUS, innodb_monitor, or innodb_lock_monitor had to print several hundred transactions in one report, and the output became truncated, InnoDB would hang, printing to the error log many waits for a mutex created at srv0srv.c, line 1621.
- Fixed a bug: SHOW INNODB STATUS on Unix always reported average file read size as 0 bytes.
- Fixed a potential bug in 4.0.4: InnoDB now does ORDER BY ... DESC like MyISAM.
- Fixed a bug: DROP TABLE could cause crash or a hang if there was a rollback concurrently running on the table. The fix will only be backported to 3.23 if this appears a real problem for users.
- Fixed a bug: ORDER BY could fail if you had not created a primary key to a table, but had defined several indexes of which at least one was a UNIQUE index with all its columns declared as NOT NULL.
- Fixed a bug: a lock wait timeout in connection with ON DELETE CASCADE could cause corruption in indexes.
- Fixed a bug: if a SELECT was done with a unique key from a primary index, and the search matched to a delete-marked record, InnoDB could return the NEXT record.
- Outstanding bugs: in 4.0.4 two bugs were introduced to AUTO_INCREMENT. REPLACE can cause the counter to be left 1 too low. A deadlock or a lock wait timeout can cause the same problem. These will be fixed in 4.0.6.
7.5.16.20 MySQL/InnoDB-3.23.53, October 9, 2002
- We again use unbuffered disk I/O to datafiles in Windows. Windows XP and Windows 2000 read performance seems to be very poor with normal I/O.
- Tuned range estimator so that index range scans are preferred over full index scans.
- Allow dropping and creating a table even if innodb_force_recovery is set. One can use this to drop a table which would cause a crash in rollback or purge, or if a failed table import causes a runaway rollback in recovery.
- Fixed a bug present in 3.23.52, 4.0.3, 4.0.4: InnoDB startup could take very long or even crash on some Windows 95/98/ME computers.
- Fixed a bug: fast shutdown (which is the default) sometimes was slowed down by purge and insert buffer merge.
- Fixed a bug: doing a big SELECT from a table where no rows were visible in a consistent read could cause a very long (> 600 seconds) semaphore wait in btr0cur.c line 310.
- Fixed a bug: the AUTO-INC lock was held to the end of the transaction if it was granted after a lock wait. This could cause unnecessary deadlocks.
- Fixed a bug: if you created a temporary table inside LOCK TABLES, and used that temporary table, that caused an assertion failure in ha_innobase.cc.
- Fixed a bug: if SHOW INNODB STATUS, innodb_monitor, or innodb_lock_monitor had to print several hundred transactions in one report, and the output became truncated, InnoDB would hang, printing to the error log many waits for a mutex created at srv0srv.c, line 1621.
- Fixed a bug: SHOW INNODB STATUS on Unix always reported average file read size as 0 bytes.
7.5.16.21 MySQL/InnoDB-4.0.4, October 2, 2002
- We again use unbuffered disk I/O in Windows. Windows XP and Windows 2000 read performance seems to be very poor with normal I/O.
- Increased the max key length of InnoDB tables from 500 to 1024 bytes.
- Increased the table comment field in SHOW TABLE STATUS so that up to 16000 characters of foreign key definitions can be printed there.
- The auto-increment counter is no longer incremented if an insert of a row immediately fails in an error.
- Allow dropping and creating a table even if innodb_force_recovery is set. One can use this to drop a table which would cause a crash in rollback or purge, or if a failed table import causes a runaway rollback in recovery.
- Fixed a bug: Using ORDER BY primarykey DESC in 4.0.3 causes an assertion failure in btr0pcur.c, line 203.
- Fixed a bug: fast shutdown (which is the default) sometimes was slowed down by purge and insert buffer merge.
- Fixed a bug: doing a big SELECT from a table where no rows were visible in a consistent read could cause a very long (> 600 seconds) semaphore wait in btr0cur.c line 310.
- Fixed a bug: if the MySQL query cache was used, it did not get invalidated by a modification done by ON DELETE CASCADE or ...SET NULL.
- Fixed a bug: if you created a temporary table inside LOCK TABLES, and used that temporary table, that caused an assertion failure in ha_innodb.cc.
- Fixed a bug: if you set innodb_flush_log_at_trx_commit to 1, SHOW VARIABLES would show its value as 16 million.
7.5.16.22 MySQL/InnoDB-4.0.3, August 28, 2002
- Removed unnecessary deadlocks when inserts have to wait for a locking read, update, or delete to release its next-key lock.
-
The MySQL
HANDLERSQL commands now work also forInnoDBtype tables.InnoDBdoes theHANDLERreads always as consistent reads.HANDLERis a direct access path to read individual indexes of tables. In some casesHANDLERcan be used as a substitute of server-side cursors. - Fixed a bug in 4.0.2: even a simple insert could crash the AIX version.
- Fixed a bug: if you used in a table name characters whose code is > 127, in DROP TABLE InnoDB could assert on line 155 of pars0sym.c.
- Compilation from source now provides a working version both on HP-UX-11 and HP-UX-10.20. The source of 4.0.2 worked only on 11, and the source of 3.23.52 only on 10.20.
- Fixed a bug: if compiled on 64-bit Solaris, InnoDB produced a bus error at startup.
7.5.16.23 MySQL/InnoDB-3.23.52, August 16, 2002
- The feature set of 3.23 will be frozen from this version on. New features will go the 4.0 branch, and only bug fixes will be made to the 3.23 branch.
- Many CPU-bound join queries now run faster. On Windows also many other CPU-bound queries run faster.
- A new SQL command SHOW INNODB STATUS returns the output of the InnoDB Monitor to the client. The InnoDB Monitor now prints detailed information on the latest detected deadlock.
- InnoDB made the SQL query optimizer to avoid too much index-only range scans and choose full table scans instead. This is now fixed.
-
BEGINandCOMMITare now added in the binlog around transactions. The MySQL replication now respects transaction borders: a user will no longer see half transactions in replication slaves. - A replication slave now prints in crash recovery the last master binlog position it was able to recover to.
- A new setting innodb_flush_log_at_trx_commit=2 makes InnoDB to write the log to the operating system file cache at each commit. This is almost as fast as the setting innodb_flush_log_at_trx_commit=0, and the setting 2 also has the nice feature that in a crash where the operating system does not crash, no committed transaction is lost. If the operating system crashes or there is a power outage, then the setting 2 is no safer than the setting 0.
- Added checksum fields to log blocks.
- SET FOREIGN_KEY_CHECKS=0 helps in importing tables in an arbitrary order which does not respect the foreign key rules.
- SET UNIQUE_CHECKS=0 speeds up table imports into InnoDB if you have UNIQUE constraints on secondary indexes. This flag should be used only if you are certain that the input records contain no UNIQUE constraint violations.
- SHOW TABLE STATUS now lists also possible ON DELETE CASCADE or ON DELETE SET NULL in the comment field of the table.
- When CHECK TABLE is run on any InnoDB type table, it now checks also the adaptive hash index for all tables.
- If you defined ON DELETE CASCADE or SET NULL and updated the referenced key in the parent row, InnoDB deleted or updated the child row. This is now changed to conform to SQL-92: you get the error 'Cannot delete parent row'.
- Improved the auto-increment algorithm: now the first insert or SHOW TABLE STATUS initializes the auto-increment counter for the table. This removes almost all surprising deadlocks caused by SHOW TABLE STATUS.
- Aligned some buffers used in reading and writing to datafiles. This allows using unbuffered raw devices as datafiles in Linux.
- Fixed a bug: If you updated the primary key of a table so that only the case of characters changed, that could cause assertion failures, mostly in page0page.ic line 515.
- Fixed a bug: If you delete or update a row referenced in a foreign key constraint and the foreign key check has to wait for a lock, then the check may report an erroneous result. This affects also the ON DELETE... operation.
- Fixed a bug: A deadlock or a lock wait timeout error in InnoDB causes InnoDB to roll back the whole transaction, but MySQL could still write the earlier SQL statements to the binlog, even though InnoDB rolled them back. This could, for example, cause replicated databases to get out-of-sync.
- Fixed a bug: If the database happened to crash in the middle of a commit, then the recovery might leak tablespace pages.
- Fixed a bug: If you specified a non-latin1 character set in my.cnf, then, in contrary to what is stated in the manual, in a foreign key constraint a string type column had to have the same length specification in the referencing table and the referenced table.
- Fixed a bug: DROP TABLE or DROP DATABASE could fail if there simultaneously was a CREATE TABLE running.
- Fixed a bug: If you configured the buffer pool bigger than 2 GB in a 32-bit computer, InnoDB would assert in buf0buf.ic line 214.
- Fixed a bug: on 64-bit computers updating rows which contained the SQL NULL in some column could cause the undo log and the ordinary log to become corrupt.
- Fixed a bug: innodb_log_monitor caused a hang if it suppressed lock prints for a page.
- Fixed a bug: in the HP-UX-10.20 version mutexes would leak and cause race conditions and crashes in any part of InnoDB code.
- Fixed a bug: if you ran in the AUTOCOMMIT mode, executed a SELECT, and immediately after that a RENAME TABLE, then RENAME would fail and MySQL would complain about error 1192.
- Fixed a bug: if compiled on 64-bit Solaris, InnoDB produced a bus error at startup.
7.5.16.24 MySQL/InnoDB-4.0.2, July 10, 2002
- InnoDB is essentially the same as InnoDB-3.23.51.
- If no innodb_data_file_path is specified, InnoDB at the database creation now creates a 10 MB auto-extending datafile ibdata1 to the datadir of MySQL. In 4.0.1 the file was 64 MB and not auto-extending.
7.5.16.25 MySQL/InnoDB-3.23.51, June 12, 2002
- Fixed a bug: a join could result in a seg fault in copying of a BLOB or TEXT column if some of the BLOB or TEXT columns in the table contained SQL NULL values.
- Fixed a bug: if you added self-referential foreign key constraints with ON DELETE CASCADE to tables and a row deletion caused InnoDB to attempt the deletion of the same row twice because of a cascading delete, then you got an assertion failure.
- Fixed a bug: if you use MySQL 'user level locks' and close a connection, then InnoDB may assert in ha_innobase.cc, line 302.
7.5.16.26 MySQL/InnoDB-3.23.50, April 23, 2002
- InnoDB now supports an auto-extending last datafile. You do not need to preallocate the whole datafile at the database startup.
- Made several changes to facilitate the use of the InnoDB Hot Backup tool. It is a separate non-free tool you can use to take online backups of your database without shutting down the server or setting any locks.
- If you want to run the InnoDB Hot Backup tool on an auto-extending datafile you have to upgrade it to version ibbackup-0.35.
- The log scan phase in crash recovery will now run much faster.
- Starting from this server version, the hot backup tool truncates unused ends in the backup InnoDB datafiles.
- To allow the hot backup tool to work, on Windows we no longer use unbuffered I/O or native async I/O; instead we use the same simulated async I/O as on Unix.
- You can now define the ON DELETE CASCADE or ON DELETE SET NULL clause on foreign keys.
- FOREIGN KEY constraints now survive ALTER TABLE and CREATE INDEX.
- We suppress the FOREIGN KEY check if any of the column values in the foreign key or referenced key to be checked is the SQL NULL. This is compatible with Oracle, for example.
- SHOW CREATE TABLE now lists also foreign key constraints. Also mysqldump no longer forgets about foreign keys in table definitions.
- You can now add a new foreign key constraint with ALTER TABLE ... ADD CONSTRAINT FOREIGN KEY (...) REFERENCES ... (...).
- FOREIGN KEY definitions now allow backquotes around table and column names.
- MySQL command SET TRANSACTION ISOLATION LEVEL ... has now the following effect on InnoDB tables: if a transaction is defined as SERIALIZABLE then InnoDB conceptually adds LOCK IN SHARE MODE to all consistent reads. If a transaction is defined to have any other isolation level, then InnoDB obeys its default locking strategy which is REPEATABLE READ.
- SHOW TABLE STATUS no longer sets an x-lock at the end of an auto-increment index if the auto-increment counter has already been initialized. This removes in almost all cases the surprising deadlocks caused by SHOW TABLE STATUS.
- Fixed a bug: in a CREATE TABLE statement the string 'foreign' followed by a non-space character confused the FOREIGN KEY parser and caused table creation to fail with errno 150.
7.5.16.27 MySQL/InnoDB-3.23.49, February 17, 2002
- Fixed a bug: if you called DROP DATABASE for a database on which there simultaneously were running queries, the MySQL server could crash or hang. Crashes fixed, but a full fix has to wait some changes in the MySQL layer of code.
- Fixed a bug: on Windows one had to put the database name in lower case for DROP DATABASE to work. Fixed in 3.23.49: case no longer matters on Windows. On Unix the database name remains case-sensitive.
- Fixed a bug: if one defined a non-latin1 character set as the default character set, then definition of foreign key constraints could fail in an assertion failure in dict0crea.c, reporting an internal error 17.
7.5.16.28 MySQL/InnoDB-3.23.48, February 9, 2002
- Tuned the SQL optimizer to favor more often index searches over table scans.
- Fixed a performance problem when several large SELECT queries are run concurrently on a multiprocessor Linux computer. Large CPU-bound SELECT queries will now also generally run faster on all platforms.
- If MySQL binlogging is used, InnoDB now prints after crash recovery the latest MySQL binlog file name and the position in that file (= byte offset) InnoDB was able to recover to. This is useful, for example, when resynchronizing a master and a slave database in replication.
- Added better error messages to help in installation problems.
- One can now recover also MySQL temporary tables which have become orphaned inside the InnoDB tablespace.
- InnoDB now prevents a FOREIGN KEY declaration where the signedness is not the same in the referencing and referenced integer columns.
- Fixed a bug: calling SHOW CREATE TABLE or SHOW TABLE STATUS could cause memory corruption and make mysqld to crash. Especially at risk was mysqldump, because it calls frequently SHOW CREATE TABLE.
- Fixed a bug: if on Unix you did an ALTER TABLE to an InnoDB table and simultaneously did queries to it, mysqld could crash with an assertion failure in row0row.c, line 474.
- Fixed a bug: if inserts to several tables containing an auto-inc column were wrapped inside one LOCK TABLES, InnoDB asserted in lock0lock.c.
- In 3.23.47 we allowed several NULLS in a UNIQUE secondary index. But CHECK TABLE was not relaxed: it reports the table as corrupt. CHECK TABLE no longer complains in this situation.
- Fixed a bug: on Sparc and other high-endian processors SHOW VARIABLES showed innodb_flush_log_at_trx_commit and other boolean-valued startup parameters always OFF even if they were switched on.
- Fixed a bug: if you ran mysqld-max-nt as a service on Windows NT/2000, the service shutdown did not always wait long enough for the InnoDB shutdown to finish.
7.5.16.29 MySQL/InnoDB-3.23.47, December 28, 2001
- Recovery happens now faster, especially in a lightly loaded system, because background checkpointing has been made more frequent.
- InnoDB allows now several similar key values in a UNIQUE secondary index if those values contain SQL NULLs. Thus the convention is now the same as in MyISAM tables.
- InnoDB gives a better row count estimate for a table which contains BLOBs.
- In a FOREIGN KEY constraint InnoDB is now case-insensitive to column names, and in Windows also to table names.
- InnoDB allows a FOREIGN KEY column of CHAR type to refer to a column of VARCHAR type, and vice versa. MySQL silently changes the type of some columns between CHAR and VARCHAR, and these silent changes do not hinder FOREIGN KEY declaration any more.
- Recovery has been made more resilient to corruption of log files.
- Unnecessary statistics calculation has been removed from queries which generate a temporary table. Some ORDER BY and DISTINCT queries will now run much faster.
- MySQL now knows that the table scan of an InnoDB table is done through the primary key. This will save a sort in some ORDER BY queries.
- The maximum key length of InnoDB tables is again restricted to 500 bytes. The MySQL interpreter is not able to handle longer keys.
- The default value of innodb_lock_wait_timeout was changed from infinite to 50 seconds, the default value of innodb_file_io_threads from 9 to 4.
7.5.16.30 MySQL/InnoDB-4.0.1, December 23, 2001
- InnoDB is the same as in 3.23.47.
- In 4.0.0 the MySQL interpreter did not know the syntax LOCK IN SHARE MODE. This has been fixed.
- In 4.0.0 multiple-table delete did not work for transactional tables. This has been fixed.
7.5.16.31 MySQL/InnoDB-3.23.46, November 30, 2001
- This is the same as 3.23.45.
7.5.16.32 MySQL/InnoDB-3.23.45, November 23, 2001
- This is a bugfix release.
- In versions 3.23.42-.44 when creating a table on Windows you have to use lower case letters in the database name to be able to access the table. Fixed in 3.23.45.
- InnoDB now flushes stdout and stderr every 10 seconds: if these are redirected to files, the file contents can be better viewed with an editor.
- Fixed an assertion failure in .44, in trx0trx.c, line 178 when you drop a table which has the .frm file but does not exist inside InnoDB.
- Fixed a bug in the insert buffer. The insert buffer tree could get into an inconsistent state, causing a crash, and also crashing the recovery. This bug could appear especially in large table imports or alterations.
- Fixed a bug in recovery: InnoDB could go into an infinite loop constantly printing a warning message that it cannot find free blocks from the buffer pool.
- Fixed a bug: when you created a temporary table of the InnoDB type, and then used ALTER TABLE to it, the MySQL server could crash.
- Prevented creation of MySQL system tables 'mysql.user', 'mysql.host', or 'mysql.db', in the InnoDB type.
- Fixed a bug which can cause an assertion failure in 3.23.44 in srv0srv.c, line 1728.
7.5.16.33 MySQL/InnoDB-3.23.44, November 2, 2001
- You can define foreign key constraints on InnoDB tables. An example: FOREIGN KEY (col1) REFERENCES table2(col2).
- You can create > 4 GB datafiles in those file systems that allow it.
- Improved InnoDB monitors, including a new innodb_table_monitor which allows you to print the contents of the InnoDB internal data dictionary.
- DROP DATABASE will now work also for InnoDB tables.
- Accent characters in the default character set latin1 will be ordered according to the MySQL ordering.<br> NOTE: if you are using latin1 and have inserted characters whose code is > 127 to an indexed CHAR column, you should run CHECK TABLE on your table when you upgrade to 3.23.43, and drop and reimport the table if CHECK TABLE reports an error!
- InnoDB will calculate better table cardinality estimates.
- Change in deadlock resolution: in .43 a deadlock rolls back only the SQL statement, in .44 it will roll back the whole transaction.
- Deadlock, lock wait timeout, and foreign key constraint violations (no parent row, child rows exist) now return native MySQL error codes 1213, 1205, 1216, 1217, respectively.
- A new my.cnf parameter innodb_thread_concurrency helps in performance tuning in high concurrency environments.
- A new my.cnf option innodb_force_recovery will help you in dumping tables from a corrupted database.
- A new my.cnf option innodb_fast_shutdown will speed up shutdown. Normally InnoDB does a full purge and an insert buffer merge at shutdown.
- Raised maximum key length to 7000 bytes from a previous limit of 500 bytes.
- Fixed a bug in replication of auto-inc columns with multiline inserts.
- Fixed a bug when the case of letters changes in an update of an indexed secondary column.
- Fixed a hang when there are > 24 datafiles.
- Fixed a crash when MAX(col) is selected from an empty table, and col is a not the first column in a multi-column index.
- Fixed a bug in purge which could cause crashes.
7.5.16.34 MySQL/InnoDB-3.23.43, October 4, 2001
- This is essentially the same as InnoDB-3.23.42.
7.5.16.35 MySQL/InnoDB-3.23.42, September 9, 2001
- Fixed a bug which corrupted the table if the primary key of a > 8000-byte row was updated.
- There are now 3 types of InnoDB Monitors: innodb_monitor, innodb_lock_monitor, and innodb_tablespace_monitor. innodb_monitor now prints also buffer pool hit rate and the total number of rows inserted, updated, deleted, read.
- Fixed a bug in RENAME TABLE.
- Fixed a bug in replication with an auto-increment column.
7.5.16.36 MySQL/InnoDB-3.23.41, August 13, 2001
- Support for < 4 GB rows. The previous limit was 8000 bytes.
- Use the doublewrite file flush method.
- Raw disk partitions supported as datafiles.
- InnoDB Monitor.
-
Several hang bugs fixed and an
ORDER BYbug ('Sort aborted') fixed.
7.5.16.37 MySQL/InnoDB-3.23.40, July 16, 2001
- Only a few rare bugs fixed.
7.5.16.38 MySQL/InnoDB-3.23.39, June 13, 2001
-
CHECK TABLEnow works forInnoDBtables. -
A new `my.cnf' parameter
innodb_unix_file_flush_methodintroduced. It can be used to tune disk write performance. - An auto-increment column now gets new values past the transaction mechanism. This saves CPU time and eliminates transaction deadlocks in new value assignment.
- Several bug fixes, most notably the rollback bug in 3.23.38.
7.5.16.39 MySQL/InnoDB-3.23.38, May 12, 2001
-
The new syntax
SELECT ... LOCK IN SHARE MODEis introduced. -
InnoDBnow callsfsync()after every disk write and calculates a checksum for every database page it writes or reads, which will reveal disk defects. - Several bug fixes.
7.5.17 InnoDB Contact Information
Contact information of Innobase Oy, producer of the InnoDB engine.
Web site: http://www.innodb.com/.
Email: sales@innodb.com
phone: 358-9-6969 3250 (office) 358-40-5617367 (mobile) Innobase Oy Inc. World Trade Center Helsinki Aleksanterinkatu 17 P.O.Box 800 00101 Helsinki Finland
7.6 BDB or BerkeleyDB Tables
7.6.1 Overview of BDB Tables
BerkeleyDB, available at http://www.sleepycat.com/ has provided
MySQL with a transactional storage engine. Support for this storage engine is
included in the MySQL source distribution starting from version 3.23.34 and is
activated in the MySQL-Max binary. This storage engine is typically called
BDB for short.
BDB tables may have a greater chance of surviving crashes and are also
capable of COMMIT and ROLLBACK operations on transactions.
The MySQL source distribution comes with a BDB distribution that has a
couple of small patches to make it work more smoothly with MySQL.
You can't use a non-patched BDB version with MySQL.
We at MySQL AB are working in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high.
When it comes to supporting BDB tables, we are committed to help our
users to locate the problem and help creating a reproducible test case
for any problems involving BDB tables. Any such test case will be
forwarded to Sleepycat who in turn will help us find and fix the
problem. As this is a two-stage operation, any problems with BDB tables
may take a little longer for us to fix than for other storage engines.
However, as the BerkeleyDB code itself has been used by many other
applications than MySQL, we don't envision any big problems with
this. See section 1.4.1 Support Offered by MySQL AB.
7.6.2 Installing BDB
If you have downloaded a binary version of MySQL that includes
support for BerkeleyDB, simply follow the instructions for installing a
binary version of MySQL.
See section 2.2.6 Installing a MySQL Binary Distribution. See section 4.8.5 mysqld-max, An Extended mysqld Server.
To compile MySQL with Berkeley DB support, download MySQL
Version 3.23.34 or newer and configure MySQL with the
--with-berkeley-db option. See section 2.3 Installing a MySQL Source Distribution.
cd /path/to/source/of/mysql-3.23.34 ./configure --with-berkeley-db
Please refer to the manual provided with the BDB distribution for
more updated information.
Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered gamma quality. We are actively improving and optimizing it to make it stable very soon.
7.6.3 BDB Startup Options
If you are running with AUTOCOMMIT=0 then your changes in BDB
tables will not be updated until you execute COMMIT. Instead of commit
you can execute ROLLBACK to forget your changes. See section 6.7.1 START TRANSACTION, COMMIT, and ROLLBACK Syntax.
If you are running with AUTOCOMMIT=1 (the default), your changes
will be committed immediately. You can start an extended transaction with
the BEGIN WORK SQL command, after which your changes will not be
committed until you execute COMMIT (or decide to ROLLBACK
the changes).
The following options to mysqld can be used to change the behavior of
BDB tables:
| Option | Description |
--bdb-home=directory | Base directory for BDB tables. This should be the same directory you use for --datadir.
|
--bdb-lock-detect=# | Berkeley lock detection method. One of (DEFAULT, OLDEST, RANDOM, or YOUNGEST).
|
--bdb-logdir=directory | Berkeley DB log file directory. |
--bdb-no-sync | Don't synchronously flush logs. |
--bdb-no-recover | Don't start Berkeley DB in recover mode. |
--bdb-shared-data | Start Berkeley DB in multi-process mode (Don't use DB_PRIVATE when initializing Berkeley DB)
|
--bdb-tmpdir=directory | Berkeley DB temporary file directory. |
--skip-bdb | Disable usage of BDB tables.
|
-O bdb_max_lock=1000 | Set the maximum number of locks possible. See section 4.6.8.4 SHOW VARIABLES.
|
If you use --skip-bdb, MySQL will not initialize the
Berkeley DB library and this will save a lot of memory. Of course,
you cannot use BDB tables if you are using this option. If you try
to create a BDB table, MySQL will instead create a MyISAM table.
Normally you should start mysqld without --bdb-no-recover if you
intend to use BDB tables. This may, however, give you problems when you
try to start mysqld if the BDB log files are corrupted. See section 2.4.2 Starting and Troubleshooting the MySQL Server.
With bdb_max_lock you can specify the maximum number of locks
(10000 by default) you can have active on a BDB table. You should
increase this if you get errors of type bdb: Lock table is out of
available locks or Got error 12 from ... when you have do long
transactions or when mysqld has to examine a lot of rows to
calculate the query.
You may also want to change binlog_cache_size and
max_binlog_cache_size if you are using big multi-line transactions.
See section 6.7.1 START TRANSACTION, COMMIT, and ROLLBACK Syntax.
7.6.4 Characteristics of BDB Tables
-
To be able to rollback transactions, the
BDBstorage engine maintains log files. For maximum performance you should place these on another disk than your databases by using the--bdb-logdiroption. -
MySQL performs a checkpoint each time a new
BDBlog file is started, and removes any log files that are not needed for current transactions. One can also runFLUSH LOGSat any time to checkpoint the Berkeley DB tables. For disaster recovery, one should use table backups plus MySQL's binary log. See section 4.5.1 Database Backups. Warning: If you delete old log files that are in use,BDBwill not be able to do recovery at all and you may lose data if something goes wrong. -
MySQL requires a
PRIMARY KEYin eachBDBtable to be able to refer to previously read rows. If you don't create one, MySQL will create an maintain a hiddenPRIMARY KEYfor you. The hidden key has a length of 5 bytes and is incremented for each insert attempt. -
If all columns you access in a
BDBtable are part of the same index or part of the primary key, then MySQL can execute the query without having to access the actual row. In aMyISAMtable the above holds only if the columns are part of the same index. -
The
PRIMARY KEYwill be faster than any other key, as thePRIMARY KEYis stored together with the row data. As the other keys are stored as the key data + thePRIMARY KEY, it's important to keep thePRIMARY KEYas short as possible to save disk and get better speed. -
LOCK TABLESworks onBDBtables as with other tables. If you don't useLOCK TABLE, MySQL will issue an internal multiple-write lock on the table to ensure that the table will be properly locked if another thread issues a table lock. -
Internal locking in
BDBtables is done on page level. -
SELECT COUNT(*) FROM table_nameis slow asBDBtables doesn't maintain a count of the number of rows in the table. -
Sequential scanning is slower than with
MyISAMtables as the data inBDBtables stored in B-trees and not in a separate datafile. -
The application must always be prepared to handle cases where
any change of a
BDBtable may make an automatic rollback and any read may fail with a deadlock error. -
Keys are not prefix or suffix-compressed like keys in
MyISAMtables. In other words, the key information will take a little more space inBDBtables compared toMyISAMtables. -
There are often holes in the
BDBtable to allow you to insert new rows in the middle of the key tree. This makesBDBtables somewhat larger thanMyISAMtables. -
The optimizer needs to know an approximation of the number of rows in
the table. MySQL solves this by counting inserts and
maintaining this in a separate segment in each
BDBtable. If you don't issue a lot ofDELETEorROLLBACKstatements, this number should be accurate enough for the MySQL optimizer, but as MySQL only stores the number on close, it may be incorrect if MySQL dies unexpectedly. It should not be fatal even if this number is not 100% correct. One can update the number of rows by executingANALYZE TABLEorOPTIMIZE TABLE. See section 4.6.2ANALYZE TABLESyntax . See section 4.6.1OPTIMIZE TABLESyntax. -
If you get full disk with a
BDBtable, you will get an error (probably error 28) and the transaction should roll back. This is in contrast withMyISAMandISAMtables wheremysqldwill wait for enough free disk before continuing.
7.6.5 Things We Need to Fix for BDB in the Near Future
-
It's very slow to open many
BDBtables at the same time. If you are going to useBDBtables, you should not have a very big table cache (like >256) and you should use--no-auto-rehashwith themysqlclient. We plan to partly fix this in 4.0. -
SHOW TABLE STATUSdoesn't yet provide that much information forBDBtables. - Optimize performance.
- Change to not use page locks at all when we are scanning tables.
7.6.6 Operating Systems Supported by BDB
Currently we know that the BDB storage engine works with the following
operating systems:
- Linux 2.x Intel
- Sun Solaris (SPARC and x86)
- FreeBSD 4.x/5.x (x86, sparc64)
- IBM AIX 4.3.x
- SCO OpenServer
- SCO UnixWare 7.1.x
It doesn't work with the following operating systems:
- Linux 2.x Alpha
- Linux 2.x AMD64
- Linux 2.x IA64
- Linux 2.x s390
- Max OS X
Note: The above list is not complete; we will update it as we receive more information.
If you build MySQL with support for BDB tables and get
the following error in the log file when you start mysqld:
bdb: architecture lacks fast mutexes: applications cannot be threaded Can't init databases
This means that BDB tables are not supported for your architecture.
In this case you must rebuild MySQL without BDB table support.
7.6.7 Restrictions on BDB Tables
Here follows the restrictions you have when using BDB tables:
-
BDBtables store in the `.db' file the path to the file as it was created. (This was done to be able to detect locks in a multi-user environment that supports symlinks). The effect of this is thatBDBtables are not movable between directories! -
When taking backups of
BDBtables, you have to either usemysqldumpor take a backup of alltable_name.dbfiles and theBDBlog files. TheBDBlog files are the files in the base data directory namedlog.XXXXXXXXXX(ten digits); TheBDBstorage engine stores unfinished transactions in the log files and requires these logs to be present whenmysqldstarts.
7.6.8 Errors That May Occur When Using BDB Tables
-
If you get the following error in the
hostname.err logwhen startingmysqld:bdb: Ignoring log file: .../log.XXXXXXXXXX: unsupported log version #
it means that the newBDBversion doesn't support the old log file format. In this case you have to delete allBDBlogs from your database directory (the files with names that have the formatlog.XXXXXXXXXX) and restartmysqld. We would also recommend you to do amysqldump --optof your oldBDBtables, delete the old tables, and restore the dump. -
If you are not running in auto-commit mode and delete a table that is referenced
in another transaction, you may get the following error messages in your MySQL
error log:
001119 23:43:56 bdb: Missing log fileid entry 001119 23:43:56 bdb: txn_abort: Log undo failed for LSN: 1 3644744: InvalidThis is not fatal but we don't recommend that you delete tables if you are not in auto-commit mode, until this problem is fixed (the fix is not trivial).
Go to the first, previous, next, last section, table of contents.
