本文共 14746 字,大约阅读时间需要 49 分钟。
在5.7中Innodb异步I/O的内存结构发生了一些变化特别是异步I/O数组和以前的结构体不同变为了类叫做AIO类但是换汤不换药只是将一些方法进行了封装,而异步i/o实际的请求放到了Slot结构体它们分别对应了5.6 os_aio_array_t和os_aio_slot_t,这里不准备详细介绍每一个属性的含义,因为在内核月报中淘宝已经给出,5.7基本也是一样的连接如下:
同时我这里也是讨论关于Linux native的部分对于innodb自己模拟的异步I/O不做分析,因为用得不多,并且自己能力也有限。但是这里还是需要明确几个概念mysql> show variables like '%io_threads%';+-------------------------+-------+| Variable_name | Value |+-------------------------+-------+| innodb_read_io_threads | 2 || innodb_write_io_threads | 2 |+-------------------------+-------+
我在本数据中实际设置了2个read 异步i/o线程和2个write异步I/O线程此外都包含一个log和ibuf异步I/O线程在数据库中我们也可以查询到这6个异步I/O线程
mysql> select a.thd_id,b.THREAD_OS_ID,a.user ,a.conn_id,b.TYPE,a.source,a.program_name from sys.processlist a,performance_schema.threads b where b.thread_id=a.thd_id and user like '%io%';+--------+--------------+------------------------+---------+------------+--------------------+--------------+| thd_id | THREAD_OS_ID | user | conn_id | TYPE | source | program_name |+--------+--------------+------------------------+---------+------------+--------------------+--------------+| 3 | 14059 | innodb/io_ibuf_thread | NULL | BACKGROUND | NULL | NULL || 4 | 14060 | innodb/io_log_thread | NULL | BACKGROUND | sync0debug.cc:1296 | NULL || 5 | 14061 | innodb/io_read_thread | NULL | BACKGROUND | NULL | NULL || 6 | 14062 | innodb/io_read_thread | NULL | BACKGROUND | NULL | NULL || 7 | 14063 | innodb/io_write_thread | NULL | BACKGROUND | sync0debug.cc:1296 | NULL || 8 | 14064 | innodb/io_write_thread | NULL | BACKGROUND | NULL | NULL |+--------+--------------+------------------------+---------+------------+--------------------+--------------+
首先一个AIO类对应了一个类型的异步,比如ibuf/log/read/write都对应一个AIO类,并且在类的最后用一个类的静态全局成员进行指向如下:
/** Insert buffer */ static AIO* s_ibuf; /** Redo log */ static AIO* s_log; /** Reads */ static AIO* s_reads; /** Writes */ static AIO* s_writes;
而我们的异步I/O线程实际上有6个也就是s_reads包含了2个线程/s_writes包含了2个线程,那么线程引入了一个叫做local segment的概念,实际上每一个线程对应了一个local segment,而在AIO下面挂的就是一个Slot的vertor数组,数组的大小和每种类型的线程个数(local segment)和每个线程最大的Slot有关,看源码中对最大的Slot的定义如下:
8 * OS_AIO_N_PENDING_IOS_PER_THREAD
其中宏定义OS_AIO_N_PENDING_IOS_PER_THREAD=32
那么对于s_ibuf和s_log因为只有一个线程(local segment)那么就有256个Slot,而s_reads和s_writes当前我的数据库各有2个线程(local segment)那么就有2*256=512个Slot.
如果没有显示指定本文所有segment均指local segment
整个内存结构的初始化是从由innobase_start_or_create_for_mysql调用的下面代码开始的如下:
if (!os_aio_init(srv_n_read_io_threads, srv_n_write_io_threads, SRV_MAX_N_PENDING_SYNC_IOS)) { ib::error() << "Cannot initialize AIO sub-system"; return(srv_init_abort(DB_ERROR)); }
实际上就是调用了os_aio_init,接下来我们来进行逐层的分析
本数据库调用栈帧:
#0 os_aio_init (n_readers=2, n_writers=2, n_slots_sync=100) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6734#1 0x0000000001b8dde1 in innobase_start_or_create_for_mysql () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0start.cc:1792
源码及注释如下:
boolos_aio_init( ulint n_readers, ulint n_writers, ulint n_slots_sync){ /* Maximum number of pending aio operations allowed per segment */ ulint limit = 8 * OS_AIO_N_PENDING_IOS_PER_THREAD; //这里我们发现了limit的定义也就是 //一个线程(local segments)包含的slot个数及256..... return(AIO::start(limit, n_readers, n_writers, n_slots_sync));}
这个函数的主要功能就是调用AIO::start下面我们进行学习。
本数据库调用栈帧:
#1 0x0000000001a7db18 in AIO::start (n_per_seg=256, n_readers=2, n_writers=2, n_slots_sync=100) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6610#2 0x0000000001a7e289 in os_aio_init (n_readers=2, n_writers=2, n_slots_sync=100) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6762
源码及注释如下:
boolAIO::start( ulint n_per_seg, //每个segment的slot数量 ulint n_readers, //多少个异步读I/O线程 ulint n_writers, //多少个异步写I/O线程 ulint n_slots_sync){#if defined(LINUX_NATIVE_AIO) /* Check if native aio is supported on this system and tmpfs */ //这里根据参数innodb_use_native_aio设置和是否支持native aio测试进行综合判断,即便 //参数设置为ON,但是不支持libaio也会将srv_use_native_aio设置为FLASE if (srv_use_native_aio && !is_linux_native_aio_supported()) { ib::warn() << "Linux Native AIO disabled.";//这里出现一个经常看到警告信息native aio不可用 srv_use_native_aio = FALSE; }#endif /* LINUX_NATIVE_AIO */ srv_reset_io_thread_op_info(); /* 重置线程状态设置为not start */ /* 这里开始初始化AIO read线程 我这里设置是 nreaders=2个read线程 n_per_seg=每个线程 256个slot */ s_reads = create( LATCH_ID_OS_AIO_READ_MUTEX, n_readers * n_per_seg, n_readers); if (s_reads == NULL) { return(false); } /* 这里进行只读检测 如果是只读则log 和 ibuf 异步线程不启用 由逻辑srv_read_only_mode ? 0 : 2 进行控制 下面主要开始设置线程的名字如果没有log和ibuf则 readers从下标0开始否则从下标2开始 */ Array of English strings describing the current state of an i/o handler thread */const char* srv_io_thread_op_info[SRV_MAX_N_IO_THREADS];const char* srv_io_thread_function[SRV_MAX_N_IO_THREADS]; */ ulint start = srv_read_only_mode ? 0 : 2; ulint n_segs = n_readers + start; /* 0 is the ibuf segment and 1 is the redo log segment. */ for (ulint i = start; i < n_segs; ++i) { ut_a(i < SRV_MAX_N_IO_THREADS); srv_io_thread_function[i] = "read thread"; } ulint n_segments = n_readers;/* 如果没有设置只读,这里开始不会初始化ibuf和log aio结构 */ if (!srv_read_only_mode) { s_ibuf = create(LATCH_ID_OS_AIO_IBUF_MUTEX, n_per_seg, 1); if (s_ibuf == NULL) { return(false); } ++n_segments; srv_io_thread_function[0] = "insert buffer thread"; s_log = create(LATCH_ID_OS_AIO_LOG_MUTEX, n_per_seg, 1); if (s_log == NULL) { return(false); } ++n_segments; srv_io_thread_function[1] = "log thread"; } else { s_ibuf = s_log = NULL; }/* 依然是一样的方式初始化write thread AIO结构 */ s_writes = create( LATCH_ID_OS_AIO_WRITE_MUTEX, n_writers * n_per_seg, n_writers); if (s_writes == NULL) { return(false); } n_segments += n_writers; //这里我们得到最终的GLOBAL SEGMENT = 2+1+1+2 = 6 for (ulint i = start + n_readers; i < n_segments; ++i) { ut_a(i < SRV_MAX_N_IO_THREADS); srv_io_thread_function[i] = "write thread"; } ut_ad(n_segments >= static_cast(srv_read_only_mode ? 2 : 4)); s_sync = create(LATCH_ID_OS_AIO_SYNC_MUTEX, n_slots_sync, 1);/*这个线程功能还需要看看n_slots_sync=100 */ if (s_sync == NULL) { return(false); } os_aio_n_segments = n_segments; //=6 os_aio_validate(); //开始分配event,他是cond和mutex的封装 os_aio_segment_wait_events = static_cast ( ut_zalloc_nokey( n_segments * sizeof *os_aio_segment_wait_events)); //这里分配n_segments 个数的条件变量内存这里是6个/* Array of events used in simulated AIOstatic os_event_t* os_aio_segment_wait_events = NULL; 最后指针给了这样一个内部全局静态变量*/ if (os_aio_segment_wait_events == NULL) { return(false); }/* 对其进行初始化这里我们明确的看出每一个SEGMENTS 也就是每一个线程都对应一个条件变量和MUTEX 他的具体作用和模拟异步I/O有关如AIO::wake_simulated_handler_thread调用*/ for (ulint i = 0; i < n_segments; ++i) { os_aio_segment_wait_events[i] = os_event_create(0); } os_last_printout = ut_time(); return(true);}
这里我们可以看到实际上还是调用核心函数AIO::create,在AIO::create调用成果后就将各个类型的AIO对象的指针赋予给了几个类静态全局指针用于后面调用
本数据库调用栈帧:
#0 AIO::create (id=LATCH_ID_OS_AIO_READ_MUTEX, n=512, n_segments=2) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6529#1 0x0000000001a7db18 in AIO::start (n_per_seg=256, n_readers=2, n_writers=2, n_slots_sync=100) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6610
源码及注释如下:
AIO*AIO::create( latch_id_t id, ulint n, //某个类型AIO对象应该包含的SLOT数量 ulint n_segments) //线程数量(segment){ if ((n % n_segments)) { //这里先做了一个保障校验n是否是n_segments的倍数 ib::error() << "Maximum number of AIO operations must be " << "divisible by number of segments"; return(NULL); } AIO* array = UT_NEW_NOKEY(AIO(id, n, n_segments)); //功能1、调用构造函数AIO(id, n, n_segments) if (array != NULL && array->init() != DB_SUCCESS) { //功能2、调用array->init() UT_DELETE(array); array = NULL; } return(array);}
我们发现本函数有2个主要功能
首先来看AIO(id, n, n_segments)
本数据库调用栈帧:
#0 AIO::AIO (this=0x32ea658, id=LATCH_ID_OS_AIO_READ_MUTEX, n=512, segments=2) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6396#1 0x0000000001a7d862 in AIO::create (id=LATCH_ID_OS_AIO_READ_MUTEX, n=512, n_segments=2) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6538
源码及注释如下:
AIO::AIO( latch_id_t id, ulint n, //某个类型AIO对象应该包含的SLOT数量 ulint segments)//线程数量(segment) : m_slots(n), //定义出多少个slot这里是512个因为我有2个io_read线程每个线程256个slot,分配内存 m_n_segments(segments), //多少个segments 我是2个io_read也就是2个 m_n_reserved(),//设置为0 m_aio_ctx(),//设置为NULL m_events(m_slots.size()) //完成events数组大小设置为slot的个数{.... mutex_create(id, &m_mutex); //根据传入的ID建立mutex 本MUTEX 保护多个线程同时使用本数组 m_not_full = os_event_create("aio_not_full");//建立所谓的event,在这个event中封装了条件变量cond和mutex m_is_empty = os_event_create("aio_is_empty");//建立所谓的event,在这个event中封装了条件变量cond和mutex memset(&m_slots[0], 0x0, sizeof(m_slots[0]) * m_slots.size());//将整个slot内存空间全部清0#ifdef LINUX_NATIVE_AIO memset(&m_events[0], 0x0, sizeof(m_events[0]) * m_events.size());//将整个events内存空间全部清0,他就是io_getevents调用需要的#endif /* LINUX_NATIVE_AIO */ os_event_set(m_is_empty); //通过brocast唤醒,所有堵塞在m_is_empty上的线程进行处理}
经过本函数我们发现在AIO这个结构体中的成员基本都进行了初始化
m_slots/m_n_segments/m_n_reserved/m_aio_ctx/m_events/m_mutex/m_not_full/m_is_empty 只是这些某些还没有意义比如m_aio_ctx接下来我们看第二个功能 AIO::init()
本数据库调用栈帧:
#0 AIO::init (this=0x32ea658) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6493#1 0x0000000001a7d8a2 in AIO::create (id=LATCH_ID_OS_AIO_READ_MUTEX, n=512, n_segments=2) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6540
源码及注释如下:
/** Initialise the array */dberr_tAIO::init(){ ut_a(!m_slots.empty());//这里断言是否为空,不可能为空除非遇到故障#ifdef _WIN32 ut_a(m_handles == NULL); m_handles = UT_NEW_NOKEY(Handles(m_slots.size()));#endif /* _WIN32 */ if (srv_use_native_aio) { //这个并非参数设置而是前面说的参数设置和innodb检测是否支持native aio的综合考虑#ifdef LINUX_NATIVE_AIO dberr_t err = init_linux_native_aio();//功能1 如果开启了innodb_use_native_aio参数并且支持native aio进行调用init_linux_native_aio() if (err != DB_SUCCESS) { return(err); }#endif /* LINUX_NATIVE_AIO */ } return(init_slots()); //功能2 调用init_slots()初始化slot内存结构
我们发现本函数有2个主要功能
下面我们先看看AIO::init_linux_native_aio()调用
本数据库调用栈帧:
#0 AIO::init_linux_native_aio (this=0x32ea658) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6460#1 0x0000000001a7d701 in AIO::init (this=0x32ea658) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6503
源码及注释如下:
/** Initialise the Linux Native AIO interface */dberr_tAIO::init_linux_native_aio(){ /* Initialize the io_context array. One io_context per segment in the array. */ ut_a(m_aio_ctx == NULL); m_aio_ctx = static_cast( ut_zalloc_nokey(m_n_segments * sizeof(*m_aio_ctx)));//到这里了我们知道AIO类的属性已经进行了初始化m_n_segments就是本AIO对象包含的线程数量 //ibuf和redo异步为1个,write和read异步线程本系统为2个这由参数控制,这里为io_context_t //也就是一个线程对应一个io_context_t这是Linux native aio必须的 if (m_aio_ctx == NULL) { return(DB_OUT_OF_MEMORY); } io_context** ctx = m_aio_ctx; ulint max_events = slots_per_segment(); //这里返回每个线程最大的event个数用于初始化io_context_t结构体就是return(m_slots.size() / m_n_segments) //及256个 for (ulint i = 0; i < m_n_segments; ++i, ++ctx) { //进行初始化对每个线程的io_context_t调用io_steup进行初始化其队列最大event个数为 //return(m_slots.size() / m_n_segments); 也就是256 if (!linux_create_io_ctx(max_events, ctx)) { //linux_create_io_ctx 主要功能就是初始化io_context_t /* If something bad happened during aio setup we should call it a day and return right away. We don't care about any leaks because a failure to initialize the io subsystem means that the server (or atleast the innodb storage engine) is not going to startup. */ return(DB_IO_ERROR); } } return(DB_SUCCESS); //最后返回成功,这样io_context_t也就是AIO结构体中的m_aio_ctx得到了初始化}
到这里io_context_t已经分配,接下来调用AIO::init_slots()我们进行分析
本数据库调用栈帧:
#0 AIO::init_slots (this=0x32ea658) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6421#1 0x0000000001a7d71b in AIO::init (this=0x32ea658) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/os/os0file.cc:6512
源码及注释如下:
/** Initialise the slots */dberr_tAIO::init_slots(){ for (ulint i = 0; i < m_slots.size(); ++i) { Slot& slot = m_slots[i]; //使用引用指向第i个元素 slot.pos = static_cast(i); //分别初始化为0到m_slots.size()也就是每个slot进行了编号编号为pos slot.is_reserved = false; //初始化#ifdef WIN_ASYNC_IO //下面是WINDOW的处理不分析 slot.handle = CreateEvent(NULL, TRUE, FALSE, NULL); OVERLAPPED* over = &slot.control; over->hEvent = slot.handle; (*m_handles)[i] = over->hEvent;#elif defined(LINUX_NATIVE_AIO) slot.ret = 0; //ret设置0 slot.n_bytes = 0; //n_bytes设置为0 memset(&slot.control, 0x0, sizeof(slot.control)); //这里对iocb结构进行清0操作,其实在AIO::AIO中已经清0了因为iocb并不是指针而是实际的内存空间#endif /* WIN_ASYNC_IO */ } return(DB_SUCCESS);}
到这里Linux native 异步I/O的iocb已经分配个数为256*segment 并且初始化0完成。
如下一张简图表示了初始化完成后的内存图:
初始化完成后接下来就是如何调用了,这个还需要仔细的分析。再开一篇文章进行分析。
那么最后写一下如果要用Linux native AIO 需要满足的条件缺一不可,这些条件很容易达到
作者微信:
转载地址:http://nlbel.baihongyu.com/