Exadata – I/O statistics

In some cases you may need more detailed information about your I/O on specific cell node. You can use some documented tools and commands like: CellCLI>list activerequest, cellsrvstat, iostat and so on. cellsrvstat, iostat and so on. But in most of the cases you need more specific information about your I/O requests and you need to know what exactly happens on your cell node.
Fortunately cellsrv is providing quite good set of statistics which I’ll try to cover with few words in here.

This is a short list of the commands:
I/O latency statistics:
alter cell events="immediate cellsrv.cellsrv_dump('iolstats',0)";
alter cell events="immediate cellsrv.cellsrv_resetstats('iolstats')";

I/O latency statistics – last 384 sec:
alter cell events="immediate cellsrv.cellsrv_dump('iolhiststats',0)";

I/O reason statistics:
alter cell events="immediate cellsrv.cellsrv_dump('ioreasons',0)";
alter cell events="immediate cellsrv.cellsrv_resetstats('ioreasons')";

Basic I/O statistics:
alter cell events="immediate cellsrv.cellsrv_dump('devio_stats',0)";

Predicate I/O statistics:
alter cell events="immediate cellsrv.cellsrv_dump('predicateio',0)";

  • I/O latency statistics
  • The following command will generate a trace file with latency statistics for each celldisk on the current cell node:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_dump('iolstats',0)";
    Dump sequence #40 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell01/trace/svtrc_30850_2.trc
    Cell cell01 successfully altered
    
    CellCLI>
    

    Trace file content:

    IO length (bytes):          Num read IOs:       Num write IOs:
    [   8192 -   16383)                    34                 1879
    [  16384 -   32767)                  1824                 5048
    [  32768 -   65535)                     0                   26
    [  65536 -  131071)                     0                    9
    
    Average IO-latency distribution stats for CDisk FD_00_cell01
    
    Number of Reads iosize-latency distribution
    IO len(B)\IO lat(us) || [      128 | [      256 | [      512 | [     1024 | [     2048 |
                         ||       255) |       511) |      1023) |      2047) |      4095) |
    ---------------------||------------|------------|------------|------------|------------|
    [    8192,    16383) ||          3 |         23 |          0 |          7 |          1 |
    [   16384,    32767) ||        689 |        409 |          6 |        180 |        540 |
    
    Number of Writes iosize-latency distribution
    IO len(B)\IO lat(us) || [       64 | [      128 | [      256 | [      512 | [     1024 | [     2048 | [     4096 |
                         ||       127) |       255) |       511) |      1023) |      2047) |      4095) |      8191) |
    ---------------------||------------|------------|------------|------------|------------|------------|------------|
    [    8192,    16383) ||         70 |       1715 |         14 |          1 |         38 |         41 |          0 |
    [   16384,    32767) ||          0 |       2000 |        170 |         51 |        956 |       1870 |          1 |
    [   32768,    65535) ||          0 |         11 |         14 |          0 |          1 |          0 |          0 |
    [   65536,   131071) ||          0 |          0 |          8 |          0 |          1 |          0 |          0 |
    and so on.
    

    The first table is showing the I/O length and the number of reads/writes executed with such length on the specific celldisk.
    In the second and third table we can see I/O length and latency statistics.
    Now in the first table, on rownum 3rd we can see that we had 1824 reads with size between 16384 and 32767, then from the second table on rownum 14th we can see that 689 of those reads were accomplished between 128 and 255 us, other 409 reads were accomplished between 256 and 511 and so on.

    As far as I know there is no automatic algorithm for flushing latency statistics, so in some cases you may need to flush them manually first:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_resetstats('ioreasons')";
    Cell cell01 successfully altered
    
    CellCLI>
    

    Flushes your I/O latency statistics.

    Of course there is another more specific way to view those statistics. You can use the following command to generate a trace file with latency statistics for last 384 sec for each celldisk on the current cell node:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_dump('iolhiststats',0)";
    Dump sequence #41 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell01/trace/svtrc_30850_30.trc
    Cell cell01 successfully altered
    
    CellCLI>
    

    Trace file content:

    2013-10-28 19:20:13.087400*: Dump sequence #44:
    Dmwg - Disk Media Working Group ID
    numDmwgPeers - Number of Peers in DMWG
    numDmwgPeersFl - Number of Peers in DMWG which do not belong to the same failure group (Flash only)
    triggerConfine - triggered confinement now
    NRd - Number of reads completed
    NWr - Number of writes completed
    LRd - Average latency for reads (microseconds)
    LWr - Average latency for writes (microseconds)
    DLRd - Average latency for reads for the DMWG peers (microseconds)
    DLWr - Average latency for writes for the DMWG peers (microseconds)
    QRd - Average queue depth for reads
    QWr - Average queue depth for writes
    DQRd - Average queue depth for reads for the DMWG peers
    DQWr - Average queue depth for writes for the DMWG peers
    SRd - Average queue sumsize for reads (Bytes)
    SWr - Average queue sumsize for writes (Bytes)
    DSRd - Average queue sumsize for reads for the DMWG peers (Bytes)
    DSWr - Average queue sumsize for writes for the DMWG peers (Bytes)
    TimeIndex: 18 -- Dmwg numDmwgPeers numDmwgPeersFl triggerConfine --- 1 1 0 0
     IOBkt    NRd    NWr    LRd    LWr   DLRd   DLWr    QRd   QWr   DQRd   DQWr    SRd    SWr   DSRd   DSWr
         6      7     20   1389   1521      0      0      0      0      0      0   4096   4096      0      0
    TimeIndex: 17 -- Dmwg numDmwgPeers numDmwgPeersFl triggerConfine --- 1 1 0 0
     IOBkt    NRd    NWr    LRd    LWr   DLRd   DLWr    QRd   QWr   DQRd   DQWr    SRd    SWr   DSRd   DSWr
         6      6     20   1265   1699      0      0      0      0      0      0   4096   4915      0      0
    TimeIndex: 16 -- Dmwg numDmwgPeers numDmwgPeersFl triggerConfine --- 1 1 0 0
     IOBkt    NRd    NWr    LRd    LWr   DLRd   DLWr    QRd   QWr   DQRd   DQWr    SRd    SWr   DSRd   DSWr
         6      6     20   1239    930      0      0      0      0      0      0   4096   4096      0      0
    TimeIndex: 15 -- Dmwg numDmwgPeers numDmwgPeersFl triggerConfine --- 1 1 0 0
     IOBkt    NRd    NWr    LRd    LWr   DLRd   DLWr    QRd   QWr   DQRd   DQWr    SRd    SWr   DSRd   DSWr
         6     13     24    990    809  38549      0      0      0      0      0   4096   4096   4096      0
    TimeIndex: 14 -- Dmwg numDmwgPeers numDmwgPeersFl triggerConfine --- 1 1 0 0
     IOBkt    NRd    NWr    LRd    LWr   DLRd   DLWr    QRd   QWr   DQRd   DQWr    SRd    SWr   DSRd   DSWr
         6     10     24    814   1124   4387      0      0      0      0      0   5734   4778   4096      0
    and so on.
    

    At the beginning we can see that this trace is quite well documented. And even better, we have a TimeIndex from which we can calculate how old those statistics are.
    For example: TimeIndex:18/ sometime is 0 is current time, TimeIndex:17 is 32 sec before, TimeIndex:16 is 64 sec before and so on. Or in other words, for each device we have a total of 12 snaps, one on each 32 sec.


  • I/O reason statistics
  • The following command will create a trace file again, which will provide you information about the reason for each I/O executed on the current cell node:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_dump('ioreasons',0)";
    Dump sequence #48 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell01/trace/svtrc_30850_55.trc
    Cell cell01 successfully altered
    
    CellCLI>
    

    Trace file contents:

    Cache::dumpReasons           I/O Reason Table
    2013-10-28 19:53:51.544064*: Dump sequence #48:
    Cache::dumpReasons Reason                  Reads Writes
    Cache::dumpReasons ------------------------------------
    Cache::dumpReasons UNKNOWN                     0     11
    Cache::dumpReasons RedoLog Write               0   2138
    Cache::dumpReasons RedoLog Read                2      0
    Cache::dumpReasons Resilvering Write           0      1
    Cache::dumpReasons ControlFile Read         3792      0
    Cache::dumpReasons ControlFile Write           0   2393
    Cache::dumpReasons BufferCache Read         9303      0
    Cache::dumpReasons DataHeader Read            10      0
    Cache::dumpReasons DataHeader Write            0      9
    Cache::dumpReasons Datafile SeqWrite           0     85
    Cache::dumpReasons DBWR Aged Write             0    691
    Cache::dumpReasons Selftune Checkpoint Write      0   1899
    Cache::dumpReasons ASM CacheCleanup IO         0      6
    Cache::dumpReasons ASM Redo IO                 0      6
    Cache::dumpReasons ASM Heartbeat IO           10   1050
    Cache::dumpReasons OSD Header IO               0      1
    

    The information is quite obvious, for example we have 2138 redo log writes.

    To reset the I/O reason statistics you can use:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_resetstats('ioreasons')";
    Cell cell01 successfully altered
    
    CellCLI>
    

    I/O reason statistics can be quite useful in conjunction with I/O latency. For example you can reset both statistics /latency and reason/ and then track the latency and guess the reason.


  • Basic I/O statistics
  • This command will create a trace file with general I/O statistics for each device on the current node:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_dump('devio_stats',0)";
    Dump sequence #49 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell01/trace/svtrc_30850_28.trc
    Cell cell01 successfully altered
    
    CellCLI>
    

    Trace file content:

    [IOSTAT] Dump IO device stats for the last 1800 seconds
    [IOSTAT] Device - /dev/sde
    ServiceTime Latency AverageRQ numReads numWrites DMWG numDmwgPeers numDmwgPeersFl trigerConfine avgSrvcTimeDmwg avgSrvcTimeDmwgFl
    0.000000 0.000000 0 0 0 0 0 0 0 0.000000 0.000000
    0.000000 0.000000 0 0 0 1 1 0 0 0.000000 0.000000
    0.000000 0.000000 0 0 0 1 1 0 0 0.000000 0.000000
    and so on.
    

    Columns meaning:
    Dmwg - Disk Media Working Group ID
    numDmwgPeers - Number of Peers in DMWG
    numDmwgPeersFl - Number of Peers in DMWG which do not belong to the same failure group (Flash only)
    trigerConfine - triggered confinement now
    avgSrvcTimeDmwg - avg ServiceTime for Dmwg
    avgSrvcTimeDmwgFl- avg ServiceTime for Dmwg Flash

    From the trace file, we can see that those statistics are for the last 1800 sec. So cellsrv is creating snapshot on each 3 seconds or 600 snaps for each celldisk on this cell.


  • Predicate I/O statistics
  • With this command, the final one, we can dump some statistics about predicate I/O executed on this node:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_dump('predicateio',0)";
    Dump sequence #50 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell01/trace/svtrc_30850_53.trc
    Cell cell01 successfully altered
    
    CellCLI>
    

    Trace file output:

    PredicateIO 0x7fa5c134fe88 cell level stats
    Number of active predicate disks: 0 Hwm active predicate disks: 2
    Number of completed predicate disks: 12 Number of predicate disk uses: 14
    curFindJobIteration 10571077
    numPolls_Resource 198622590 numEmptyPolls_Resource 188055173
    numSuccessfulPolls_Resource 0 numUnsuccessfulPolls_Resource 10582561
    numCannotIssueMoreIOs 145
    numEmptyDestinationBuffersSent 0 numEmptyDestinationBuffersRequeued 10
    totalMapElementsReceived 427 totalMapWrites 14
    totalDiskReadJobs 64 totalDiskWriteJobs 2 totalFilterJobs 127
    totalPredicateCacheGetJobs 77
    totalPredicateCacheGetMisses 49 totalIORequestMisses 0
    totalPredicateCachePutJobs 14
    hwmDiskReadJobs 2 hwmDiskWriteJobs 1 hwmFilterJobs 4
    hwmPredicateCacheGetJobs 1 hwmPredicateCachePutJobs 1
    totalBufferAllocationFailures 0
    numFailuresToAllocateReads 0 numReadAllocationNotAttempted 24
    numFailuresToAllocateWrites 0 numWriteAllocationNotAttempted 0
    numFailuresToJobSchedFailDisable 0 numPredicateDataFilteringJobCalls 470
    numFailuresToAllocateFilters 0 numFilterAllocationNotAttempted 0
    totalBufferReserveFailures 0
    hwmDestinationBufferBytesUsed 1042424 avgDestinationBufferBytesUsed 563295
    numDestinationBuffersFull 37 numCompletedIOsEmpty 90
    numPtDataOnlyDestBuffers 0 numNonPtDataOnlyDestBuffers 0
    numIOErrors 0 numInternalErrors 0 numVersionErrors 0
    avgMetadataLength 1592 hwmMetadataLength 1808 hwmFmetadataLength 68
    myPredicateTracking 0x7fa5c134fca8
    predicateDiskSlots 0x7fa5c132a3d8 predicateDiskNumSlots 1320 nextPredDiskSlotToPoll 2
    nextPredDiskSlotForBufEviction 0 predDiskSlotHighest 0
    numEvictionsCalls 0 totalWastedUnfilteredIO 0 totalWastedFilteredIO 0
    totalNumIOsCompletedForCC 0 maxCoreExecs 7
    numMapElementsDontHaveDataFiltered 127 numMapElemStoridxHasWrongResult 0
    totalSmartScanIOsCompleted 264
    totalSmartScanIOsCompletedFromFlash 0
    totalSmartScanIOsCompletedFromFlashAndDisk 0
    flashCacheReadBytes 0 flashCacheWriteBytes 0
    hardDiskReadBytes 529793024 hardDiskWriteBytes 10485760
    numFlashCacheWriteIOs 0 flashCacheHitRatio 0
    numDeferredDiskDeletions 0 numDeferredDiskDeletions 0
    numMapElementQuarantined 0
    numFiltersExecsInPassThru 0 numFiltersExecsInPassThruFromCPU 0
    numPrefetchLimitReached 3
    numCorruptBlocks 0 numCorruptBlockRequeues 0
    totalToBeIssuedIOSize 434765824 totalIOIssuedIOSize 431620096
    totalFlashPopulationIOSize 0 totalIOCompletedIOSize 445530112
    totalInFilteringIOSize  442318848 totalFilteringCompletedIOSize 428408832
    totalFilteredOutputSize 42743000 filterCpuPassThruOutputSize 0
    passThruOutputSize 0 totalInSendCompletedIOSize 404357120
    totalInSendFilteredOutputSize 34470664
    curToBeIssuedIOSize 0 curIOIssuedIOSize 0
    curFlashPopulationIOSize 0 curIOCompletedIOSize 0
    curInFilteringIOSize 0 curFilteringCompletedIOSize 0
    curFilteredOutputSize 0 curInSendCompletedIOSize 0
    curInSendFilteredOutputSize 0
    .
    <output omitted>
    .
    IO-request-sizes histogram
            <            1 KB-num IOs: 0
            <            2 KB-num IOs: 0
            <            4 KB-num IOs: 0
            <            8 KB-num IOs: 0
            <           16 KB-num IOs: 1
            <           32 KB-num IOs: 0
            <           64 KB-num IOs: 0
            <          128 KB-num IOs: 0
            <          256 KB-num IOs: 13
            <          512 KB-num IOs: 0
            <         1024 KB-num IOs: 0
            <         2048 KB-num IOs: 14
            <         4096 KB-num IOs: 13
            <         8192 KB-num IOs: 93
            <        16384 KB-num IOs: 0
            <     infinity KB-num IOs: 0
    and so on.
    

    Yes, a bit messy, but still useful. Predicate I/O means offloaded data processing with data filtering on cell side. So from here we can gather a lot of useful information, but the most useful in this section are highlighted.

    Also you can reset the predicate I/O statistics with:

    CellCLI> alter cell events="immediate cellsrv.cellsrv_resetstats('predicateio')";
    Cell cell01 successfully altered
    
    CellCLI>
    
    

    Cheers,
    Kovachev

    Advertisements

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s