Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

06-16-2011

Registered User

5, 0

Join Date: Aug 2006

Last Activity: 29 June 2011, 6:53 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

Writing a Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

I have over the years come across the same issue a couple of times, and it normally is that the read speed on SAN is absolutely atrocious when doing non-sequential I/O to the disks. Problem being of course that most databases will be doing non-sequential I/O to disks, databases most common read process is db file sequential reads, which would not cause a sequential read of the actual blocks on the device.

My second issue is that it is normally tricky to segregate the different processes enough to be able to clearly test or even show the exact issue with the non-sequential reads and writes, so I end up in very lengthy discussions about possibilities of changing computing theory rather than actually changing the SAN to be able to handle these type of requests, and while that allows for very creative use of similes it is not very efficient use of my time, and I really have little need for more overtime.

So my thought was, how would one go about writing a utility that takes a file that opens a large file and read random blocks of data throughout the file, that way simulating the same effect in a controlled environment.

The general layout I was thinking is

Input for program

name [file to read] [block size] [number of reads]

set block size
set number of reads

get file size

open file

for I << number of reads
set random block address
read random block address from file

close

My problem is, how would I go about reading a random block address from a file.
And is there any way to get the time in milliseconds the operation took

And the POSIX bit, basically the systems I need to use this code on are locked down pretty heavily, and installing a new compiler is a couple of months worth of work, so I want a tool that will be able to be compiled on almost any old compiler.

vrghost

View Public Profile for vrghost

Find all posts by vrghost

06-16-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

On Linux:

Code:

#!/bin/bash


for ((N=0; N<10; N++))
do
        dd if=gigabytefile of=/dev/null skip=$((RANDOM % 1024)) bs=$((1024*1024)) count=1
done

Code:

$ ./disktest.sh
$ ./disktest.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0916211 s, 11.4 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0867173 s, 12.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00175788 s, 597 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0166869 s, 62.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00172908 s, 606 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.018392 s, 57.0 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0261557 s, 40.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0181711 s, 57.7 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0152302 s, 68.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0144154 s, 72.7 MB/s
$

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

06-16-2011

Registered User

5, 0

Join Date: Aug 2006

Last Activity: 29 June 2011, 6:53 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Corona688: Absolutely brilliantly simple solution. Will try to see what would actually happen if I use that.

Thank you very much.

vrghost

View Public Profile for vrghost

Find all posts by vrghost

06-16-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

No problem.

Unless you've got SSD's, disks are atrocious for random read in general. Reading sequential disk blocks(512 bytes) you can get 100+MB per second on a modern disk. Reading random blocks from that same modern disk, assuming a 15ms seek time, you get worst-case transfer rates in double-digit kilobytes per second. Until SSD's started becoming practical the usual way to overcome this was gigantic amounts of cache.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

06-16-2011

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

Quote:

Originally Posted by Corona688

On Linux:

Code:

#!/bin/bash


for ((N=0; N<10; N++))
do
        dd if=gigabytefile of=/dev/null skip=$((RANDOM % 1024)) bs=$((1024*1024)) count=1
done

Code:

$ ./disktest.sh
$ ./disktest.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0916211 s, 11.4 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0867173 s, 12.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00175788 s, 597 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0166869 s, 62.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00172908 s, 606 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.018392 s, 57.0 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0261557 s, 40.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0181711 s, 57.7 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0152302 s, 68.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0144154 s, 72.7 MB/s
$

That's not really random IO from a single process. That's 1 MB of sequential IO from multiple processes, in series.

You really need something lower-level, something maybe like this:

Code:

#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main( int argc, char **argv )
{
    int fd;
    off_t *offsets;
    struct stat sb;
    int numReads = 1024;

    // get page-sized buffer (Linux direct IO
    // fails unless IO requests are exact
    // multiples of page size)
    size_t readSize = sysconf( _SC_PAGESIZE );

    // get page-aligned buffer (Linux can't handle
    // direct IO unless it's page-aligned)
    char *buffer = valloc( readSize );

    // must actually touch the memory to create
    // the physical page mapping in the process
    // address space
    memset( buffer, 0, readSize );

    // get an array for offsets to read from, since
    // calling lrand48() during the read loop can
    // be slow enough to impact the results, especially
    // on fast devices
    offsets = ( off_t * ) calloc( numReads, sizeof( offsets[ 0 ] ) );

    // need to do direct IO to avoid page cache
#ifdef __linux
    fd = open( argv[ 1 ], O_RDONLY | O_DIRECT );
#else
    fd = open( argv[ 1 ], O_RDONLY );
#endif

#ifdef __sun
    directio( fd, DIRECTIO_ON );
#endif

    fstat( fd, &sb );

    for ( ii = 0; ii < numReads; ii++ )
    {
        // get a random offset that's no larger
        // than the file/device we're reading
        offsets[ ii ] = ( off_t ) lrand48();
        offsets[ ii ] <<= 32;
        offsets[ ii ] += ( off_t ) lrand48();
        offsets[ ii ] %= sb.st_size;
        // mask to get 512-byte offsets
        offsets[ ii ] &= ( off_t ) 0xFFFFFFFFFFFFFE00;
    }

    // do the reads
    // add code to get start time here
    for ( ii = 0; ii < numReads, ii++ )
    {
        pread( fd, buffer, readSize, offsets[ ii ] );
    }
    // add code to get finish time here, then
    // print out results
    
    return( 0 );
}

Compile that with "-m64" to get a 64-bit binary that can easily handle devices > 2GB, and run like this on Linux:

Code:

./RandomIOTest /dev/sda1

or this on Solaris:

Code:

./RandomIOtest /dev/rdsk/c1t2d4s2

Also of note, if you're reading small chunks (8K or so, most common page size), and your storage device has a large read-ahead setting, you'll get much slower performance than you'd otherwise expect as each 8K read can cause the disk controllers to read a whole lot more than 8K per read.

You can get some strange effects with high-speed disk systems. Try to malloc() a 1 GB buffer, and read into it from a high-speed storage system without actually setting the memory you malloc()'d to zero. The data can come in from disk faster than your system's virtual memory system can create the pages to put it into.

And then you can turn around and do something pathologically bad and get only a few KB/sec from that same storage.

achenle

View Public Profile for achenle

Find all posts by achenle

06-17-2011

Registered User

5, 0

Join Date: Aug 2006

Last Activity: 29 June 2011, 6:53 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

achenle: thank you very much, I hope you will not mind if I butcher that code a bit to suit my exact needs, will post the stuff once I've completed.

Thanks everyone who have responded. I'll be going for reading completely (OK, somewhat random blocks if we are going to be exact) random blocks in a fast loop, this as it will allow me to as closely as possible segregate the reads from any other activity (shell, file open ...) to simulate what happens when a database reads blocks of data all over the disk.

/Ben

vrghost

View Public Profile for vrghost

Find all posts by vrghost

06-17-2011

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

Have at it. Don't forget - that's meant to be 64-bit code. It might work if you compile t with large-file compile flags/defines, but I'm not sure as I pretty much do nothing but 64-bit code any more.

I did forget to call srand48() to seed the random number generator. FWIW, something like this would work:

Code:

srand48( time( NULL ) );

On Solaris I usually like this as running something like the above C code over and over in a script can result in sequential runs getting the same "random" sequence when using time() as the seed because the seed value is the same:

Code:

srand48( gethrtime() );

I don't recall offhand if Linux has gethrtime() or an equivalent.

I also didn't try to test or even compile that code. There's a good chance I missed include files and put in something utterly brain dead.

And the location of the read isn't technically random because using the % operator means the beginning of the file will have a slightly larger chance of being selected because of the way it causes the full 64-bit random offset to wrap. But unless the file/device is HUGE it won't really matter - even multiple petabytes won't make the difference measurable.

achenle

View Public Profile for achenle

Find all posts by achenle

Programming

Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sequential Reading from two file in a loop

Discussion started by: ankur328

2. Shell Programming and Scripting

Read directories sequential based on timestamp

Discussion started by: chetan.c

3. UNIX for Dummies Questions & Answers

Inserting a sequential number into a field on a flat file

Discussion started by: BristolSmithy

4. Shell Programming and Scripting

sequential to line sequential

Discussion started by: vakharia Mahesh

5. Shell Programming and Scripting

Sequential comparison (one row with file and so on)

Discussion started by: Gery

6. Shell Programming and Scripting

How to sca a sequential file and fetch some substring data from it

Discussion started by: manmeet

7. Shell Programming and Scripting

how to scan a sequential file to fetch some of the records?

Discussion started by: manmeet

8. Shell Programming and Scripting

Finding missing sequential file names

Discussion started by: Julolidine

9. Programming

Reading special characters while converting sequential file to line sequential

Discussion started by: Rajeshsu

10. UNIX for Dummies Questions & Answers

inserting uniq sequential numbers at the start of the file

Discussion started by: jingi1234