Visit Our UNIX and Linux User Community


Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX


 
Thread Tools Search this Thread
Top Forums Programming Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX
# 1  
Old 06-16-2011
Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

Writing a Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

I have over the years come across the same issue a couple of times, and it normally is that the read speed on SAN is absolutely atrocious when doing non-sequential I/O to the disks. Problem being of course that most databases will be doing non-sequential I/O to disks, databases most common read process is db file sequential reads, which would not cause a sequential read of the actual blocks on the device.

My second issue is that it is normally tricky to segregate the different processes enough to be able to clearly test or even show the exact issue with the non-sequential reads and writes, so I end up in very lengthy discussions about possibilities of changing computing theory rather than actually changing the SAN to be able to handle these type of requests, and while that allows for very creative use of similes it is not very efficient use of my time, and I really have little need for more overtime.

So my thought was, how would one go about writing a utility that takes a file that opens a large file and read random blocks of data throughout the file, that way simulating the same effect in a controlled environment.

The general layout I was thinking is

Input for program

name [file to read] [block size] [number of reads]

set block size
set number of reads

get file size

open file

for I << number of reads
set random block address
read random block address from file

close


My problem is, how would I go about reading a random block address from a file.
And is there any way to get the time in milliseconds the operation took

And the POSIX bit, basically the systems I need to use this code on are locked down pretty heavily, and installing a new compiler is a couple of months worth of work, so I want a tool that will be able to be compiled on almost any old compiler.
# 2  
Old 06-16-2011
On Linux:
Code:
#!/bin/bash


for ((N=0; N<10; N++))
do
        dd if=gigabytefile of=/dev/null skip=$((RANDOM % 1024)) bs=$((1024*1024)) count=1
done

Code:
$ ./disktest.sh
$ ./disktest.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0916211 s, 11.4 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0867173 s, 12.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00175788 s, 597 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0166869 s, 62.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00172908 s, 606 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.018392 s, 57.0 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0261557 s, 40.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0181711 s, 57.7 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0152302 s, 68.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0144154 s, 72.7 MB/s
$

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 06-16-2011
Corona688: Absolutely brilliantly simple solution. Will try to see what would actually happen if I use that.

Thank you very much.
# 4  
Old 06-16-2011
No problem.

Unless you've got SSD's, disks are atrocious for random read in general. Reading sequential disk blocks(512 bytes) you can get 100+MB per second on a modern disk. Reading random blocks from that same modern disk, assuming a 15ms seek time, you get worst-case transfer rates in double-digit kilobytes per second. Until SSD's started becoming practical the usual way to overcome this was gigantic amounts of cache.
# 5  
Old 06-16-2011
Quote:
Originally Posted by Corona688
On Linux:
Code:
#!/bin/bash


for ((N=0; N<10; N++))
do
        dd if=gigabytefile of=/dev/null skip=$((RANDOM % 1024)) bs=$((1024*1024)) count=1
done

Code:
$ ./disktest.sh
$ ./disktest.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0916211 s, 11.4 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0867173 s, 12.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00175788 s, 597 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0166869 s, 62.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00172908 s, 606 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.018392 s, 57.0 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0261557 s, 40.1 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0181711 s, 57.7 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0152302 s, 68.8 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0144154 s, 72.7 MB/s
$

That's not really random IO from a single process. That's 1 MB of sequential IO from multiple processes, in series.

You really need something lower-level, something maybe like this:

Code:
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main( int argc, char **argv )
{
    int fd;
    off_t *offsets;
    struct stat sb;
    int numReads = 1024;

    // get page-sized buffer (Linux direct IO
    // fails unless IO requests are exact
    // multiples of page size)
    size_t readSize = sysconf( _SC_PAGESIZE );

    // get page-aligned buffer (Linux can't handle
    // direct IO unless it's page-aligned)
    char *buffer = valloc( readSize );

    // must actually touch the memory to create
    // the physical page mapping in the process
    // address space
    memset( buffer, 0, readSize );

    // get an array for offsets to read from, since
    // calling lrand48() during the read loop can
    // be slow enough to impact the results, especially
    // on fast devices
    offsets = ( off_t * ) calloc( numReads, sizeof( offsets[ 0 ] ) );

    // need to do direct IO to avoid page cache
#ifdef __linux
    fd = open( argv[ 1 ], O_RDONLY | O_DIRECT );
#else
    fd = open( argv[ 1 ], O_RDONLY );
#endif

#ifdef __sun
    directio( fd, DIRECTIO_ON );
#endif

    fstat( fd, &sb );

    for ( ii = 0; ii < numReads; ii++ )
    {
        // get a random offset that's no larger
        // than the file/device we're reading
        offsets[ ii ] = ( off_t ) lrand48();
        offsets[ ii ] <<= 32;
        offsets[ ii ] += ( off_t ) lrand48();
        offsets[ ii ] %= sb.st_size;
        // mask to get 512-byte offsets
        offsets[ ii ] &= ( off_t ) 0xFFFFFFFFFFFFFE00;
    }

    // do the reads
    // add code to get start time here
    for ( ii = 0; ii < numReads, ii++ )
    {
        pread( fd, buffer, readSize, offsets[ ii ] );
    }
    // add code to get finish time here, then
    // print out results
    
    return( 0 );
}

Compile that with "-m64" to get a 64-bit binary that can easily handle devices > 2GB, and run like this on Linux:

Code:
./RandomIOTest /dev/sda1

or this on Solaris:

Code:
./RandomIOtest /dev/rdsk/c1t2d4s2

Also of note, if you're reading small chunks (8K or so, most common page size), and your storage device has a large read-ahead setting, you'll get much slower performance than you'd otherwise expect as each 8K read can cause the disk controllers to read a whole lot more than 8K per read.

You can get some strange effects with high-speed disk systems. Try to malloc() a 1 GB buffer, and read into it from a high-speed storage system without actually setting the memory you malloc()'d to zero. The data can come in from disk faster than your system's virtual memory system can create the pages to put it into.

And then you can turn around and do something pathologically bad and get only a few KB/sec from that same storage.
# 6  
Old 06-17-2011
achenle: thank you very much, I hope you will not mind if I butcher that code a bit to suit my exact needs, will post the stuff once I've completed.

Thanks everyone who have responded. I'll be going for reading completely (OK, somewhat random blocks if we are going to be exact) random blocks in a fast loop, this as it will allow me to as closely as possible segregate the reads from any other activity (shell, file open ...) to simulate what happens when a database reads blocks of data all over the disk.

/Ben
# 7  
Old 06-17-2011
Have at it. Don't forget - that's meant to be 64-bit code. It might work if you compile t with large-file compile flags/defines, but I'm not sure as I pretty much do nothing but 64-bit code any more.

I did forget to call srand48() to seed the random number generator. FWIW, something like this would work:

Code:
srand48( time( NULL ) );

On Solaris I usually like this as running something like the above C code over and over in a script can result in sequential runs getting the same "random" sequence when using time() as the seed because the seed value is the same:

Code:
srand48( gethrtime() );

I don't recall offhand if Linux has gethrtime() or an equivalent.

I also didn't try to test or even compile that code. There's a good chance I missed include files and put in something utterly brain dead.

And the location of the read isn't technically random because using the % operator means the beginning of the file will have a slightly larger chance of being selected because of the way it causes the full 64-bit random offset to wrap. But unless the file/device is HUGE it won't really matter - even multiple petabytes won't make the difference measurable.

Previous Thread | Next Thread
Test Your Knowledge in Computers #456
Difficulty: Easy
An application programming interface (API) is a graphical-based interface for viewing client-server data.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sequential Reading from two file in a loop

Hello All, I have two files with me file1.txt and file2.txt file1.txt has: 333 222 111 file2.txt has ccc bbb aaa ccc is related to 333 only, bbb is related to 222 only and aaa is related to 111 only. I have to get the values from each of the file and pass them in the URL... (3 Replies)
Discussion started by: ankur328
3 Replies

2. Shell Programming and Scripting

Read directories sequential based on timestamp

Hi, I have a directory structure like below Directoryname create time d1 12:00 d2 12:05 d3 12:08 I want to read the directories based on timestamp.That is oldest directory must be read first and kick off certain process. ... (7 Replies)
Discussion started by: chetan.c
7 Replies

3. UNIX for Dummies Questions & Answers

Inserting a sequential number into a field on a flat file

I have a csv flatfile with a few million rows. I need to replace a field (field number is 85) in the file with a sequential number. As an example, let's assume there are only 4 fields in the file: A,A,,32 A,A,,27 A,B,,43 C,C,,354 If I wanted to amend the 3rd field in this way my... (2 Replies)
Discussion started by: BristolSmithy
2 Replies

4. Shell Programming and Scripting

sequential to line sequential

Hi I have a file sequential way i.e. written in contineous mode and the Record Seperator is AM from which the record is seperated .Now to process I have to make line sequential,and more over record length is not same it varies as per the input address, AM1234563 John Murray 24 Old streeet old... (5 Replies)
Discussion started by: vakharia Mahesh
5 Replies

5. Shell Programming and Scripting

Sequential comparison (one row with file and so on)

Dear linux experts, I'd like to ask for your support, I've read some posts in this forum about files comparison but haven't found what I'm looking for. I need to create a sequential script to compare row-by-row one file with 34 similar files but without success so far. This is what I get: ... (2 Replies)
Discussion started by: Gery
2 Replies

6. Shell Programming and Scripting

How to sca a sequential file and fetch some substring data from it

Hi, I have a task where i need to scan second column of seuential file and fetch first 3 digits of that column For e.g. FOLLOWING IS THE SAMPLE FOR MY SEQUENTIAL FILE AU_ID ACCT_NUM CRNCY_CDE THHSBC001 30045678 THB THHSBC001 10154267 THB THHSBC001 ... (2 Replies)
Discussion started by: manmeet
2 Replies

7. Shell Programming and Scripting

how to scan a sequential file to fetch some of the records?

Hi I am working on a script which needs to scan a sequential file and fetch the row where 2nd column = 'HUB' Can any one help me with this... Thanks (1 Reply)
Discussion started by: manmeet
1 Replies

8. Shell Programming and Scripting

Finding missing sequential file names

So, I've got a ton of files that I want to go through (ie something like 300,000), and they're all labeled sequentially. However I'm not 100% positive that they are all there. Is there any way of running through a sequence of numbers, checking if the file is in the folder, if not appending it... (2 Replies)
Discussion started by: Julolidine
2 Replies

9. Programming

Reading special characters while converting sequential file to line sequential

We have to convert a sequential file to a 80 char line sequential file (HP UX platform).The sequential file contains special characters. which after conversion of the file to line sequential are getting coverted into "new line" or "tab" and file is getting distorted. Is there any way to read these... (2 Replies)
Discussion started by: Rajeshsu
2 Replies

10. UNIX for Dummies Questions & Answers

inserting uniq sequential numbers at the start of the file

Hi Unix gurus, I have a file. I need to insert sequential number at the starting of the file. Fields are delimited by "|". I know the starting number. Example: File is as follows |123|4test|test |121|2test|test |x12|1test|test |vd123|5test|test starting number is : 120 ... (7 Replies)
Discussion started by: jingi1234
7 Replies

Featured Tech Videos