File alignment and performances... (difficult)

06-17-2009

Registered User

6, 0

Join Date: May 2009

Last Activity: 18 June 2009, 12:10 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

File alignment and performances... (difficult)

Hello !

I will use my best english possible to explain my objective. I'm french so pardon for the lack of precision...

So, what i would like to do in shell script (but you will possibly answer ''not possible in script'' have to use low level langage or something like that) is described below. All the blue is already done and working.

- from a list of files of the same size, sorted by their name. Their name are adjacent numbers so they are like : 0001, 0002, 0003 ect...
- copy them to one big file by concatenating each of them, one by one to one big file (doing that with dd and big block size because of fast I/O system and big file size)
- cut the bigfile into pieces to recreate original files...

Before explaining why the "split" command do not match for me let's precise the context of my objective...

I'm trying to raise disk I/O performance on some group of files by putting them near to each others physically on the hard drive. Those files are big files (about 10MB each) that have to be read in a sequence order (like 1 then 2 then 3 ...) and hard drive head movement when file1 is far from file2 cost a LOT of performance.
As it is not possible to change the physical address of a file on a storage device, objective is to ''bluff'' the OS filesystem : copying a lot of files into one big (thus filesystem will try to write one big file in adjacent sectors) file.
I don't want to grow this post too big but if you want more details i will give some with pleasure.

So, i don't want split command because it's copying from one source file to multi destination. As i said before, generating new files will allow filesystem to spread them all over the drive, and i loose performance again...

Would some other command could help ? Is it possible to cut one big file into piece by only generating new entries in inode table to be as fast as possible ?
Is there some other solution than script thinkable ?

Thanks a lot for your help and your ideas !
Have a good day !

-----Post Update-----

perhaps should i have posted this to filesystem & disk section ?

Gnaag

View Public Profile for Gnaag

Find all posts by Gnaag

06-17-2009

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Yes. You are describing partitioned database tables. Each partition has some common key - like a date or a filenumber. Whatever you choose. And you can then sprinkle the data across many disks and effectively 'parallelize' I/O - thereby having dozens of I/O requests being worked on at the same time, instead of sequentially from a single I/O request queue.

Oracle (or sybase or db2 or mssql) can access data in those kinds of datasets much faster than you will probably be able to emulate with your method. Even Microsoft got in the act with MSSQL

Partitioned Tables and Indexes in SQL Server 2005

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

06-18-2009

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

What file system type are you using? ext3? ext4?

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

06-18-2009

Registered User

6, 0

Join Date: May 2009

Last Activity: 18 June 2009, 12:10 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

i'm using CVFS (Quantum Stornext SAN shared FS).
But it's running like any other fs in my context :

someone asked some write
i look in my free inode table
i look for size of the file
i put the file in free inode
...

Gnaag

View Public Profile for Gnaag

Find all posts by Gnaag

UNIX for Advanced & Expert Users

File alignment and performances... (difficult)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting IDs from a text file into a sequence alignment file

Discussion started by: nans

2. Programming

Difficult in analyzing an algorithm

Discussion started by: sureshcisco

3. Shell Programming and Scripting

Difficult problem: Complex text file manipulation in bash script.

Discussion started by: tinman47

4. Solaris

Performances with RAID 5

Discussion started by: aribault

5. Linux

grep -f CPU performances

Discussion started by: tafil

6. UNIX for Dummies Questions & Answers

so difficult question about using grep

Discussion started by: shrimpy

7. Shell Programming and Scripting

A difficult script (for me)

Discussion started by: DNAx86

8. Solaris

difficult time differences

Discussion started by: Lokesha

9. AIX

ssa performances

Discussion started by: Javier Gutierre

10. UNIX for Advanced & Expert Users

Difficult Filtering Problem

Discussion started by: Piyush