File alignment and performances... (difficult)


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users File alignment and performances... (difficult)
# 1  
Old 06-17-2009
File alignment and performances... (difficult)

Hello !

I will use my best english possible to explain my objective. I'm french so pardon for the lack of precision...

So, what i would like to do in shell script (but you will possibly answer ''not possible in script'' have to use low level langage or something like that) is described below. All the blue is already done and working.

- from a list of files of the same size, sorted by their name. Their name are adjacent numbers so they are like : 0001, 0002, 0003 ect...
- copy them to one big file by concatenating each of them, one by one to one big file (doing that with dd and big block size because of fast I/O system and big file size)

- cut the bigfile into pieces to recreate original files...

Before explaining why the "split" command do not match for me let's precise the context of my objective...

I'm trying to raise disk I/O performance on some group of files by putting them near to each others physically on the hard drive. Those files are big files (about 10MB each) that have to be read in a sequence order (like 1 then 2 then 3 ...) and hard drive head movement when file1 is far from file2 cost a LOT of performance.
As it is not possible to change the physical address of a file on a storage device, objective is to ''bluff'' the OS filesystem : copying a lot of files into one big (thus filesystem will try to write one big file in adjacent sectors) file.
I don't want to grow this post too big but if you want more details i will give some with pleasure.

So, i don't want split command because it's copying from one source file to multi destination. As i said before, generating new files will allow filesystem to spread them all over the drive, and i loose performance again...

Would some other command could help ? Is it possible to cut one big file into piece by only generating new entries in inode table to be as fast as possible ?
Is there some other solution than script thinkable ?

Thanks a lot for your help and your ideas !
Have a good day !

-----Post Update-----

perhaps should i have posted this to filesystem & disk section ?
# 2  
Old 06-17-2009
Yes. You are describing partitioned database tables. Each partition has some common key - like a date or a filenumber. Whatever you choose. And you can then sprinkle the data across many disks and effectively 'parallelize' I/O - thereby having dozens of I/O requests being worked on at the same time, instead of sequentially from a single I/O request queue.

Oracle (or sybase or db2 or mssql) can access data in those kinds of datasets much faster than you will probably be able to emulate with your method. Even Microsoft got in the act with MSSQL

Partitioned Tables and Indexes in SQL Server 2005
# 3  
Old 06-18-2009
What file system type are you using? ext3? ext4?
# 4  
Old 06-18-2009
i'm using CVFS (Quantum Stornext SAN shared FS).
But it's running like any other fs in my context :

someone asked some write
i look in my free inode table
i look for size of the file
i put the file in free inode
...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting IDs from a text file into a sequence alignment file

Hi, I have one file with one column and several hundred entries File1: NA1 NA2 NA3And now I need to run a command within a mapping aligner tool to insert these sample names into a sequence alignment file (SAM) such that they look like this @RG ID:Library1 SM:NA1 PL:Illumina ... (7 Replies)
Discussion started by: nans
7 Replies

2. Programming

Difficult in analyzing an algorithm

Hello, I was reading Heuritics text and came across an algorithm below. Finding hard to analyze it can any one help me out below... How to analyze if I take say no. of types are 5 and each type has say 20 coins. thanks. Let {c1, c2...cn=1} be a set of distinct coin types where ci is... (1 Reply)
Discussion started by: sureshcisco
1 Replies

3. Shell Programming and Scripting

Difficult problem: Complex text file manipulation in bash script.

I don't know if this is a big issue or not, but I'm having difficulties. I apoligize for the upcoming essay :o. I'm writing a script, similar to a paint program that edits images, but in the form of ANSI block characters. The program so far is working. I managed to save the image into a file,... (14 Replies)
Discussion started by: tinman47
14 Replies

4. Solaris

Performances with RAID 5

Hello every body, Maybe someone could help me. I have a SUN Server with 6 disks, each of 150 Gb. I have mounted the first two disk in mirror (RAID1) for the system files. I have mounted 3 disks in RAID5 for users file systems. I kept the last one as spare and I have mounted it standalone... (6 Replies)
Discussion started by: aribault
6 Replies

5. Linux

grep -f CPU performances

Hi I would like to thank you all for this excellent forum. Today i tried to compare two files and i get some problem with it. I have two files and i want to get all the data that match the first file like this File1 (pattern file) ___________________________ 9007 9126 9918 9127 ... (6 Replies)
Discussion started by: tafil
6 Replies

6. UNIX for Dummies Questions & Answers

so difficult question about using grep

en...how to grep some words from some file, the goal is that, we donot want the words are exactly 9 charactor, and want to grep from some words that are longer than 9 and it contain a substring that with 9 different charactors (2 Replies)
Discussion started by: shrimpy
2 Replies

7. Shell Programming and Scripting

A difficult script (for me)

Hi, I'm a beginners, this is one of my first script, it's easy, but I don't know how to write this script: The script receive in input 4 parameters: 1) user_name 2) r and/or w and/or x ( rwx, rw, x, ....) 3) u and/or g and/or o ( u, uo, ugo, ...) 4) the path name The script print a... (2 Replies)
Discussion started by: DNAx86
2 Replies

8. Solaris

difficult time differences

:rolleyes: Hi, How to take the time diffence between start and finish time from a log file? It is like ..... started at Jun 20 23:20 . . ..... finished at Jun 21 01:40 Tryed so many ways but failed to ger exact way. :confused: Your help will be honoured. Ta........Lokesha (1 Reply)
Discussion started by: Lokesha
1 Replies

9. AIX

ssa performances

Helo: We updated form AIX 4.3.3 to AIX 5.1-7 and after this we spent more than double time in read from external disks. Aparently the ssa cards microcode is at last level and all the ptf and apars are instaled. Out backups expent more than double time, but curiously in read only, if we write in... (0 Replies)
Discussion started by: Javier Gutierre
0 Replies

10. UNIX for Advanced & Expert Users

Difficult Filtering Problem

Sir, I have a file containing say 1000 lines that contain 100 paragraphs of 10 lines each separated by blank lines.I have to match a pattern or a string "hdfhasdjkasdhs" and print the complete paragraphs containing these strings.I can do this with the help of line editor ex,but how can I use... (1 Reply)
Discussion started by: Piyush
1 Replies
Login or Register to Ask a Question