Playing with Volume of data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Playing with Volume of data
# 1  
Old 06-30-2009
Question Playing with Volume of data

Quick problem statement:
How to read/extract data from a big-big file.

Details:
We are having a big big problemo in the work we are working at. We are using solaris plarform E25.
There is a very big file created somewhere around 200 million records anad the lenght of each record is more than 1000 columns. The data in these columns is separated by semicolon.
Code:
Sample File:
01;1;;0001;123;;;;ZBCA10;;;;;;;;;20060116;99991
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
01;1;;0001;421876;;;;BCA030;BCA010;;;;;;;;20060502
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
01;1;;0001;42187;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;4216;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;4876;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;21876;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;876;;;;BCA030;BCA010;;;;;;;;20060502
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/

About file:
the start of the file always starts with a 01 or a " " space. The records are more like header-details. the row starting with 01 - implies a header line. the ones below it are details. Data columns are separated by semicolon. Data Column 5 will remain same between header and detail rows.
What I need to do:
1) extract a bunch of header-detail records. For example I need to every 50000 header-detail kind of data rows into a file.

I tired the SED command
sed -n 100,200002p -f fileabc.txt
However the performance is not upto the mark. the problem with sed is that even if it is instructed to copy only 100 to 200002 rows, it still scans the entire file.
When I tried SED with the entire file it took couple of days to run. Thats too much. I need better options?
Is there a way to make this operation run in parallel?
Is there a shell command which helps to copy only the specific rows and down not scan the entire file?

2) I will let you know the second question later.

Thanks,
darshanw
# 2  
Old 06-30-2009
q in sed quits early.
awk may be easier
Code:
awk 'FNR>2 && FNR<200003 {print $0}
       FNR>200002 { exit}' bigfile > smallerfile

# 3  
Old 07-01-2009
Maybe a dumb question:
The script which is using SED is inside another shell script, how to pass the "q" so that SED stops after the relevant line?

Secondly, I tried the AWK as well, but it doesn't print anything. Any clues why so?
regards,
# 4  
Old 08-01-2009
Stumbled and dubmled --- I found a solution
# Usage of awk for file creation
awk 'NR>='"$1"' && NR<='"$2"' {print; } NR>'"$2"' {exit}' $3 >> $4
echo "` date "+%F_%T" | tr -d ' '` DD-File created --> " $4 >> $5

Now I can pass my regular shell script parameters to teh awk script as well...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

No space in volume group. How to create a file system using existing logical volume

Hello Guys, I want to create a file system dedicated for an application installation. But there is no space in volume group to create a new logical volume. There is enough space in other logical volume which is being mounted on /var. I know we can use that logical volume and create a virtual... (2 Replies)
Discussion started by: vamshigvk475
2 Replies

2. Shell Programming and Scripting

Output large volume of data to CSV file

I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message when I run the program. The program is not outputting to the csv file. Error: the file access permissions do not allow the specified action cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies

3. Shell Programming and Scripting

Need a ready Shell script to validate a high volume data file

Hi, I am looking for a ready shell script that can help in loading and validating a high volume (around 4 GB) .Dat file . The data in the file has to be validated at each of its column, like the data constraint on each of the data type on each of its 60 columns and also a few other constraints... (2 Replies)
Discussion started by: Guruprasad
2 Replies

4. AIX

LTO5 Catridge 1.5 TB Natvie capactiy unable to hold 1.44TB data in one volume

Hi, LTO5 Data cartridge has 1.5 TB (1500GB) native capacity but when we are taking our 1.44 TB (1475 GB) filesystem backup using backupby filename on these data cartridges it does not fully finish on one cartridge instead it requires another volume to backup the remaining files. I am unable to... (11 Replies)
Discussion started by: m_raheelahmed
11 Replies

5. UNIX for Dummies Questions & Answers

Confusion Regarding Physical Volume,Volume Group,Logical Volume,Physical partition

Hi, I am new to unix. I am working on Red Hat Linux and side by side on AIX also. After reading the concepts of Storage, I am now really confused regarding the terminologies 1)Physical Volume 2)Volume Group 3)Logical Volume 4)Physical Partition Please help me to understand these concepts. (6 Replies)
Discussion started by: kashifsd17
6 Replies

6. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

7. Shell Programming and Scripting

Creating Header & Trailer for bulk volume data file

Hi all, I have a requirement to create a Header &Trailer for a file which is having 20 millions of records. If I use the following method, i think it will take more time. cat "Header"> file1.txt cat Data_File>>file1.txt cat "Trailer">>file1.txt since second CAT command has to read all... (4 Replies)
Discussion started by: Raamc
4 Replies

8. Shell Programming and Scripting

how to manipulate with lines while playing with data

hello everyone, well I have a file which contains data, I want to add the data on hourly basis, like my file contains data for 24 hours, (so a total of 1440 ) lines. Now i want to add the data on hourly basis to get average values. like if I use (head) command it is ok for first go, but... (5 Replies)
Discussion started by: jojo123
5 Replies

9. AIX

Basic Filesystem / Physical Volume / Logical Volume Check

Hi! Can anyone help me on how I can do a basic check on the Unix filesystems / physical volumes and logical volumes? What items should I check, like where do I look at in smit? Or are there commands that I should execute? I need to do this as I was informed by IBM that there seems to be... (1 Reply)
Discussion started by: chipahoys
1 Replies

10. Solaris

How to resize mirror volume in veritas volume manager 3.5 on Solaris 9 OE

Hi all, I have a problem with vxvm volume which is mirror with two disks. when i am try to increase file system, it is throwing an ERROR: can not allocate 5083938 blocks, ERROR: can not able to run vxassist on this volume. Please find a sutable solutions. Thanks and Regards B. Nageswar... (0 Replies)
Discussion started by: nageswarb
0 Replies
Login or Register to Ask a Question