Pattern Matchin Huge File


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Matchin Huge File
# 1  
Old 02-10-2011
Lightbulb Pattern Matchin Huge File

Hi Experts,
I've issue with the huge file.
My requirement is I need to search a pattern between the 155-156 position and if its match's to 31 or 36 then need to route that to a new separate files.
The main file has around 1459328 line and 2 GB in size. I tired with the below code which take around 2 hrs to execute.
Code:
while read line
do
    record_type=`echo "$line" | cut -c 155-156`
    if [ "$record_type" -eq 31 ] ; then
    print "$line" >> ./31.txt
    elif  [ "$record_type" -eq 39 ] ; then
    print "$line" >> ./39.txt
    fi
done < LOAD.txt

Where as I modified this and used awk which is still taking more than 30 minutes but the results vary.
Code:
 
awk '/839I/ {print $0}' LOAD.txt > record_39.txt &
awk '/831I/ {print $0}' LOAD.txt > record_31.txt &
cat LOAD.txt | cut -c 155-156 > smp.log
grep -c '31' smp.log
 1182483
wc -l record_type_31.txt 
 1182495 record_31.txt

I even tired this too
Code:
 
awk '$5 ~ 39{print $0;}' LOAD.txt

but always the $5 wont come in between 155-156 position.
Sample records.
Code:
14115726     0000000000         00000000000000000000000000000000000000000000000000000000                                                      000         00I201
06485726     0000000000         00000000000000000000000000000000000000000000000000000000                                                      000        805I201
18005726ABCUS0000005726         01002080000000000000000000000000000000000000000000000000370291010381009    20090218                           000 I      839I201
18005726ABCUS0000005726         08009100000000000000000000000000000000000000000000000000370290173421008    20101203                           000I       839I201
18005726ABCUS0000005726         00000020000000000000000000000000000000000000000000000000370282295281006    20060706                           000C       831I201
18005726ABCUS0000005726         01002080000000000000000000000000000000000000000000000000370282010171003    20090216                           000 I      831I201

Do we have any other way in which I can get the currect results.

Thanks
Senthil.
# 2  
Old 02-10-2011
Do you have the same result running
Code:
cat LOAD.txt | cut -c 155-156 > smp.log
grep -c '31' smp.log

and

Code:
cat LOAD.txt | cut -c 154-157 > smp.log
grep -c '831I' smp.log

???
This User Gave Thanks to ctsgnb For This Post:
# 3  
Old 02-10-2011
Code:
awk '{p=substr($0,155,2)} p ~ "3[19]" {print > p ".txt"}' file

This User Gave Thanks to Franklin52 For This Post:
# 4  
Old 02-10-2011
Hi,

Test next 'perl' script:

Code:
$ perl -ne 'BEGIN { open $f31, ">", "31.txt" or die $!; open $f39, ">", "39.txt" or die $!; } ($a) = unpack "x154 A2", $_; if ($a == 31) { print $f31 $_; } elsif ($a == 39) { print $f39 $_; }' infile

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 5  
Old 02-10-2011
Perhaps this will go faster:
Code:
grep '^.\{154\}31' infile > 31.txt

To just count the records:
Code:
grep -c '^.\{154\}31' infile

Likewise for 39
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 02-10-2011
@ctsgnb
Code:
 
cat L*.txt | cut -c 155-156 > smp.log
grep -c '31' smp.log
1182483
grep -c '39' smp.log
32855
cat L*.txt | cut -c 154-157 > smp.log 
grep -c '831I' smp.log
1182483
grep -c '839I' smp.log
32855

@ Franklin52 - Many thanks this deserve a party. SmilieSmilie
Code:
 
time awk '{p=substr($0,155,2)} p ~ "3[19]" {print > p ".txt"}' LOAD.txt &
real    1m50.57s
user    0m23.54s
sys     0m44.26s
wc -l 39.txt 31.txt
   32855 39.txt
 1182483 31.txt

@ Birei - I'm sorry I wont have perl in the box so not possible to try.
@ Scrutinizer - Do you please explain me the command little bit I'm dump to understand the expert level command.
Code:
 
time grep '^.\{154\}39' LOAD.txt > 39.txt &
real    2m43.49s
user    0m18.01s
sys     0m17.35s
wc -l 39.txt
32855 39.txt

# 7  
Old 02-10-2011
Code:
grep "83[19]I...$" LOAD.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to quickly substitute pattern within certain range of a huge file?

I have big files (some are >300GB!) that need substitution for some patterns, for example, change Multiple Spaces into Tab. I used this oneliner:sed '1,18s/ \{1,\}/\t/g' infile_big.sam > outfile_big.sambut it seems very slow as the job is still running after 24 hours! In this example, only the... (8 Replies)
Discussion started by: yifangt
8 Replies

2. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

3. UNIX for Dummies Questions & Answers

My file system is 100%, can't find the huge file

Please help. My file system is 100%, I can't seem to find what is taking so much space. The total hard drive space is 150Gig free but I got nothing now. I did to this to find the big file but it's taking so much time. Is there any other way? du -ah / | more find ./ -size +200M... (3 Replies)
Discussion started by: samnyc
3 Replies

4. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

5. UNIX for Dummies Questions & Answers

Pattern matchin Between Two Files

Hi All, I have two files as below: file1 file2 AAAA CCCC,1234,0909 BBBBB AAAA,1234 AAAA DDDD,23536,9090 CCCC DDDD EEEEE I want a out file as below AAAA,1234 BBBB AAAA,1234... (5 Replies)
Discussion started by: thana
5 Replies

6. Shell Programming and Scripting

Huge File Comparison

Hi i need to compare two fixed length files and produce the differences if any to a seperate file. I have to capture each and every differneces line by line. Ideally my files should not have any differences but if there are any then it should be captured without any miss. Also my files sizes are... (4 Replies)
Discussion started by: naveenn08
4 Replies

7. Shell Programming and Scripting

Help on splitting this huge file

Hi , i have files coming in my system which are very huge in MB and GBs, all these files are in a single line, there is no newline character. I need to get only last 700 bytes of these files, of this i am splitting the files by "split -b 700 filename" but this gives all the splitted... (2 Replies)
Discussion started by: Prateek007
2 Replies

8. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

9. Shell Programming and Scripting

sorting huge file

Hi All I am sorting a huge file -rw-r--r-- 1 rama users 448156978 May 13 18:48 102384.temp $ sort -k 1,40n 102384.temp > 102384.temp1 msgcnt 1468 vxfs: mesg 001: vx_nospace - /dev/vg00/var file system full (1 block extent) sort: A write error occurred while sorting. I thought... (3 Replies)
Discussion started by: dhanamurthy
3 Replies

10. Shell Programming and Scripting

Simple to you not simple to me pattern matchin help

hey all, im new and my first question is: say i have a word "blahblah" how do i get and replace the last letter of the word with say k, so replace the h with a k. However you cant just replace the h it has to change the LAST LETTER of the word. Cheers In advance. :b: (0 Replies)
Discussion started by: aleks001
0 Replies
Login or Register to Ask a Question