Pattern Matchin Huge File Post: 302495374

Sponsored Content

Top Forums Shell Programming and Scripting Pattern Matchin Huge File Post 302495374 by senthil.ak on Thursday 10th of February 2011 04:29:57 AM

02-10-2011

Registered User

Pattern Matchin Huge File

Hi Experts,
I've issue with the huge file.
My requirement is I need to search a pattern between the 155-156 position and if its match's to 31 or 36 then need to route that to a new separate files.
The main file has around 1459328 line and 2 GB in size. I tired with the below code which take around 2 hrs to execute.

Code:

while read line
do
    record_type=`echo "$line" | cut -c 155-156`
    if [ "$record_type" -eq 31 ] ; then
    print "$line" >> ./31.txt
    elif  [ "$record_type" -eq 39 ] ; then
    print "$line" >> ./39.txt
    fi
done < LOAD.txt

Where as I modified this and used awk which is still taking more than 30 minutes but the results vary.

Code:

 
awk '/839I/ {print $0}' LOAD.txt > record_39.txt &
awk '/831I/ {print $0}' LOAD.txt > record_31.txt &
cat LOAD.txt | cut -c 155-156 > smp.log
grep -c '31' smp.log
 1182483
wc -l record_type_31.txt 
 1182495 record_31.txt

I even tired this too

Code:

 
awk '$5 ~ 39{print $0;}' LOAD.txt

but always the $5 wont come in between 155-156 position.
Sample records.

Code:

14115726     0000000000         00000000000000000000000000000000000000000000000000000000                                                      000         00I201
06485726     0000000000         00000000000000000000000000000000000000000000000000000000                                                      000        805I201
18005726ABCUS0000005726         01002080000000000000000000000000000000000000000000000000370291010381009    20090218                           000 I      839I201
18005726ABCUS0000005726         08009100000000000000000000000000000000000000000000000000370290173421008    20101203                           000I       839I201
18005726ABCUS0000005726         00000020000000000000000000000000000000000000000000000000370282295281006    20060706                           000C       831I201
18005726ABCUS0000005726         01002080000000000000000000000000000000000000000000000000370282010171003    20090216                           000 I      831I201

Do we have any other way in which I can get the currect results.

Thanks
Senthil.

senthil.ak

View Public Profile for senthil.ak

Find all posts by senthil.ak

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Simple to you not simple to me pattern matchin help

hey all, im new and my first question is: say i have a word "blahblah" how do i get and replace the last letter of the word with say k, so replace the h with a k. However you cant just replace the h it has to change the LAST LETTER of the word. Cheers In advance. :b:

2. Shell Programming and Scripting

sorting huge file

Hi All I am sorting a huge file -rw-r--r-- 1 rama users 448156978 May 13 18:48 102384.temp $ sort -k 1,40n 102384.temp > 102384.temp1 msgcnt 1468 vxfs: mesg 001: vx_nospace - /dev/vg00/var file system full (1 block extent) sort: A write error occurred while sorting. I thought...

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and...

4. Shell Programming and Scripting

Help on splitting this huge file

Hi , i have files coming in my system which are very huge in MB and GBs, all these files are in a single line, there is no newline character. I need to get only last 700 bytes of these files, of this i am splitting the files by "split -b 700 filename" but this gives all the splitted...

5. Shell Programming and Scripting

Huge File Comparison

Hi i need to compare two fixed length files and produce the differences if any to a seperate file. I have to capture each and every differneces line by line. Ideally my files should not have any differences but if there are any then it should be captured without any miss. Also my files sizes are...

6. UNIX for Dummies Questions & Answers

Pattern matchin Between Two Files

Hi All, I have two files as below: file1 file2 AAAA CCCC,1234,0909 BBBBB AAAA,1234 AAAA DDDD,23536,9090 CCCC DDDD EEEEE I want a out file as below AAAA,1234 BBBB AAAA,1234...

7. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised...

8. UNIX for Dummies Questions & Answers

My file system is 100%, can't find the huge file

Please help. My file system is 100%, I can't seem to find what is taking so much space. The total hard drive space is 150Gig free but I got nothing now. I did to this to find the big file but it's taking so much time. Is there any other way? du -ah / | more find ./ -size +200M...

9. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below ...

10. Shell Programming and Scripting

How to quickly substitute pattern within certain range of a huge file?

I have big files (some are >300GB!) that need substitution for some patterns, for example, change Multiple Spaces into Tab. I used this oneliner:sed '1,18s/ \{1,\}/\t/g' infile_big.sam > outfile_big.sambut it seems very slow as the job is still running after 24 hours! In this example, only the...

LEARN ABOUT MOJAVE

ucblinks

ucblinks(1B)                                         SunOS/BSD Compatibility Package Commands                                         ucblinks(1B)

NAME

       ucblinks - adds /dev entries to give SunOS 4.x compatible names to SunOS 5.x devices

SYNOPSIS

       /usr/ucb/ucblinks [-e rulebase] [-r rootdir]

DESCRIPTION

       ucblinks  creates symbolic links under the /dev directory for devices whose SunOS 5.x names differ from their SunOS 4.x names. Where possi-
       ble, these symbolic links point to the device's SunOS 5.x name rather than to the actual /devices entry.

       ucblinks does not remove unneeded compatibility links; these must be removed by hand.

       ucblinks should be called each time the system is reconfiguration-booted, after any new SunOS 5.x links that are needed have been  created,
       since the reconfiguration may have resulted in more compatibility names being needed.

       In  releases prior to SunOS 5.4, ucblinks used a  nawk rule-base to construct the SunOS 4.x compatible names. ucblinks no longer uses  nawk
       for the default operation, although  nawk rule-bases can still be specifed with the -e option.  The  nawk rule-base equivalent to the SunOS
       5.4 default operation can be found in /usr/ucblib/ucblinks.awk.

OPTIONS

       -e rulebase     Specify rulebase as the file containing nawk(1) pattern-action statements.

       -r rootdir      Specify rootdir as the directory under which dev and devices will be found, rather than the standard root directory /.

FILES

       /usr/ucblib/ucblinks.awk        sample rule-base for compatibility links

ATTRIBUTES

       See attributes(5) for descriptions of the following attributes:

       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE         |      ATTRIBUTE VALUE        |
       +-----------------------------+-----------------------------+
       |Availability                 |SUNWscpu                     |
       +-----------------------------+-----------------------------+

SEE ALSO

       devlinks(1M), disks(1M), ports(1M), tapes(1M), attributes(5)

SunOS 5.10                                                          13 Apr 1994                                                       ucblinks(1B)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Simple to you not simple to me pattern matchin help

Discussion started by: aleks001

2. Shell Programming and Scripting

sorting huge file

Discussion started by: dhanamurthy

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

Discussion started by: deepaktanna

4. Shell Programming and Scripting

Help on splitting this huge file

Discussion started by: Prateek007

5. Shell Programming and Scripting

Huge File Comparison

Discussion started by: naveenn08

6. UNIX for Dummies Questions & Answers

Pattern matchin Between Two Files

Discussion started by: thana

7. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Discussion started by: manishkomar007

8. UNIX for Dummies Questions & Answers

My file system is 100%, can't find the huge file

Discussion started by: samnyc

9. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Discussion started by: KishM

10. Shell Programming and Scripting

How to quickly substitute pattern within certain range of a huge file?

Discussion started by: yifangt

LEARN ABOUT MOJAVE

ucblinks