The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM
Home Forums Register Rules & FAQ Members List Arcade Search Today's Posts Mark Forums Read


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!


Other UNIX.COM Threads You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Splitting file based on number of rows wahi80 Shell Programming and Scripting 2 06-03-2008 06:38 PM
Splitting a file based on the records in another file er_ashu Shell Programming and Scripting 2 05-12-2008 01:34 PM
splitting a record and adding a record to a file rsolap Shell Programming and Scripting 1 08-13-2007 10:58 AM
splitting files based on text in the file matrix1067 Shell Programming and Scripting 1 01-30-2006 04:45 PM
Splitting a file based on some condition and naming them srivsn Shell Programming and Scripting 1 12-07-2005 07:27 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-12-2008
Registered User
 

Join Date: Apr 2007
Posts: 39
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Splitting a file based on record sin another file

All,

We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g.

File1:

UHDR 2008112
"25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ,21-APR-1991
"8Y3H4","0000004H","00",16-APR-1992,"" ,"H",-001621119,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ,21-APR-1991
"95Y8U","02100971","00",03-MAR-1991,"" ,"H",-000004499,+0000000000,"" , ,+000000000,+000000000,000000000,"2","US",21-APR-1991
"24567","02100973","00",26-SEP-1991,"" ,"H",-000000362,+0000000000,"" , ,+000000000,+000000000,000000000,"2","US",21-APR-1991
--
--
--
UTRL 00144700


File2:
2518720080512
2456720080512
1256720080512
8WE7820080512
8Y3H020080512
8Y3H220080512
8Y3H420080512
8Y3H620080512
-
--
--
--

If the first 5 characters of file 2 matched with the chars 2-6 in file1, it should separate those records and put them into another file and rest of the records should be copied into a second file.

I tried cut command but as the file1 is quite large, it was taking a lot of time to put the values into a variable and then compare it.

Is there a way which can do the above task quite fast.

Please help as it is needed urgently.

Thanks in anticipation.
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 05-13-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 2,203
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
If your grep can read patterns from a file (like GNU grep), try something like this.

Code:
cut -c1-5 file2 >patterns
grep -f patterns file1 >matches
grep -f patterns -v file1 >nonmatches
There is some room for improvement, relating to making the patterns always match at beginning of line, etc. You could replace the first line with something like

Code:
sed 's/^\(.....\).*/^"\1",/' file2 >patterns
This adds the double quotes around the search string and adds a comma behind it and the special character "^" before it, which means match only at beginning of line.

Last edited by era : 05-13-2008 at 12:28 AM. Reason: Explain revised patterns
Reply With Quote
  #3 (permalink)  
Old 05-13-2008
Moderator
 

Join Date: Feb 2007
Posts: 1,291
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
With awk:

Code:
awk 'NR==FNR{a[substr($0,1,5)];next}
NF<3{next}
substr($1,2,5) in a {print > "file1";next}
{print > "file2"}' file2 file1
Regards
Reply With Quote
  #4 (permalink)  
Old 05-13-2008
Registered User
 

Join Date: Apr 2007
Posts: 39
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Thanks Franklin and era.

Franklin,
I fpossible, can you explain wha tthe cod eis doing. I'm an awk novice and this will help me in interpreting things better.
Also, what do I do if I don't want to modify the original files.

Thanks again.
Reply With Quote
  #5 (permalink)  
Old 05-13-2008
Moderator
 

Join Date: Feb 2007
Posts: 1,291
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Code:
awk 'NR==FNR{a[substr($0,1,5)];next}
Gets first 5 characters of the first file into array.

Code:
 NF<3{next}
Ignore lines with less then 3 fields (the first and last line)

Code:
 substr($1,2,5) in a {print > "file1";next}
If 5 characters after quotes in array print to file1..

Code:
 {print > "file2"}
... else print to file2


The original files should not change.

Regards

Last edited by Franklin52 : 05-13-2008 at 10:03 AM. Reason: linguistic correction
Reply With Quote
Google UNIX.COM
Reply



Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 09:31 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger

Search Engine Optimization by vBSEO 3.1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102