![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Script to capture new lines in a file and copy it to new file | fara_aris | Shell Programming and Scripting | 0 | 05-27-2008 10:11 PM |
| Deleting lines inside a file without opening the file | toms | Shell Programming and Scripting | 3 | 09-24-2007 07:58 AM |
| need help appending lines/combining lines within a file... | mr_manny | Shell Programming and Scripting | 2 | 01-06-2006 06:45 PM |
| How to read specific lines in a bulk file using C file Programming | rajan_ka1 | High Level Programming | 10 | 11-10-2005 03:29 AM |
| Loop through file and write out lines to file(s) | Jtrinh | Shell Programming and Scripting | 7 | 07-05-2005 03:06 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
omitting lines from file A that are in file B
I've got file A with (say) 1M lines in it ... ascii text, space delimited ...
I've got file B with (say) 10M lines in it ... same structure. I want to remove any lines from A that appear (identically) in B and print the remaining (say) 900K lines. (And I want to do it in zero time of course!) Best I've come up with so far is somehow marking the lines in A, then doing a sort and applying an awk script to the result so that the marked lines are only printed if the following (or previous) line isn't "identical" except for the mark. But after 1000 years of shell programming I've GOT to believe I'm missing an easier/faster solution ... I'm using bash and cygwin tools - and compiling is not an option. ADVthanksANCE for your help! =Gneen |
|
||||
|
Quote:
Code:
cat fileA | while read line do grep -q "$line" fileB if [ $? -eq 1 ]; then echo "$line" > fileC fi done |
|
||||
|
but ...
Heh - the grep inside the read loop would "work" ... but I'd have to come back in a year to see the results!
For tiny files this would clearly be the way to go - but for files the size I'm dealing with this would mean one million greps into a file that was ten million lines long ... can you spell "Rip Van Winkle"? ![]() =Gneen |
|
|||||
|
Use awk/perl hashes/assoc arrays
Assuming awk is fairly memory efficient and you have at least 1M x length-of-line bytes in virtual mem, this should work:
Code:
awk 'NR==FNR { A[$0]=1; next; } { if ($0 in A) { A[$0]=0; } END { for (k in A) { if (A[k]==1) { print A[k]; } } }' A B
|
|
||||
|
Thanks otheus!
Nothing quite like a one-line cryptic awk script from a guru ... with a few minor typo corrections it shows excellent promise ... trying it with the giant files and the real data is going to need to wait for tomorrow. SWEET! (I'll post back here with some timing results.) And thanks to to the other folks who replied - this is indeed an incredible resource! Quote:
|
![]() |
| Bookmarks |
| Tags |
| linux |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|