Compare two files when pattern matched


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Compare two files when pattern matched
# 1  
Old 08-19-2016
Compare two files when pattern matched

I have two files say FILE1 and FILE2.

FILE1 contains 80,000 filename in sorted order and another file FILE2 contains 6,000 filenames is also in sorted order.

I want to compare the filename for each file and copy them in to a folder when filename is matched.


File1.txt contain 80,000 filenames
Code:
./list1.txt
./list.txt
./temp.txt
./1_April_2011_Front0.txt
./1_April_2011_Front10.txt
./1_April_2011_Front11.txt
./1_April_2012_Front12.txt
./1_April_2011_Front13.txt
./1_April_2011_Front14.txt
./1_April_2011_Front15.txt
./1_April_2011_Front16.txt
./1_April_2011_Front17.txt
./1_April_2011_Front18.txt
./1_April_2011_Front19.txt
./1_April_2011_Front1.txt
./5_April_2012_Page323.txt
./6_August_2012_Page328.txt
./10_February_2014_Sportz6.txt
.....
.....

File2.txt contain 6,000 filenames without extension (.txt)
Code:
1_April_2012_Front16
5_April_2012_Page323
6_August_2012_Page328
15_August_2012_Sportz10
10_February_2014_Sportz6
.....
.....

Similar filenames copied to a folder name "output"

desired output
Code:
5_April_2012_Page323.txt
6_August_2012_Page328.txt
10_February_2014_Sportz6.txt

I tried this code but do not get my desired output

Code:
counter=0;
for file in `cat FILE1.txt | awk -F'[/_.]' '{print $3$4$5$6}'` 
do
x=`echo "$file"` 
while read eachline
do
y=`echo "$eachline" | cat temp.txt | awk -F'[/_.]' '{print $1$2$3$4}'`
if [ "$x"=="$y" ]
then
cp -v $file /home/imran/Script/data
counter=$((counter+1))
break
fi
done < FILE2.txt
echo $counter
done

I have tried in this way also

Code:
counter=0;
for f in `awk 'NR>2{print}' FILE1.txt` 
   do
     f3=$(echo $f|awk -F'/' '{print $2}');
     f6=$(echo "${f3%%.*}");    
   for g in `awk 'NR>=1{print}' FILE2.txt`
        do
           if [ "$f"=="$g" ]
           then
           cp $f /home/imran/Script/data
           counter=$((counter+1))    
           break;
           fi
       done
             echo $counter
  done


Please help


Moderator's Comments:
Mod Comment Please use CODE (not ICODE) tags as required by forum rules!

Last edited by RudiC; 08-19-2016 at 06:45 AM.. Reason: Changed ICODE tags.
# 2  
Old 08-19-2016
Does "similar" mean "identical except for the .txt ending"?
Will EVERY single entry in file2 exist in file1 (with leading "./" and trailing ".txt")?
# 3  
Old 08-19-2016
Given my above assumptions apply, try

Code:
awk 'NR == FNR {T[$1]; next} {FN = $0; gsub (/^.*\/|.txt$/, _)} $0 in T {system ("echo cp " FN " /some/where")}' file2 file1
cp ./5_April_2012_Page323.txt /some/where
cp ./6_August_2012_Page328.txt /some/where
cp ./10_February_2014_Sportz6.txt /some/where

If happy, remove the echo command from the system() call.
This User Gave Thanks to RudiC For This Post:
# 4  
Old 08-19-2016
Thank you so much RudiC Sir!!

---------- Post updated at 05:47 PM ---------- Previous update was at 05:34 PM ----------

RudiC Sir!! Could you please explain your command
# 5  
Old 08-19-2016
Quote:
Originally Posted by imranrasheedamu;
RudiC Sir!! Could you please explain your command
Hello imranrasheedamu,

Could you please let me know if following may help you here.
Code:
awk 'NR == FNR                         #### NR and FNR are the awk's inbuilt variables so condition NR==FNR willbe TRUE only when first file(file2) here will be read. Because FNR's value will be reset whenever a new file is being read but NR's value will be keep on increasing till the all files will be completed reading.
{T[$1];                                #### creating an array named T whose value is $1(first field).
next}                                  #### putting next(awk's inbuilt keyword) to skip all further statements now.
                                       #### All following statements will be read when second file named file1 is being read.
{FN = $0;                              #### creating a variable named FN whose value is $0(complete line).           
gsub (/^.*\/|.txt$/, _)}               #### gsub(awk's in-built functionality to globally subtitute the pattern in any line or variable, line here in this case. It will globally subsitutue everything till / (as per your requirement) with NULL.
$0 in T                                #### Now every line(which is formed by above subsitute command now) is present in array named T(which was created while file2 was getting read in NR==FNR condition).
{system ("echo cp " FN " /some/where") #### using system command(which is use to execute shell commands inside awk) executing echo command which will write the actually commands which we want to perform like cp source_file  Target_file in this case.
}' file2 file1                         #### Mentioning Input_files named file2 and file1 here.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 08-19-2016
Code:
awk '
NR == FNR       {T[$1]                                  # for the first file (NR id. to FNR), collect the names to search in T array
                 next                                   # stop processing this line; read next one
                }
                {FN = $0                                # second file only: save total file path in FN variable
                 gsub (/^.*\/|.txt$/, _)                # remove leading path info and ".txt" ext. from file name
                }
$0 in T         {system ("echo cp " FN " /some/where")  # IF the reduced file name is found in pattern array T, run the 
                                                        # system command to cp FN (full file path) to destination (echo inserted for safety)
                }
' file2 file1

This User Gave Thanks to RudiC For This Post:
# 7  
Old 08-19-2016
Each call to system() in awk will invoke a shell which will then invoke cp. If there are 6000 files to be copied, invoking one shell for the copies instead of 6000 should be considerably faster. Consider this small change to RudiC's suggestion:
Code:
awk '
NR == FNR       {T[$1]
                 next
                }
                {FN = $0 
                 gsub (/^.*\/|.txt$/, _)
                }
$0 in T         {print "cp", FN, "/some/where"
                }
' file2 file1 | sh

And, if the cp utility on your system has a -t destination_directory option (which is an extension not covered by the standards), you could make even more gains greatly reducing the number of times cp is invoked by using xargs:
Code:
awk '
NR == FNR       {T[$1]
                 next
                }
                {FN = $0 
                 gsub (/^.*\/|.txt$/, _)
                }
$0 in T         {print FN
                }
' file2 file1 | xargs cp -t "/some/where"

These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

To print from the first line until pattern is matched

Hi I want to print the line until pattern is matched. I am using below code: sed -n '1,/pattern / p' file It is working fine for me , but its not working for exact match. sed -n '1,/^LAC$/ p' file Input: LACC FEGHRA 0 LACC FACAF 0 LACC DARA 0 LACC TALAC 0 LAC ILACTC 0... (8 Replies)
Discussion started by: Abhisrajput
8 Replies

2. Shell Programming and Scripting

Putting together substrings if pattern is matched

What I would like to do is if the lines with % have the same name, then combine the last 9 letters of the string underneath the last occurrence of that ID with the first 9 letters of the string underneath the first occurrence of that ID. I have a file that looks like this: %GOGG... (12 Replies)
Discussion started by: verse123
12 Replies

3. Shell Programming and Scripting

Matched a pattern from multiple columns

Hi, I need to extract an info in $1 based on a matched pattern in $2,$3,$4, and $5. The sample input file as follows:- ID Pat1 Pat2 Pro1 use1 add41 M M M add87 M M M M add32 ... (16 Replies)
Discussion started by: redse171
16 Replies

4. Shell Programming and Scripting

How to use sed to search a particular pattern in a file backward after a pattern is matched.?

Hi, I have two files file1.txt and file2.txt. Please see the attachments. In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to... (9 Replies)
Discussion started by: saurabh kumar
9 Replies

5. Shell Programming and Scripting

Insert certain field of matched pattern line above pattern

Hello every, I am stuck in a problem. I have file like this. I want to add the fifth field of the match pattern line above the lines starting with "# @D". The delimiter is "|" eg > # @D0.00016870300|0.05501020000|12876|12934|3||Qp||Pleistocene||"3 Qp Pleistocene"|Q # @P... (5 Replies)
Discussion started by: jyu3
5 Replies

6. Shell Programming and Scripting

Print only matched pattern in perl

Hi, I have script like below: #!/usr/local/bin/perl use strict; use warnings; while (<DATA>) { ( my ($s_id) = /^\d+\|(\d+?)\|/ ) ; if ( $s_id == 1 ){ s/^(.*\|)*.*ABC\.pi=(+|+)*.*ABC\.id=(\d+|+).*$/$1$2|$3/s; print "$1$2|$3\n"; (2 Replies)
Discussion started by: sol_nov
2 Replies

7. Shell Programming and Scripting

Grep word between matched pattern

would like to print word between matched patterns using sed for example : create INDEX SCOTT.OR_PK ON table_name(....) would like to print between SCOTT. and ON which is OR_PK Please help me out Thanks (4 Replies)
Discussion started by: jhonnyrip
4 Replies

8. Shell Programming and Scripting

removing lines around a matched pattern

I have an ugly conf file that has the string I'm interested in searching for in the middle of a block of code that's relevant, and I'm trying to find a way to remove that entire block based on the matched line. I've googled for this problem, and most people helping are only interested in... (9 Replies)
Discussion started by: tamale
9 Replies

9. Shell Programming and Scripting

Shell Scripting: Compare pattern in two files and merge the o/p in one.

one.txt ONS.1287677000.820.log 20Oct2010 ONS.1287677000.123.log 21Oct2010 ONS.1287677000.456.log 22Oct2010 two.txt ONS.1287677000.820.log:V AC CC EN ONS.1287677000.123.log:V AC CC EN ONS.1287677000.820.log:V AC CC EN In file two.txt i have to look for pattern which column one... (17 Replies)
Discussion started by: saluja.deepak
17 Replies

10. UNIX for Dummies Questions & Answers

Count of matched pattern occurance

In a file a pattern is occured many times randomly. Even it may appear more then once in the same line too. How i can get the number of times that pattern appeared in the file? let the file name is abc.txt and the pattern is "xyz". I used the following code: grep -ic "xyz" abc.txt but it is... (3 Replies)
Discussion started by: palash2k
3 Replies
Login or Register to Ask a Question