Match files between two folders


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match files between two folders
# 1  
Old 10-20-2014
Match files between two folders

I have two folders:

FOLDER1
Code:
file_101_1010.txt
file_102_1007.txt
file_103_1003.txt
file_104_1007.txt
file_105_1011.txt
file_106_1006.txt
file_108_1007.txt
file_109_1002.txt
file_110_1006.txt
file_111_1008.txt
file_112_1011.txt
file_113_1012.txt
file_114_1001.txt
file_115_1009.txt
file_116_1002.txt
file_117_1003.txt
file_118_1006.txt
file_119_1012.txt
file_120_1010.txt
file_201_1003.txt
file_202_1004.txt
file_203_1007.txt
file_204_1002.txt
file_205_1010.txt
file_206_1005.txt
file_207_1006.txt
file_208_1006.txt
file_209_1001.txt
file_210_1004.txt
file_211_1003.txt
file_213_1008.txt
file_214_1001.txt
file_215_1004.txt
file_216_1001.txt
file_217_1003.txt
file_218_1005.txt
file_219_1006.txt
file_220_1012.txt
file_301_1010.txt
file_302_1007.txt
file_303_1008.txt
file_304_1011.txt
file_306_1006.txt
file_307_1008.txt
file_308_1009.txt
file_309_1009.txt
file_310_1009.txt
file_311_1012.txt
file_312_1006.txt
file_313_1010.txt
file_314_1005.txt
file_315_1003.txt
file_316_1010.txt
file_318_1002.txt
file_319_1010.txt
file_320_1001.txt

FOLDER2
Code:
myfile_1007_102.log
myfile_1003_103.log
myfile_1011_105.log
myfile_1006_106.log
myfile_1002_107.log
myfile_1007_108.log
myfile_1008_111.log
myfile_1011_112.log
myfile_1012_113.log
myfile_1001_114.log
myfile_1009_115.log
myfile_1002_116.log
myfile_1003_117.log
myfile_1006_118.log
myfile_1012_119.log
myfile_1010_120.log
myfile_1003_201.log
myfile_1004_202.log
myfile_1007_203.log
myfile_1002_204.log
myfile_1010_205.log
myfile_1005_206.log
myfile_1006_207.log
myfile_1006_208.log
myfile_1001_209.log
myfile_1004_210.log
myfile_1003_211.log
myfile_1002_212.log
myfile_1008_213.log
myfile_1001_214.log
myfile_1004_215.log
myfile_1001_216.log
myfile_1003_217.log
myfile_1006_219.log
myfile_1012_220.log
myfile_1010_301.log
myfile_1007_302.log
myfile_1008_303.log
myfile_1011_304.log
myfile_1003_305.log
myfile_1006_306.log
myfile_1008_307.log
myfile_1009_308.log
myfile_1009_309.log
myfile_1009_310.log
myfile_1012_311.log
myfile_1006_312.log
myfile_1010_313.log
myfile_1005_314.log
myfile_1010_316.log
myfile_1001_317.log
myfile_1002_318.log
myfile_1010_319.log
myfile_1001_320.log

The naming convention for the files are:

Code:
FOLDER1/file_ID_VALUE.txt  
FOLDER2/myfile_VALUE_ID.log

I'm trying to list all the ID values in sequence (101-120, 201-220, 301-320), match file ID's between the 2 folders and print VALUE :

Code:
ID   - VALUE-FOLDER1 VALUE-FOLDER2
101  -  1010    1010
102  -  1007    1007
103  -  1003    1003
104  -  1007        
105  -  1011    1011
106  -  1006    1006
107  -          1002
108  -  1007    1007
109  -  1002        
110  -  1006        
111  -  1008    1008
112  -  1011    1011
113  -  1012    1012
114  -  1001    1001
115  -  1009    1009
116  -  1002    1002
117  -  1003    1003
118  -  1006    1006
119  -  1012    1012
120  -  1010    1010
201  -  1003    1003
202  -  1004    1004
203  -  1007    1007
204  -  1002    1002
205  -  1010    1010
206  -  1005    1005
207  -  1006    1006
208  -  1006    1006
209  -  1001    1001
210  -  1004    1004
211  -  1003    1003
212  -          1002
213  -  1008    1008
214  -  1001    1001
215  -  1004    1004
216  -  1001    1001
217  -  1003    1003
218  -  1005        
219  -  1006    1006
220  -  1012    1012
301  -  1010    1010
302  -  1007    1007
303  -  1008    1008
304  -  1011    1011
305  -          1003
306  -  1006    1006
307  -  1008    1008
308  -  1009    1009
309  -  1009    1009
310  -  1009    1009
311  -  1012    1012
312  -  1006    1006
313  -  1010    1010
314  -  1005    1005
315  -  1003        
316  -  1010    1010
317  -          1001
318  -  1002    1002
319  -  1010    1010
320  -  1001    1001

Here is what I did so far (script run in FODLER1):

Code:
#!/bin/gawk -f

BEGIN { 
{
print "ID   -","VALUE-FOLDER1", "VALUE-FOLDER2"
}
}	
ENDFILE{
id1[substr(FILENAME,3,3)]=substr(FILENAME,10,4)
for (i=101; i<=320; i++) 
{
if (i%100 <=20 && i%100 > 0) 
{
if ( i in id1)
{
print i," - ",id1[i]
}
}
}
}

Could you please help me for resolving this matter?

Thanks in advance
# 2  
Old 10-20-2014
Using gnu awk you could do this (similar to your solution):

Code:
gawk -F_ '
BEGIN { print "ID   -","VALUE-FOLDER1", "VALUE-FOLDER2" }
BEGINFILE {
    split(FILENAME, v, "_")
    if(v[2]+0>1000) id2[v[3]+0]=v[2]+0
    if(v[3]+0>1000) id1[v[2]+0]=v[3]+0
    nextfile
}
END {
    for (i=101; i<=320; i++) {
        if (i%100 >20) i+=80
        print i," - ", id1[i], id2[i]
    }
}' OFS="\t" FOLDER1/* FOLDER2/*

Or using standard awk you could do:

Code:
find FOLDER[12] -type f -print | awk -F_ '
BEGIN { print "ID   -","VALUE-FOLDER1", "VALUE-FOLDER2" }
$2+0>1000{id2[$3+0]=$2+0}
$3+0>1000{id1[$2+0]=$3+0}
END {
    for (i=101; i<=320; i++) {
        if (i%100 >20) i+=80
        print i," - ", id1[i], id2[i]
    }
}' OFS="\t"

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 10-21-2014
Thank you both solutions worked like a charm.

Best Regards
# 4  
Old 10-22-2014
Hello,

Sorry for bumping on the same post, but I just came across a new issue that I cannot solve.

The bellow script, does everything what I need, except handling duplicates.

Code:
find FOLDER[12] -type f -print | awk -F_ '
BEGIN { print "ID   -","VALUE-FOLDER1|","COUNT FOLDER1|", "VALUE-FOLDER2|","COUNT FOLDER2|","FILE SIZE FOLDER2"}
$2+0>1000{id2[$3+0]=$2+0;; c = "stat -c %s " $0 ;c |getline foo;file2[$3+0]=foo; close( c ) }
$3+0>1000{id1[$2+0]=$3+0}
END {
    for (i=101; i<=320; i++) {
        if (i%100 >20) i+=80
        if ( id1[i] !=""){count1++}
        if ( id2[i] !=""){count2++}
        if ( id1[i] !="" &&  id2[i] !="" ) 
        {print i," - ", id1[i],count1,id2[i],count2,file2[i]/1024/1024," MB"}
        else if ( id1[i] !="" &&  id2[i] =="" ) 
        {print i," - ", id1[i],count1,id2[i],"  ","  "}
        else if ( id1[i] =="" &&  id2[i] =="" ) 
        {print i," - ", id1[i],"  ",id2[i],"  ","  "}
        else if ( id1[i] =="" &&  id2[i] !="" ) 
        {print i," - ", id1[i],"  ",id2[i],count2,file2[i]/1024/1024," MB"}
       
    }
}' OFS="\t"

Beside the above files in FOLDER1 and FOLDER2, I have several more files as follows:

FODER1:
Code:
file_104_1129.txt
file_110_1007.txt

FODER2:
Code:
myfile_1957_307.log
myfile_1095_314.log

If possible I would like the following output:

Code:
ID   - VALUE-FOLDER1| COUNT FOLDER1| VALUE-FOLDER2| COUNT FOLDER2| FILE SIZE FOLDER2
101	 - 	1010	1		  	  
102	 - 	1007	2	1007	1	0	 MB
103	 - 	1003	3	1003	2	0	 MB
104	 - 	1007	4		  	  
104	 - 	1129	5		  	  
105	 - 	1011	6	1011	3	0	 MB
106	 - 	1006	7	1006	4	0	 MB
107	 - 		  	1002	5	0	 MB
108	 - 	1007	8	1007	6	0	 MB
109	 - 	1002	9		  	  
110	 - 	1006	10		  	  
110	 - 	1007	11		  	  
111	 - 	1008	12	1008	7	0	 MB
112	 - 	1011	13	1011	8	0	 MB
113	 - 	1012	14	1012	9	0	 MB
114	 - 	1001	15	1001	10	0	 MB
115	 - 	1009	16	1009	11	0	 MB
116	 - 	1002	17	1002	12	0	 MB
117	 - 	1003	18	1003	13	0	 MB
118	 - 	1006	19	1006	14	0	 MB
119	 - 	1012	20	1012	15	0	 MB
120	 - 	1010	21	1010	16	0	 MB
201	 - 	1003	22	1003	17	0	 MB
202	 - 	1004	23	1004	18	0	 MB
203	 - 	1007	24	1007	19	0	 MB
204	 - 	1002	25	1002	20	0	 MB
205	 - 	1010	26	1010	21	0	 MB
206	 - 	1005	27	1005	22	0	 MB
207	 - 	1006	28	1006	23	0	 MB
208	 - 	1006	29	1006	24	0	 MB
209	 - 	1001	30	1001	25	0	 MB
210	 - 	1004	31	1004	26	0	 MB
211	 - 	1003	32	1003	27	0	 MB
212	 - 		  	1002	28	0	 MB
213	 - 	1008	33	1008	29	0	 MB
214	 - 	1001	34	1001	30	0	 MB
215	 - 	1004	35	1004	31	0	 MB
216	 - 	1001	36	1001	32	0	 MB
217	 - 	1003	37	1003	33	0	 MB
218	 - 	1005	38		  	  
219	 - 	1006	39	1006	34	0	 MB
220	 - 	1012	40	1012	35	0	 MB
301	 - 	1010	41	1010	36	0	 MB
302	 - 	1007	42	1007	37	0	 MB
303	 - 	1008	43	1008	38	0	 MB
304	 - 	1011	44	1011	39	0	 MB
305	 - 		  	1003	40	0	 MB
306	 - 	1006	45	1006	41	0	 MB
307	 - 	1008	46	1008	42	0	 MB
307	 - 		  	1957	43	0	 MB
308	 - 	1009	47	1009	44	0	 MB
309	 - 	1009	48	1009	45	0	 MB
310	 - 	1009	49	1009	46	0	 MB
311	 - 	1012	50	1012	47	0	 MB
312	 - 	1006	51	1006	48	0	 MB
313	 - 	1010	52	1010	49	0	 MB
314	 - 	1005	53	1005	50	0	 MB
314	 - 		  	1095	51	0	 MB
315	 - 	1003	54		  	  
316	 - 	1010	55	1010	52	0	 MB
317	 - 		  	1001	53	0	 MB
318	 - 	1002	56	1002	54	0	 MB
319	 - 	1010	57	1010	55	0	 MB
320	 - 	1001	58	1001	56	0	 MB

Thanks in advance.
# 5  
Old 10-22-2014
If your find supports -printf, can I suggest this:

Code:
find FOLDER[12] -type f -printf "%s_%p\n" | awk -F_ '
BEGIN { print "ID   -","VALUE-FOLDER1|","COUNT FOLDER1|", "VALUE-FOLDER2|","COUNT FOLDER2|","FILE SIZE FOLDER2"}
$3+0>1000{i=$4+0;for(p=1;p SUBSEP i in id2;) p++;id2[p,i]=$3+0; file2[p,i]=$1 }
$4+0>1000{i=$3+0;for(p=1;p SUBSEP i in id1;) p++;id1[p,i]=$4+0}
END {
    for (i=101; i<=320; i++) {
        if (i%100 >20) i+=80
        for(p=1; p==1 || p SUBSEP i in id1 || p SUBSEP i in id2; p++) {
            printf("%d\t-\t%s\t%s\t%s\t%s\t", i,
                id1[p,i], id1[p,i]?++count1:"  ",
                id2[p,i], id2[p,i]?++count2:"  ")
            if(id2[p,i]) printf("%.2f\tMB\n", file2[p,i]/1024/1024);
            else printf("  \t  \n");
        }
       
    }
}'

otherwise:

Code:
find FOLDER[12] -type f -print | awk -F_ '
BEGIN { print "ID   -","VALUE-FOLDER1|","COUNT FOLDER1|", "VALUE-FOLDER2|","COUNT FOLDER2|","FILE SIZE FOLDER2"}
$2+0>1000{i=$3+0;for(p=1;p SUBSEP i in id2;) p++;id2[p,i]=$2+0; c = "stat -c %s " $0 ;c |getline foo;file2[p,i]=foo; close(c) }
$3+0>1000{i=$2+0;for(p=1;p SUBSEP i in id1;) p++;id1[p,i]=$3+0}
END {
    for (i=101; i<=320; i++) {
        if (i%100 >20) i+=80
        for(p=1; p==1 || p SUBSEP i in id1 || p SUBSEP i in id2; p++) {
            printf("%d\t-\t%s\t%s\t%s\t%s\t", i,
                id1[p,i], id1[p,i]?++count1:"  ",
                id2[p,i], id2[p,i]?++count2:"  ")
            if(id2[p,i]) printf("%.2f\tMB\n", file2[p,i]/1024/1024);
            else printf("  \t  \n");
        }
       
    }
}'


Last edited by Chubler_XL; 10-22-2014 at 05:07 PM.. Reason: Edit: Ensure size for zero MB files is still shown
This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 10-23-2014
Thank you very much for your help.

Both scripts are working perfect.

Best Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

How to copy files/folders and show the files/folders?

Hi, So i know we use cp -r as a basic to copy folders/files. I would like this BUT i would like to show the output of the files being copied. With the amazing knowledge i have i have gone as far as this: 1) find source/* -exec cp -r {} target/ \; 2) for ObjectToBeCopied in `find... (6 Replies)
Discussion started by: Imre
6 Replies

3. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

New files based off match or no match

Trying to match $2 in original_targets with $2 of new_targets . If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. UNIX for Dummies Questions & Answers

Listing folders and files within

is there any command that can make listing files like this /data/seismic/prestack-4/eon5/PEP/JAWA/AKASIA-BAGUS/3D/F/BL3-4/F12AKB3D_SW82-128_ID1696-1850.segy /data/seismic/prestack-4/eon5/PEP/JAWA/AKASIA-BAGUS/3D/F/BL3-4/F12AKB3D_SW82-128_ID1851-1975.segy ... (2 Replies)
Discussion started by: muhnandap
2 Replies

6. Shell Programming and Scripting

List all the files in the present path and Folders and subfolders files also

Hi, I need a script/command to list out all the files in current path and also the files in folder and subfolders. Ex: My files are like below $ ls -lrt total 8 -rw-r--r-- 1 abc users 419 May 25 10:27 abcd.xml drwxr-xr-x 3 abc users 4096 May 25 10:28 TEST $ Under TEST, there are... (2 Replies)
Discussion started by: divya bandipotu
2 Replies

7. UNIX for Dummies Questions & Answers

Searching for folders/parent folders not files.

Hello again, A little while back I got help with creating a command to search all directories and sub directories for files from daystart of day x. I'm wondering if there is a command that I've overlooked that may be able to search for / write folder names to an output file which ideally... (2 Replies)
Discussion started by: Aussiemick
2 Replies

8. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

9. HP-UX

to get the timestamp of files from the files and folders in Unix

Hi, I had a directory and many subdirectories and files with in it. Now i want to get the timestamp of files from the files and folders recursively. :( Please help me to generate a script fort he above mentioned requirement! Appreciate for ur qick response Thanks in advance! ... (2 Replies)
Discussion started by: kishan
2 Replies

10. Shell Programming and Scripting

removing old files except configuration files and folders

Dear all, I want to remove files older than 2 months in the /home/member directory. But except the configuration files (like .bash_profile .config/ .openoffice/ .local/ .kde/ etc..) I have tried with the command find . -mtime +60 -wholename './.*' -prune -o -print -exec mv {} \; but it... (1 Reply)
Discussion started by: jamcalicut
1 Replies
Login or Register to Ask a Question