Compare Only "File Names" in 2 Files with file lists having different directory structure


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare Only "File Names" in 2 Files with file lists having different directory structure
# 1  
Old 12-29-2016
Compare Only "File Names" in 2 Files with file lists having different directory structure

I have a tar arcive
Code:
arch_all.tar.gz

and 4 batched tar archive . These batches are supposed to have all the files form arch1.all.tar.gz

Code:
arch1_batch1.tar.gz
arch1_batch2.tar.gz
arch1_batch3.tar.gz
arch1_batch4.tar.gz

my issue is that the directory structure in "arch_all.tar.gz" is different than the directory strutcure in batch1 2 3 and 4 . I need to find missing files in batch1 2 3 and 4.

example:

in arch1.all.tar.gz

Code:
-rw-r--r-- oracle/oracle 40203 2016-12-25 14:59 usr/data/output/export_12-25-2016/File_31339155.xml
-rw-r--r-- oracle/oracle 40203 2016-12-25 14:59 usr/data/output/export_12-25-2016/File_31339156.xml

The same file is named as

Code:
-rw-r--r-- oracle/oracle 40203 2016-12-26 13:21 export_12-26-2016_BATCH1/File_31339155.xml

I was able to create a combined file with lists from batch1 batch2 batch3 and batch4

QUESTION:


I need to write a shell script that can help me grep only the filenames from these 2 files and show me the difference if any?

arch_all.tar.gz has almost 2700 more files that all the 4 batches combined.

Example:arch_all:


Code:
-rw-r--r-- oracle/oracle 40203 2016-12-25 14:59 usr/data/output/export_12-25-2016/File_31339155.xml
-rw-r--r-- oracle/oracle 40203 2016-12-25 14:59 usr/data/output/export_12-25-2016/File_31339156.xml

Combined file with files from batch1 batch2 batch3 and batch4:

Code:
-rw-r--r-- oracle/oracle 40203 2016-12-26 13:21 export_12-26-2016_BATCH1/File_31339155.xml

Since the combined file is missing file
Code:
File_31339156.xml

I expect to see "File_31339156.xml" as the output.

Can you please help?



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!


Thanks

Last edited by RudiC; 12-29-2016 at 03:46 PM.. Reason: Added CODE tags.
# 2  
Old 12-29-2016
Hi,

can you try something like this ?

Code:
tar tf all.tar.gz | grep ".xml" > all-xml-file-list

#if you dont have file list from batch ,create it
rm -f batch-file-list
for i in bat*.tar.gz 
do
tar tf $i | grep ".xml" >> batch-file-list
done

echo "get missing list"
grep -v -f batch-file-list all-xml-file-list

Note that in tar files i look only for xml files, you might need to modify a bit.
This User Gave Thanks to greet_sed For This Post:
# 3  
Old 12-30-2016
I don't see that greet_sed's suggestion makes any attempt to extract just the last component of any of the pathnames in your two files. You didn't show us how your tar archives are created and you haven't bothered to tell us what operating system or shell you're using. The following awk script should work even if your archives contain directories in addition to regular files, but if your archives only contain regular files, the code could be simplified:
Code:
awk -F/ '
!$NF {	next
}
NR == FNR {
	files[$NF]
	next
}
{	delete files[$NF]
}
END {	for(file in files)
		print file
}
' arch_all combined

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 12-30-2016
Small perhaps theoretical note:
Code:
!$NF {	next
}

is used to skip directories, but it would also skip files with names like 0, 00 or +0.

A safer method would be to use:
Code:
$NF=="" { 
  next
}

# 5  
Old 12-30-2016
Quote:
Originally Posted by Don Cragun
I don't see that greet_sed's suggestion makes any attempt to extract just the last component of any of the pathnames in your two files. You didn't show us how your tar archives are created and you haven't bothered to tell us what operating system or shell you're using. The following awk script should work even if your archives contain directories in addition to regular files, but if your archives only contain regular files, the code could be simplified:
Code:
awk -F/ '
!$NF {	next
}
NR == FNR {
	files[$NF]
	next
}
{	delete files[$NF]
}
END {	for(file in files)
		print file
}
' arch_all combined

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

I am on Bash.

I am not a unix expert, can you please give me the code to print the final result with difference in both the list files assuming the first file name is all-xml-file-list.lst and combinted batch file with list is "batch-file-list.lst". Please note the *.lst files do have file names with directory structure in them.

Thanks

---------- Post updated at 05:15 AM ---------- Previous update was at 04:58 AM ----------

Please ignore my previous update. I was able to pull only the file names using your AWK script. Now I am using the compare using grep -f -v option.

Thanks

---------- Post updated at 05:22 AM ---------- Previous update was at 05:15 AM ----------

grep -v -f batch-file-list batch-file-list.lst > /tmp/difference.lst

Am I using the correct command to print the difference in /tmp/difference.lst? It's been running for a while

Last edited by sumang24; 12-30-2016 at 06:21 AM..
# 6  
Old 12-30-2016
Quote:
Originally Posted by sumang24
I am on Bash.

I am not a unix expert, can you please give me the code to print the final result with difference in both the list files assuming the first file name is all-xml-file-list.lst and combinted batch file with list is "batch-file-list.lst". Please note the *.lst files do have file names with directory structure in them.

Thanks

---------- Post updated at 05:15 AM ---------- Previous update was at 04:58 AM ----------

Please ignore my previous update. I was able to pull only the file names using your AWK script. Now I am using the compare using grep -f -v option.

Thanks

---------- Post updated at 05:22 AM ---------- Previous update was at 05:15 AM ----------

grep -v -f batch-file-list batch-file-list.lst > /tmp/difference.lst

Am I using the correct command to print the difference in /tmp/difference.lst? It's been running for a while
I am completely at a loss from your above statements. In your first post in this thread you said you had two files (one that you referred to as arch_all and one that you said "Combined file with files from batch1 batch2 batch3 and batch4" which my script assumed was named combined). If you had given the names of those two files (in that order) as the names of the files on the last line of the script I provided, the output would have been the output you requested! I.e., the names of the files in 1st input file (after discarding the directories in which those files were located) that were not found in the 2nd input file (after discarding the directories in which those files were located). So, what are you now trying to do with grep -v -f that wasn't already done by the code I provided???
These 2 Users Gave Thanks to Don Cragun For This Post:
# 7  
Old 12-30-2016
Quote:
Originally Posted by Don Cragun
I am completely at a loss from your above statements. In your first post in this thread you said you had two files (one that you referred to as arch_all and one that you said "Combined file with files from batch1 batch2 batch3 and batch4" which my script assumed was named combined). If you had given the names of those two files (in that order) as the names of the files on the last line of the script I provided, the output would have been the output you requested! I.e., the names of the files in 1st input file (after discarding the directories in which those files were located) that were not found in the 2nd input file (after discarding the directories in which those files were located). So, what are you now trying to do with grep -v -f that wasn't already done by the code I provided???
Your script worked flawlessly. I was able to get the difference in 1 shot.

Thanks

---------- Post updated at 09:05 PM ---------- Previous update was at 09:04 PM ----------

Quote:
Originally Posted by greet_sed
Hi,

can you try something like this ?

Code:
tar tf all.tar.gz | grep ".xml" > all-xml-file-list

#if you dont have file list from batch ,create it
rm -f batch-file-list
for i in bat*.tar.gz 
do
tar tf $i | grep ".xml" >> batch-file-list
done

echo "get missing list"
grep -v -f batch-file-list all-xml-file-list

Note that in tar files i look only for xml files, you might need to modify a bit.
Thanks for your help. I had to use Don's script as it handled the stripping of directory component in files names
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

The "read" command misinterprets file names containing spaces

The "read" command, which is built into bash, takes words from the standard input. However, "read" is not good at taking file names if the file names contain spaces. I would like my bash script to ask the user to enter file names, which may contain spaces. Can you think about any technique for... (14 Replies)
Discussion started by: LessNux
14 Replies

5. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

6. Shell Programming and Scripting

Compare file names and select correct elements to include in "for each loop"

Hi everyone, I`ll try to be most clear I can explaining my help request. I have 2 folders Folder A-->This folder receives files through FTP constantly Folder B-->The files from Folder A are unzipped and then processed in Folder B Sometimes Folder A doesn`t contain all... (2 Replies)
Discussion started by: cgkmal
2 Replies

7. Shell Programming and Scripting

Delete files older than "x" if directory size is greater than "y"

I wrote a script to delete files which are older than "x" days, if the size of the directory is greater than "y" #!/bin/bash du -hs $1 while read SIZE ENTRY do if ; then find $1 -mtime +$2 -exec rm -f {} \; echo "Files older than $2 days deleted" else echo "free Space available"... (4 Replies)
Discussion started by: JamesCarter
4 Replies

8. Shell Programming and Scripting

"sed" to check file size & echo " " to destination file

Hi, I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
Discussion started by: jockey007
7 Replies

9. UNIX for Dummies Questions & Answers

total number of files which have "aaa" in files whose names are File*_bbb*

I am getting the list of all the files which have "aaa" from files whose name is File*_bbb*. grep -l "aaa" File*_bbb* But I want to count the number of files. That is I want the total number of files which have "aaa" in files File*_bbb* If I run the following for getting number of... (1 Reply)
Discussion started by: sudheshnaiyer
1 Replies

10. UNIX for Dummies Questions & Answers

File names that contain "01" act hidden

Haveing an issue. Anytime a file is created with "01" (zero one) in the name - it cannot be viewed by LS or any other file listing command. Although the file is there, it cannot be seen. I can edit it, run it, anything, except see it..... What happened? Any ideas? (8 Replies)
Discussion started by: n9ninchd
8 Replies
Login or Register to Ask a Question