Compare Only "File Names" in 2 Files with file lists having different directory structure
I have a tar arcive
and 4 batched tar archive . These batches are supposed to have all the files form arch1.all.tar.gz
my issue is that the directory structure in "arch_all.tar.gz" is different than the directory strutcure in batch1 2 3 and 4 . I need to find missing files in batch1 2 3 and 4.
example:
in arch1.all.tar.gz
The same file is named as
I was able to create a combined file with lists from batch1 batch2 batch3 and batch4
QUESTION:
I need to write a shell script that can help me grep only the filenames from these 2 files and show me the difference if any?
arch_all.tar.gz has almost 2700 more files that all the 4 batches combined.
Example:arch_all:
Combined file with files from batch1 batch2 batch3 and batch4:
Since the combined file is missing file
I expect to see "File_31339156.xml" as the output.
Can you please help?
Moderator's Comments:
Please use CODE tags as required by forum rules!
Thanks
Last edited by RudiC; 12-29-2016 at 03:46 PM..
Reason: Added CODE tags.
I don't see that greet_sed's suggestion makes any attempt to extract just the last component of any of the pathnames in your two files. You didn't show us how your tar archives are created and you haven't bothered to tell us what operating system or shell you're using. The following awk script should work even if your archives contain directories in addition to regular files, but if your archives only contain regular files, the code could be simplified:
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
I don't see that greet_sed's suggestion makes any attempt to extract just the last component of any of the pathnames in your two files. You didn't show us how your tar archives are created and you haven't bothered to tell us what operating system or shell you're using. The following awk script should work even if your archives contain directories in addition to regular files, but if your archives only contain regular files, the code could be simplified:
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
I am on Bash.
I am not a unix expert, can you please give me the code to print the final result with difference in both the list files assuming the first file name is all-xml-file-list.lst and combinted batch file with list is "batch-file-list.lst". Please note the *.lst files do have file names with directory structure in them.
Thanks
---------- Post updated at 05:15 AM ---------- Previous update was at 04:58 AM ----------
Please ignore my previous update. I was able to pull only the file names using your AWK script. Now I am using the compare using grep -f -v option.
Thanks
---------- Post updated at 05:22 AM ---------- Previous update was at 05:15 AM ----------
I am not a unix expert, can you please give me the code to print the final result with difference in both the list files assuming the first file name is all-xml-file-list.lst and combinted batch file with list is "batch-file-list.lst". Please note the *.lst files do have file names with directory structure in them.
Thanks
---------- Post updated at 05:15 AM ---------- Previous update was at 04:58 AM ----------
Please ignore my previous update. I was able to pull only the file names using your AWK script. Now I am using the compare using grep -f -v option.
Thanks
---------- Post updated at 05:22 AM ---------- Previous update was at 05:15 AM ----------
Am I using the correct command to print the difference in /tmp/difference.lst? It's been running for a while
I am completely at a loss from your above statements. In your first post in this thread you said you had two files (one that you referred to as arch_all and one that you said "Combined file with files from batch1 batch2 batch3 and batch4" which my script assumed was named combined). If you had given the names of those two files (in that order) as the names of the files on the last line of the script I provided, the output would have been the output you requested! I.e., the names of the files in 1st input file (after discarding the directories in which those files were located) that were not found in the 2nd input file (after discarding the directories in which those files were located). So, what are you now trying to do with grep -v -f that wasn't already done by the code I provided???
These 2 Users Gave Thanks to Don Cragun For This Post:
I am completely at a loss from your above statements. In your first post in this thread you said you had two files (one that you referred to as arch_all and one that you said "Combined file with files from batch1 batch2 batch3 and batch4" which my script assumed was named combined). If you had given the names of those two files (in that order) as the names of the files on the last line of the script I provided, the output would have been the output you requested! I.e., the names of the files in 1st input file (after discarding the directories in which those files were located) that were not found in the 2nd input file (after discarding the directories in which those files were located). So, what are you now trying to do with grep -v -f that wasn't already done by the code I provided???
Your script worked flawlessly. I was able to get the difference in 1 shot.
Thanks
---------- Post updated at 09:05 PM ---------- Previous update was at 09:04 PM ----------
Quote:
Originally Posted by greet_sed
Hi,
can you try something like this ?
Note that in tar files i look only for xml files, you might need to modify a bit.
Thanks for your help. I had to use Don's script as it handled the stripping of directory component in files names
Hi 2 all,
i have had AIX 7.2
:/# /usr/IBMAHS/bin/apachectl -v
Server version: Apache/2.4.12 (Unix)
Server built: May 25 2015 04:58:27
:/#:/# /usr/IBMAHS/bin/apachectl -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
mpm_worker_module (static)
... (3 Replies)
Hello.
System : opensuse leap 42.3
I have a bash script that build a text file.
I would like the last command doing :
print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt
where :
print_cmd ::= some printing... (1 Reply)
How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address
and column 3 contains “cc” e-mail address to include with same email.
Sample input file, email.txt
Below is an sample code where... (2 Replies)
The "read" command, which is built into bash, takes words from the standard input. However, "read" is not good at taking file names if the file names contain spaces. I would like my bash script to ask the user to enter file names, which may contain spaces. Can you think about any technique for... (14 Replies)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
Hi everyone,
I`ll try to be most clear I can explaining my help request.
I have 2 folders
Folder A-->This folder receives files through FTP constantly
Folder B-->The files from Folder A are unzipped and then processed in Folder B
Sometimes Folder A doesn`t contain all... (2 Replies)
I wrote a script to delete files which are older than "x" days, if the size of the directory is greater than "y"
#!/bin/bash
du -hs $1
while read SIZE ENTRY
do
if ;
then
find $1 -mtime +$2 -exec rm -f {} \;
echo "Files older than $2 days deleted"
else
echo "free Space available"... (4 Replies)
Hi,
I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
I am getting the list of all the files which have "aaa" from files whose name is File*_bbb*.
grep -l "aaa" File*_bbb*
But I want to count the number of files. That is I want the total number of files which have "aaa" in
files File*_bbb*
If I run the following for getting number of... (1 Reply)
Haveing an issue. Anytime a file is created with "01" (zero one) in the name - it cannot be viewed by LS or any other file listing command. Although the file is there, it cannot be seen. I can edit it, run it, anything, except see it.....
What happened? Any ideas? (8 Replies)