Visit The New, Modern Unix Linux Community


ISSUE in handling multiple same name files :-(


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting ISSUE in handling multiple same name files :-(
# 1  
ISSUE in handling multiple same name files :-(

Dear all,
My work is completely stuck cos of the following issue. Please find it here and kindly help me.
Task is following:
I have set of files with such pattern
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
1t-rw-rw-r-- 1 emily emily  20 Jun 11 10:45 vgtree_75_1_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
2t-rw-rw-r-- 1 emily emily  41 Jun 11 10:45 vgtree_75_2_pfs.root
2t-rw-rw-r-- 1 emily emily   8 Jun 11 10:46 vgtree_3_2_pls.root
3t-rw-rw-r-- 1 emily emily  28 Jun 11 10:46 vgtree_2_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

As you can see that file are repeating, which means the repetition of pattern
vgtree_5_*. So the following file are repeating:
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root

similarly, file with vgtree_75_* are repeating.

What I want is to make a separate text file with the only files name present which are non repeating and if repeating, I would want to chose the file with the maximum file size. So bsically all the files mark RED in above file.
Also shown here:

Code:
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

Greetings,
emily
# 2  
Try

Code:
ls -l | sort -nrk5 | awk '{split($NF,A,"_");if(!X[A[1],A[2]]++){print}}'

This User Gave Thanks to pamu For This Post:
# 3  
Shell script solution

I tried this shell scripting solution.

Code:
DIR=/your/directory
PATTERNS=/tmp/available_patterns.txt
SORTED_PATTERNS=/tmp/unique_patterns.txt
NON_REPEATING=/tmp/non_repeating_files.dat

#Get all available file patterns or prefixes before the second "_"
for FILENAME in `ls $DIR`
do
 PATTERN=$(echo $FILENAME | awk -F"_" '{print $1"_"$2}' )
 echo $PATTERN >> $PATTERNS
done

#Get the unique patterns. Either sort -u or command uniq would work
sort -u $PATTERNS > $UNIQUE_PATTERNS

#From each unique pattern get the nr of occurrences and split the files
for FILENAME in `cat $UNIQUE_PATTERNS`
do
 OCCURS=$(ls ${FILENAME}* | wc -l)
 if [ $OCCURS -eq 1 ]	#Move the file to non_repeating
 then
	ls ${FILENAME}* >> $NON_REPEATING
 else	#Else sort by file size and move the max sized file to repeating
	ls -l ${FILENAME}* | sort -nrk5,5 | head -1 >>$REPEATING
 fi
done

Note: This is untested and loops through the directory twice
This User Gave Thanks to krishmaths For This Post:
# 4  
I would go with Pamu's solution Smilie

simple and effective
This User Gave Thanks to PikK45 For This Post:
# 5  
Thanks Pamu , it worked like charm..Smilie

---------- Post updated at 04:20 AM ---------- Previous update was at 02:14 AM ----------

Dear Pamu,
I just realized still, I am accepting some extra files (actually, I deal with hundreds of such files..Smilie )
And this time it is little tricky as well.
So on the following files:
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
1t-rw-rw-r-- 1 emily emily  20 Jun 11 10:45 vgtree_75_1_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
2t-rw-rw-r-- 1 emily emily  41 Jun 11 10:45 vgtree_75_2_pfs.root
2t-rw-rw-r-- 1 emily emily   8 Jun 11 10:46 vgtree_3_2_pls.root
3t-rw-rw-r-- 1 emily emily  28 Jun 11 10:46 vgtree_2_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

I selected files interest of mine, which your command line does.
Code:
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

Now, again, I have to cross-check the this file_ID (like, 5, 75 and 3) with another available text file. The text file would look like this:
Code:
crab:  ExitCodes Summary
 >>>>>>>>> 396 Jobs with Wrapper Exit Code : 0 
	 List of jobs: 1-8,13-66,68,70-81,86-95,97-126,128-166,168-185,187-195,197,200-246,248-261,266-305,307-309,311-326,328-336,340-349,351-352,354-367,369-395,397-411,413-429 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 1 Jobs with Wrapper Exit Code : 8021 
	 List of jobs: 127 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 1 Jobs with Wrapper Exit Code : 50115 
	 List of jobs: 96 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:   429 Total Jobs 
 >>>>>>>>> 399 Jobs Retrieved 
	List of jobs Retrieved: 1-8,13-66,68,70-81,86-166,168-185,187-195,197,200-246,248-261,266-309,311-326,328-336,340-349,351-352,354-367,369-395,397-411,413-429 
 >>>>>>>>> 1 Jobs Cancelled by user 
	List of jobs Cancelled by user: 327 
 >>>>>>>>> 29 Jobs Cancelled 
	List of jobs Cancelled: 9-12,67,69,75, 82-85,167,186,196,198-199,247,262-265,310,337-339,350,353,368,396,412 

Now, I need to compare the fileID against the numbers marked in RED SECTION here, if they match. I should discard that file...
For example, 75 lies in the cancelled job And I should discard it after comparing this file.
Not sure if its obvious.

Thanks
emily

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #899
Difficulty: Easy
Programs included in BusyBox can be run simply by adding their name as an argument to the BusyBox executable.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Issue with search and replacing multiple items in multiple files

Im having an issue when trying to replace the first column with a new set of values in multiple files. The results from the following code only replaces the files with the last set of values in val.txt. I want to replace all the files with all the values. for date in {1..31} do for val in... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

2. Shell Programming and Scripting

awk script issue redirecting to multiple files after matching pattern

Hi All I am having one awk and sed requirement for the below problem. I tried multiple options in my sed or awk and right output is not coming out. Problem Description ############################################################### I am having a big file say file having repeated... (4 Replies)
Discussion started by: kshitij
4 Replies

3. UNIX for Dummies Questions & Answers

File handling issue

Hi All, I am running into an issue. I have a very big file. Wants to split it in smaller chunks. This file has multiple header/ trailers. Also, between each header/trailer there are records. Number of records in each header trailer combination can vary. Also, headers can start with... (3 Replies)
Discussion started by: Gurkamal83
3 Replies

4. UNIX for Dummies Questions & Answers

Large file data handling issue

I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long" I need to actually remove the last | symbol from this file. sed -e 's/\|*$//' filename is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies

5. Shell Programming and Scripting

UNIX file handling issue

I have a huge file semicolon( ; ) separated records are Pipe(|) delimited. e.g abc;def;ghi|jkl;mno;pqr|123;456;789 I need to replace the 50th field(semicolon separated) of each record with 9006. The 50th field can have no value e.g. ;; Can someone help me with the appropriate command. (3 Replies)
Discussion started by: Gurkamal83
3 Replies

6. Shell Programming and Scripting

[SOLVED] Handling multiple files using awk

Hi, I am trying to process 2 files simultaneously using awk satisfying following condition, Both files contain 3 columns. It should take entry from column 1 from first file, look for that entry in file 2 and if found, add column 2 and column 3 from both files and output to third file. For e.g.... (4 Replies)
Discussion started by: muazfarooqaslam
4 Replies

7. Shell Programming and Scripting

Issue with Error handling,not able to continue the script further

Hi, I am trying to write a script to cleanup files in a log directory .. cd log find Datk** -mtime +7 -exec rm -f {} \; 2> /dev/null Have used the above to clean up files in log directory more then 7 days older. The file can be something like ( auto-generate by some processes and... (2 Replies)
Discussion started by: nss280
2 Replies

8. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

9. Emergency UNIX and Linux Support

rsync transferring multiple files issue

Hi, I want to specify multiple remote directories but want to transfer them in a single command with one connection with remote server. This avoids entering passwords repeatedly and is also efficient. e.g. rsync -vrt --size-only --delete user@host:/home/user1/dir1... (9 Replies)
Discussion started by: sardare
9 Replies

10. UNIX for Advanced & Expert Users

Multiple file handling

Dear All, I have two files, which looks like: File 1 124 235 152 178 156 142 178 163 159 File 2 124|5623 452|6698 178|9995 (8 Replies)
Discussion started by: rochitsharma
8 Replies

Featured Tech Videos