Visit The New, Modern Unix Linux Community


ISSUE in handling multiple same name files :-(


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting ISSUE in handling multiple same name files :-(
# 1  
ISSUE in handling multiple same name files :-(

Dear all,
My work is completely stuck cos of the following issue. Please find it here and kindly help me.
Task is following:
I have set of files with such pattern
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
1t-rw-rw-r-- 1 emily emily  20 Jun 11 10:45 vgtree_75_1_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
2t-rw-rw-r-- 1 emily emily  41 Jun 11 10:45 vgtree_75_2_pfs.root
2t-rw-rw-r-- 1 emily emily   8 Jun 11 10:46 vgtree_3_2_pls.root
3t-rw-rw-r-- 1 emily emily  28 Jun 11 10:46 vgtree_2_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

As you can see that file are repeating, which means the repetition of pattern
vgtree_5_*. So the following file are repeating:
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root

similarly, file with vgtree_75_* are repeating.

What I want is to make a separate text file with the only files name present which are non repeating and if repeating, I would want to chose the file with the maximum file size. So bsically all the files mark RED in above file.
Also shown here:

Code:
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

Greetings,
emily
# 2  
Try

Code:
ls -l | sort -nrk5 | awk '{split($NF,A,"_");if(!X[A[1],A[2]]++){print}}'

This User Gave Thanks to pamu For This Post:
# 3  
Shell script solution

I tried this shell scripting solution.

Code:
DIR=/your/directory
PATTERNS=/tmp/available_patterns.txt
SORTED_PATTERNS=/tmp/unique_patterns.txt
NON_REPEATING=/tmp/non_repeating_files.dat

#Get all available file patterns or prefixes before the second "_"
for FILENAME in `ls $DIR`
do
 PATTERN=$(echo $FILENAME | awk -F"_" '{print $1"_"$2}' )
 echo $PATTERN >> $PATTERNS
done

#Get the unique patterns. Either sort -u or command uniq would work
sort -u $PATTERNS > $UNIQUE_PATTERNS

#From each unique pattern get the nr of occurrences and split the files
for FILENAME in `cat $UNIQUE_PATTERNS`
do
 OCCURS=$(ls ${FILENAME}* | wc -l)
 if [ $OCCURS -eq 1 ]	#Move the file to non_repeating
 then
	ls ${FILENAME}* >> $NON_REPEATING
 else	#Else sort by file size and move the max sized file to repeating
	ls -l ${FILENAME}* | sort -nrk5,5 | head -1 >>$REPEATING
 fi
done

Note: This is untested and loops through the directory twice
This User Gave Thanks to krishmaths For This Post:
# 4  
I would go with Pamu's solution Smilie

simple and effective
This User Gave Thanks to PikK45 For This Post:
# 5  
Thanks Pamu , it worked like charm..Smilie

---------- Post updated at 04:20 AM ---------- Previous update was at 02:14 AM ----------

Dear Pamu,
I just realized still, I am accepting some extra files (actually, I deal with hundreds of such files..Smilie )
And this time it is little tricky as well.
So on the following files:
Code:
1t-rw-rw-r-- 1 emily emily 119 Jun 11 10:45 vgtree_5_1_pfs.root
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
1t-rw-rw-r-- 1 emily emily  20 Jun 11 10:45 vgtree_75_1_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
2t-rw-rw-r-- 1 emily emily  41 Jun 11 10:45 vgtree_75_2_pfs.root
2t-rw-rw-r-- 1 emily emily   8 Jun 11 10:46 vgtree_3_2_pls.root
3t-rw-rw-r-- 1 emily emily  28 Jun 11 10:46 vgtree_2_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

I selected files interest of mine, which your command line does.
Code:
3t-rw-rw-r-- 1 emily emily 145 Jun 11 10:46 vgtree_5_3_pfs.root
3t-rw-rw-r-- 1 emily emily  73 Jun 11 10:45 vgtree_75_3_pfs.root
3t-rw-rw-r-- 1 emily emily  75 Jun 11 10:46 vgtree_3_3_pfs.root

Now, again, I have to cross-check the this file_ID (like, 5, 75 and 3) with another available text file. The text file would look like this:
Code:
crab:  ExitCodes Summary
 >>>>>>>>> 396 Jobs with Wrapper Exit Code : 0 
	 List of jobs: 1-8,13-66,68,70-81,86-95,97-126,128-166,168-185,187-195,197,200-246,248-261,266-305,307-309,311-326,328-336,340-349,351-352,354-367,369-395,397-411,413-429 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 1 Jobs with Wrapper Exit Code : 8021 
	 List of jobs: 127 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 1 Jobs with Wrapper Exit Code : 50115 
	 List of jobs: 96 
	See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:   429 Total Jobs 
 >>>>>>>>> 399 Jobs Retrieved 
	List of jobs Retrieved: 1-8,13-66,68,70-81,86-166,168-185,187-195,197,200-246,248-261,266-309,311-326,328-336,340-349,351-352,354-367,369-395,397-411,413-429 
 >>>>>>>>> 1 Jobs Cancelled by user 
	List of jobs Cancelled by user: 327 
 >>>>>>>>> 29 Jobs Cancelled 
	List of jobs Cancelled: 9-12,67,69,75, 82-85,167,186,196,198-199,247,262-265,310,337-339,350,353,368,396,412 

Now, I need to compare the fileID against the numbers marked in RED SECTION here, if they match. I should discard that file...
For example, 75 lies in the cancelled job And I should discard it after comparing this file.
Not sure if its obvious.

Thanks
emily

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #30
Difficulty: Medium
Microsoft was founded in 1971.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Issue with search and replacing multiple items in multiple files

Im having an issue when trying to replace the first column with a new set of values in multiple files. The results from the following code only replaces the files with the last set of values in val.txt. I want to replace all the files with all the values. for date in {1..31} do for val in... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

2. Shell Programming and Scripting

awk script issue redirecting to multiple files after matching pattern

Hi All I am having one awk and sed requirement for the below problem. I tried multiple options in my sed or awk and right output is not coming out. Problem Description ############################################################### I am having a big file say file having repeated... (4 Replies)
Discussion started by: kshitij
4 Replies

3. UNIX for Dummies Questions & Answers

File handling issue

Hi All, I am running into an issue. I have a very big file. Wants to split it in smaller chunks. This file has multiple header/ trailers. Also, between each header/trailer there are records. Number of records in each header trailer combination can vary. Also, headers can start with... (3 Replies)
Discussion started by: Gurkamal83
3 Replies

4. UNIX for Dummies Questions & Answers

Large file data handling issue

I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long" I need to actually remove the last | symbol from this file. sed -e 's/\|*$//' filename is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies

5. Shell Programming and Scripting

UNIX file handling issue

I have a huge file semicolon( ; ) separated records are Pipe(|) delimited. e.g abc;def;ghi|jkl;mno;pqr|123;456;789 I need to replace the 50th field(semicolon separated) of each record with 9006. The 50th field can have no value e.g. ;; Can someone help me with the appropriate command. (3 Replies)
Discussion started by: Gurkamal83
3 Replies

6. Shell Programming and Scripting

[SOLVED] Handling multiple files using awk

Hi, I am trying to process 2 files simultaneously using awk satisfying following condition, Both files contain 3 columns. It should take entry from column 1 from first file, look for that entry in file 2 and if found, add column 2 and column 3 from both files and output to third file. For e.g.... (4 Replies)
Discussion started by: muazfarooqaslam
4 Replies

7. Shell Programming and Scripting

Issue with Error handling,not able to continue the script further

Hi, I am trying to write a script to cleanup files in a log directory .. cd log find Datk** -mtime +7 -exec rm -f {} \; 2> /dev/null Have used the above to clean up files in log directory more then 7 days older. The file can be something like ( auto-generate by some processes and... (2 Replies)
Discussion started by: nss280
2 Replies

8. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

9. Emergency UNIX and Linux Support

rsync transferring multiple files issue

Hi, I want to specify multiple remote directories but want to transfer them in a single command with one connection with remote server. This avoids entering passwords repeatedly and is also efficient. e.g. rsync -vrt --size-only --delete user@host:/home/user1/dir1... (9 Replies)
Discussion started by: sardare
9 Replies

10. UNIX for Advanced & Expert Users

Multiple file handling

Dear All, I have two files, which looks like: File 1 124 235 152 178 156 142 178 163 159 File 2 124|5623 452|6698 178|9995 (8 Replies)
Discussion started by: rochitsharma
8 Replies

Featured Tech Videos