Figure out complex sort


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Figure out complex sort
# 1  
Old 02-04-2009
Figure out complex sort

This is an extension to a question that was earlier posted on this forum:
I have task in which I need to pickup a set of files from a directory
depending on the following criteria:
Every month 6 files are expected to arrive at /test.
The files come with date timestamp and the latest file set for the month needs to be used
Suppose this is the set of files that present in the test directory
1. a_20080905_214058_200808.txt
2. b_20080905_214058_200808.txt
3. c_20080905_214058_200808.txt
4. d_20080905_214058_200808.txt
5. e_20080905_214058_200808.txt
6. f_20080905_214058_200808.txt
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt
13. d_20080906_114058_200809.txt
14. e_20080906_114058_200809.txt
15. f_20080906_114058_200809.txt
Then
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt files should be picked up.
Criteria 200809 is the latest month 20080906 is the latest date and 214058 is the latest date.
The criteria can be achieved using the following code

tr -s '_' ' ' < $LINE2 | \
sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
tr '_' ' ' | sort > resultsfile
sed -e "s/ [ ]*/_/g" resultsfile > outfile #to replace spaces with '_' bcoz thats what the name of file is.
sed -e "s/^_//g" outfile >outfile2 #to remove '_' at the end of some lines
sed -e "s/_$//g" outfile2>outfile3 #to remove '_' at the beginning of some lines
grep -v "^$" outfile3 > newfilename
The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
How to modify the above statement to achieve this. Another thing is I am creating many files o achieve this task and would like to avoid it.
# 2  
Old 02-04-2009
Quote:
Originally Posted by w020637
This is an extension to a question that was earlier posted on this forum:
I have task in which I need to pickup a set of files from a directory
depending on the following criteria:
Every month 6 files are expected to arrive at /test.
The files come with date timestamp and the latest file set for the month needs to be used
Suppose this is the set of files that present in the test directory
1. a_20080905_214058_200808.txt
2. b_20080905_214058_200808.txt
3. c_20080905_214058_200808.txt
4. d_20080905_214058_200808.txt
5. e_20080905_214058_200808.txt
6. f_20080905_214058_200808.txt
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt
13. d_20080906_114058_200809.txt
14. e_20080906_114058_200809.txt
15. f_20080906_114058_200809.txt
Then
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt files should be picked up.

When supplying test data, please use a representative sample. Given the above filenames, the solution is easy; with the filenames you buried at the bottom of the post, it is not so trivial.
Quote:
Criteria 200809 is the latest month 20080906 is the latest date and 214058 is the latest date.
The criteria can be achieved using the following code

Please put code inside [code] tags.
Quote:
Code:
tr -s '_' ' ' <  $LINE2 | \


There is no need for a backslash after the pipe.

And what does $LINE2 contain?
Quote:
Code:
  sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
  awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
  tr '_' ' ' | sort > resultsfile
sed -e "s/ [ ]*/_/g" resultsfile  > outfile #to replace spaces with '_' bcoz thats what the name of file is.
sed -e "s/^_//g" outfile >outfile2 #to remove '_' at the end of some lines
sed -e "s/_$//g" outfile2>outfile3 #to remove '_' at the beginning of some lines
grep -v "^$" outfile3 > newfilename

The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
How to modify the above statement to achieve this. Another thing is I am creating many files o achieve this task and would like to avoid it.

Code:
printf "%s\n" *.txt |
   sed 's/_\([a-zA-Z]\)/~\1/g' |
     sort -rt_ -k2,2n -k3,3n |
       head -n6 |
         sed 's/~/_/g'

Or why not simply:

Code:
ls -t *.txt | head -n6

# 3  
Old 02-04-2009
the ls wont work as the time stamp in the filename are not consistent with the actual file timestamps.

Taking the clue I am replacing the '_' with an unlikely string sequence and replacing then back at the end.
# 4  
Old 02-04-2009
I still don't understand why you can't do this:

ls -r1 *_????????_??????_??????.txt|sort -t2 +1 -r|head -6

What does it give you? Why make it more difficult than what it is if you have the pattern already?!?
# 5  
Old 02-04-2009
Actually there is not one pattern but six different patterns
# 6  
Old 02-04-2009
Quote:
Originally Posted by w020637
Actually there is not one pattern but six different patterns
given your quote above:
Quote:
Originally Posted by w020637
The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
there's only ONE pattern:
'_n_n_n.txt' preceeded by ANYTHING. Isn't it true?
# 7  
Old 02-05-2009
Hi

Thank you all for the suggestions.

I have made code changes as suggested and the results are awesome.

The FILENAME=/test/files.txt has all the patterns I want to keep.

The following code1 of epic proportions has been replaced by code2 although both are working fine.But shorter the better.

code1
*****************
#!/bin/ksh
#Directories
pathdir=/test
test_dir=/test
#Variables
pym=200812
ym=200901
#Imp Files
FILENAME=/test/files.txt
#Temp Files
LINE1=/test/files1$ym.txt
resultsfile=/test/files2$ym.txt
LINE3=/test/files3$ym.txt
usage(){
print process_test.sh - script to process test data
print "Usage: process_test.sh"
}
if test -f "$LINE1"
then
rm $LINE1
fi
cat $FILENAME | while read LINE
do
cd $test_dir
find . -name "$LINE$ym*$pym.txt" -print| while read obj
do
# echo "Inside DO loop"
echo $obj >> $LINE1
done
done
sed -e "s/^\.\///g" $LINE1| sed -e "s/_\([a-zA-Z]\)/kkk\1/g" > $LINE3
tr -s '_' ' ' < $LINE3| \
sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
tr '_' ' ' | sort > $resultsfile
sed -e "s/ [ ]*/_/g" $resultsfile |sed -e "s/^_//g"|sed -e "s/_$//g"| \
grep -v "^$"|sed -e "s/kkk/_/g"| while read LINE4
do
echo "Process file : $LINE4"
done
rm $LINE1 $LINE3 $resultsfile
# end script

*****************

was replaced by

code2
**************

#!/bin/ksh
##########################################
#Directories
eagle_dir=/test
#Imp Files
FILENAME=/test/files.txt
usage(){
print process_test.sh - script to process test data
print "Usage: process_test.sh"
}
cat $FILENAME | while read LINE
do
ls -r1 "$test"/"$LINE"_????????_??????_??????.txt|sort -t2 +1 -r|head -1|while read obj
do
echo "Inside DO $LINE loop"
echo $obj
done
done
# end script
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with change significant figure to normal figure command

Hi, Below is my input file: Long list of significant figure 1.757E-4 7.51E-3 5.634E-5 . . . Desired output file: 0.0001757 0.00751 0.00005634 . . . (10 Replies)
Discussion started by: perl_beginner
10 Replies

2. Shell Programming and Scripting

How to sort by complex algorithm

Hello, To simplify ma question, here is my list : # cat liste a m x h and here is the right order to list his component : liste_order="1:m 2:a 3:h 4:x" The only way to sort my file like I want, I find this idea : cat liste | sed 's/a/2:a/g' | sed 's/m/1:m/g' | sed... (9 Replies)
Discussion started by: mlaiti
9 Replies

3. UNIX for Dummies Questions & Answers

Can't figure this one out --

I'm putting together a shell script while I'm learning UNIX -- just for myself. It's a little script that simply takes some vendor names and writes them to a file. So far I'm at the stage where the user enters the name of the file and places it in a folder called vendorlists: * ) touch... (5 Replies)
Discussion started by: Straitsfan
5 Replies

4. UNIX for Advanced & Expert Users

Can't figure out how this is working

I have two machines, each with a virtual interface, with the following configurations: Machine1: eth2 Link encap:Ethernet HWaddr 00:09:6B:19:E5:05 inet addr:172.16.0.201 Bcast:172.16.0.255 Mask:255.255.255.0 eth2:0 Link encap:Ethernet HWaddr 00:09:6B:19:E5:05 ... (0 Replies)
Discussion started by: druidmatrix
0 Replies

5. UNIX for Dummies Questions & Answers

Sort directory with complex numeric file names

I have a directory with a large number (1000s) of files and I need to produce a file listing all the files in the directory ordered "properly" (properly will be explained shortly). The files have the following naming pattern: bul_13_5_228_b.txt bul_1_3_57.txt bul_13_6_229.txt... (2 Replies)
Discussion started by: fdsayre
2 Replies

6. IP Networking

How do I figure out the subnet?

Hi, How do I get subnet from this: 10.252.0.138/25 Tnx (2 Replies)
Discussion started by: mehrdad68
2 Replies

7. Shell Programming and Scripting

Sort complex data

Hi, Can someone here help sorting the following data in numeric order? INPUT: FIRST abc(3) def(13) fgh(1) ijk(6) abc(2) SECOND dfe(10) abc(4) hij(19) tlm(1) hij(1) hub(10) abc(1) fed(3) OTHERS hij(10) mok(4) bub(19) hij(1) abc(2) abc(15) abc(1) hij(3) OUTPUT: FIRST def(13) ijk(6)... (12 Replies)
Discussion started by: need_help
12 Replies

8. Shell Programming and Scripting

Cant figure out this..

Hi, I need a little help here. I am exporting user info from a PSQL database and everything is working with the exception of this: 10029008:dsAuthMethodStandard\:dsAuthClearText:classword:10029008:2004:10029008:10029008:/home/student/1002/90/08:10029008 It is putting a colon right before the... (1 Reply)
Discussion started by: Stud33
1 Replies

9. UNIX for Dummies Questions & Answers

figure it out

hi there i am new to this site and this linux and unix stuff so kind of plz help me out hoe to start the stuff.... (1 Reply)
Discussion started by: cool_dude
1 Replies

10. UNIX for Dummies Questions & Answers

i can not figure this out

I am having problems scripting in UNIX. I am currently attending school and for the first time I am being introduced to scripting. My problem is I am supposed to enhance the spell_check by adding a third optional argument. The third argument is to specify a list of words to be added to the... (1 Reply)
Discussion started by: steph
1 Replies
Login or Register to Ask a Question