The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
I can't figure this out? Quick help with sed pmoore4321 Shell Programming and Scripting 3 08-25-2008 05:58 PM
Cant figure out this.. Stud33 Shell Programming and Scripting 1 10-25-2007 08:26 PM
Can't figure out else not matching peteroc Shell Programming and Scripting 4 09-19-2006 05:58 PM
figure it out cool_dude UNIX for Dummies Questions & Answers 1 09-11-2006 01:49 PM
i can not figure this out steph UNIX for Dummies Questions & Answers 1 08-21-2002 09:32 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 02-04-2009
w020637 w020637 is offline
Registered User
  
 

Join Date: Feb 2009
Posts: 33
Figure out complex sort

This is an extension to a question that was earlier posted on this forum:
I have task in which I need to pickup a set of files from a directory
depending on the following criteria:
Every month 6 files are expected to arrive at /test.
The files come with date timestamp and the latest file set for the month needs to be used
Suppose this is the set of files that present in the test directory
1. a_20080905_214058_200808.txt
2. b_20080905_214058_200808.txt
3. c_20080905_214058_200808.txt
4. d_20080905_214058_200808.txt
5. e_20080905_214058_200808.txt
6. f_20080905_214058_200808.txt
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt
13. d_20080906_114058_200809.txt
14. e_20080906_114058_200809.txt
15. f_20080906_114058_200809.txt
Then
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt files should be picked up.
Criteria 200809 is the latest month 20080906 is the latest date and 214058 is the latest date.
The criteria can be achieved using the following code

tr -s '_' ' ' < $LINE2 | \
sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
tr '_' ' ' | sort > resultsfile
sed -e "s/ [ ]*/_/g" resultsfile > outfile #to replace spaces with '_' bcoz thats what the name of file is.
sed -e "s/^_//g" outfile >outfile2 #to remove '_' at the end of some lines
sed -e "s/_$//g" outfile2>outfile3 #to remove '_' at the beginning of some lines
grep -v "^$" outfile3 > newfilename
The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
How to modify the above statement to achieve this. Another thing is I am creating many files o achieve this task and would like to avoid it.
  #2 (permalink)  
Old 02-04-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,378
Quote:
Originally Posted by w020637 View Post
This is an extension to a question that was earlier posted on this forum:
I have task in which I need to pickup a set of files from a directory
depending on the following criteria:
Every month 6 files are expected to arrive at /test.
The files come with date timestamp and the latest file set for the month needs to be used
Suppose this is the set of files that present in the test directory
1. a_20080905_214058_200808.txt
2. b_20080905_214058_200808.txt
3. c_20080905_214058_200808.txt
4. d_20080905_214058_200808.txt
5. e_20080905_214058_200808.txt
6. f_20080905_214058_200808.txt
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt
13. d_20080906_114058_200809.txt
14. e_20080906_114058_200809.txt
15. f_20080906_114058_200809.txt
Then
7. a_20080906_214058_200809.txt
8. b_20080906_214058_200809.txt
9. c_20080906_214058_200809.txt
10. d_20080906_214058_200809.txt
11. e_20080906_214058_200809.txt
12. f_20080906_214058_200809.txt files should be picked up.

When supplying test data, please use a representative sample. Given the above filenames, the solution is easy; with the filenames you buried at the bottom of the post, it is not so trivial.
Quote:
Criteria 200809 is the latest month 20080906 is the latest date and 214058 is the latest date.
The criteria can be achieved using the following code

Please put code inside [code] tags.
Quote:
Code:
tr -s '_' ' ' <  $LINE2 | \

There is no need for a backslash after the pipe.

And what does $LINE2 contain?
Quote:

Code:
  sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
  awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
  tr '_' ' ' | sort > resultsfile
sed -e "s/ [ ]*/_/g" resultsfile  > outfile #to replace spaces with '_' bcoz thats what the name of file is.
sed -e "s/^_//g" outfile >outfile2 #to remove '_' at the end of some lines
sed -e "s/_$//g" outfile2>outfile3 #to remove '_' at the beginning of some lines
grep -v "^$" outfile3 > newfilename

The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
How to modify the above statement to achieve this. Another thing is I am creating many files o achieve this task and would like to avoid it.


Code:
printf "%s\n" *.txt |
   sed 's/_\([a-zA-Z]\)/~\1/g' |
     sort -rt_ -k2,2n -k3,3n |
       head -n6 |
         sed 's/~/_/g'

Or why not simply:


Code:
ls -t *.txt | head -n6

  #3 (permalink)  
Old 02-04-2009
w020637 w020637 is offline
Registered User
  
 

Join Date: Feb 2009
Posts: 33
the ls wont work as the time stamp in the filename are not consistent with the actual file timestamps.

Taking the clue I am replacing the '_' with an unlikely string sequence and replacing then back at the end.
  #4 (permalink)  
Old 02-04-2009
giannicello giannicello is offline
Registered User
  
 

Join Date: Sep 2001
Location: Phoenix
Posts: 169
I still don't understand why you can't do this:

ls -r1 *_????????_??????_??????.txt|sort -t2 +1 -r|head -6

What does it give you? Why make it more difficult than what it is if you have the pattern already?!?
  #5 (permalink)  
Old 02-04-2009
w020637 w020637 is offline
Registered User
  
 

Join Date: Feb 2009
Posts: 33
Actually there is not one pattern but six different patterns
  #6 (permalink)  
Old 02-04-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,131
Quote:
Originally Posted by w020637 View Post
Actually there is not one pattern but six different patterns
given your quote above:
Quote:
Originally Posted by w020637
The problem is to understand the logic of the code below because the set of files has many '_' eg. The file may be
pra_hrt_cq_20080906_214058_200809.txt or rrt_cd_20080906_214058_200809.txt.
there's only ONE pattern:
'_n_n_n.txt' preceeded by ANYTHING. Isn't it true?
  #7 (permalink)  
Old 02-05-2009
w020637 w020637 is offline
Registered User
  
 

Join Date: Feb 2009
Posts: 33
Hi

Thank you all for the suggestions.

I have made code changes as suggested and the results are awesome.

The FILENAME=/test/files.txt has all the patterns I want to keep.

The following code1 of epic proportions has been replaced by code2 although both are working fine.But shorter the better.

code1
*****************
#!/bin/ksh
#Directories
pathdir=/test
test_dir=/test
#Variables
pym=200812
ym=200901
#Imp Files
FILENAME=/test/files.txt
#Temp Files
LINE1=/test/files1$ym.txt
resultsfile=/test/files2$ym.txt
LINE3=/test/files3$ym.txt
usage(){
print process_test.sh - script to process test data
print "Usage: process_test.sh"
}
if test -f "$LINE1"
then
rm $LINE1
fi
cat $FILENAME | while read LINE
do
cd $test_dir
find . -name "$LINE$ym*$pym.txt" -print| while read obj
do
# echo "Inside DO loop"
echo $obj >> $LINE1
done
done
sed -e "s/^\.\///g" $LINE1| sed -e "s/_\([a-zA-Z]\)/kkk\1/g" > $LINE3
tr -s '_' ' ' < $LINE3| \
sort -n -k 1.1,1.1 -k 2.1,2.8 -k 3.1,3.6 | \
awk '{ arr[$1]=$0 } END {for (i in arr) { print arr[i] } }' | \
tr '_' ' ' | sort > $resultsfile
sed -e "s/ [ ]*/_/g" $resultsfile |sed -e "s/^_//g"|sed -e "s/_$//g"| \
grep -v "^$"|sed -e "s/kkk/_/g"| while read LINE4
do
echo "Process file : $LINE4"
done
rm $LINE1 $LINE3 $resultsfile
# end script

*****************

was replaced by

code2
**************

#!/bin/ksh
##########################################
#Directories
eagle_dir=/test
#Imp Files
FILENAME=/test/files.txt
usage(){
print process_test.sh - script to process test data
print "Usage: process_test.sh"
}
cat $FILENAME | while read LINE
do
ls -r1 "$test"/"$LINE"_????????_??????_??????.txt|sort -t2 +1 -r|head -1|while read obj
do
echo "Inside DO $LINE loop"
echo $obj
done
done
# end script
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 03:00 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0