The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk error in sorting text file karthikn7974 Shell Programming and Scripting 3 05-06-2008 04:19 AM
what is Critical section is all about? compbug UNIX for Dummies Questions & Answers 3 04-07-2006 01:07 PM
Windows section?!? PxT Post Here to Contact Site Administrators and Moderators 4 04-11-2002 09:28 AM
New Section kapilv Post Here to Contact Site Administrators and Moderators 3 10-31-2001 09:57 PM
New section ober5861 Post Here to Contact Site Administrators and Moderators 3 07-25-2001 12:16 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 12-04-2006
Indalecio Indalecio is offline
Registered User
  
 

Join Date: Sep 2006
Location: Sweden
Posts: 59
Post Sorting rules on a text section

Hi all

My text file looks like this:

Code:
start doc
...                    (certain number of records)
REC3|Emma|info|
REC3|Lukas|info|
REC3|Arthur|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Tamira|info|
REC3|Henry|info|
...     
end doc
As you see, the text file contains different document sections, and some of the records in there are tagged with "REC3" which is a unique record code. REC3 lines are always in a row, but obviously there are as many REC3 sections as there are documents.

What I want to do is to sort these REC3 lines per document using the quickest UNIX command possible. The expected output file is:

Code:
start doc
...                    (certain number of records)
REC3|Arthur|info|
REC3|Emma|info|
REC3|Lukas|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Henry|info|
REC3|Tamira|info|
...     
end doc
I don't want to touch any of the other record types. Does any of you have any suggestion about how to do this? I would be very grateful. I had in mind perl had some good potential to make it simple, but I´m no expert. I tried a couple of things with ksh/awk but it tended to become complex and unflexible.

Last edited by Indalecio; 12-04-2006 at 11:13 AM..
  #2 (permalink)  
Old 12-04-2006
Manish Jha Manish Jha is offline
Registered User
  
 

Join Date: Dec 2005
Location: Boston, USA
Posts: 65
you could do this by having few intermediate steps like create temporary files for each document sections and then sort the necessary temporary file as per your need and later combind them in the same order. Hope this would help.

--Manish
  #3 (permalink)  
Old 12-04-2006
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,717
input
Code:
start doc
...                    (certain number of records)
REC3|Emma|info|
REC3|Lukas|info|
REC3|Arthur|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Tamira|info|
REC3|Henry|info|
...     
end doc
output:
Code:
 
start doc
...                    (certain number of records)
REC3|Arthur|info|
REC3|Emma|info|
REC3|Lukas|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Henry|info|
REC3|Tamira|info|
...
end doc
code :
Code:
#/bin/ksh
set -x
found=0
set -A arr ""
while read line
do
	rec3=${line##REC3|}
	if [[ ${#rec3} -lt ${#line} ]] ; then
		arr[$found]="$line"
		found=$found+1	    
	    continue
	fi
	if [[ $found -gt 0 ]] ;then
		
		cat tmp.tmp
		let i=0
		while [[ $i -lt $found ]]
		do
		    printf "%s\n" ${arr[i]} 
		    i=$i+1
		done | sort -k2.1,2.30 -t'|' 
		found=0
        set -A arr ""
	fi
	echo "$line"
done < filename
  #4 (permalink)  
Old 12-05-2006
Indalecio Indalecio is offline
Registered User
  
 

Join Date: Sep 2006
Location: Sweden
Posts: 59
Thank you for your response

The thing is, I had already tried simple file manipulation like the following:
Code:
#!/usr/bin/ksh

new_block="N"
rm *.tmp

while read line; do
   record=`echo $line | cut -d '|' -f 1`
   if [[ "$record" = "REC3" ]]; then 
     if [[ "$new_block" = "Y" ]]; then
         echo $line >> REC3.tmp
      else
         new_block="Y"
         echo $line > REC3.tmp
      fi
   else
      if [[ "$new_block" = "Y" ]]; then
         new_block="N"
         sort REC3.tmp > finalREC3.tmp
         cat rest.tmp finalREC3.tmp >> output.tmp         
      fi
      echo $line >> rest.tmp
    fi
done < $1
cat rest.tmp >> output.tmp
But this is incredibly slow! If I run this on a 20 MB file I can wait for a few minutes... I just wonder if a more powerful command would do the same job a bit quicker (I mentioned perl but it could be anything).
  #5 (permalink)  
Old 12-05-2006
Indalecio Indalecio is offline
Registered User
  
 

Join Date: Sep 2006
Location: Sweden
Posts: 59
For your info, the above command took 40 minutes to make the job, which is not acceptable.

I came to another solution, check this out:
Code:
gawk 'BEGIN {FS="|";blockREC3="N"}{if ($1 == "REC3"){if (blockREC3 == "Y"){j++;ind[j]=$0} else {blockREC3="Y";j=1;ind[j\
]=$0} } else {if (blockREC3 == "Y") {blockREC3="N";n=asort(ind);for (i = 1; i <= n; i++) {print ind[i]}}; print $0}}' $1 > outp\
ut.tmp
I was trying to find something that awk could do, unfortunately the asort command is only supported by the gawk version. So I haven't tested it since I need to install gawk first. But I think this may save a lot of time.

EDIT: I tested it, it took 2 sec

Last edited by Indalecio; 12-05-2006 at 05:40 AM..
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:16 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0