Sorting rules on a text section

12-04-2006

Registered User

71, 1

Join Date: Sep 2006

Last Activity: 2 May 2014, 3:39 AM EDT

Location: Sweden

Posts: 71

Thanks Given: 0

Thanked 1 Time in 1 Post

Sorting rules on a text section

Hi all

My text file looks like this:

Code:

start doc
...                    (certain number of records)
REC3|Emma|info|
REC3|Lukas|info|
REC3|Arthur|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Tamira|info|
REC3|Henry|info|
...     
end doc

As you see, the text file contains different document sections, and some of the records in there are tagged with "REC3" which is a unique record code. REC3 lines are always in a row, but obviously there are as many REC3 sections as there are documents.

What I want to do is to sort these REC3 lines per document using the quickest UNIX command possible. The expected output file is:

Code:

start doc
...                    (certain number of records)
REC3|Arthur|info|
REC3|Emma|info|
REC3|Lukas|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Henry|info|
REC3|Tamira|info|
...     
end doc

I don't want to touch any of the other record types. Does any of you have any suggestion about how to do this? I would be very grateful. I had in mind perl had some good potential to make it simple, but I�m no expert. I tried a couple of things with ksh/awk but it tended to become complex and unflexible.

Last edited by Indalecio; 12-04-2006 at 11:13 AM..

Indalecio

View Public Profile for Indalecio

Find all posts by Indalecio

12-04-2006

Registered User

65, 0

Join Date: Dec 2005

Last Activity: 16 June 2009, 1:18 PM EDT

Location: Boston, USA

Posts: 65

Thanks Given: 0

Thanked 0 Times in 0 Posts

you could do this by having few intermediate steps like create temporary files for each document sections and then sort the necessary temporary file as per your need and later combind them in the same order. Hope this would help.

--Manish

Manish Jha

View Public Profile for Manish Jha

Find all posts by Manish Jha

12-04-2006

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

input

Code:

start doc
...                    (certain number of records)
REC3|Emma|info|
REC3|Lukas|info|
REC3|Arthur|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Tamira|info|
REC3|Henry|info|
...     
end doc

output:

Code:

 
start doc
...                    (certain number of records)
REC3|Arthur|info|
REC3|Emma|info|
REC3|Lukas|info|
...                     (certain number of records)
end doc
start doc
...                     (certain number of records)
REC3|Henry|info|
REC3|Tamira|info|
...
end doc

code :

Code:

#/bin/ksh
set -x
found=0
set -A arr ""
while read line
do
	rec3=${line##REC3|}
	if [[ ${#rec3} -lt ${#line} ]] ; then
		arr[$found]="$line"
		found=$found+1	    
	    continue
	fi
	if [[ $found -gt 0 ]] ;then
		
		cat tmp.tmp
		let i=0
		while [[ $i -lt $found ]]
		do
		    printf "%s\n" ${arr[i]} 
		    i=$i+1
		done | sort -k2.1,2.30 -t'|' 
		found=0
        set -A arr ""
	fi
	echo "$line"
done < filename

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

12-05-2006

Registered User

71, 1

Join Date: Sep 2006

Last Activity: 2 May 2014, 3:39 AM EDT

Location: Sweden

Posts: 71

Thanks Given: 0

Thanked 1 Time in 1 Post

Thank you for your response

The thing is, I had already tried simple file manipulation like the following:

Code:

#!/usr/bin/ksh

new_block="N"
rm *.tmp

while read line; do
   record=`echo $line | cut -d '|' -f 1`
   if [[ "$record" = "REC3" ]]; then 
     if [[ "$new_block" = "Y" ]]; then
         echo $line >> REC3.tmp
      else
         new_block="Y"
         echo $line > REC3.tmp
      fi
   else
      if [[ "$new_block" = "Y" ]]; then
         new_block="N"
         sort REC3.tmp > finalREC3.tmp
         cat rest.tmp finalREC3.tmp >> output.tmp         
      fi
      echo $line >> rest.tmp
    fi
done < $1
cat rest.tmp >> output.tmp

But this is incredibly slow! If I run this on a 20 MB file I can wait for a few minutes... I just wonder if a more powerful command would do the same job a bit quicker (I mentioned perl but it could be anything).

Indalecio

View Public Profile for Indalecio

Find all posts by Indalecio

12-05-2006

Registered User

71, 1

Join Date: Sep 2006

Last Activity: 2 May 2014, 3:39 AM EDT

Location: Sweden

Posts: 71

Thanks Given: 0

Thanked 1 Time in 1 Post

For your info, the above command took 40 minutes to make the job, which is not acceptable.

I came to another solution, check this out:

Code:

gawk 'BEGIN {FS="|";blockREC3="N"}{if ($1 == "REC3"){if (blockREC3 == "Y"){j++;ind[j]=$0} else {blockREC3="Y";j=1;ind[j\
]=$0} } else {if (blockREC3 == "Y") {blockREC3="N";n=asort(ind);for (i = 1; i <= n; i++) {print ind[i]}}; print $0}}' $1 > outp\
ut.tmp

I was trying to find something that awk could do, unfortunately the asort command is only supported by the gawk version. So I haven't tested it since I need to install gawk first. But I think this may save a lot of time.

EDIT: I tested it, it took 2 sec

Last edited by Indalecio; 12-05-2006 at 05:40 AM..

Indalecio

View Public Profile for Indalecio

Find all posts by Indalecio

Shell Programming and Scripting

Sorting rules on a text section

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sorting blocks by a section of the identifier

Discussion started by: Xterra

2. Shell Programming and Scripting

Sorting indented text files

Discussion started by: kobel

3. Shell Programming and Scripting

Grep text and see all section

Discussion started by: sharong

4. Shell Programming and Scripting

Extracting text from within a section of text using AWK

Discussion started by: heykiran

5. UNIX for Dummies Questions & Answers

Sorting arrays horizontally without END section, awk

Discussion started by: lucasvs

6. UNIX Desktop Questions & Answers

Problem in sorting a text file

Discussion started by: a_bahreini

7. Shell Programming and Scripting

sorting based on a specified column in a text file

Discussion started by: Lucky Ali

8. Shell Programming and Scripting

Extract section of file based on word in section

Discussion started by: jelloir

9. Shell Programming and Scripting

Sorting a text file

Discussion started by: jon2ryhme

10. UNIX for Dummies Questions & Answers

how can i extract text section via grep

Discussion started by: meny