Sorting and saving values based on unique entries

01-07-2014

Registered User

127, 1

Join Date: Dec 2011

Last Activity: 3 June 2014, 3:14 AM EDT

Posts: 127

Thanks Given: 63

Thanked 1 Time in 1 Post

Sorting and saving values based on unique entries

Hi all,

I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following:

Code:

input file: 200006-07file.txt
145 35 10 3
147 35 12 4
146 36 11 3
145 34 12 5
143 31 15 4
146 30 14 5

desired output files:
200006-07_003.txt (this contains:)
145 35 10 3
146 36 11 3

200006-07_004.txt
147 35 12 4
143 31 15 4

200006-07_004.txt
145 34 12 5
146 30 14 5

Thank you and happy new year!

ida1215

View Public Profile for ida1215

Find all posts by ida1215

01-07-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

What have you tried so far?

PS Am I correct in assuming that the second output file named 200006-07_004.txt is really supposed to be named 200006-07_005.txt?

Last edited by Don Cragun; 01-07-2014 at 11:52 PM.. Reason: Add PS.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-08-2014

Registered User

127, 1

Join Date: Dec 2011

Last Activity: 3 June 2014, 3:14 AM EDT

Posts: 127

Thanks Given: 63

Thanked 1 Time in 1 Post

Hi,
Yes, I typed it incorrectly. It should be

Code:

200006-07_005.txt

. I have tried the following but I did the saving of output files manually and its very inefficient.

Code:

sort -n -k4 200006-07file.txt > file.tmp

awk '$4==3 {print}' file.tmp > 200006-07_003.txt
awk '$4==4 {print}' file.tmp > 200006-07_004.txt
awk '$4==5 {print}' file.tmp > 200006-07_005.txt

Thanks!

ida1215

View Public Profile for ida1215

Find all posts by ida1215

01-08-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You could try something like:

Code:

awk '
FNR == 1 {
        # Get filename base and clear list of output files for this input file.
        if((i = index(FILENAME, "file.txt")) == 0) {
                printf("Filenames (\"%s\") does not end in \"file.txt\"\n",
                        FILENAME)
                exit 1
        }
        base = substr(FILENAME, 1, i - 1)
        for (i in outlist) delete outlist[i]
}
{       # Generate output filename for this line:
        of = sprintf("%s_%03d.txt", base, $4)
        if(lf != of) {
                # Close previous output file, if there was one.
                if(lf != "") close(lf)
                # If this is the 1st time for a new output file, add it to the
                # list of output files and remove any existing file with that
                # name.
                if(!($4 in outlist)) {
                        # Remove any existing file with this name:
                        system("rm -f " of)
                        # Save this index in outlist:
                        outlist[$4]
                }
                lf = of
        }
        # Save the current line in the current output file:
        print >> lf
}' 200006-07file.txt

This is probably more complex than is needed. It will work if given multiple input files and clears any existing output files that already exist when the script is started.

If you want to try this script on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of awk.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-08-2014

Registered User

1, 0

Join Date: Jan 2014

Last Activity: 8 January 2014, 12:25 AM EST

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

Is there any way to get return status of script after 2-3 hours of its completion to check whether it got aborted or finished successfully?

Moderator's Comments:

When posting a question about a new topic; please start a new thread. Hijacking a thread by posting an unrelated question as a response to an existing thread creates confusion for anyone trying to help answer either question.

Last edited by Don Cragun; 01-08-2014 at 01:40 AM..

Kshitij Mishra

View Public Profile for Kshitij Mishra

Find all posts by Kshitij Mishra

01-08-2014

Registered User

127, 1

Join Date: Dec 2011

Last Activity: 3 June 2014, 3:14 AM EDT

Posts: 127

Thanks Given: 63

Thanked 1 Time in 1 Post

Thank you very much for the quick reply. I will give it a try and will get back to you.

ida1215

View Public Profile for ida1215

Find all posts by ida1215

UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

Discussion started by: jvoot

2. Shell Programming and Scripting

Sorting unique by column

Discussion started by: fat

3. Shell Programming and Scripting

Sorting out unique values from output of for loop.

Discussion started by: omkar.jadhav

4. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

5. Shell Programming and Scripting

Unique entries based on a range of numbers.

Discussion started by: flyfisherman

6. Shell Programming and Scripting

Find and count unique date values in a file based on position

Discussion started by: ronan1219

7. UNIX for Dummies Questions & Answers

Assistance with combining, sorting and saving multi files into one new file

Discussion started by: jaacmmason

8. Shell Programming and Scripting

Finding unique entries without sorting

Discussion started by: npatwardhan

9. UNIX for Dummies Questions & Answers

need help sorting/deleting non-unique things

Discussion started by: zac100

10. Shell Programming and Scripting

sorting file and unique commnad..

Discussion started by: amon