Visit Our UNIX and Linux User Community


sort and split file by 2 cols (1 col after the other)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sort and split file by 2 cols (1 col after the other)
# 8  
Old 07-27-2009
Quote:
Originally Posted by Ghetz
I focused on it out of fustration I guess.
Post a data sample using [code] tags and we will try to help you.
# 9  
Old 07-27-2009
danmero,

Here is a data sample:
Code:
76576.867188 6232.454102 1.944000 0.000000 1
76576.867188 6232.454102 2.549000 0.000000 2
76576.867188 6232.454102 3.517000 0.000000 3
76576.867188 6232.454102 4.976000 0.000000 4
76576.867188 6232.454102 0.792961 55.000000 2
76576.867188 6232.454102 2.008904 55.000000 3
76576.867188 6232.454102 3.607231 55.000000 4
76576.867188 6232.454102 1.555146 65.000000 3
76576.867188 6232.454102 3.226928 65.000000 4
76576.867188 6232.454102 2.180096 100.000000 4
66576.867188 4232.454102 2.944000 0.000000 1
66576.867188 4232.454102 3.549000 0.000000 2
66576.867188 4232.454102 4.517000 0.000000 3
66576.867188 4232.454102 5.976000 0.000000 4
66576.867188 4232.454102 0.855956 55.000000 2
66576.867188 4232.454102 2.104038 55.000000 3
66576.867188 4232.454102 3.737020 55.000000 4
66576.867188 4232.454102 1.608926 65.000000 3
66576.867188 4232.454102 3.319681 65.000000 4
66576.867188 4232.454102 2.234251 100.000000 4
86576.867188 5232.454102 3.944000 0.000000 1
86576.867188 5232.454102 3.549000 0.000000 2
86576.867188 5232.454102 4.537932 0.000000 3
86576.867188 5232.454102 6.022880 0.000000 4
86576.867188 5232.454102 0.705243 55.000000 2
86576.867188 5232.454102 1.563464 55.000000 3
86576.867188 5232.454102 3.311949 55.000000 4
86576.867188 5232.454102 1.635146 65.000000 3
86576.867188 5232.454102 3.370348 65.000000 4
86576.867188 5232.454102 2.266589 100.000000 4

Thanks
# 10  
Old 07-27-2009
Code:
# cat file
76576.867188 6232.454102 1.944000 0.000000 1
76576.867188 6232.454102 2.549000 0.000000 2
76576.867188 6232.454102 3.517000 0.000000 3
76576.867188 6232.454102 4.976000 0.000000 4
76576.867188 6232.454102 0.792961 55.000000 2
76576.867188 6232.454102 2.008904 55.000000 3
76576.867188 6232.454102 3.607231 55.000000 4
76576.867188 6232.454102 1.555146 65.000000 3
76576.867188 6232.454102 3.226928 65.000000 4
76576.867188 6232.454102 2.180096 100.000000 4
66576.867188 4232.454102 2.944000 0.000000 1
66576.867188 4232.454102 3.549000 0.000000 2
66576.867188 4232.454102 4.517000 0.000000 3
66576.867188 4232.454102 5.976000 0.000000 4
66576.867188 4232.454102 0.855956 55.000000 2
66576.867188 4232.454102 2.104038 55.000000 3
66576.867188 4232.454102 3.737020 55.000000 4
66576.867188 4232.454102 1.608926 65.000000 3
66576.867188 4232.454102 3.319681 65.000000 4
66576.867188 4232.454102 2.234251 100.000000 4
86576.867188 5232.454102 3.944000 0.000000 1
86576.867188 5232.454102 3.549000 0.000000 2
86576.867188 5232.454102 4.537932 0.000000 3
86576.867188 5232.454102 6.022880 0.000000 4
86576.867188 5232.454102 0.705243 55.000000 2
86576.867188 5232.454102 1.563464 55.000000 3
86576.867188 5232.454102 3.311949 55.000000 4
86576.867188 5232.454102 1.635146 65.000000 3
86576.867188 5232.454102 3.370348 65.000000 4
86576.867188 5232.454102 2.266589 100.000000 4

# sort -nk4 file | awk '{newfile=$NF".txt";print >  newfile}'

# cat 1.txt
66576.867188 4232.454102 2.944000 0.000000 1
76576.867188 6232.454102 1.944000 0.000000 1
86576.867188 5232.454102 3.944000 0.000000 1

# cat 2.txt
66576.867188 4232.454102 3.549000 0.000000 2
76576.867188 6232.454102 2.549000 0.000000 2
86576.867188 5232.454102 3.549000 0.000000 2
66576.867188 4232.454102 0.855956 55.000000 2
76576.867188 6232.454102 0.792961 55.000000 2
86576.867188 5232.454102 0.705243 55.000000 2

# cat 3.txt
66576.867188 4232.454102 4.517000 0.000000 3
76576.867188 6232.454102 3.517000 0.000000 3
86576.867188 5232.454102 4.537932 0.000000 3
66576.867188 4232.454102 2.104038 55.000000 3
76576.867188 6232.454102 2.008904 55.000000 3
86576.867188 5232.454102 1.563464 55.000000 3
66576.867188 4232.454102 1.608926 65.000000 3
76576.867188 6232.454102 1.555146 65.000000 3
86576.867188 5232.454102 1.635146 65.000000 3

# cat 4.txt
66576.867188 4232.454102 5.976000 0.000000 4
76576.867188 6232.454102 4.976000 0.000000 4
86576.867188 5232.454102 6.022880 0.000000 4
66576.867188 4232.454102 3.737020 55.000000 4
76576.867188 6232.454102 3.607231 55.000000 4
86576.867188 5232.454102 3.311949 55.000000 4
66576.867188 4232.454102 3.319681 65.000000 4
76576.867188 6232.454102 3.226928 65.000000 4
86576.867188 5232.454102 3.370348 65.000000 4
66576.867188 4232.454102 2.234251 100.000000 4
76576.867188 6232.454102 2.180096 100.000000 4
86576.867188 5232.454102 2.266589 100.000000 4

That's what you want Smilie
# 11  
Old 07-27-2009
Quote:
Originally Posted by Ghetz
I want output files like:

55.000 3.txt
76576.867188 6232.454102 2.008904 55.000000 3
...................................................55.000000 3

55.000 4.txt
76576.867188 6232.454102 3.607231 55.000000 4
...................................................55.000000 4

65.000000 3.txt
76576.867188 6232.454102 1.555146 65.000000 3
...................................................65.000000 3

65.000000 4.txt
76576.867188 6232.454102 3.226928 65.000000 4
...................................................65.000000 4

etc.
So will this do...?

Code:
awk '{print >> $4 " " $5 ".txt"}' datafile

Or are you literally after that extra line, like this ...?

Code:
awk '{ print >> $4 " " $5 ".txt"; f[$4 " " $5]=1 }
     END { for (s in f) print "..................................................." s >> s ".txt" }' datafile

# 12  
Old 07-27-2009
cambridge and danmero

cambridge's code generates the following error message:
Code:
awk: syntax error at source line 1
 context is
	{print >> $4 " >>>  " <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1

The following are the output files I am after:

Code:
0.000000 1.txt
66576.867188 4232.454102 2.944000 0.000000 1
76576.867188 6232.454102 1.944000 0.000000 1
86576.867188 5232.454102 3.944000 0.000000 1

0.000000 2.txt
66576.867188 4232.454102 3.549000 0.000000 2
76576.867188 6232.454102 2.549000 0.000000 2
86576.867188 5232.454102 3.549000 0.000000 2

55.000000 2.txt
66576.867188 4232.454102 0.855956 55.000000 2
76576.867188 6232.454102 0.792961 55.000000 2
86576.867188 5232.454102 0.705243 55.000000 2

0.000000 3.txt
66576.867188 4232.454102 4.517000 0.000000 3
76576.867188 6232.454102 3.517000 0.000000 3
86576.867188 5232.454102 4.537932 0.000000 3

55.000000 3.txt
66576.867188 4232.454102 2.104038 55.000000 3
76576.867188 6232.454102 2.008904 55.000000 3
86576.867188 5232.454102 1.563464 55.000000 3

65.000000 3.txt
66576.867188 4232.454102 1.608926 65.000000 3
76576.867188 6232.454102 1.555146 65.000000 3
86576.867188 5232.454102 1.635146 65.000000 3

0.000000 4.txt
66576.867188 4232.454102 5.976000 0.000000 4
76576.867188 6232.454102 4.976000 0.000000 4
86576.867188 5232.454102 6.022880 0.000000 4

55.000000 4.txt
66576.867188 4232.454102 3.737020 55.000000 4
76576.867188 6232.454102 3.607231 55.000000 4
86576.867188 5232.454102 3.311949 55.000000 4

65.000000 4.txt
66576.867188 4232.454102 3.319681 65.000000 4
76576.867188 6232.454102 3.226928 65.000000 4
86576.867188 5232.454102 3.370348 65.000000 4

100.000000 4.txt
66576.867188 4232.454102 2.234251 100.000000 4
76576.867188 6232.454102 2.180096 100.000000 4
86576.867188 5232.454102 2.266589 100.000000 4

It is important each output file name is made up of the 4th and 5th columns' value.

I hope this clarifies my question.

---------- Post updated at 01:21 AM ---------- Previous update was at 12:58 AM ----------

danmero and cambridge

This seems to work:

Code:
sort -nk4 infile | awk '{ file=substr($0,35,11)".txt"; print >> file }'

Thanks for your efforts
# 13  
Old 07-28-2009
Quote:
Originally Posted by Ghetz
This seems to work:

Code:
sort -nk4 infile | awk '{ file=substr($0,35,11)".txt"; print >> file }'

Almost Smilie
Code:
#sort -nk4 infile | awk '{ file=substr($0,35,11)".txt"; print file }'
0.000000 1.txt
0.000000 2.txt
0.000000 3.txt
0.000000 4.txt
0.000000 1.txt
0.000000 2.txt
0.000000 3.txt
0.000000 4.txt
0.000000 2.txt
0.000000 1.txt
0.000000 3.txt
0.000000 4.txt
55.000000 2.txt
55.000000 3.txt
55.000000 4.txt
55.000000 2.txt
55.000000 3.txt
55.000000 4.txt
55.000000 2.txt
55.000000 3.txt
55.000000 4.txt
65.000000 3.txt
65.000000 4.txt
65.000000 3.txt
65.000000 4.txt
65.000000 3.txt
65.000000 4.txt
100.000000 .txt
100.000000 .txt
100.000000 .txt

Maybe
Code:
# sort -nk4 infile | awk '{ file=$4"_"$5".txt"; print file }'
0.000000_1.txt
0.000000_2.txt
0.000000_3.txt
0.000000_4.txt
0.000000_1.txt
0.000000_2.txt
0.000000_3.txt
0.000000_4.txt
0.000000_2.txt
0.000000_1.txt
0.000000_3.txt
0.000000_4.txt
55.000000_2.txt
55.000000_3.txt
55.000000_4.txt
55.000000_2.txt
55.000000_3.txt
55.000000_4.txt
55.000000_2.txt
55.000000_3.txt
55.000000_4.txt
65.000000_3.txt
65.000000_4.txt
65.000000_3.txt
65.000000_4.txt
65.000000_3.txt
65.000000_4.txt
100.000000_4.txt
100.000000_4.txt
100.000000_4.txt

See your previous post/requirement Smilie


PS: Next time post the data sample and required output when you start a new thread, that will save time to everybody Smilie
# 14  
Old 07-28-2009
Thanks danmero, your last solution is even better.

Sorry I didn't make myself clear initially.

Now one last request:

If I enter values for the 4th and 5th columns on the command line, how can I extract the corresponding files? Example if I enter the following:
Code:
4th column = 0.000000 , 55.000000, 100.000000
5th column = 1

How do I extract /select the corresponding
Code:
0.000000 1.txt, 55.000000 1.txt, 100.000000 1.txt

files?

Thanks again.

Previous Thread | Next Thread
Test Your Knowledge in Computers #931
Difficulty: Medium
64-bit computer system storage allocation for timekeeping will allow them to represent dates more than 300 billion years into the future.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sort files to the split second

I need a similar sort, I need to sort the directory by time, but time needs to be to the second or smaller. Using ls -lt but that is only to the minute. (3 Replies)
Discussion started by: mwlaursen
3 Replies

2. Shell Programming and Scripting

Sort and Split file with header and custom name

Hi, I am using SUN SOLARIS (SunOS sun4v sparc SUNW, T5240). I have a huge data file with header and trailer. This file gets used into an ETL process. ETL skips the header record (which is the first record of the file) and loads the rest of the record. The file can be delimited (comma,... (5 Replies)
Discussion started by: Saanvi1
5 Replies

3. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. Shell Programming and Scripting

how to Insert values in multiple lines(records) within a pipe delimited text file in specific cols

this is Korn shell unix. The scenario is I have a pipe delimited text file which needs to be customized. say for example,I have a pipe delimited text file with 15 columns(| delimited) and 200 rows. currently the 11th and 12th column has null values for all the records(there are other null columns... (4 Replies)
Discussion started by: vasan2815
4 Replies

6. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

7. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

8. Shell Programming and Scripting

How to find number of Cols in a file ?

Hi I have a requirement wherein the file is comma separated. Each records seems to have different number of columns, how I can detect like a row index wise, how many columns are present ? Thanks in advance. (2 Replies)
Discussion started by: videsh77
2 Replies

9. Shell Programming and Scripting

Sort & Split records in a file

Hi, I am new to scripting. I need a script to sort and the records in a file and then split them into different files. For example, the file is: H1...................... H2...................... D2.................... D2.................... H1........................... (15 Replies)
Discussion started by: Sunitha_edi82
15 Replies

10. Shell Programming and Scripting

join cols from multi files into one file

Hi Fields in Files 1,2,3,4 are pipe"|" separated. Say I want to grep col1 from File1 col3 from File2 col4 from File3 and print to File4 in the following order: col3|col1|col4 what is the best way of doing this? Thanks (2 Replies)
Discussion started by: vbshuru
2 Replies

Featured Tech Videos