Sponsored Content
Full Discussion: Sort mixed data file
Top Forums UNIX for Advanced & Expert Users Sort mixed data file Post 302845907 by jnrohit2k on Thursday 22nd of August 2013 02:51:04 PM
Old 08-22-2013
I have a text file and each field is separated by semicolon ( ; ). Field number 7 is internally separated by comma ( , ) and pipe ( | ) symbol. I want to sort file based on four different fields which are marked in BOLD.

Here first BOLD field will have numbers upto the length of 9 characters, second BOLD field will have start date in 'YYYYMMDD' format, third field will have end date in 'YYYYMMDD' format and fourth BOLD field will have string of 2 characters (Like "IP", "GC" ) or two WHITE SPACE.


Input File:

Code:
 
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013


Required output: Sorting should happen based on following four fields in same sequence:-

(1) Fourth BOLD field should be in decending order (CHARACTER)
(2) First BOLD field should be in ascending order (NUMBER)
(3) Second BOLD field should be in ascending order (NUMBER)
(4) Third BOLD field should be in descending order (NUMBER)

Output file:

Code:
REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013

I tried below command but it is not showing field 17th (i.e., fourth BOLD field) in the output.

Code:
 
awk -F';' '{$17 "%" split($7,a,","); print a[3] "%" $15 "%" $16 "%" $0}' temp.dat
 
100010463%20120103%20120401%CLMREQ;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
18663%20130101%20130331%CLMREQ;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
18663%20121201%20130229%CLMREQ;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
18663%20121201%20130101%CLMREQ;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
18663%20130101%20130430%CLMREQ;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
34082%20120203%20120301%CLMREQ;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
18663%20121201%20130131%CLMREQ;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
100009394%20120103%20120401%CLMREQ;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
18663%20121101%20130531%CLMREQ;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
100010463%20110203%20110501%CLMREQ;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013

Thanks!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort a big data file

Hello, I have a big data file (160 MB) full of records with pipe(|) delimited those fields. I`m sorting the file on the first field. I'm trying to sort with "sort" command and it brings me 6 minutes. I have tried with some transformation methods in perl but it results "Out of memory". I was... (2 Replies)
Discussion started by: rubber08
2 Replies

2. Shell Programming and Scripting

Ignore Header and Footer and Sort the data in fixed width file

Hi Experts, I want to Sort the data in fixed width file where i have Header and Footer also in file. I m using below commad to do the sort based on field satarting from 15 position to 17 position , but it is not ignoring the Header and Footer of the file while sorting. In the output i am... (5 Replies)
Discussion started by: sasikari
5 Replies

3. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies

4. Shell Programming and Scripting

Advanced: Sort, count data in column, append file name

Hi. I am not sure the title gives an optimal description of what I want to do. Also, I tried to post this in the "UNIX for Dummies Questions & Answers", but it seems no-one was able to help out. I have several text files that contain data in many columns. All the files are organized the same... (14 Replies)
Discussion started by: JamesT
14 Replies

5. Shell Programming and Scripting

How to use FS for mixed file?

Hi! All I am just wondering solution to use FS if file fields are separated by whitespace (one or more spaces ), tab and comma, How to use FS ? finally I want to print all columns as tab separated look at my file here tagged 130, US 121337 30.530 -58.900 1941 1 25 19.50 ... (5 Replies)
Discussion started by: Akshay Hegde
5 Replies

6. Shell Programming and Scripting

awk - mixed for and if to select particular lines in a data file

Hi all, I am new to AWK and I am trying to solve a problem that is probably easy for an expert. Suppose I have the following data file input.txt: 20 35 43 20 23 54 20 62 21 20.5 43 12 20.5 33 11 20.5 89 87 21 33 20 21 22 21 21 56 87 I want to select from all lines having the... (4 Replies)
Discussion started by: naska
4 Replies

7. Shell Programming and Scripting

Using awk to parse a file with mixed formats in columns

Greetings I have a file formatted like this: rhino grey weight=1003;height=231;class=heaviest;histology=9,0,0,8 bird white weight=23;height=88;class=light;histology=7,5,1,0,0 turtle green weight=40;height=9;class=light;histology=6,0,2,0... (2 Replies)
Discussion started by: Twinklefingers
2 Replies

8. Shell Programming and Scripting

Sort data file by case

Hello, I'm trying to sort a large data file by the 3rd column so that all of the first words in the 3rd column that are in all uppercase appear before (or after) the non uppercase words. For example, Data file: xxx 12345 Rat in the house xxx 12345 CAT in the hat xxx 12345 Dog in the... (4 Replies)
Discussion started by: palex
4 Replies

9. Shell Programming and Scripting

Sort data in text file in particular format

I have to sort below output in text file in unix bash 20170308 DA,I,113 20170308 PM,I,123 20170308 DA,U,22 20170308 PM,U,123 20170309 DA,I,11 20170309 PM,I,23 20170309 DA,U,123 20170309 PM,U,233 (8 Replies)
Discussion started by: Adfire
8 Replies

10. Shell Programming and Scripting

Sort file data according to a custom list of string

I have a string of pre defined ip address list which will always remain constant their order will never change like in below sample: iplist=8.8.5.19,9.7.5.14,12.9.9.23,8.8.8.14,144.1.113 In the above example i m considering only 5 ips but there could be many more. Now i have a file which... (15 Replies)
Discussion started by: mohtashims
15 Replies
All times are GMT -4. The time now is 03:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy