Unix/Linux Go Back    


UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD.

Sort mixed data file

UNIX for Advanced & Expert Users


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 07-20-2013
jnrohit2k jnrohit2k is offline
Registered User
 
Join Date: May 2013
Last Activity: 16 March 2014, 7:24 PM EDT
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
Sort mixed data file

I have a text file and each field is separated by semicolon ( ; ). Field number 7 is internally separated by comma ( , ) and pipe ( | ) symbol. I want to sort file based on three different fields which are marked in BOLD.

Here first BOLD field will have numbers upto the length of 9 characters, second BOLD field will have start date in 'YYYYMMDD' format and third field will have end date in 'YYYYMMDD' format.


Input File:


Code:
 
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013


Required output: First and second field should be in ascending order, and third field should be in descending order.

Output file:


Code:
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013

Thanks!

Last edited by jnrohit2k; 07-20-2013 at 01:37 PM..
Sponsored Links
    #2  
Old Unix and Linux 07-20-2013
mjf mjf is online now
Registered User
 
Join Date: Nov 2011
Last Activity: 21 April 2015, 5:35 AM EDT
Location: Newtown, PA
Posts: 151
Thanks: 9
Thanked 37 Times in 35 Posts
Here is one solution assuming your file will not contain '%':


Code:
 awk -F';' '{split($7,a,","); print a[3] "%" $15 "%" $16 "%" $0}' input.txt | sort -t '%' -k 1n -k 2n -k 3nr | awk -F'%' '{print $4}'


Code:
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013

Sponsored Links
    #3  
Old Unix and Linux 07-20-2013
jnrohit2k jnrohit2k is offline
Registered User
 
Join Date: May 2013
Last Activity: 16 March 2014, 7:24 PM EDT
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
It worked. Thanks!
    #4  
Old Unix and Linux 08-22-2013
jnrohit2k jnrohit2k is offline
Registered User
 
Join Date: May 2013
Last Activity: 16 March 2014, 7:24 PM EDT
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
I have a text file and each field is separated by semicolon ( ; ). Field number 7 is internally separated by comma ( , ) and pipe ( | ) symbol. I want to sort file based on four different fields which are marked in BOLD.

Here first BOLD field will have numbers upto the length of 9 characters, second BOLD field will have start date in 'YYYYMMDD' format, third field will have end date in 'YYYYMMDD' format and fourth BOLD field will have string of 2 characters (Like "IP", "GC" ) or two WHITE SPACE.


Input File:


Code:
 
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013


Required output: Sorting should happen based on following four fields in same sequence:-

(1) Fourth BOLD field should be in decending order (CHARACTER)
(2) First BOLD field should be in ascending order (NUMBER)
(3) Second BOLD field should be in ascending order (NUMBER)
(4) Third BOLD field should be in descending order (NUMBER)

Output file:


Code:
REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013

I tried below command but it is not showing field 17th (i.e., fourth BOLD field) in the output.


Code:
 
awk -F';' '{$17 "%" split($7,a,","); print a[3] "%" $15 "%" $16 "%" $0}' temp.dat
 
100010463%20120103%20120401%CLMREQ;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
18663%20130101%20130331%CLMREQ;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
18663%20121201%20130229%CLMREQ;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
18663%20121201%20130101%CLMREQ;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
18663%20130101%20130430%CLMREQ;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
34082%20120203%20120301%CLMREQ;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
18663%20121201%20130131%CLMREQ;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
100009394%20120103%20120401%CLMREQ;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
18663%20121101%20130531%CLMREQ;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
100010463%20110203%20110501%CLMREQ;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013

Thanks!
Sponsored Links
    #5  
Old Unix and Linux 08-22-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 21 April 2015, 4:50 AM EDT
Location: Aachen, Germany
Posts: 5,727
Thanks: 101
Thanked 1,506 Times in 1,415 Posts
What about analysing and trying to understand mjf's suggestion, and then adapting it YOURSELF to your new specification?
Sponsored Links
    #6  
Old Unix and Linux 08-22-2013
jnrohit2k jnrohit2k is offline
Registered User
 
Join Date: May 2013
Last Activity: 16 March 2014, 7:24 PM EDT
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
Sorry, I did a mistake here. The awk command should be;

awk -F';' '{split($7,a,","); print $17 "%" a[3] "%" $15 "%" $16 "%" $0}' temp1.dat | sort -t '%' -k 1r -k 2n -k 3n -k 4nr

It gave me following output but it did not sort first bold field correctly. Still trying....


Code:
 
IP%34082%20120203%20120301%CLMREQ;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
IP%18663%20130101%20130331%CLMREQ;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
IP%18663%20121201%20130229%CLMREQ;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
IP%18663%20121201%20130101%CLMREQ;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
IP%100009394%20120103%20120401%CLMREQ;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
GC%18663%20130101%20130430%CLMREQ;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
GC%100010463%20120103%20120401%CLMREQ;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
GC%100010463%20110203%20110501%CLMREQ;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013
  %18663%20121201%20130131%CLMREQ;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
  %18663%20121101%20130531%CLMREQ;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013

Thanks!
Sponsored Links
    #7  
Old Unix and Linux 08-22-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 21 April 2015, 4:50 AM EDT
Location: Aachen, Germany
Posts: 5,727
Thanks: 101
Thanked 1,506 Times in 1,415 Posts
You should modify the sort parameters:
Code:
sort -t '%' -k1,1r -k2,2n -k3,3n -k4,4nr

-k2n starts a key (field) 2 and runs to the end of line.
-k2,2 starts at 2 and stops at 2.
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Unix or Linux Image More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk - mixed for and if to select particular lines in a data file naska Shell Programming and Scripting 4 07-05-2013 02:58 PM
How to use FS for mixed file? Akshay Hegde Shell Programming and Scripting 5 03-08-2013 01:26 PM
Advanced: Sort, count data in column, append file name JamesT Shell Programming and Scripting 14 08-14-2012 07:36 AM
Ignore Header and Footer and Sort the data in fixed width file sasikari Shell Programming and Scripting 5 07-14-2011 11:42 AM
Sort a big data file rubber08 Shell Programming and Scripting 2 10-18-2010 07:05 AM



All times are GMT -4. The time now is 05:49 AM.