Rearrange fields of delimited text file

09-07-2017

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

In that case:

Code:

awk '
function rbefore(STR)   { return(substr(STR, 0, RSTART-1)); }# before match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after

# First line:  Reformat headers for sorting and find new order
NR==1 {
    for(N=1; N<=NF; N++)
    {
        IN=$N
        OUT=""

        # Convert a_3 to a_00000003 so it will sort
        while(match(IN, /[0-9]+/))
        {
                OUT=OUT rbefore(IN) sprintf("%08d", rall(IN));
                IN=rafter(IN);
        }

        OUT=OUT IN;
        A[OUT]=N # Creating an array of A["string_0001"]=N
    }

    C=asorti(A, B); # Sort it into B[1]="string_0001", B[2]="string_0003" etc
    for(X in B) D[X]=A[B[X]]; # D[1]=4, maps in to out column
}

# All lines: Assemble string from column order and print
{
        OUT=""
        for(N=1; N<=C; N++) OUT=OUT OFS $(D[N])
        print substr(OUT,2);
}

' FS=";" OFS=";" inputfile > outputfile

Also for GNU awk.

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

09-07-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try also

Code:

read LINE < file
echo $LINE | tr '_;' ' '$'\n' | sort -k2n | tr ' ' '_' | awk -F";" '
FNR == NR       {T[$0] = NR
                 MX = NR
                 next
                }
FNR == 1        {for (i=1; i<=NF; i++)  S[T[$i]] = i
                }
                {for (i=1; i<=MX; i++)  printf "%s%s", $(S[i]), (i==MX)?ORS:FS
                }
' - file

EDIT: Or, if you have the line tool to copy exactly one line from stdin to stdout:

Code:

line < file2 | tr ';' $'\n' | sort -t_ -k2n | awk ...

Last edited by RudiC; 09-07-2017 at 03:08 PM..

These 2 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-07-2017

Registered User

52, 0

Join Date: Mar 2007

Last Activity: 13 September 2017, 3:17 PM EDT

Posts: 52

Thanks Given: 7

Thanked 0 Times in 0 Posts

thank you all

@Corona688: you say "When you say "only first line", what do you mean? Clearly the second line is changed too. "
..and that what I mean

---------- Post updated at 04:24 PM ---------- Previous update was at 04:13 PM ----------

@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

Last edited by RudiC; 09-09-2017 at 01:18 PM..

andy2000

View Public Profile for andy2000

Find all posts by andy2000

09-07-2017

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Quote:

Originally Posted by andy2000

...
...
@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

...

The Perl script will not work either because the numeric part of the header columns moved to the left of the underscore ("_") instead of the right.

So, the index to compare after the "split" function should be 0 and not 1.

Code:

$ 
$ cat input_1.txt
1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3
$ 
$ 
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        @s = map{ $x{$_} } sort { (split "_", $a)[0] <=> (split "_", $b)[0] } @F;
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' input_1.txt
1_ICD;3_ICD;11_ICD;15_ICD
a1;a3;a11;a15
$ 
$

If, however, the numeric part in the header row can be on either side of the underscore, like so:

Code:

$ 
$ cat input_2.txt
1_ICD;ICD_11;15_ICD;ICD_3
a1;a11;a15;a3
$ 
$

then things get a bit more serious

Code:

$ 
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        %y = map{ $m = $_; ($n = $m) =~ s/\D//g; $m => $n } @F;
                        @s = map{ $x{$_} } sort { $y{$a} <=> $y{$b} } keys(%y);
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' input_2.txt
1_ICD;ICD_3;ICD_11;15_ICD
a1;a3;a11;a15
$ 
$

Of course, the last script will not work for a header row that could have numeric part on both sides of the underscore, like so:

Code:

1_ICD;2_ICD_6;5_ICD;ICD_7
aa;bb;cc;dd

In that case, it has to be first determined if the numeric part of 2nd column header is 2, 6 or 26.

This User Gave Thanks to durden_tyler For This Post:

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

09-07-2017

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Making lots of assumptions about the input data, here is a solution that transposes the file, sorts it (in a hybrid manner), and re-transposes:

Code:

#!/usr/bin/env bash

# @(#) s1       Demonstrate sort headers, carrying data fields, datamash, msort

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf datamash msort

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
head $FILE

pl " Expected output:"
head $E

# See f3 and f2 for intermediate output.
pl " Results:"
datamash -t ';' transpose < $FILE |
tee f3 |
msort -q -l -n 1,1 -d ';' --comparison-type hybrid |
tee f2 |
datamash -t ';' transpose |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

pl " Some detail for datamash, msort:"
dixf datamash msort

exit 0

producing:

Code:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.8 (jessie) 
bash GNU bash 4.3.30
dixf (local) 1.50
datamash (GNU datamash) 1.0.6
msort 8.53

-----
 Input data file data1:
a_13;a_2;a_1;a_10
13;2;1;10

-----
 Expected output:
a_1;a_2;a_10;a_13
1;2;10;13

-----
 Results:
a_1;a_2;a_10;a_13
1;2;10;13

-----
 Verify results if possible:

-----
 Comparison of 2 created lines with 2 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.

-----
 Some detail for datamash, msort:

datamash        command-line calculations (man)
Path    : /usr/bin/datamash
Version : 1.0.6
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with -h,--help
Repo    : Debian 8.8 (jessie) 
Home    : https://savannah.gnu.org/projects/datamash/ (pm)

msort   sort records in complex ways (man)
Path    : /usr/bin/msort
Version : 8.53
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.8 (jessie) 
Home    : http://www.billposer.org/Software/msort.html (pm)

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

09-08-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by andy2000

.
.
.
@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

Smilie

It does, on the sample data given. If you change the data's structure, you need to adapt the code as well. For you new data structure, don't use the second field for the sort key but the first.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-07-2017

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Note:
--

tr understands the newline escape sequence \n so there is no need to use a hard newline character..

so

Code:

tr ';' '\n'

can be used to translate semicolons into newlines.

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Rearrange fields of delimited text file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pattern Match and Rearrange the Fields in UNIX

Discussion started by: arunkesi

2. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Discussion started by: raja kakitapall

3. Shell Programming and Scripting

Using awk to rearrange fields

Discussion started by: seddoubt

4. Shell Programming and Scripting

Split a free form text delimited by space to words with other fields

Discussion started by: monishathampi

5. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Discussion started by: machomaddy

6. UNIX for Advanced & Expert Users

Problem while counting number of fields in TAB delimited file

Discussion started by: vikanna

7. Shell Programming and Scripting

Rearrange the text file

Discussion started by: appu2176

8. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

Discussion started by: clintrpeterson

9. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

Discussion started by: axo959

10. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Discussion started by: swethapatil