Rearrange fields of delimited text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Rearrange fields of delimited text file
# 1  
Old 09-07-2017
In that case:

Code:
awk '
function rbefore(STR)   { return(substr(STR, 0, RSTART-1)); }# before match
function rall(STR)      { return(substr(STR, RSTART, RLENGTH)); }# Entire match
function rafter(STR)    { return(substr(STR, RSTART+RLENGTH)); }# after

# First line:  Reformat headers for sorting and find new order
NR==1 {
    for(N=1; N<=NF; N++)
    {
        IN=$N
        OUT=""

        # Convert a_3 to a_00000003 so it will sort
        while(match(IN, /[0-9]+/))
        {
                OUT=OUT rbefore(IN) sprintf("%08d", rall(IN));
                IN=rafter(IN);
        }

        OUT=OUT IN;
        A[OUT]=N # Creating an array of A["string_0001"]=N
    }

    C=asorti(A, B); # Sort it into B[1]="string_0001", B[2]="string_0003" etc
    for(X in B) D[X]=A[B[X]]; # D[1]=4, maps in to out column
}

# All lines: Assemble string from column order and print
{
        OUT=""
        for(N=1; N<=C; N++) OUT=OUT OFS $(D[N])
        print substr(OUT,2);
}

' FS=";" OFS=";" inputfile > outputfile

Also for GNU awk.
This User Gave Thanks to Corona688 For This Post:
# 2  
Old 09-07-2017
Try also
Code:
read LINE < file
echo $LINE | tr '_;' ' '$'\n' | sort -k2n | tr ' ' '_' | awk -F";" '
FNR == NR       {T[$0] = NR
                 MX = NR
                 next
                }
FNR == 1        {for (i=1; i<=NF; i++)  S[T[$i]] = i
                }
                {for (i=1; i<=MX; i++)  printf "%s%s", $(S[i]), (i==MX)?ORS:FS
                }
' - file


EDIT: Or, if you have the line tool to copy exactly one line from stdin to stdout:
Code:
line < file2 | tr ';' $'\n' | sort -t_ -k2n | awk ...


Last edited by RudiC; 09-07-2017 at 03:08 PM..
These 2 Users Gave Thanks to RudiC For This Post:
# 3  
Old 09-07-2017
thank you all

@Corona688: you say "When you say "only first line", what do you mean? Clearly the second line is changed too. "
..and that what I mean

---------- Post updated at 04:24 PM ---------- Previous update was at 04:13 PM ----------

@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

Smilie

Last edited by RudiC; 09-09-2017 at 01:18 PM..
# 4  
Old 09-07-2017
Quote:
Originally Posted by andy2000
...
...
@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

...
The Perl script will not work either because the numeric part of the header columns moved to the left of the underscore ("_") instead of the right.

So, the index to compare after the "split" function should be 0 and not 1.

Code:
$ 
$ cat input_1.txt
1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3
$ 
$ 
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        @s = map{ $x{$_} } sort { (split "_", $a)[0] <=> (split "_", $b)[0] } @F;
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' input_1.txt
1_ICD;3_ICD;11_ICD;15_ICD
a1;a3;a11;a15
$ 
$

If, however, the numeric part in the header row can be on either side of the underscore, like so:

Code:
$ 
$ cat input_2.txt
1_ICD;ICD_11;15_ICD;ICD_3
a1;a11;a15;a3
$ 
$

then things get a bit more serious Smilie

Code:
$ 
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        %y = map{ $m = $_; ($n = $m) =~ s/\D//g; $m => $n } @F;
                        @s = map{ $x{$_} } sort { $y{$a} <=> $y{$b} } keys(%y);
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' input_2.txt
1_ICD;ICD_3;ICD_11;15_ICD
a1;a3;a11;a15
$ 
$

Of course, the last script will not work for a header row that could have numeric part on both sides of the underscore, like so:

Code:
1_ICD;2_ICD_6;5_ICD;ICD_7
aa;bb;cc;dd

In that case, it has to be first determined if the numeric part of 2nd column header is 2, 6 or 26.
This User Gave Thanks to durden_tyler For This Post:
# 5  
Old 09-07-2017
Hi.

Making lots of assumptions about the input data, here is a solution that transposes the file, sorts it (in a hybrid manner), and re-transposes:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate sort headers, carrying data fields, datamash, msort

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf datamash msort

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
head $FILE

pl " Expected output:"
head $E

# See f3 and f2 for intermediate output.
pl " Results:"
datamash -t ';' transpose < $FILE |
tee f3 |
msort -q -l -n 1,1 -d ';' --comparison-type hybrid |
tee f2 |
datamash -t ';' transpose |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

pl " Some detail for datamash, msort:"
dixf datamash msort

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.8 (jessie) 
bash GNU bash 4.3.30
dixf (local) 1.50
datamash (GNU datamash) 1.0.6
msort 8.53

-----
 Input data file data1:
a_13;a_2;a_1;a_10
13;2;1;10

-----
 Expected output:
a_1;a_2;a_10;a_13
1;2;10;13

-----
 Results:
a_1;a_2;a_10;a_13
1;2;10;13

-----
 Verify results if possible:

-----
 Comparison of 2 created lines with 2 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.

-----
 Some detail for datamash, msort:

datamash        command-line calculations (man)
Path    : /usr/bin/datamash
Version : 1.0.6
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with -h,--help
Repo    : Debian 8.8 (jessie) 
Home    : https://savannah.gnu.org/projects/datamash/ (pm)

msort   sort records in complex ways (man)
Path    : /usr/bin/msort
Version : 8.53
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.8 (jessie) 
Home    : http://www.billposer.org/Software/msort.html (pm)

Best wishes ... cheers, drl
# 6  
Old 09-08-2017
Quote:
Originally Posted by andy2000
.
.
.
@RudiC: your solution doesn't work correctly:

1_ICD;11_ICD;15_ICD;3_ICD
a1;a11;a15;a3

Smilie
It does, on the sample data given. If you change the data's structure, you need to adapt the code as well. For you new data structure, don't use the second field for the sort key but the first.
# 7  
Old 09-07-2017
Note:
--

tr understands the newline escape sequence \n so there is no need to use a hard newline character..

so
Code:
tr ';' '\n'

can be used to translate semicolons into newlines.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pattern Match and Rearrange the Fields in UNIX

For an Output like below Input : <Subject A="I" B="1039502" C="2015-06-30" D="010101010101"> Output : <Subject D="010101010101" B="1039502" C="2015-06-30" A="I"> I have been using something like below but not getting the desired output : awk -F ' ' '/Subject/ BEGIN{OFS=" ";}... (19 Replies)
Discussion started by: arunkesi
19 Replies

2. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

3. Shell Programming and Scripting

Using awk to rearrange fields

Hi, I am required to arrange columns of a file i.e make the 15th column into the 1st column. I am doing awk 'begin {fs=ofs=","} {print $15,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' ad.data>ad.csv the problem is that column 15 gets to column 1 but it is not comma separated with the... (10 Replies)
Discussion started by: seddoubt
10 Replies

4. Shell Programming and Scripting

Split a free form text delimited by space to words with other fields

Hi, I need your help for below with shell scripting or perl I/P key, Sentence customer1, I am David customer2, I am Taylor O/P Key, Words Customer1,I Customer1,am Customer1,David Customer2,I Customer2,am Customer2,Taylor (4 Replies)
Discussion started by: monishathampi
4 Replies

5. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Hi, I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number. File1 ==== name,desgnation,doj,project #header#... (7 Replies)
Discussion started by: machomaddy
7 Replies

6. UNIX for Advanced & Expert Users

Problem while counting number of fields in TAB delimited file

I'm facing a strange problem, please help me out. Here we go. I want to count number of fields in particular file. filename and delimiter character will be passed through parameter. On command prompt if i type following i get 27 as output (which is correct) cat customer.dat | head -1 | awk... (12 Replies)
Discussion started by: vikanna
12 Replies

7. Shell Programming and Scripting

Rearrange the text file

Gents, I have a large file and each line of the file contains more than 200 bytes.Please let me a way to have the new line to start when the word "FIT" appears. I was trialling with 'tr' command but i am not sure how to get it based on bytes and so it wasn't working... Current... (3 Replies)
Discussion started by: appu2176
3 Replies

8. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ... (2 Replies)
Discussion started by: clintrpeterson
2 Replies

9. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies

10. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Hi, I have a comma delimited file. I want to sort the fields alphabetically and again store them in a comma delimited file. For example, My file looks like this. abc,aaa,xyz,xxx,def pqr,ggg,eee,iii,qqq zyx,lmo,pqr,abc,fff and I want my output to look like this, all fields sorted... (3 Replies)
Discussion started by: swethapatil
3 Replies
Login or Register to Ask a Question