Pattern Match and Rearrange the Fields in UNIX

10-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Please use code tags as required by forum rules!

Difficult to believe. The default action for the "1" pattern is "print the line in $0 unconditionally".

BTW, the "subject" might be lead in by an upper case "S".

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-16-2015

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by zaxxon

Hi Ravinder,
if it is not an error by the OP, he wants to have the order not reversed but "D B C A".

Thank you Zaxxon. I have tried to make it more generic. May be following can help OP.

Code:

awk --re-interval '{match($0,/[A-Z]=\"[0-9]{4}\-[0-9]{2}\-[0-9]{2}\"/);val3=substr($0,RSTART,RLENGTH)};{match($0,/[A-Z]=\"[0-1]{12}\"/);val2=substr($0,RSTART,RLENGTH);{match($0,/[A-Z]=\"[0-9]{7}\"/);val4=substr($0,RSTART,RLENGTH)};{match($0,/[A-Z]=\"[a-zA-Z]+\"/);val5=substr($0,RSTART,RLENGTH);print "<Subject " val2 OFS val4 OFS val3 OFS val5 ">"}}' Input_file

Output will be as follows.

Code:

<Subject D="010101010101" B="1039502" C="2015-06-30" A="I">

Now in above solution I have assumed that pattern of Input_file will be like [A-Z]="[A-Za-z]" then [A-Z]=[0-9] till 7 digits then [A-Z]=[0-9] 4 digits -[0-9] 2 digits - [0-9] 2 digits like YYYY-MM-DD format and finally [A-Z]=[0-1] till 12 digits. So if all Input_file is having mentioned syntax input then above solution may help OP.

EDIT: Adding a non one-liner form of solution as follows.

Code:

awk --re-interval '{
                        match($0,/[A-Z]=\"[0-9]{4}\-[0-9]{2}\-[0-9]{2}\"/);
                        val3=substr($0,RSTART,RLENGTH)};
                   {
                        match($0,/[A-Z]=\"[0-1]{12}\"/);
                        val2=substr($0,RSTART,RLENGTH);
                   {
                        match($0,/[A-Z]=\"[0-9]{7}\"/);
                        val4=substr($0,RSTART,RLENGTH)};
                   {
                        match($0,/[A-Z]=\"[a-zA-Z]+\"/);
                        val5=substr($0,RSTART,RLENGTH);
                        print "<Subject " val2 OFS val4 OFS val3 OFS val5 ">"
                   }
                   }
                  ' Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 10-16-2015 at 08:15 AM.. Reason: Added a non one-liner form of solution.

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

10-16-2015

Registered User

38, 5

Join Date: Oct 2011

Last Activity: 23 October 2015, 7:43 AM EDT

Location: Kan-ada

Posts: 38

Thanks Given: 6

Thanked 5 Times in 5 Posts

I've come-up with this using (GNU) sed.
Started to learn the command and would gladly appreciate input on efficiency and also stylistically!

Code:

sed -n -r 's/(<Subject )(.*)(.* )(.*)(.* )(.*)(.* )(.*)>/\1\8 \6 \4 \2>/p' testfile

<Subject D="010101010101" C="2015-06-30" B="1039502" A="I">

Thanks

Klasform

View Public Profile for Klasform

Find all posts by Klasform

10-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

.* stands for "anystring", so .*.* is anystring followed by anystring, which is equivalent to .* . Your script yields the same as

Code:

sed -n -r 's/(<Subject )(.* )(.* )(.* )(.*)>/\1\5 \4 \3 \2>/p' file

. And it does not print the unmodified, i.e. unmatched lines, which are requested by the OP.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-16-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The command:

Code:

sed -n -r 's/(<Subject )(.*)(.* )(.*)(.* )(.*)(.* )(.*)>/\1\8 \6 \4 \2>/p' testfile

should work because of greedy matching forcing the 1st .* to grab everything that the following .* could also grab. Stylistically, I generally avoid putting expressions in parentheses when I don't need the string matched by that expression in the replacement. With that in mind consider this simplification:

Code:

sed -n -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \4 \3 \2>/p' testfile

But, the requested output was:

Code:

<Subject D="010101010101" B="1039502" C="2015-06-30" A="I">

while the above sed scripts produce the output:

Code:

<Subject D="010101010101" C="2015-06-30" B="1039502" A="I">

The requested output could be achieved by rearranging the replacement string references:

Code:

sed -n -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \3 \4 \2>/p' testfile

And, of course, to print the lines that don't match the search pattern without changing them, get rid of the -n option:

Code:

sed -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \3 \4 \2>/p' testfile

And, if you don't have GNU sed (with the -r option), it can be done with standard basic regular expressions with:

Code:

sed 's/\(<Subject \)\(.*\) \(.*\) \(.*\) \(.*\)>/\1\5 \3 \4 \2>/p' testfile

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-25-2015

Registered User

14, 0

Join Date: Feb 2014

Last Activity: 5 September 2019, 4:46 PM EDT

Posts: 14

Thanks Given: 2

Thanked 0 Times in 0 Posts

Input

Code:

<Subject Q="I" W="1039502" E="2015-06-30" R="010101010101">

Output

Code:

<Subject R="010101010101" W="1039502" E="2015-06-30" Q="I">

Code

Code:

awk '/subject/ {sub(/>/,"",$NF); print $1,$5,$3,$4,$2">"; next} 1' infile > outfile

Thanks for all the help, this seems to be working fine. But can we tweak this awk one-liner so that it handles all the cases in the input, meaning to say in the input we dont have a fixed order for
Q W E R (they can be any positions in the input) but we need to search for them and place the output in R W E Q order

Thanks so much for all your help thus far

arunkesi

View Public Profile for arunkesi

Find all posts by arunkesi

10-25-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try:

Code:

awk '
  /Subject/ {
    split($0,F,/(^|=)[^ ]*( |$)/)
    for(i in F) P[F[i]]=i
    sub(/>/,x)
    print $1,$P["R"],$P["W"],$P["E"],$P["Q"] ">"
    next
  }
  1
' file

Or using the order as a variable:

Code:

awk -v order="R W E Q" '
  BEGIN{
    split(order,O)
  } 
  /Subject/ {
    split($0,F,/(^|=)[^ ]*( |$)/)
    for(i in F) P[F[i]]=i
    sub(/>/,x)
    print $1,$P[O[1]],$P[O[2]],$P[O[3]],$P[O[4]] ">"
    next
  }
  1
' file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Pattern Match and Rearrange the Fields in UNIX

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Rearrange fields of delimited text file

Discussion started by: andy2000

2. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

Discussion started by: cmccabe

3. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Discussion started by: invinzin21

4. Shell Programming and Scripting

Using awk to rearrange fields

Discussion started by: seddoubt

5. UNIX for Dummies Questions & Answers

Match Pattern after certain pattern and Print words next to Pattern

Discussion started by: 100bees

6. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

Discussion started by: chidori

7. Shell Programming and Scripting

Add fields in different files only if some fields between them match

Discussion started by: murpholinox

8. Shell Programming and Scripting

AWK break string into fields + pattern match

Discussion started by: Moxy

9. Shell Programming and Scripting

Match first pattern first then extract second pattern match

Discussion started by: patrick87

10. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

Discussion started by: axo959