Pattern Match and Rearrange the Fields in UNIX


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Match and Rearrange the Fields in UNIX
# 8  
Old 10-16-2015
Please use code tags as required by forum rules!

Difficult to believe. The default action for the "1" pattern is "print the line in $0 unconditionally".

BTW, the "subject" might be lead in by an upper case "S".
# 9  
Old 10-16-2015
Quote:
Originally Posted by zaxxon
Hi Ravinder,
if it is not an error by the OP, he wants to have the order not reversed but "D B C A".
Thank you Zaxxon. I have tried to make it more generic. May be following can help OP.
Code:
awk --re-interval '{match($0,/[A-Z]=\"[0-9]{4}\-[0-9]{2}\-[0-9]{2}\"/);val3=substr($0,RSTART,RLENGTH)};{match($0,/[A-Z]=\"[0-1]{12}\"/);val2=substr($0,RSTART,RLENGTH);{match($0,/[A-Z]=\"[0-9]{7}\"/);val4=substr($0,RSTART,RLENGTH)};{match($0,/[A-Z]=\"[a-zA-Z]+\"/);val5=substr($0,RSTART,RLENGTH);print "<Subject " val2 OFS val4 OFS val3 OFS val5 ">"}}' Input_file

Output will be as follows.
Code:
<Subject D="010101010101" B="1039502" C="2015-06-30" A="I">

Now in above solution I have assumed that pattern of Input_file will be like [A-Z]="[A-Za-z]" then [A-Z]=[0-9] till 7 digits then [A-Z]=[0-9] 4 digits -[0-9] 2 digits - [0-9] 2 digits like YYYY-MM-DD format and finally [A-Z]=[0-1] till 12 digits. So if all Input_file is having mentioned syntax input then above solution may help OP.

EDIT: Adding a non one-liner form of solution as follows.
Code:
awk --re-interval '{
                        match($0,/[A-Z]=\"[0-9]{4}\-[0-9]{2}\-[0-9]{2}\"/);
                        val3=substr($0,RSTART,RLENGTH)};
                   {
                        match($0,/[A-Z]=\"[0-1]{12}\"/);
                        val2=substr($0,RSTART,RLENGTH);
                   {
                        match($0,/[A-Z]=\"[0-9]{7}\"/);
                        val4=substr($0,RSTART,RLENGTH)};
                   {
                        match($0,/[A-Z]=\"[a-zA-Z]+\"/);
                        val5=substr($0,RSTART,RLENGTH);
                        print "<Subject " val2 OFS val4 OFS val3 OFS val5 ">"
                   }
                   }
                  ' Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 10-16-2015 at 08:15 AM.. Reason: Added a non one-liner form of solution.
# 10  
Old 10-16-2015
I've come-up with this using (GNU) sed.
Started to learn the command and would gladly appreciate input on efficiency and also stylistically!

Code:
sed -n -r 's/(<Subject )(.*)(.* )(.*)(.* )(.*)(.* )(.*)>/\1\8 \6 \4 \2>/p' testfile

<Subject D="010101010101" C="2015-06-30" B="1039502" A="I">

Thanks
# 11  
Old 10-16-2015
.* stands for "anystring", so .*.* is anystring followed by anystring, which is equivalent to .* . Your script yields the same as
Code:
sed -n -r 's/(<Subject )(.* )(.* )(.* )(.*)>/\1\5 \4 \3 \2>/p' file

. And it does not print the unmodified, i.e. unmatched lines, which are requested by the OP.
This User Gave Thanks to RudiC For This Post:
# 12  
Old 10-16-2015
The command:
Code:
sed -n -r 's/(<Subject )(.*)(.* )(.*)(.* )(.*)(.* )(.*)>/\1\8 \6 \4 \2>/p' testfile

should work because of greedy matching forcing the 1st .* to grab everything that the following .* could also grab. Stylistically, I generally avoid putting expressions in parentheses when I don't need the string matched by that expression in the replacement. With that in mind consider this simplification:
Code:
sed -n -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \4 \3 \2>/p' testfile

But, the requested output was:
Code:
<Subject D="010101010101" B="1039502" C="2015-06-30" A="I">

while the above sed scripts produce the output:
Code:
<Subject D="010101010101" C="2015-06-30" B="1039502" A="I">

The requested output could be achieved by rearranging the replacement string references:
Code:
sed -n -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \3 \4 \2>/p' testfile

And, of course, to print the lines that don't match the search pattern without changing them, get rid of the -n option:
Code:
sed -r 's/(<Subject )(.*) (.*) (.*) (.*)>/\1\5 \3 \4 \2>/p' testfile

And, if you don't have GNU sed (with the -r option), it can be done with standard basic regular expressions with:
Code:
sed 's/\(<Subject \)\(.*\) \(.*\) \(.*\) \(.*\)>/\1\5 \3 \4 \2>/p' testfile

This User Gave Thanks to Don Cragun For This Post:
# 13  
Old 10-25-2015
Input
Code:
<Subject Q="I" W="1039502" E="2015-06-30" R="010101010101">

Output
Code:
<Subject R="010101010101" W="1039502" E="2015-06-30" Q="I">

Code
Code:
awk '/subject/ {sub(/>/,"",$NF); print $1,$5,$3,$4,$2">"; next} 1' infile > outfile


Thanks for all the help, this seems to be working fine. But can we tweak this awk one-liner so that it handles all the cases in the input, meaning to say in the input we dont have a fixed order for
Q W E R (they can be any positions in the input) but we need to search for them and place the output in R W E Q order


Thanks so much for all your help thus far
# 14  
Old 10-25-2015
Try:
Code:
awk '
  /Subject/ {
    split($0,F,/(^|=)[^ ]*( |$)/)
    for(i in F) P[F[i]]=i
    sub(/>/,x)
    print $1,$P["R"],$P["W"],$P["E"],$P["Q"] ">"
    next
  }
  1
' file

Or using the order as a variable:
Code:
awk -v order="R W E Q" '
  BEGIN{
    split(order,O)
  } 
  /Subject/ {
    split($0,F,/(^|=)[^ ]*( |$)/)
    for(i in F) P[F[i]]=i
    sub(/>/,x)
    print $1,$P[O[1]],$P[O[2]],$P[O[3]],$P[O[4]] ">"
    next
  }
  1
' file

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Rearrange fields of delimited text file

I want to rearrange the fields of delimited text file after sorting first line (only): input file: a_13;a_2;a_1;a_10 13;2;1;10 the result should be: a_1;a_2;a_10;a_13 1;2;10;13 any help would be appreciated andy (20 Replies)
Discussion started by: andy2000
20 Replies

2. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

4. Shell Programming and Scripting

Using awk to rearrange fields

Hi, I am required to arrange columns of a file i.e make the 15th column into the 1st column. I am doing awk 'begin {fs=ofs=","} {print $15,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' ad.data>ad.csv the problem is that column 15 gets to column 1 but it is not comma separated with the... (10 Replies)
Discussion started by: seddoubt
10 Replies

5. UNIX for Dummies Questions & Answers

Match Pattern after certain pattern and Print words next to Pattern

Hi experts , im new to Unix,AWK ,and im just not able to get this right. I need to match for some patterns if it matches I need to print the next few words to it.. I have only three such conditions to match… But I need to print only those words that comes after satisfying the first condition..... (2 Replies)
Discussion started by: 100bees
2 Replies

6. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

7. Shell Programming and Scripting

Add fields in different files only if some fields between them match

Hi everybody (first time posting here) I have a file1 that looks like > 1,101,0.1,0.1 1,26,0.1,0.1 1,3,0.1,0.1 1,97,0.5,0.5 1,98,8.1,0.218919 1,99,6.2,0.248 2,101,0.1,0.1 2,24,3.1,0.147619 2,25,23.5,0.559524 2,26,34,0.723404with 762 lines.. I have another 'similar' file2 > ... (10 Replies)
Discussion started by: murpholinox
10 Replies

8. Shell Programming and Scripting

AWK break string into fields + pattern match

I am trying to break a string into separate fields and print the field that matches a pattern. I am using awk at the moment and have gotten this far: awk '{for(i=1;i<=NF;++i)print "\t" $i}' longstring This breaks the string into fields and prints each field on a separate line. I want to add... (2 Replies)
Discussion started by: Moxy
2 Replies

9. Shell Programming and Scripting

Match first pattern first then extract second pattern match

My input file: <accession>Q91G55</accession> <name>043L_IIV6</name> <protein> <recommendedName> <location> <position position="294"/> </location> <fullName>Uncharacterized protein 043L</fullName> <accession>P18556</accession> <name>1106L_ASFB7</name> <protein> <recommendedName>... (5 Replies)
Discussion started by: patrick87
5 Replies

10. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies
Login or Register to Ask a Question