Find and replace single character w/awk given conditions

07-16-2015

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Find and replace single character w/awk given conditions

I have a file that looks like this:

Code:

14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                          verb                      to observe
15656      KC                          verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

When there are two characters in $2, $3 does not line up when there are other strings in $3. That is to say, when there are three characters in $2, then there are 25 spaces until $3 should begin. However, then there are two characters in $2, then there are 26 spaces and this throws off the justification of both $3 and >=$4.

What I want to do is search for when $3 begins on the 40th character and delete a space so that it begins on the 39th.

Thus:

Code:

14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                         verb                      to observe
15656      KC                         verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

In order to do this, I've attempted this code awk code, but have had trouble combining conditional statements with substrings and substitutions.

Code:

gawk '{if(substr($0,39,1)==" " && $3 ~/verb/); sub(/^ verb/,"verb", $3);print}' file.txt

I've also tried this:

Code:

gawk '{if(substr($0,39,1)==" " && $3 ~/verb/); sub(substr($0,39,2)," ");print}' file.txt

...and this with a variable:

Code:

gawk '$3 ~/^verb$/{X=substr($0,39,1); sub(/ /,"",X)} 1 {print}' SEDRAt

Perhaps I'm going at this all wrong, but ideally what I'd like is all of my columns to line up, but since my last column will have multiple spaces in it, I've had difficulty executing printf(). Perhaps there is some iteration of FIXEDWIDTH that is escaping me. Nevertheless, I need to be able to learn how to effectively combine conditionals, substrings, and substitutions in awk so this is why I'm asking for help in this manner.

Thank you all so much.

jvoot

View Public Profile for jvoot

Find all posts by jvoot

07-16-2015

Registered User

945, 306

Join Date: Jun 2011

Last Activity: 1 January 2020, 5:25 PM EST

Location: South Carolina, USA

Posts: 945

Thanks Given: 32

Thanked 306 Times in 284 Posts

FIELDWIDTHS seemed to only work on splitting. Then needed printf format specifiers to output them pretty again. This strips whitespace from beginning of all fields.

Code:

mute@tiny:~$ gawk -v FIELDWIDTHS='11 27 26 999' '{for (i=1;i<=NF;i++)gsub(/^  *|  *$/,"",$i);printf("%-11s%-27s%-26s%s\n",$1,$2,$3,$4);}' input
14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                         verb                      to observe
15656      KC                         verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

This User Gave Thanks to neutronscott For This Post:

neutronscott

View Public Profile for neutronscott

Visit neutronscott's homepage!

Find all posts by neutronscott

07-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

It's not FIXEDWIDTH but FIELDWIDTH. Try defining FIELDWIDTH="11 27 26 30" (last field some guess), and then sub (/^ /, "", $3).

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-16-2015

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Thanks so much for this RudiC and sorry for the embarrassing miscue on FIELDWIDTH. While Neutronscott's code nails it, for some reason I cannot get your suggestion to work although it seems as though it should. I must confess that seem to mess up sub() and gsub() quite often. Here is my code, as per your suggestion:

Code:

gawk -v FIELDWIDTHS='11 27 26 999' '{sub(/^ /, "", $3); print}' input

Did I do something amiss?

jvoot

View Public Profile for jvoot

Find all posts by jvoot

07-16-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

How about:

Code:

awk 'length($2)==2{sub($2 FS, $2)}1' file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-17-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by jvoot

Code:

gawk -v FIELDWIDTHS='11 27 26 999' '{sub(/^ /, "", $3); print}' input

Did I do something amiss?

That will remove at most one leading space. I don't have gawk on my system and awk on my system doesn't support the FIELDWIDTHS variable, so I can't test anything using it. But, neutronscott's suggestion might be a good starting point. (Have you tried it?)

It would seem that adjusting field 3 (with:

Code:

sub(/^ */, "", $3)

not:

Code:

sub(/^ /, "", $3)

) would line up field 3; but might not fix alignment of field 4.

Assuming field widths for your 1st three fields of 11 characters, 27 characters, and 26 characters (guessing from jvoot's sample output), the following should work with any awk that adheres to the POSIX standards' requirements (implementing a subset of the gawk field widths option using standard awk features):

Code:

awk -v FIELDWIDTHS='11 27 26 999' '
BEGIN {	nf = split(FIELDWIDTHS, fw)
	sc = 1
	for(i = 1; i < nf; i++) {
		fs[i] = sc
		sc += fw[i]
	}
}
{	for(i = 1; i < nf; i++) {
		f[i] = substr($0, fs[i], fw[i])
		gsub(/^ *| *$/, "", f[i])
		printf("%-*s", fw[i], f[i])
	}
	f[i] = substr($0, sc)
	gsub(/^ *| *$/, "", f[i])
	print f[i]
}' file

This should line up all fields as long as data from one field doesn't spill over into following fields (other than <space>s). This will align any number of fields (not just 4) with their sizes specified by the list provided in the FIELDWIDTHS variable. The last field width specified is unimportant. This code assumes that the last field starts at the calculated starting column (based on earlier field widths) and runs to the end of the line; no trailing spaces are included in the output of the last output field. With jvoot's sample input, it produces the output jvoot requested.

Note that many implementations of awk do not adhere to the standards when using substr() on fields that contain multi-byte characters. (The standards say that start and length parameters count characters; some awk implementations count bytes instead. As long as you're only dealing with single-byte characters, both work correctly.)

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-17-2015

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Being an awk-ignorant i might be mistaken, but isn't the following much easier as a reformatting filter:

Code:

awk '$0=sprintf( "<some-format>\n", $1, $2, $3, $4);' /path/to/infile

For instance, to get the above alignment:

Code:

awk '$0=sprintf( "%-11s%-27s%-26s%s\n", $1, $2, $3, $4);' /path/to/infile

I hope this helps.

bakunin

This User Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

UNIX for Dummies Questions & Answers

Find and replace single character w/awk given conditions

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find multiple strings and replace single string

Discussion started by: chettyravi

2. Shell Programming and Scripting

Find character and Replace character for given position

Discussion started by: Sara183

3. Shell Programming and Scripting

Find and replace a character

Discussion started by: baskivs

4. Shell Programming and Scripting

awk - setting fs to equal any single character

Discussion started by: chronics

5. Shell Programming and Scripting

Replace multiple occurances of same character with a single character.

Discussion started by: dipanchandra

6. UNIX for Dummies Questions & Answers

find single quote in a string and replace it

Discussion started by: yogichavan

7. Shell Programming and Scripting

How do you print a single quote character in AWK

Discussion started by: cold_Que

8. Shell Programming and Scripting

Script to multiple find and replace in a single file

Discussion started by: wildhorse

9. Shell Programming and Scripting

AWK: replace single positional character given variables

Discussion started by: System Shock

10. Shell Programming and Scripting

Matching multiples of a single character using sed and awk

Discussion started by: royalibrahim