Find and replace single character w/awk given conditions


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Find and replace single character w/awk given conditions
# 1  
Old 07-16-2015
Find and replace single character w/awk given conditions

I have a file that looks like this:

Code:
14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                          verb                      to observe
15656      KC                          verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

When there are two characters in $2, $3 does not line up when there are other strings in $3. That is to say, when there are three characters in $2, then there are 25 spaces until $3 should begin. However, then there are two characters in $2, then there are 26 spaces and this throws off the justification of both $3 and >=$4.

What I want to do is search for when $3 begins on the 40th character and delete a space so that it begins on the 39th.

Thus:

Code:
14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                         verb                      to observe
15656      KC                         verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

In order to do this, I've attempted this code awk code, but have had trouble combining conditional statements with substrings and substitutions.

Code:
gawk '{if(substr($0,39,1)==" " && $3 ~/verb/); sub(/^ verb/,"verb", $3);print}' file.txt

I've also tried this:

Code:
gawk '{if(substr($0,39,1)==" " && $3 ~/verb/); sub(substr($0,39,2)," ");print}' file.txt

...and this with a variable:

Code:
gawk '$3 ~/^verb$/{X=substr($0,39,1); sub(/ /,"",X)} 1 {print}' SEDRAt

Perhaps I'm going at this all wrong, but ideally what I'd like is all of my columns to line up, but since my last column will have multiple spaces in it, I've had difficulty executing printf(). Perhaps there is some iteration of FIXEDWIDTH that is escaping me. Nevertheless, I need to be able to learn how to effectively combine conditionals, substrings, and substitutions in awk so this is why I'm asking for help in this manner.

Thank you all so much.
# 2  
Old 07-16-2015
FIELDWIDTHS seemed to only work on splitting. Then needed printf format specifiers to output them pretty again. This strips whitespace from beginning of all fields.

Code:
mute@tiny:~$ gawk -v FIELDWIDTHS='11 27 26 999' '{for (i=1;i<=NF;i++)gsub(/^  *|  *$/,"",$i);printf("%-11s%-27s%-26s%s\n",$1,$2,$3,$4);}' input
14985      DPN                        verb                      PPa to spend.
12886      DPNDJN                                               bay tree.
15686      DQ                         verb                      to observe
15656      KC                         verb                      Pa to stay quiet
15835      KCJ                        verb                      Pp|PPa, PPp.

This User Gave Thanks to neutronscott For This Post:
# 3  
Old 07-16-2015
It's not FIXEDWIDTH but FIELDWIDTH. Try defining FIELDWIDTH="11 27 26 30" (last field some guess), and then sub (/^ /, "", $3).
This User Gave Thanks to RudiC For This Post:
# 4  
Old 07-16-2015
Thanks so much for this RudiC and sorry for the embarrassing miscue on FIELDWIDTH. While Neutronscott's code nails it, for some reason I cannot get your suggestion to work although it seems as though it should. I must confess that seem to mess up sub() and gsub() quite often. Here is my code, as per your suggestion:

Code:
gawk -v FIELDWIDTHS='11 27 26 999' '{sub(/^ /, "", $3); print}' input

Did I do something amiss?
# 5  
Old 07-16-2015
How about:
Code:
awk 'length($2)==2{sub($2 FS, $2)}1' file

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 07-17-2015
Quote:
Originally Posted by jvoot
Thanks so much for this RudiC and sorry for the embarrassing miscue on FIELDWIDTH. While Neutronscott's code nails it, for some reason I cannot get your suggestion to work although it seems as though it should. I must confess that seem to mess up sub() and gsub() quite often. Here is my code, as per your suggestion:

Code:
gawk -v FIELDWIDTHS='11 27 26 999' '{sub(/^ /, "", $3); print}' input

Did I do something amiss?
That will remove at most one leading space. I don't have gawk on my system and awk on my system doesn't support the FIELDWIDTHS variable, so I can't test anything using it. But, neutronscott's suggestion might be a good starting point. (Have you tried it?)

It would seem that adjusting field 3 (with:
Code:
sub(/^ */, "", $3)

not:
Code:
sub(/^ /, "", $3)

) would line up field 3; but might not fix alignment of field 4.

Assuming field widths for your 1st three fields of 11 characters, 27 characters, and 26 characters (guessing from jvoot's sample output), the following should work with any awk that adheres to the POSIX standards' requirements (implementing a subset of the gawk field widths option using standard awk features):
Code:
awk -v FIELDWIDTHS='11 27 26 999' '
BEGIN {	nf = split(FIELDWIDTHS, fw)
	sc = 1
	for(i = 1; i < nf; i++) {
		fs[i] = sc
		sc += fw[i]
	}
}
{	for(i = 1; i < nf; i++) {
		f[i] = substr($0, fs[i], fw[i])
		gsub(/^ *| *$/, "", f[i])
		printf("%-*s", fw[i], f[i])
	}
	f[i] = substr($0, sc)
	gsub(/^ *| *$/, "", f[i])
	print f[i]
}' file

This should line up all fields as long as data from one field doesn't spill over into following fields (other than <space>s). This will align any number of fields (not just 4) with their sizes specified by the list provided in the FIELDWIDTHS variable. The last field width specified is unimportant. This code assumes that the last field starts at the calculated starting column (based on earlier field widths) and runs to the end of the line; no trailing spaces are included in the output of the last output field. With jvoot's sample input, it produces the output jvoot requested.

Note that many implementations of awk do not adhere to the standards when using substr() on fields that contain multi-byte characters. (The standards say that start and length parameters count characters; some awk implementations count bytes instead. As long as you're only dealing with single-byte characters, both work correctly.)

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.
# 7  
Old 07-17-2015
Being an awk-ignorant i might be mistaken, but isn't the following much easier as a reformatting filter:

Code:
awk '$0=sprintf( "<some-format>\n", $1, $2, $3, $4);' /path/to/infile

For instance, to get the above alignment:

Code:
awk '$0=sprintf( "%-11s%-27s%-26s%s\n", $1, $2, $3, $4);' /path/to/infile

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find multiple strings and replace single string

Hi, following Perl code i used for finding multiple strings and replace with single string. code: #!/usr/bin/perl my @files = <*.txt>; foreach $fileName (@files) { print "$fileName\n"; my $searchStr = ',rdata\)' | ',,rdata\)' | ', ,rdata\)'; my $replaceStr =... (2 Replies)
Discussion started by: chettyravi
2 Replies

2. Shell Programming and Scripting

Find character and Replace character for given position

Hi, i want find the character '-' in a file from position 284-298, if it occurs i need to replace it with 'O ' for the position in the file. How to do that using SED command. thanks in advance, Sara (9 Replies)
Discussion started by: Sara183
9 Replies

3. Shell Programming and Scripting

Find and replace a character

Hi Team, i have 1st cloumn of data containing, LAMSBA01-BA-COFF-YTD LAMSBA01-BA-COFF-ITD LAMSBA01-BA-AGGR-IND . LAMSBA01-BA-CURR-COFF-BAL i need to replace the "-" to "_" (underscore) using AWK . please help me on this. Thanks, Baski (4 Replies)
Discussion started by: baskivs
4 Replies

4. Shell Programming and Scripting

awk - setting fs to equal any single character

Hi Does anyone know how to set any character as the field separator with awk/nawk on a solaris 10 box. I have tried using /./ regex but this doesnt work either and im out of ideas. thanks (7 Replies)
Discussion started by: chronics
7 Replies

5. Shell Programming and Scripting

Replace multiple occurances of same character with a single character.

Hi all, Greetings, I have the following scenario, The contents of main file are like : Unix|||||forum|||||||||||||||is||||||the||best so||||||be|||||on||||||||||||||||||||||||||||||||||||||||||||it And i need the output in the following form: Unix=forum=is=the=best so=be=on=it ... (3 Replies)
Discussion started by: dipanchandra
3 Replies

6. UNIX for Dummies Questions & Answers

find single quote in a string and replace it

Hi, I have variable inside shell script - from_item. from_item = 40.1'1/16 i have to first find out whether FROM_ITEM contains single quote('). If yes, then that need to be replace with two quotes (''). How to do it inside shell script? Please note that inside shell script........ (4 Replies)
Discussion started by: yogichavan
4 Replies

7. Shell Programming and Scripting

How do you print a single quote character in AWK

How do you print out a single quote character in AWK? Using the escape character does not seem to work. {printf "%1$s %2$s%3$s%2$s\n" , "INCLUDE", " \' ", "THIS" } does not work. Any suggestions? (6 Replies)
Discussion started by: cold_Que
6 Replies

8. Shell Programming and Scripting

Script to multiple find and replace in a single file

Dear all I need a script for multiple find and replace in a single file. For example input file is - qwe wer ert rty tyu asd sdf dgf dfg fgh qwe wer det rtyyui jhkj ert asd asd dfgd now qwe should be replace with aaaaaa asd should be replace with bbbbbbbb rty should be replace... (6 Replies)
Discussion started by: wildhorse
6 Replies

9. Shell Programming and Scripting

AWK: replace single positional character given variables

I already have accomplished this task using sed and arrays, but since I get the variable using awk, I figured I'd ask this question and maybe I can get a cleaner solution using strictly awk.. I just can't quite grasp it in awk. Story: I'm automating the (re)configuration of network interfaces,... (3 Replies)
Discussion started by: System Shock
3 Replies

10. Shell Programming and Scripting

Matching multiples of a single character using sed and awk

Hi, I have a file 'imei_01.txt' having the following contents: $ cat imei_01.txt a123456 bbr22135 yet223 where I want to check whether the expression 'first single alphabet followed by 6 digits' is present in the file (here it is the first record 'a123456') I am using the following... (5 Replies)
Discussion started by: royalibrahim
5 Replies
Login or Register to Ask a Question