How to lowercase the values in a column in awk and include a dynamic counter?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to lowercase the values in a column in awk and include a dynamic counter?
# 1  
Old 10-30-2015
How to lowercase the values in a column in awk and include a dynamic counter?

Hi,
I am trying to incorporate 2 functions into my `awk` command.
I want to lower case Column 2 (which is essentially the same information in Col1, except in Col1 I want to maintain the capitalization) and I want to count from 0-N that begins and ends with the start of certain markers that I have.

The data (tab-separated) currently looks like this:

Code:
<s>
He	PRP	-
could	MD	-
tell	VB	-
she	PRP	-
was	VBD	-
teasing	VBG	-
him	PRP	-
.	.	.
</s>
<s>
He	PRP	-
kept	VBD	-
his	PRP$	-
eyes	NNS	-
closed	VBD	-
,	,	-
but	CC	-
he	PRP	-
could	MD	-
feel	VB	-
himself	PRP	-
smiling	VBG	-
.	.	.
</s>

The ideal output would be like this:

Code:
<s>
He	he	PRP	1
could	could	MD	2
tell	tell	VB	3
she		she	PRP	4
was	was	VBD		5
teasing	teasing	VBG	6
him	him	PRP	7
.	.	.	8
</s>
<s>
He	he	PRP	1-
kept	kept	VBD	2
his	his	PRP$	3
eyes	eyes	NNS	4
closed	closed	VBD	5
,	,	,	6
but	but	CC	7
he	he	PRP	8
could	could	MD	9
feel	feel	VB	10
himself		himself	PRP	11
smiling	smiling	VBG	12
.	.	.	13
</s>

The 2-step `awk` that I am trying that does not work is this:

Step 1:
Code:
awk '!NF{$0=x}1' input | awk '{$1=$1; print "<s>\n" $0 "\t.\n</s>"}' RS=  FS='\n' OFS='\t-\n' > output

Here, I do not know how to make the "-" into a counter

and Step 2 (which directly gives me an error):
Code:
awk '{print $1 "\t" '$1 = tolower($1)' "\t" $2 "\t" $3}' input > output

Any suggestions 1. on how to solved the lower and counter and 2. if it is possible to combine these two steps?

Thank you in advance
# 2  
Old 10-30-2015
Code:
awk '/^<s>/{i=0;print $1;next}
/<\/s>/{print $1;next}
{tmp=tolower($1); print $1,tmp,$2,i++;}' OFS="\t" filename

# 3  
Old 10-30-2015
Not sure how to reconcile your written spec and the sampe output. Do you mean you want to insert a field by copying tolower($1) between $1 and $2? And, the count info should be the number of lines between <s> and </s>?

---------- Post updated at 13:08 ---------- Previous update was at 12:57 ----------

Assuming above thoughts to be true, try
Code:
awk '
        {if ($1 ~ "<\/?s>") ST = NR
         else   {$1=$1 OFS tolower($1)
                 $3=NR-ST
                }
        }
1
' OFS="\t" file
<s>
He      he      PRP     1
could   could   MD      2
tell    tell    VB      3
she     she     PRP     4
was     was     VBD     5
teasing teasing VBG     6
him     him     PRP     7
.       .       .       8
</s>
<s>
He      he      PRP     1
kept    kept    VBD     2
his     his     PRP$    3
eyes    eyes    NNS     4
closed  closed  VBD     5
,       ,       ,       6
but     but     CC      7
he      he      PRP     8
could   could   MD      9
feel    feel    VB      10
himself himself PRP     11
smiling smiling VBG     12
.       .       .       13
</s>

# 4  
Old 11-02-2015
Quote:
Originally Posted by RudiC
Not sure how to reconcile your written spec and the sampe output. Do you mean you want to insert a field by copying tolower($1) between $1 and $2? And, the count info should be the number of lines between <s> and </s>?

---------- Post updated at 13:08 ---------- Previous update was at 12:57 ----------

Assuming above thoughts to be true, try
Code:
awk '
        {if ($1 ~ "<\/?s>") ST = NR
         else   {$1=$1 OFS tolower($1)
                 $3=NR-ST
                }
        }
1
' OFS="\t" file
<s>
He      he      PRP     1
could   could   MD      2
tell    tell    VB      3
she     she     PRP     4
was     was     VBD     5
teasing teasing VBG     6
him     him     PRP     7
.       .       .       8
</s>
<s>
He      he      PRP     1
kept    kept    VBD     2
his     his     PRP$    3
eyes    eyes    NNS     4
closed  closed  VBD     5
,       ,       ,       6
but     but     CC      7
he      he      PRP     8
could   could   MD      9
feel    feel    VB      10
himself himself PRP     11
smiling smiling VBG     12
.       .       .       13
</s>

This solution words great for ascii characters. However, I have some characters that are non-ascii and they do not convert correctly when using
Code:
 tolower

. For instance, this is what happened:

Code:
<s>
Pero	pero	cc	0
lo	lo	da0000	1
más	m?s	rg	2
importante	importante	aq0000	3
,	,	fc	4
no	no	rn	5
sólo	s?lo	rg	6
desde	desde	sp000	7
la	la	da0000	8
visión	visi?n	nc0s000	9
de	de	sp000	10
una	una	di0000	11
parte	parte	nc0s000	12
</s>

Is there a way to maintain the non-ascii character when using
Code:
 tolower

?
# 5  
Old 11-02-2015
With my awk version, tolowerdoesn't replace the non-ASCII char with a queston mark, but just returns the char as is. Unfortunately, I didn't find an awk solution for your problem. But, with bash, this might work:
Code:
X="más sólo"
echo ${X^^}
MÁS SÓLO
Y=${X^^}
echo ${Y,,}
más sólo

This User Gave Thanks to RudiC For This Post:
# 6  
Old 11-02-2015
It looks like there is a mismatch between the Locale that was used when creating the file and the Locale used when running your awk script.

Try running your awk script again with the LC_CTYPE environment variable set to a locale that uses the same character set used to write your file and that contains the accented characters in your file in class alpha.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 11-12-2015
Quote:
Originally Posted by Don Cragun
It looks like there is a mismatch between the Locale that was used when creating the file and the Locale used when running your awk script.

Try running your awk script again with the LC_CTYPE environment variable set to a locale that uses the same character set used to write your file and that contains the accented characters in your file in class alpha.
I know this is a zombie thread, I just wanted to mention that setting the
HTML Code:
LC
variable to UTF-8 completely solved the problem. Thank you!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Compare two files column values using awk

Judi # cat File1 judi /export/home 76 judi /usr 83 judi # judi # cat File2 judi /export/home 79 judi /usr 82 judi # if COLUMN3 of File2 is greater that COLUMN3 of File1, then print File2's lines juid /export/home 79 Code tags please (2 Replies)
Discussion started by: judi
2 Replies

3. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

4. UNIX for Dummies Questions & Answers

awk for concatenation of column values

Hello, I have a table as shown below. I want to concatenate values in col2 and col3 based on a value in col4. 1 X Y A 3 Y Z B 4 A W B 5 T W A If col4 is A, then I want to concatenate col3 with itself. Otherwise it should concateneate col2 with col3. 1 X Y YY 3 Y Z YZ... (10 Replies)
Discussion started by: Gussifinknottle
10 Replies

5. UNIX for Dummies Questions & Answers

Compare values of fields from same column with awk

Hi all ! If there is only one single value in a column (e.g. column 1 below), then return this value in the same output column. If there are several values in the same column (e.g. column 2 below), then return the different values separated by "," in the output. pipe-separated input: ... (11 Replies)
Discussion started by: lucasvs
11 Replies

6. Shell Programming and Scripting

AWK "make a new column that include increasing numbers"

please help!!!!!! I have a file .txt that has only one column like that: 34.1 35.5 35.6 45.6 ... Now, i want to add a column in the left in which the values of this column increase by 0.4 , for example: 0.0 34.1 0.4 35.5 0.8 35.6 1.2 45.6 How can i do with awk instructions??? ... (2 Replies)
Discussion started by: tienete
2 Replies

7. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

8. Shell Programming and Scripting

how to include the missing column in the original file using awk

Hi Experts, The content of the raw file: date,nomsgsent,nomsgnotdeliver,nomsgdelay 201003251000,1000,1,2 201003251000,900,0,0 201003251000,1450,0,0 201003251000,1230,0,0 However, sometimes, the column will missing in the raw files: e.g. date,nomsgsent,nomsgdelay... (8 Replies)
Discussion started by: natalie23
8 Replies

9. Shell Programming and Scripting

How to pick values from column based on key values by usin AWK

Dear Guyz:) I have 2 different input files like this. I would like to pick the values or letters from the inputfile2 based on inputfile1 keys (A,F,N,X,Z). I have done similar task by using awk but in that case the inputfiles are similar like in inputfile2 (all keys in 1st column and values in... (16 Replies)
Discussion started by: repinementer
16 Replies

10. Shell Programming and Scripting

averaging column values with awk

Hello. Im just starting to learn awk so hang in there with me...I have a large text file formatted as such everything is in a single column ID001 value 1 value 2 value....n ID002 value 1 value 2 value... n I want to be able to calculate the average for values for each ID from the... (18 Replies)
Discussion started by: johnmillsbro
18 Replies
Login or Register to Ask a Question