awk modify multiple columns with pipes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk modify multiple columns with pipes
# 1  
Old 09-20-2010
awk modify multiple columns with pipes

Hello,

I have a CSV-like dataset where some of the columns contain HTML snippets which I need to convert to XHTML. For any given snippet, I have a functioning config for the text processor 'tidy' such that
Code:
tidy -config tidy.cfg example.html

does the job I need done.

I would like to process the whole dataset this way, using something like the following (incorrect) method:
Code:
awk 'BEGIN{FS="`";RS="|";OFS="`";ORS="|"}{print $1,$2,$3,$4|"tidy -config tidy.cfg",$5|"tidy -config tidy.cfg",$6,$7}' infile.csv > outfile.csv

(Notice the unusual field and record separators; my data contains newlines that are not record separators.)

The above command throws an error at ',$5', and I'm thinking that is because I need to close each pipe before moving on.

Another possibility would be to output just the columns that I am transforming one at at time to separate files and then somehow glue the columns back together with awk and getline statements, but I'm too new to awk to know for sure how to do that.

Any help?
Thanks,
Brian
# 2  
Old 09-20-2010
Include the pipe in print statement within the quotes and let us know if you still have problems
# 3  
Old 09-20-2010
Quote:
Originally Posted by anbu23
Include the pipe in print statement within the quotes and let us know if you still have problems
Thank you for the reply.

I tried this
Code:
awk 'BEGIN{FS="`";RS="|";OFS="`";ORS="|"}{print $1,$2,$3,$4"|tidy -config tidy.cfg",$5"|tidy -config tidy.cfg",$6,$7}' infile.csv > outfile.csv

and this
Code:
awk 'BEGIN{FS="`";RS="|";OFS="`";ORS="|"}{print $1,$2,$3,"print $4|tidy -config tidy.cfg","print $5|tidy -config tidy.cfg",$6,$7}' infile.csv > outfile.csv

And that just results in a literal "|tidy -config tidy.cfg" (or "print $4|tidy -config tidy.cfg") appearing in the output.

Brian
# 4  
Old 09-20-2010
CSV files are comma separated files. Then FS should be set to comma.

If you want to have comma separated output then set OFS to comma
# 5  
Old 09-20-2010
Quote:
Originally Posted by anbu23
CSV files are comma separated files. Then FS should be set to comma.

If you want to have comma separated output then set OFS to comma
It is a little odd, but I said "CSV-like" - I know that I am using "`" for a FS and "|" for a RS, but this is not the source of my problem.

Last edited by bstamper; 09-20-2010 at 04:05 PM..
# 6  
Old 09-20-2010
Can you post sample input?
# 7  
Old 09-20-2010
Sample input:
Code:
50`Byrd Polar Research Center``<p>Named in honor of one of America's most famous explorers, the Byrd Polar Research Center of The Ohio State University is recognized internationally as a leader in polar and alpine research. The Center's research programs are conducted throughout the world.</p>`12094``|
53`Ornamental Plant Germplasm Center``The OPGC conserves, assesses and distributes herbaceous ornamental plant germplasm and develops new techniques for conserving seed and clonally propagated germplasm.`12493``|
52`Latin American History``<p>Latin American history is well represented in the OSU Department of History with specialists in Colonial Andean, Argentine and Mexican history and Latino/a history. Thematic emphases include economic history, gender and sexuality studies, race and ethnicity, and revolutionary societies. </p>```|
45`Food, Agricultural, and Environmental Sciences, College of```175839``|
49`American Indian Studies at The Ohio State University``<p>American Indian Studies respects the importance of native protocol. We acknowledge that we are in the world on North America, in central Ohio on the traditional homeland of the Shawnee Nation under the guidance of Our Grandmother (Kokumthena), in a refuge of the Delaware under Kishaylamukong, and with the fire of the Wyandots under Ataentsik.  Through them we belong in Ohio. </p><br>```Connect to:<br>
<a href="http://www.americanindianstudies.osu.edu/">American Indian Studies</a>|
33`OSU Newark``<P>OSU Newark offers general studies coursework applicable to all undergraduate degree programs at The Ohio State University. In addition, we offer upper-division courses in several departments. Entire degree programs may be completed at the Newark Campus in several areas.</P>```|

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

2. Shell Programming and Scripting

Comparing multiple columns using awk

Hello All; I have two files with below conditions: 1. Entries in file A is missing in file B (primary is field 1) 2. Entries in file B is missing in file A (primary is field 1) 3. Field 1 is present in both files but Field 2 is different. Example Content: File A ... (4 Replies)
Discussion started by: mystition
4 Replies

3. Shell Programming and Scripting

Awk: is it possible to print into multiple columns?

Hi guys, I have hundreds file like this, here I only show two of them: file 1 feco4_s_BB95.log ZE_1=-1717.5206260 feco4_t_BB95.log ZE_1=-1717.5169250 feco5_s_BB95.log ZE_1=-1830.9322060... (11 Replies)
Discussion started by: liuzhencc
11 Replies

4. Shell Programming and Scripting

Extracting multiple columns with awk

Hi everyone!! I need to apply a simple command to extract columns from a matrix, but I need to extract contemporary from the first to the tenth columns, than from the eleventh to the twentyth and so on... how can i do that? (1 Reply)
Discussion started by: gabrysfe
1 Replies

5. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Hi, Input 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr21.fa chr3.fa 7488 7389 chr1.fa chr1.fa 3546 9887 chr9.fa chr5.fa 7898 7387 chrX.fa chr3.fa Desired Output 7488 7389 chr1.fa chr1.fa 2 3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

6. Shell Programming and Scripting

Awk if-else syntax with multiple columns

I can't seem to get this to work. I can reformat the date field if it's the first field (and only field) in the file: However, I get a syntax error when the date field is the second field (or has any other columns following): I can use a ";" but then it puts each column on separate... (8 Replies)
Discussion started by: giannicello
8 Replies

7. Shell Programming and Scripting

Generating multiple new columns with awk

Hi, I'm trying to reformat a file to create a new columns reflecting the previous 2 over and over. By that I mean currently each observation has two columns and I want to create a third which has a value equal to 1 minus the sum of the previous two. This is slightly complicated as 1) I... (6 Replies)
Discussion started by: reformatplink
6 Replies

8. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

9. Shell Programming and Scripting

Multiple pipes toward a single awk command

Hello, I would like to pipe two variables into awk, but I don't know how to do. Each variable, "a" and "b", are in fact a list of data. They are not files. So to get awk to work with it I am using: echo $a | awk 'FNR==NR{print $1}FNR!=NR{print $4}' The above works, but when I am... (5 Replies)
Discussion started by: jolecanard
5 Replies

10. Shell Programming and Scripting

AWK subtraction in multiple columns

AWK subtraction in multiple columns Hi there, Can not get the following: input: 34523 934 9485 3847 394 3847 3456 9384 awk 'NR==1 {for (i = 1; i <= NF; i++) {n=$i; next}; {n-=$i} END {print n}' input output: 21188 first column only,... (2 Replies)
Discussion started by: awkward
2 Replies
Login or Register to Ask a Question