Need help with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help with awk
# 1  
Old 05-30-2016
Hammer & Screwdriver Need help with awk

Hi Experts,

I am trying to process a CSV file with awk.

One of the columns (fields) in CSV file is an XML (which contains white space and new lines)

When I print/extract this column ($2) for processing, it only prints the first line of that multi line content. Below is the awk command I am using

Code:
awk -F, '{for (i=1;i<=1;i++) if ( $1 ~/xyz/ && $2 ~/abc/ ) print $2}'

How can I make sure that ask Field Separator ignores spaces/new lines and only separates fields based on the value provided ","

Appreciate your help

Last edited by Scrutinizer; 05-30-2016 at 10:51 AM.. Reason: code tags
# 2  
Old 05-30-2016
What does your input look like?

You awk script is equivalent to this:
Code:
awk -F, '$1 ~/xyz/ && $2 ~/abc/{print $2}'

-F, ensures that a comma is used as field separator and it should work for every line in your input file and it should print $2 for every line where the condition $1 ~/xyz/ && $2 ~/abc/ is met..


--- Edit ---
If you file has a multiline $2 then additional processing needs to be done. I depends on what kind of CSV format you use..

Last edited by Scrutinizer; 05-30-2016 at 11:34 AM..
# 3  
Old 05-30-2016
Hammer & Screwdriver Sample input

Attached text file shows how my input looks like (I could not copy here as it exceeds to many icon characters)

As you can see $2 has some spaces/new lines. And AWK does not parse the entire content as big text. Instead it only returns below

Code:
<soapenv:Envelope xmlns:soapenv=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:cai3=""http://schemas.centra.com/cai3g1.2/"" xmlns:opt=""http://schemas.centra.com/""><soapenv:Header><cai3:SessionId>abc_4d728b7db4cf4950b660c37c0ccad7bd</cai3:SessionId><cai3:TransactionId>1605271108449435658</cai3:TransactionId></soapenv:Header><soapenv:Body>


Last edited by Scrutinizer; 05-30-2016 at 11:54 AM..
# 4  
Old 05-30-2016
Quote:
Originally Posted by sshark
Attached text file shows how my input looks like (I could not copy here as it exceeds to many icon characters)

As you can see $2 has some spaces/new lines. And AWK does not parse the entire content as big text. Instead it only returns below

Code:
<soapenv:Envelope xmlns:soapenv=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:cai3=""http://schemas.centra.com/cai3g1.2/"" xmlns:opt=""http://schemas.centra.com/""><soapenv:Header><cai3:SessionId>abc_4d728b7db4cf4950b660c37c0ccad7bd</cai3:SessionId><cai3:TransactionId>1605271108449435658</cai3:TransactionId></soapenv:Header><soapenv:Body>

Your input is quite inconsistent. What would it be your expectation of output? Could you post how would you like the output to look like?

Also, what would it constitute a record (a line) in your thinking? Meaning where does a line begins and where does it end?
# 5  
Old 05-30-2016
Expected output

Pls see attached text file

A line or a field shall only be comma (,) separated. White spaces, new lines shall not be considered as delimiters
# 6  
Old 05-31-2016
Quote:
Originally Posted by sshark
Pls see attached text file

A line or a field shall only be comma (,) separated. White spaces, new lines shall not be considered as delimiters
The newline is never considered a field separator by default, but rather a record separator, or in other way, how much from the file is going to read before subdividing even into smaller parts, called fields.

Nevertheless, this match your posted output.
Code:
gawk '{gsub("\n", "")} p ~ /xyz/ && /abc/; {p=$0}' RS="," sshark.input

# 7  
Old 05-31-2016
Your sample input and your sample output are non consistent. Field 2 in your comma separated input file contains two <newline> characters (which you say are not field delimiters and are part of the field), but there are no <newline> in the middle of the single line sample output that you say you are trying to get???

Furthermore, most CSV files contain more than one record and contain a fixed number of fields. But, apparently we are to assume that your input file is always exactly one record consisting of an unspecified number of fields. And, your sample input file (input.txt), contains exactly four input fields with the 1st field containing:
Code:
"xyz

the 2nd field containing:
Code:
<soapenv:Envelope xmlns:soapenv=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:cai3=""http://schemas.centra.com/cai3g1.2/"" xmlns:opt=""http://schemas.centra.com/""><soapenv:Header><cai3:SessionId>abc_4d728b7db4cf4950b660c37c0ccad7bd</cai3:SessionId><cai3:TransactionId>1605271108449435658</cai3:TransactionId></soapenv:Header><soapenv:Body>
      <ns2:Set xmlns:ns2=""http://schemas.centra.com/cai3g1.2/"" xmlns:neo=""http://schemas.centra.com/index/""><ns2:MOType>neoSubscription@http://schemas.centra.com/neo/</ns2:MOType><ns2:MOId><neo:imsi>709015700190174</neo:imsi></ns2:MOId><ns2:MOAttributes><neo:SetneoSubscription><neo:SetemsSubscription imsi=""709015700190174"" msisdn=""700345234452""><neo:gprs pciid=""1""><neo:pciid>1</neo:pciid><neo:docid>20</neo:docid><neo:eqosid>10</neo:eqosid><neo:pcich>2</neo:pcich><neo:vpaa>0</neo:vpaa></neo:gprs><neo:gprs pciid=""2""><neo:pciid>2</neo:pciid><neo:docid>10</neo:docid><neo:eqosid>1</neo:eqosid><neo:pcich>2</neo:pcich><neo:vpaa>0</neo:vpaa></neo:gprs><neo:gprs pciid=""3""><neo:pciid>3</neo:pciid><neo:docid>50</neo:docid><neo:eqosid>1</neo:eqosid><neo:pcich>2</neo:pcich><neo:vpaa>0</neo:vpaa></neo:gprs><neo:baoc><neo:provisionState>1</neo:provisionState><neo:ts10><neo:activationState>0</neo:activationState></neo:ts10></neo:baoc><neo:cfb><neo:provisionState>1</neo:provisionState><neo:ts10><neo:activationState>0</neo:activationState></neo:ts10></neo:cfb><neo:cfnrc><neo:provisionState>1</neo:provisionState><neo:ts10><neo:activationState>0</neo:activationState></neo:ts10></neo:cfnrc><neo:cfnry><neo:provisionState>1</neo:provisionState><neo:ts10><neo:activationState>0</neo:activationState></neo:ts10></neo:cfnry><neo:caw><neo:provisionState>1</neo:provisionState><neo:ts10><neo:activationState>1</neo:activationState></neo:ts10></neo:caw><neo:acc>3</neo:acc><neo:hold>1</neo:hold><neo:mpty>1</neo:mpty><neo:ofa>1</neo:ofa><neo:tick>35</neo:tick><neo:ts11>1</neo:ts11><neo:ts21>1</neo:ts21><neo:ts22>1</neo:ts22><neo:rsa>0</neo:rsa><neo:schar>0</neo:schar></neo:SetemsSubscription><neo:CreateserviceSC imsi=""709015700190174""><neo:imsi>709015700190
    </soapenv:Body></soapenv:Envelope>

the 3rd field containing:
Code:
<S:Envelope xmlns:S=""http://schemas.xmlsoap.org/soap/envelope/""><S:Header><SessionId xmlns=""http://schemas.centra.com/cai3g1.2/"">abc_4d728b7db4cf4950b660c37c0ccad7bd</SessionId><TransactionId xmlns=""http://schemas.centra.com/cai3g1.2/"">1605271108449435658</TransactionId><SequenceId xmlns=""http://schemas.centra.com/cai3g1.2/"" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" psi:nil=""true""/></S:Header><S:Body><SetResponse xmlns=""http://schemas.centra.com/cai3g1.2/""/></S:Body></S:Envelope>

and the 4th field containing:
Code:
"

(Note that the 4th field consists only of one <double-quote> character. Note also that neither of your sample *.txt files are text files (since the last line in both files is incomplete because neither file is terminated by a <newline> character). And, finally, note that the line separators in field 2 in input.txt are DOS <carriage-return><newline> character pairs; not just the normal UNIX line terminating <newline> characters.)

So, if ONLY commas are field separators, why are some whitespace characters in field 2 in the input removed from your desired output?

Why are you calling these text files if they do not meet the qualifications of a text file?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk output yields error: awk:can't open job_name (Autosys)

Good evening, Im newbie at unix specially with awk From an scheduler program called Autosys i want to extract some data reading an inputfile that comprises jobs names, then formating the output to columns for example 1. This is the inputfile: $ more MapaRep.txt ds_extra_nikira_usuarios... (18 Replies)
Discussion started by: alexcol
18 Replies

2. Shell Programming and Scripting

Pass awk field to a command line executed within awk

Hi, I am trying to pass awk field to a command line executed within awk (need to convert a timestamp into formatted date). All my attempts failed this far. Here's an example. It works fine with timestamp hard-codded into the command echo "1381653229 something" |awk 'BEGIN{cmd="date -d... (4 Replies)
Discussion started by: tuxer
4 Replies

3. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

4. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

5. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

7. Shell Programming and Scripting

Problem with awk awk: program limit exceeded: sprintf buffer size=1020

Hi I have many problems with a script. I have a script that formats a text file but always prints the same error when i try to execute it The code is that: { if (NF==17){ print $0 }else{ fields=NF; all=$0; while... (2 Replies)
Discussion started by: fate
2 Replies

8. Shell Programming and Scripting

awk: assign variable with -v didn't work in awk filter

I want to filter 2nd column = 2 using awk $ cat t 1 2 2 4 $ VAR=2 #variable worked in print $ cat t | awk -v ID=$VAR ' { print ID}' 2 2 # but variable didn't work in awk filter $ cat t | awk -v ID=$VAR '$2~/ID/ { print $0}' (2 Replies)
Discussion started by: honglus
2 Replies

9. Shell Programming and Scripting

scripting/awk help : awk sum output is not comming in regular format. Pls advise.

Hi Experts, I am adding a column of numbers with awk , however not getting correct output: # awk '{sum+=$1} END {print sum}' datafile 2.15291e+06 How can I getthe output like : 2152910 Thank you.. # awk '{sum+=$1} END {print sum}' datafile 2.15079e+06 (3 Replies)
Discussion started by: rveri
3 Replies

10. Shell Programming and Scripting

Awk problem: How to express the single quote(') by using awk print function

Actually I got a list of file end with *.txt I want to use the same command apply to all the *.txt Thus I try to find out the fastest way to write those same command in a script and then want to let them run automatics. For example: I got the file below: file1.txt file2.txt file3.txt... (4 Replies)
Discussion started by: patrick87
4 Replies
Login or Register to Ask a Question