parsing data for certain conditions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting parsing data for certain conditions
# 1  
Old 06-05-2009
parsing data for certain conditions

Hi guys,

I have got this working OK but I am sure there is a more efficient/elegant way of doing it, which I hope you can help me with.

It can be done in whatever is most suitable i.e perl/awk..

Any suggestions are welcome and many thanks in advance.

What I require is to extract the first field using " as the FS upto the last . in that field. Sometimes there are several . in that field.
The second field is from the last . to the first "
The third field is from the first " to the | removing spaces.

This output is only required if the third field using the " as FS is blank, and the second field upto the | has data present.


Below is an example of all variants of the data I have in a file 800000+ rows.

Quote:
CH0045775191=UBSL.RDN_EXCHD2 " | CH0045775191=UBSL.RDN_EXCHD2 "phil
CH0045775191=UBSL.TILE_DESC " | CH0045775191=UBSL.TILE_DESC "
CH0024226190=UBSL.ISSUE_DATE " | CH0024226190=UBSL.ISSUE_DATE "
CH0024226190=UBSL.CONV_TEXT "G VANKE | CH0024226190=UBSL.CONV_TEXT "
CH0024226190=UBSL.GEN_VAL1 "+16.56 | CH0024226190=UBSL.GEN_VAL1 "J0shua
CH0032678747.UBS.MKT_MKR_NM "govindva | CH0032678747.UBS.MKT_MKR_NM "
This is the output using the above input.

Quote:
CH0024226190=UBSL , CONV_TEXT , G VANKE
CH0032678747.UBS , MKT_MKR_NM , govindva
Code:
#!/bin/bash
IFS='"'
while read line
do
test1=`echo "$line" | awk -F'"' '{print $1}'`
test2=`echo "$line" | awk -F '[|]' '{print $(NF-1)}' | awk -F'"' 'BEGIN {OFS=","} {print $2}'|awk '{$1=$1;print}'`
test3=`echo "$line" | awk -F'"' '{print $3}'|awk '{$1=$1;print}'`
        if [[ -n "${test2}" && -z "${test3}" ]]; then
        FID=`echo  "${test1}"|awk -F"." '{ gsub(/-/,"",$0); for ( i = NF; i > 0; i-- ) printf("%s ",$i); printf("\n");}'| awk -F" " '{print $1}'`
        RIC=`echo  "${test1}"|sed -e 's/'.${FID}'//g'`
        echo "$RIC , $FID , $test2" >> philout
        else
        echo "false"
        fi
done < head_out_orig_phil

Cheers Phil.
# 2  
Old 06-05-2009
Quote:
Originally Posted by PAW
I have got this working OK but I am sure there is a more efficient/elegant way of doing it, which I hope you can help me with.

It can be done in whatever is most suitable i.e perl/awk..

You are calling awk 6,400,000+ times, and sed 800,000+ times.

With 800000+ rows, you need awk, but you only need one call to awk, not eight (including one that does nothing) and one to sed for every line of the file.

Here's a start to an awk script:

Code:
awk -F'"' '
    {
     test1 = $1
     fields = split($0,a,"|")
     test2 = a[fields - 1]
     test3 = $3

     if ( length(test2) > 0 && length(test3) == 0 ) ...
}
' head_out_orig_phil

# 3  
Old 06-05-2009
OK thanks, Ill give it a go.
# 4  
Old 06-05-2009
Quote:
Originally Posted by PAW
...I am sure there is a more efficient/elegant way of doing it, which I hope you can help me with.

It can be done in whatever is most suitable i.e perl/awk..

Any suggestions are welcome ...
Here's one way to do it in perl:

Code:
$
$ cat data.txt
CH0045775191=UBSL.RDN_EXCHD2 " | CH0045775191=UBSL.RDN_EXCHD2 "phil
CH0045775191=UBSL.TILE_DESC " | CH0045775191=UBSL.TILE_DESC "
CH0024226190=UBSL.ISSUE_DATE " | CH0024226190=UBSL.ISSUE_DATE "
CH0024226190=UBSL.CONV_TEXT "G VANKE | CH0024226190=UBSL.CONV_TEXT "
CH0024226190=UBSL.GEN_VAL1 "+16.56 | CH0024226190=UBSL.GEN_VAL1 "J0shua
CH0032678747.UBS.MKT_MKR_NM "govindva | CH0032678747.UBS.MKT_MKR_NM "
$
$
$
$ perl -ne 'split/["\|]/;
>  if ($_[3] =~ /^\s*$/ && $_[1] !~ /^\s*$/ && $_[0] =~ /^(.*)\.([^.]*?) /) {
>   print "$1 , $2 , $_[1]\n" }' data.txt
CH0024226190=UBSL , CONV_TEXT , G VANKE
CH0032678747.UBS , MKT_MKR_NM , govindva
$
$

tyler_durden
# 5  
Old 06-05-2009
Thanks for contribution Tyler.
# 6  
Old 06-18-2009
Hi guys,

OK, the Awk script has a problem whereby it is providing an output from the test when there is no characters so I am presuming it is spaces/tabs.
Can you help with this?

Code:
#!/bin/bash
awk -F'"' '
    {
     test1 = $1
     test2 = $2
     fields = split(test2,a,"|")
     test4 = a[fields - 1]
     test3 = $3
     if ( length(test4) > 0 && length(test3) == 0 )   print test4 ; else print "
fail"
}
' head_out_orig_phil

Output
Quote:
fail

G VANKE
fail
govindva
Using the original input file.
Quote:
CH0045775191=UBSL.RDN_EXCHD2 " | CH0045775191=UBSL.RDN_EXCHD2 "phil
CH0045775191=UBSL.TILE_DESC " | CH0045775191=UBSL.TILE_DESC "
CH0024226190=UBSL.ISSUE_DATE " | CH0024226190=UBSL.ISSUE_DATE "
CH0024226190=UBSL.CONV_TEXT "G VANKE | CH0024226190=UBSL.CONV_TEXT "
CH0024226190=UBSL.GEN_VAL1 "+16.56 | CH0024226190=UBSL.GEN_VAL1 "J0shua
CH0032678747.UBS.MKT_MKR_NM "govindva | CH0032678747.UBS.MKT_MKR_NM "

With the perl script it works OK, apart from on a couple of lines it fails due to a line with the highlighted character. Have you any ideas and if you could put some comments regards this script I would appreciate it. My perl is not too good.


Code:
CH0042237526=UBSL.GNTXT14_5 "CH0042237526Â                  | CH0042237526=UBSL.GNTXT14_5 "

Many thanks for your help

Phil.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Shell script to extract data from csv file based on certain conditions

Hi Guys, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (1 Reply)
Discussion started by: Vivekit82
1 Replies

2. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Hi folks I have a script I wrote that basically parses a bunch of config and xml files works out were to add in the new content then spits out the data into a new file. It all works - apart from the xml and config file format in the new file with XML files the original XML (that ends up in... (2 Replies)
Discussion started by: dfinch
2 Replies

3. Shell Programming and Scripting

[Script] Conditions on parsing file

Hello, I open a new POST, i consider that this is resolved https://www.unix.com/shell-programming-scripting/215803-create-file-comment-script.html But i wish improve it. In case 1, I would like to test the input file $1. If $1 exist with no parameters but only comments, then send a message... (2 Replies)
Discussion started by: amazigh42
2 Replies

4. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Experts , Below is the data: --- Physical volumes --- PV Name /dev/dsk/c1t2d0 VG Name /dev/vg00 PV Status available Allocatable yes VGDA 2 Cur LV 8 PE Size (Mbytes) 8 Total PE 4350 Free PE 2036 Allocated PE 2314 Stale PE 0 IO Timeout (Seconds) default --- Physical volumes ---... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

take a section of a data with conditions

I have a data file like below: 2011 0701 2015 21.2 L 37.692 46.202 18.0 Teh 4 0.3 2.1 LTeh 1 GAP=233 E Iranian Seismological Center, Institute of Geophysics, University of Tehran 6 STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7 TBZ SN EPg 0 2015 31.19 -0.3... (3 Replies)
Discussion started by: saeed.soltani
3 Replies

6. Shell Programming and Scripting

Data parsing

Hi, I do have a data file which is divided into compartments by ---------. I would like to extract (parse) some of the data and numbers either using awk or sed The file has the format: CATGC Best GO enrichment: Genes/ORF that have the motifs (genes are sorted by max(pa+pd+po)): ... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

7. Shell Programming and Scripting

Parsing the data

Hi friends, I need to parse the following data in the given format and get the desired output. I need a function, which takes the input as a parameter and the desired output will be returned from the function. INPUT(single parameter as complete string) A;BCF;DFG;FD ... (3 Replies)
Discussion started by: sumesh.1988
3 Replies

8. Shell Programming and Scripting

Organization data based on two conditions applied problem asking...

Input file: HS04636 type header 836 7001 ID=g1 HS04636 type status 836 1017 Parent=g1.t1 HS04636 type location 966 1017 ID=g1.t1.cds;Parent=g1.t1 HS04636 type location 1818 1934 ID=g1.t1.cds;Parent=g1.t1 HS04636 type status 1818... (8 Replies)
Discussion started by: patrick87
8 Replies

9. Shell Programming and Scripting

Parsing data

Hi all , I have a file with billing CDR records in it. I need to parse that information (row format) . The purpose is to compare full content. The example I have given below is a single line record but it has two portions, (1) the line start with “!” and end with “1.2.1.8” and (2) second part... (5 Replies)
Discussion started by: jaygamini
5 Replies

10. Shell Programming and Scripting

Parsing the data

Hi I need to parse the following data using shell script Table ----- stage4n_abc 48 stage4o_abcd 4 adashpg_abc_HeartBeat 1 stage4l_asc 168 Can anyone gimme the solution. I want each value to get stored in an array or variable and want the value to be greped from another file.... (1 Reply)
Discussion started by: Archana.Dheepan
1 Replies
Login or Register to Ask a Question