Need help for data extraction if files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Need help for data extraction if files
# 1  
Old 09-29-2012
Error Need help for data extraction if files

Hello all,

I want to extract some particular data from a files and than add all the values .
but i m not able to cut the particular word(USU-INOCT and USU-OUTOCT) as it is coming not in column. and than able to add values coming in it .

can anyone help me Please
Code:
cat <file name>
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-TariffTimeChange=0 USU-INOCT=12772 USU-OUTOCT=39177</USU> <USU> USU-TariffTimeChange=1 USU-INOCT=178 USU-OUTOCT=144</USU> SID=46 RG=46 R.REASON=VLD </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=139072 USU-OUTOCT=818813</USU> SID=46 RG=46 R.REASON=VLD </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=122714 USU-OUTOCT=902004 USU-R.RES=QEX</USU> SID=46 RG=46 </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=121962 USU-OUTOCT=902846 USU-R.RES=QEX</USU> SID=46 RG=46 </MSCC-1>   
 <MSCC-1>   SID=46 RG=46 RESULT-CODE=4012 </MSCC-1>   
 <MSCC-1>   <USU> USU-INOCT=52 USU-OUTOCT=0</USU> SID=46 RG=46 R.REASON=FNL </MSCC-1>

thanks
akash

Last edited by jim mcnamara; 09-29-2012 at 09:28 AM..
# 2  
Old 09-29-2012
That shows your input, please show us your desired output.
# 3  
Old 09-29-2012
Code:
BEGIN {
 a["USU-INOCT"]="USU-INOCT";
 a["USU-OUTOCT"]="USU-OUTOCT";
}

{c=split($0,b,">");
 for (i=0; i<=c; i++) {
  for (j in a) {
   if (b[i] ~ j) {
      ss=b[i];
      s=substr(b[i], index(ss, j));
      sub("^[^=]*= *", "", s);
      sub("[<> ].*$", "", s);
      s_a[j]+=s;
   }
  }
 }
}

END { for (o in a) print "Sum Total " o ": " s_a[o]; }

# 4  
Old 09-29-2012
Hello jim mcnamara,

i want output as like this...

USU-INOCT=xxxxx USU-OUTOCT=xxxxx
USU-INOCT=xxxxx USU-OUTOCT=xxxxx
USU-INOCT=xxxxx USU-OUTOCT=xxxxx
...
...
..
USU-INOCT=xxxxx USU-OUTOCT=xxxxx
(n lines)
and than SUM USU-INOCT=xxxxx SUM USU-OUT-OCT=xxxxx

Thanks
# 5  
Old 09-29-2012
Try:
Code:
awk -F "[ =]" 'BEGIN{
        f1 = "USU-INOCT"
        f2 = "USU-OUTOCT"
}       
{       for(i = 1; i < NF; i++)
                if($i == f1) s1 += $(i + 1)
                else if($i == f2) s2 += $(i + 1)
        printf("%s=%s %s=%d\n", f1, s1, f2, s2)
        t1 += s1
        t2 += s2
}               
END {   printf("SUM %s=%d SUM %s=%d\n", f1, t1, f2, t2)
}' input

**********************************
Please ignore this posting; this script will not work. I'll post a correction later.

Last edited by Don Cragun; 09-29-2012 at 12:11 PM.. Reason: I must still have been asleep when I posted this.
# 6  
Old 09-29-2012
Hi.

This can be done in two steps, one to extract the strings token=number and another to sum the individual items:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate extraction of string=integer.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C grep awk

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
grep -E -o '(USU-INOCT|USU-OUTOCT)=[0-9]+' $FILE |
tee f1 |
awk -F= '
/IN/	{ sumi += $2 ; next }
/OUT/	{ sumo += $2 }
END	{ print sumi, sumo, sumi+sumo }
'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
grep GNU grep 2.5.3
awk GNU Awk 3.1.5

-----
 Input data file data1:
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-TariffTimeChange=0 USU-INOCT=12772 USU-OUTOCT=39177</USU> <USU> USU-TariffTimeChange=1 USU-INOCT=178 USU-OUTOCT=144</USU> SID=46 RG=46 R.REASON=VLD </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=139072 USU-OUTOCT=818813</USU> SID=46 RG=46 R.REASON=VLD </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=122714 USU-OUTOCT=902004 USU-R.RES=QEX</USU> SID=46 RG=46 </MSCC-1>   
 <MSCC-1>   <GSU>  GSU-TOTOCT=1024000 </GSU>  SID=46 RG=46 VALTIME=1700 RESULT-CODE=2001 V.Q.THRESHOLD=102400 </MSCC-1>   
 <MSCC-1>   <RSU>  </RSU>  <USU> USU-INOCT=121962 USU-OUTOCT=902846 USU-R.RES=QEX</USU> SID=46 RG=46 </MSCC-1>   
 <MSCC-1>   SID=46 RG=46 RESULT-CODE=4012 </MSCC-1>   
 <MSCC-1>   <USU> USU-INOCT=52 USU-OUTOCT=0</USU> SID=46 RG=46 R.REASON=FNL </MSCC-1>

-----
 Results:
396750 2662984 3059734

See the intermediate file f1 for the extracted strings, and see man pages for other details ... cheers, drl
# 7  
Old 09-29-2012
Before anamdev posted details of the desired output in message #4 in this thread, I had a working script that produced very nicely formatted results giving the total of the USU-INOCT entries, the total of the USU-OUTOCT entries, and the sum of those two values (which is what I was guessing you wanted). For you amusement, this is that script:
Code:
#!/bin/ksh
awk -F "[ =]" 'BEGIN{
        f1 = "USU-INOCT"
        f2 = "USU-OUTOCT"
        h1 = "Sum of " f1 " values:"
        h2 = "Sum of " f2 " values:"
        h3 = "==============="
        h4 = "Total:"
}
{       for(i = 1; i < NF; i++)
                if($i == f1) s1 += $(i + 1)
                else if($i == f2) s2 += $(i + 1)
}
END {   slen = length((s1 + s2) "")
        hml = length(h1) > length(h2) ? length(h1) : length(h2)
        printf("sum=%d, slen=%d\n", s1+s2, slen)
        printf("Sum of %s values: %d\nSum of %s values: %d\nTotal: %d\n",
                f1, s1, f2, s2, s1 + s2)
        printf("%*s %*d\n%*s %*d\n%*.*s\n%*s %*d\n",
                hml, h1, slen, s1,
                hml, h2, slen, s2,
                hml + slen + 1, slen, h3,
                hml, h4, slen, s1 + s2)
}' input

which with the input you gave as an example produces the output:
Code:
 Sum of USU-INOCT values:  396750
Sum of USU-OUTOCT values: 2662984
                          =======
                   Total: 3059734

When I saw your actual requirements, I modified the script above producing what I put in message #5, but I obviously didn't think through all of the basic changes to the logic. Following is a corrected version that does what you actually asked for:
Code:
#!/bin/ksh
awk -F "[ =]" 'BEGIN{
        f1 = "USU-INOCT"
        f2 = "USU-OUTOCT"
}       
{       s1 = s2 = 0
        for(i = 1; i < NF; i++)
                if($i == f1) s1 += $(i + 1)
                else if($i == f2) s2 += $(i + 1)
        if(s1 || s2) printf("%s=%s %s=%d\n", f1, s1, f2, s2)
        t1 += s1
        t2 += s2
}
END {   printf("SUM %s=%d SUM %s=%d\n", f1, t1, f2, t2)
}' input

which produces the following output when given the same input:
Code:
USU-INOCT=12950 USU-OUTOCT=39321
USU-INOCT=139072 USU-OUTOCT=818813
USU-INOCT=122714 USU-OUTOCT=902004
USU-INOCT=121962 USU-OUTOCT=902846
USU-INOCT=52 USU-OUTOCT=0
SUM USU-INOCT=396750 SUM USU-OUTOCT=2662984

I sincerely apologize for any confusion this may have caused.

- Don
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data extraction from .xml file

Hello, I'm attempting to extract 13 digit numbers beginning with 978 from a data file with the following command: awk '{ for(i=1;i<=NF;i++) if($i ~ /^978/) print $i; }' datafile > outfile This typically works. However, the new data file is an .xml file, and this command is no longer working... (6 Replies)
Discussion started by: palex
6 Replies

2. Shell Programming and Scripting

CSV file data extraction

Hi I am writing a shell script to parse a CSV file , in which i am facing a problem to separate the columns . Could some one help me with it. IN301330/00001 pvavan kumar limited xyz@ttccpp.com IN302148/00002 PRECIOUS SECURITIES (P) LTD viash@yahoo.co.in IN300239/00000 CENTRE india... (8 Replies)
Discussion started by: nanduri
8 Replies

3. Shell Programming and Scripting

data extraction from a file

Hi Freinds, I have a file1.txt in the following format File1.txt I want to get 2 files from the above file filextra.txt should have the lines which are ending with "<" and remaining lines in the filecompare.txt file. Please help. (3 Replies)
Discussion started by: i150371485
3 Replies

4. Shell Programming and Scripting

Data and return code extraction

Hello everybody, Another day another problem. I have to create a script which collects data from 3 csv files. I would like the script to check file1 which contains different values for the date entered previously. As you can see 01/12/2010 contains actions in the first field and in the... (7 Replies)
Discussion started by: freyr
7 Replies

5. Shell Programming and Scripting

Extraction of data from multiple text files, and creation of a chart

Hello dear friends, My problem as explained below seems really basic. Fact is that I'm totally new to programming, and have only a week to produce a script ( CShell or Perl ? ) to perform this action. While searching on the forums, I found a command that could help me, but I don't know... (2 Replies)
Discussion started by: ackheron
2 Replies

6. Shell Programming and Scripting

Selective extraction of data from a files

Hi, I would like to seek for methods to do selective extraction of line froma file. The scenario as follows: I have a file with content: message a received on 11:10:00 file size: 10 bytes send by abc message b received on 11:20:00 file size: 10 bytes send by abc (3 Replies)
Discussion started by: dwgi32
3 Replies

7. Shell Programming and Scripting

Another data extraction question

Hi, I have a tmp file like below: <ADATA> ANUM=900 ADESC=Saving ATYP=0 TXREGD=0 </ADATA> <ADATA> ANUM=890 ADESC=Saving ATYP=0 ABAL=9000 TXREGD=1 </ADATA> <ADATA> (6 Replies)
Discussion started by: kunigirib
6 Replies

8. Shell Programming and Scripting

Data Extraction From a File

Hi All, I have a requirement where I have to search the file with some text say "Exception". This exception word can be repeated for more then 10 times. Suppose the "Exception" word is repeated at line numbers say x=10, 50, 60, 120. Now I want to extract all the lines starting from x-5 to... (3 Replies)
Discussion started by: rrangaraju
3 Replies

9. Shell Programming and Scripting

help with data extraction script

Hello all, Iam newbie here and to unix programming. I have the following text file. A:Woshington,B:London,C:Paris,D:Manchester,C:Lisbon,E:Cape town. Now I would like extract this and store in database. here is the script I have tried but it did work. CITY1:`echo "$text" | grep "A:"... (11 Replies)
Discussion started by: mam
11 Replies

10. Shell Programming and Scripting

Data Extraction issue.

I have a small problem, I have written a following script, which extracts all the rows from source file which strats with T101 and rights it to another file mydata.dat Script my_script #!/bin/ksh YMONTH=$1 dir1='/home/data' dir2='/clients/source_file' cd $dir1 grep "T101"... (5 Replies)
Discussion started by: irehman
5 Replies
Login or Register to Ask a Question