extract data and display the missing value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract data and display the missing value
# 1  
Old 04-28-2011
extract data and display the missing value

Hi,

I have thousands of data in 1 file that need to be sorted out. My file is like below:-

File1.txt
Code:
  
condition 1
scaf_27   CDS        48317   48517  "e_gww2.27.12.1"    Id 35277
scaf_27   stop_cod   48317   48319  "e_gww2.27.12.1"
scaf_27   CDS        48518   49107  "e_gww2.27.12.1"    Id 135277  
scaf_27   CDS        49159   49527  "e_gww2.27.12.1"    Id 135277                              

condition 2
scaf_27   start_cod  132050  132052  C_scaf_27000026"        
scaf_27   CDS        132729  132788  C_scaf_27000026"   Id 9489
scaf_27   CDS        132829  132956  C_scaf_27000026"   Id 9489
scaf_27   CDS        133017  133411  C_scaf_27000026"   Id 9489

condition 3              
scaf_27   CDS        70283   70452   "gww2.27.28.1"     Id 43177
scaf_27   CDS        70500   70914   "gww2.27.28.1"     Id 43177

condition 4
scaf_27   CDS        51556   51986   C_scaf_27000005"   Id 9468     
scaf_27   start_cod  51556   51558   C_scaf_27000005"    
scaf_27   CDS        52048   52114   C_scaf_27000005"   Id 9468   
scaf_27   CDS        52168   52491   C_scaf_27000005"   Id 9468
scaf_27   stop_cod   55218   55220   C_scaf_27000005"

From this file, i need to extract data which has
1) only stop_cod
2) only start_code
3) Does not have both start_cod and stop_cod

Id that has both start_cod & stop_cod (condition 4) should be ignored.

The output should display the missing value like this:-

if in condition 1

scaf_27 Id 35277 start_cod

if in condition 2
scaf_27 Id 9489 stop_cod

if in condition 3
scaf_27 Id 43177 start_cod & stop_cod

condition 4 should be ignored as it has both start_cod & stop_cod

output.txt
Code:
scaf_27     Id 35277    start_cod
scaf_27     Id 9489     stop_cod
scaf_27     Id 43177    start_cod & stop_cod

I have no idea how to do this. if i got the solution for this, it would save me a lot of time and i could apply it to other works as well. would appreciate your kind help on this. thanks
# 2  
Old 04-28-2011
I don't quite understand what is "condition" there in your input... Nevertheless:

Try this out:
Code:
#!/usr/bin/awk -f

BEGIN{
  a[0]="start";
  b[0]="stop";
  start=5;
}
/condition/{
  if((start+stop)<2){
    print first " Id " id " " a[start] " " b[stop]
  }
  start=0; 
  stop=0; 
  id="";
}
/stop/{
  stop=1; 
  first=$1; 
}
/start/{
  first=$1; 
  start=1
}
$7~/^[0-9]+$/{id=$7}
END{
  if((start+stop)<2){
    print first " Id " id " " a[start] " " b[stop]}
}

Save the above in a file, let's name it doIt.awk, then make it executable
Code:
chmod 754 doIt.awk

and invoke with
Code:
./doIt.awk data

This script assumes Id is always in 7th column, and the same for the whole block.
# 3  
Old 04-29-2011
hi mirni,

Thanks so much for your prompt response.. actually the word "condition" does not exist in the input file, i put it to make whoever read it understand.

i will try to work on your method above and see how it goes. Thanks Smilie

---------- Post updated 04-29-11 at 11:42 AM ---------- Previous update was 04-28-11 at 08:27 PM ----------

Hi mirni,

i had tried couple of times but it just give me blank output file Smilie
# 4  
Old 04-29-2011
The script uses the word 'condition' in the input file to separate the blocks of data. If you don't have the word 'condition' there, it's not gonna work. Please post the input data as is, so that I can adjust the script to make it work.
# 5  
Old 04-29-2011
Hi,

the sample for input file as follows:-

Code:
scaf_27   CDS        48317   48517  "e_gww2.27.12.1"    Id 35277
scaf_27   stop_cod   48317   48319  "e_gww2.27.12.1"
scaf_27   CDS        48518   49107  "e_gww2.27.12.1"    Id 135277  
scaf_27   CDS        49159   49527  "e_gww2.27.12.1"    Id 135277                              
scaf_27   start_cod  132050  132052  C_scaf_27000026"        
scaf_27   CDS        132729  132788  C_scaf_27000026"   Id 9489
scaf_27   CDS        132829  132956  C_scaf_27000026"   Id 9489
scaf_27   CDS        133017  133411  C_scaf_27000026"   Id 9489
scaf_29   CDS        70283   70452   "gww2.27.28.1"     Id 43177
scaf_29   CDS        70500   70914   "gww2.27.28.1"     Id 43177
scaf_27   CDS        51556   51986   C_scaf_27000005"   Id 9468     
scaf_27   start_cod  51556   51558   C_scaf_27000005"    
scaf_27   CDS        52048   52114   C_scaf_27000005"   Id 9468   
scaf_27   CDS        52168   52491   C_scaf_27000005"   Id 9468
scaf_27   stop_cod   55218   55220   C_scaf_27000005"

and the output.txt file should be like this:-

Code:
scaf_27     Id 35277    start_cod
scaf_27     Id 9489     stop_cod
scaf_29     Id 43177    start_cod & stop_cod

thanks..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filling in the missing data point by awk

I am learning AWK by trying out examples whenever I need a specific conversion. I would like to edit the 'before.txt' so that all the missing data points between 140-150 are added and shown as 0. before.txt 145 2 148 13 149 17 to below, 140 0 141 0 142 0 143 0 144 0 145 2 146 0... (5 Replies)
Discussion started by: numareica
5 Replies

2. UNIX for Dummies Questions & Answers

Missing data

Gents, Using the following code. awk -F: 'BEGIN { print "Time,FFID,Swath,Line,Point"; } /(SCI TB Timestamp Local : |File # :|Swath Name :|Tape # :|Line Name :|Point Number :|Type_Of_Dump|Type_Of_Test|Tape_Nb|Tape_Label|Date|Hist)/{ sub("^*","",$2);sub("*$","",$2); if($1 ~ /Hist/) { printf... (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

Fill in missing Data

hello everyone, I have a task to input missing data into a file. example of my data below: Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,,0.3,,0.3,,0.5,,0.3,,,0.4,0.3, Wed Feb 01 09:00:11 EST 2012,,,,,,,0.2,,,,,,,,,, Wed Feb 01 09:00:22 EST... (23 Replies)
Discussion started by: Nolph
23 Replies

4. Shell Programming and Scripting

How to extract a field from ls-l command and display?

So I want to put a line at the end of my script which greps for keywords from syslog.log that outputs the following after it is done: "This file was last modified on (thisdate)" I know I can use the following to get the date: rtidsvb(izivanov):/home/izivanov> ll /var/adm/syslog/syslog.log ... (4 Replies)
Discussion started by: zixzix01
4 Replies

5. UNIX for Dummies Questions & Answers

Missing the 'data' archive in my PC

Hello everyoane I folow step by step (i' new in unix) this video tutorial YouTube - rogriff's Channel to instal unix in a personal laptop, but when he go in C -> data -> downloads -> resume020807 in mi pc not exist I attach the filles to see, thank you in advance (3 Replies)
Discussion started by: ___
3 Replies

6. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

7. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

8. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Here is what old matrix look like, IDs X1 X2 Y1 Y2 10914061 -0.364613333 -0.362922333 0.001691 -0.450094667 10855062 0.845956333 0.860396667 0.014440333 1.483899333... (7 Replies)
Discussion started by: ssshen
7 Replies

9. UNIX for Advanced & Expert Users

missing data in FTP file

Hello - I am FTPing file from remote unix box to my unix box. I am FTPing file around 2AM. some time, the complete fle is not ftping.. I am missing data in the FTP file. It happens few times in a month. Whenever, i miss the data, the file size is always 60106. The actual file size is not that... (4 Replies)
Discussion started by: govindts
4 Replies

10. UNIX for Dummies Questions & Answers

using cat and grep to display missing records

Gentle Unix users, Can someone tell me how I can use a combination of the cat and grep command to display records that are in FileA but missing in FileB. cat FileA one line at a time and grep to see if it is in fileB. If it is ignore. If line is not in fileB display the line. Thanks in... (4 Replies)
Discussion started by: jxh461
4 Replies
Login or Register to Ask a Question