filtering one file based on results from other- AGAIN


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting filtering one file based on results from other- AGAIN
# 8  
Old 12-15-2008
You want the previous output + the total, or only the total?

This will give you only the total:

Code:
awk 'END {
  while (++i <= c)
    if (u[o[i]] > 1)
	  printf "%s %.2f\n", o[i], t[o[i]]  
  }
NR == FNR {
  _[$1] = $2 
  next
  }
$1 in _ && $4 <= _[$1] {
  o[u[$1]++ ? c : ++c] = $1
  t[$1] += $NF 
  u[$1]++
}' file1 file2


Last edited by radoulov; 12-15-2008 at 09:26 AM..
# 9  
Old 12-15-2008
Rehrase

Actually let me rephrase my question. I hope I am not annoying you with this. I actually put my question wrongly last time.

I actually want (based on the first column) to have the difference of the first and last value (it does not have to be in the same script as you originally wrote, this can be a second script that I would run after running your script that did the magic for me earlier). It outputs the result as follows;

DATA_444_0 299659.88 2686034.50 -5222.89
DATA_444_0 299646.31 2686026.00 -5226.55
DATA_444_0 299634.50 2686018.50 -5229.11
DATA_444_0 299622.41 2686010.75 -5230.46
DATA_451_0 299369.53 2684876.00 -5191.90
DATA_451_0 299357.28 2684869.25 -5194.87
DATA_451_0 299332.78 2684855.50 -5197.94


I want the difference of the last and the 1st value in the last (4th) column for records based on the first column. Thus from the second script (if you can write it as another script) to give me the following result
DATA_444_0 -7.57 # (-5230.46 - ( -5222.8))
DATA_451_0 -6.04 # (-5197.94 - (-5191.0))

............
............


i.e. each time the value changes in the first column, i want the difference of the last and the first record in the 4th column for each value in the first column and output as above.
I hope I am able to describe the question appropriately. I would remember you in my prayers for helping me on this.
# 10  
Old 12-15-2008
No problem.
If you want to run two separate scripts:

Code:
awk 'END {
  print k, l - f 
  }
!_[$1]++ {
  if (k) 
    if(l) print k, l - f
  k = $1; f = $NF
  }
{ l = $NF }'

So, given your sample data, it would be something like this.

- the first one:

Code:
$ awk 'END {
  for(i=1; i<=c; i++) {
    split(r[i], t)
if (u[t[1]] > 1)
  print r[i]
  }
  }
NR == FNR {
  _[$1] = $2
  next
  }
$1 in _ && $4 <= _[$1] {
  r[++c] = $0
  u[$1]++
}' file1 file2
DATA_444_0 299659.88 2686034.50 -5222.89
DATA_444_0 299646.31 2686026.00 -5226.55
DATA_444_0 299634.50 2686018.50 -5229.11
DATA_444_0 299622.41 2686010.75 -5230.46
DATA_451_0 299369.53 2684876.00 -5191.90
DATA_451_0 299357.28 2684869.25 -5194.87
DATA_451_0 299332.78 2684855.50 -5197.94

- both:

Code:
$ awk 'END {
  for(i=1; i<=c; i++) {
    split(r[i], t)  
if (u[t[1]] > 1)                     
  print r[i]
  }                                
  }         
NR == FNR { 
  _[$1] = $2 
  next 
  }                     
$1 in _ && $4 <= _[$1] {   
  r[++c] = $0  
  u[$1]++                  
}' file1 file2|awk 'END {
  print k, l - f
  }
!_[$1]++ {
  if (l) print k, l - f
  k = $1; f = $NF
  }
{ l = $NF } '
DATA_444_0 -7.57
DATA_451_0 -6.04

And of course, you can put all in one script.

Last edited by radoulov; 12-16-2008 at 09:40 AM.. Reason: corrected
# 11  
Old 12-16-2008
Thank you AWK GURU

Dear RADOULOV,

If awk has a GOD, it is you...Smilie
How can I thank you for helping me. I dont have words.
Will you kill me or just ignore me if I ask you to help me with the last formatting request? Please ....

Your code that you wrote together i.e. the one that does both jobs in one go works perfectly. Unfortunately I overlooked and forgot to ask you that the output has to be changed slightly. The first part of your script (If using your code that does both things together) outputs following result.

DATA_444_0 299659.88 2686034.50 -5222.89
DATA_444_0 299646.31 2686026.00 -5226.55
DATA_444_0 299634.50 2686018.50 -5229.11
DATA_444_0 299622.41 2686010.75 -5230.46 <= 2nd and 3rd col from the last row are required in the final result
DATA_451_0 299369.53 2684876.00 -5191.90
DATA_451_0 299357.28 2684869.25 -5194.87
DATA_451_0 299332.78 2684855.50 -5197.94 <= 2nd and 3rd col from the last row are required in the final result

The second part of your your code produces this result, which is perfect (based on my previous request)

DATA_444_0 -7.57 # (-5230.46 - ( -5222.8))
DATA_451_0 -6.04 # (-5197.94 - (-5191.0))

however I want to include the last row's 2nd and 3rd column to this final output for every unique record in the first column. Thus I would want the result instead of above to be like this;

DATA_444_0 299622.41 2686010.75 -7.57 # (2nd and 3rd column comes from the last row of each unique record in the first column)
DATA_451_0 299332.78 2684855.50 -6.04 # (2nd and 3rd column comes from the last row of each unique record in the first column

Note tha thte 2nd and 3rd column in the above result are the last row of each unique record in the first column. Also it would be great if I can have the formatted output of the final result. (not necessary as I can do it as after I get the aove result.... that is all the awk I know...) I hope you will help me as you have done before and I would pray for a healthy, safe and prosperous life for you.

A very newbie to Linux / Unix...
# 12  
Old 12-16-2008
If I'm not missing something:

Code:
awk 'END {
  split(l, t)
  print k, t[2], t[3], t[4] - f
  }
!_[$1]++ {
  if (l) {
    split(l, t)
    print k, t[2], t[3], t[4] - f
    }
  k = $1; f = $NF
  }
{ l = $0 }'

So the result would be:

Code:
$ awk 'END {
  for(i=1; i<=c; i++) {
    split(r[i], t)
if (u[t[1]] > 1)
  print r[i]
  }
  }
NR == FNR {
  _[$1] = $2
  next
  }
$1 in _ && $4 <= _[$1] {
  r[++c] = $0
  u[$1]++
}' file1 file2|awk 'END {
  split(l, t)
  print k, t[2], t[3], t[4] - f
  }
!_[$1]++ {
  if (l) {
    split(l, t)
print k, t[2], t[3], t[4] - f
}
  k = $1; f = $NF
  }
{ l = $0 }'
DATA_444_0 299622.41 2686010.75 -7.57
DATA_451_0 299332.78 2684855.50 -6.04

# 13  
Old 12-17-2008
Thank you very much

Thanks a lot. Your script does the trick for me. I greatly appreciate you patience to help me out. Once again you are a real awk guru and a really nice person. I wish you all the success in life. God Bless.
If you dont mind, I want to ask where are you located in the world....?
# 14  
Old 12-17-2008
Thank you for the nice words but I'm not an AWK guru Smilie
I know programmers that know and use AWK far better than me.

Look at the Location above: I'm Bulgarian, but I live and work in Italy.

Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies

2. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ... (25 Replies)
Discussion started by: daashti
25 Replies

3. Shell Programming and Scripting

Filtering duplicates based on lookup table and rules

please help solving the following. I have access to redhat linux cluster having 32gigs of ram. I have duplicate ids for variable names, in the file 1,2 are duplicates;3,4 and 5 are duplicates;6 and 7 are duplicates. My objective is to use only the first occurrence of these duplicates. Lookup... (4 Replies)
Discussion started by: ritakadm
4 Replies

4. Shell Programming and Scripting

Filtering first file columns based on second file column

Hi friends, I have one file like below. (.csv type) SNo,data1,data2 1,1,2 2,2,3 3,3,2 and another file like below. Exclude data1 where Exclude should be treated as column name in file2. I want the output shown below. SNo,data2 1,2 2,3 3,2 Where my data1 column got removed from... (2 Replies)
Discussion started by: ks_reddy
2 Replies

5. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................ (2 Replies)
Discussion started by: mintu41
2 Replies

6. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies

7. Shell Programming and Scripting

filtering records based on numeric field value in 8th position

I have a ";" delimited file.Whcih conatins a number fileds of length 4 charcters in 8th position But there is a alphanumeric charcters like : space, ";" , "," , "/" , "23-1" , "23 1" , "aqjhdj" , "jun-23" , "APR-04" , "4:00AM" , "-234" , "56784 ", "." , "+" "_" , "&" , "*" , "^" , "%" , "!"... (2 Replies)
Discussion started by: indusri
2 Replies

8. Shell Programming and Scripting

filtering one file based on results from other

Can anybody help me with writing a script for the data that I want to use from one file based on the data from another file. I have file1 in this form; (the first field represents a well name and the second field represents the depth of interest) FILE1 -------- DATA_35_0 ... (2 Replies)
Discussion started by: digipak
2 Replies

9. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ... (4 Replies)
Discussion started by: risk_sly
4 Replies

10. Shell Programming and Scripting

filtering list results

I created a large file list using: find . -type f -mtime +540 > test2.txt ..which searched recursively down the directory tree searching for any file older than 540 days. I would like to filter the results removing the directory name and the "/" character, resulting in only a list of the... (3 Replies)
Discussion started by: fxvisions
3 Replies
Login or Register to Ask a Question