Count multiple columns and print original file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Count multiple columns and print original file
# 1  
Old 05-26-2017
Compare multiple columns and print original file

Hello, I have two tab files with headers

File1: with 4 columns
Code:
header1    header2    header3    header4
44    a    bb    1
57    c    ab    4
64    d    d    5

File2: with 26 columns

Code:
header1..    header5    header6    header7 ... header 22...header26
id1 44    a    bb    
id2 57    c    ab    
id3 64    d    d
id4 103  e g

Output
Code:
header1..    header5    header6    header7 ... header 22...header26
id2.. 57    c    ab...    4 ...
id3.. 64    d    d... 5 ...
id4.. 103  e g ...   Unknown ...

I want to compare File1.$1$2$3 with File2.$5$6$7 and print its value in column 22 of File2 only if it matches value of '4' or '5' from File1.$4 or has no entry in File1

I started by trying to comparing columns to see if it produces any output but have been stuck since

Code:
awk -F, 'NR == FNR {
  a[$1FS$2FS$3] = $5FS$6$FS7; next 
  }
$4 in a {
  print $0, a[$4]
  }' OFS='\t' file1.txt file2.txt > output.txt

Any help is appreciated. Thank you

Last edited by nans; 05-26-2017 at 08:16 AM..
# 2  
Old 05-26-2017
Try something like this:
Code:
awk '
  NR == FNR {
    a[$1,$2,$3] = $4
    next 
  }

  ($5,$6,$7) in a {
    if ($22 == a[$5,$6,$7]){
      print
    }
    next
  }

  {
    $22="Unknown"
    print
  }
' FS='\t' OFS='\t' file1.txt file2.txt > output.txt

# 3  
Old 05-26-2017
@Scruitizer
Thank you but it seems to be printing everything as 'Unknown' and not the matching values of 4/5 if present in File 1.
# 4  
Old 05-26-2017
Your requirements are not immediately clear to me. Perhaps this is more what you mean:
Code:
awk '
  NR == FNR {
    a[$1,$2,$3] = $4
    next 
  } 

  FNR==1 {
    print
    next
  }

  {
    if(($5,$6,$7) in a) {
      $22 = a[$5,$6,$7]
    }
    else {
      $22="Unknown"
    }
    print
  }
' FS='\t' OFS='\t' file1 file2

# 5  
Old 05-26-2017
Sorry, maybe I should have been more clear. So File 1, which has 4 columns, is my reference file. the 4th column has frequencies value from 1 to 5

The second file with 26 columns is my query file and I want all the rows in the output file, that has a frequency (File1.$4) with no value(ie not in file 1) or 4 or 5 (which is in file1).

To put it the other way around, i want to delete all rows that has frequency value 1,2 and 3 and keep everything else by matching File1.$1$2$3 with File2.$5$6$7

Thank you
# 6  
Old 05-26-2017
OK, so like this?
Code:
awk '
  NR == FNR {
    a[$1,$2,$3] = $4
    next 
  } 

  FNR==1 {
    print
    next
  }

  ($5,$6,$7) in a {
    if (a[$5,$6,$7]>3){
      $22=a[$5,$6,$7]
      print
    }
    next
  }

  {
    $22="Unknown"
    print
  }
' FS='\t' OFS='\t' file1 file2

This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 05-31-2017
yes, thank you very much.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to print multiple required columns dynamically in a file using the header name?

Hi All, i am trying to print required multiple columns dynamically from a fie. But i am able to print only one column at a time. i am new to shell script, please help me on this issue. i am using below script awk -v COLT=$1 ' NR==1 { for (i=1; i<=NF; i++) { ... (2 Replies)
Discussion started by: balu1234
2 Replies

2. UNIX for Beginners Questions & Answers

Print multiple columns in scientific notation

Hi everybody, I have file 1 with 15 columns, I want to change the formatting of the numbers of columns 10,11 and 12 in the scientific notation. I used the Following script: awk '{print $10}' file1.dat | awk '{printf "%.2e\n", $1}' > file2.dat awk '{print $11}' file1.dat | awk '{printf... (7 Replies)
Discussion started by: supernono06
7 Replies

3. Shell Programming and Scripting

Awk: is it possible to print into multiple columns?

Hi guys, I have hundreds file like this, here I only show two of them: file 1 feco4_s_BB95.log ZE_1=-1717.5206260 feco4_t_BB95.log ZE_1=-1717.5169250 feco5_s_BB95.log ZE_1=-1830.9322060... (11 Replies)
Discussion started by: liuzhencc
11 Replies

4. Programming

awk to count occurrence of strings and loop for multiple columns

Hi all, If i would like to process a file input as below: col1 col2 col3 ...col100 1 A C E A ... 3 D E G A 5 T T A A 6 D C A G how can i perform a for loop to count the occurences of letters in each column? (just like uniq -c ) in every column. on top of that, i would also like... (8 Replies)
Discussion started by: iling14
8 Replies

5. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

6. UNIX for Dummies Questions & Answers

Split a column into multiple columns at certain character count

Hey everyone, I have an issue with a client that is passing me a list of values in one column, and occasionally the combination of all the values results in more than an 255 character string. My DB has a 255 character limit, so I am looking to take the column (comma delimited file), and if it... (1 Reply)
Discussion started by: perekl
1 Replies

7. Shell Programming and Scripting

Match ids and print original file

Hello, I have two files Original: ( 5000 entries) Chr Position chr1 879108 chr1 881918 chr1 896874 ... and a file with allele freq ( 2000 entries) Chr Position MAF chr1 881918 0.007 chr1 979748 0.007 chr1... (9 Replies)
Discussion started by: nans
9 Replies

8. Linux

Find and print in multiple columns

Hi all, My input file is : 0 13400000 sil 13400000 14400000 a 14400000 14900000 dh 14900000 15300000 a 15300000 16500000 R 16500000 17000000 k 17000000 17300000 u 17300000 17600000 th 17600000 17900000 sil 17900000 18400000 th 18400000 18900000 a 18900000 19600000 g 19600000 19900000... (1 Reply)
Discussion started by: girlofgenuine
1 Replies

9. Shell Programming and Scripting

awk command to print multiple columns

Hello Team, I have written following command which is giving output is as shown below. bash-3.00$ grep -i startup catalina.out | tail +2 | sed -n 1p | awk -F" " '{ for (x=1; x<=5; x++) { printf"%s\n", $x } }' Dec 19, 2010 3:28:39 PM bash-3.00$ I would like to modify above command to... (2 Replies)
Discussion started by: coolguyamy
2 Replies

10. Shell Programming and Scripting

mv command to rename multiple files that retain some portion of the original file nam

Well the title is not too good, so I will explain. I need to move (rename) files using a simple AIX script. ???file1.txt ???file2.txt ???file1a.txt ???file2a.txt to be: ???renamedfile1'date'.txt ???renamedfile2'date'.txt ???renamedfile1a'date'.txt ???renamedfile2a'date'.txt ... (4 Replies)
Discussion started by: grimace15
4 Replies
Login or Register to Ask a Question