Get both common and missing values from multiple files


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get both common and missing values from multiple files
# 1  
Get both common and missing values from multiple files

Hi,

I have 5 files with two columns. I need to merge all the 5 files based on column 1. If any of them are missing then corresponding 2nd column should be substituted by missing value.

I know hoe to do this for 2 files. but how can I implement for 5 files. I tried this based on 5 files but it does not work

Code:
awk 'FNR==NR{a[$2]=$3;next}{print $0,a[$2]?a[$2]:"NA"}' file2 file1

input:
Code:
ENSG00001 22
ENSG00002 20
ENSG00003 40

input2
Code:
ENSG00001 22
ENSG00004 22
ENSG00005 22

input3:
Code:
ENSG00000 22
ENSG00002 20
ENSG00003 40

input4
Code:
ENSG00002 22
ENSG00004 22
ENSG00005 22

output:
Code:
ENSG00000    22 
ENSG00001 22 22   
ENSG00002 22  20 22
ENSG00003 40  40 
ENSG00004 22  22
ENSG00005  22  22

missing values should be NULL

Thanks,

Moderator's Comments:
Mod Comment CODE tags also for code

Last edited by Scrutinizer; 05-21-2014 at 04:55 PM..
# 2  
Well, a suggestion for logic (no code yet)
  • Get all the unique values from all the files into one.
  • Get values missing in each file and write your adjustment records in
  • Wrap it all together. Is the output required as sorted? That might make it easier.
Does that seem a sensible approach?

How many output files do you want? Is it one output for each input?




Robin
# 3  
I'm confused. You said that you tried:
Code:
awk 'FNR==NR{a[$2]=$3;next}{print $0,a[$2]?a[$2]:"NA"}' file2 file1

to handle five files, but you only gave this script two files?

The code above seems to be using the 2nd field (not the 1st) as the common key between files and using the 3rd field (not the 2nd) as the data for that key. This is probably why this wasn't doing what you wanted for two files.

Your sample output shows empty fields (with a single space field separator) when a field was missing; you said "missing fields should be NULL" (which could be interpreted to mean you want a null string or that you want the literal string "NULL"); but the code above seems to be trying to put out the literal string NA when the data is an empty string, 0, or missing.

So, I can guess that you want one output file with one output column for each input file.

But it isn't clear if you care whether or not the output is sorted on the 1st output column. Do you care if the output is sorted?

And it isn't clear what you want for output if the data in the 2nd field of an input file is missing, is an empty string, or is a numeric string that evaluates to 0. If there is no line in an input file for a 1st field value, do you want the output to be an empty string, the string NULL, or the string NA? If there is a line in an input file with a given 1st field value but there is no second field on that line in the input file, do you want the output to be an empty string, the string NULL, or the string NA? If there is a line in an input file with a given 1st field and the 2nd field is a string of one or more zeros, do you want the output to be an empty string, the string NULL, the string NA, or the string of one or more zeros found on the corresponding input line?

What do you want to use as the output field separator?
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #226
Difficulty: Easy
According to NetMarketShare, in September 2019 Android had less than a two-to-one lead over iOS in worldwide mobile device operating system market share.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Common values in 2 columns in 2 files

Hello, Suppose I have these 2 tab delimited files, where the second column in first file contains matching values from first column of the second file, I would like to get an output like this: File A 1 A 2 B 3 C File B A Apple C Cinnabon B Banana I would like... (1 Reply)
Discussion started by: Mohamed EL Hadi
1 Replies

2. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

3. Shell Programming and Scripting

Find Common Values Across Two Files

Hi All, I have two files like below: File1 MYFILE_28012012_1112.txt|4 MYFILE_28012012_1113.txt|51 MYFILE_28012012_1114.txt|57 MYFILE_28012012_1115.txt|57 MYFILE_28012012_1116.txt|57 MYFILE_28012012_1117.txt|57 File2 MYFILE_28012012_1110.txt|57 MYFILE_28012012_1111.txt|57... (2 Replies)
Discussion started by: angshuman
2 Replies

4. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

5. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

6. UNIX for Dummies Questions & Answers

Extract common data out of multiple files

I am trying to extract common list of Organisms from different files For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it. I want to use awk/ grep/ cut/... (7 Replies)
Discussion started by: macmath
7 Replies

7. Shell Programming and Scripting

Parsing common values across multiple files

Hi All, I have multiple (5+) text files with single columns and I would like to grep the common values across all the text files and parse it to a new file. All the values are numerical. Please let me know how to do it using awk. (6 Replies)
Discussion started by: Lucky Ali
6 Replies

8. Shell Programming and Scripting

Get common lines from multiple files

FileA chr1 31237964 NP_001018494.1 PUM1 M340L chr1 31237964 NP_055491.1 PUM1 M340L chr1 33251518 NP_037543.1 AK2 H191D chr1 33251518 NP_001616.1 AK2 H191D chr1 57027345 NP_001004303.2 C1orf168 P270S FileB chr1 ... (9 Replies)
Discussion started by: genehunter
9 Replies

9. Shell Programming and Scripting

Scan two files and print values missing

Dear Experts, Have been seraching this forum from this morning for my query but dint find hence posting it her... Basically i have two input files BSS and MSS which has a unique string , hence i hav tried and seperated the text to compare frm both files .. Any my present input files look like... (6 Replies)
Discussion started by: shaliniyadav
6 Replies

10. UNIX for Dummies Questions & Answers

How to rename multiple files with a common suffix

Hi, There are multiple files like file1_11 file2_11 file3_11.....and so on. How to rename them such tht the suffix _11 is removed and they become file1, file2, file3. Any help is appreciated. Regards er_ashu (1 Reply)
Discussion started by: er_ashu
1 Replies