Merge multiple tab delimited files with index checking Post: 302988715

Sponsored Content

Top Forums Shell Programming and Scripting Merge multiple tab delimited files with index checking Post 302988715 by RavinderSingh13 on Sunday 1st of January 2017 01:40:17 AM

01-01-2017

Moderator

Quote:

Originally Posted by RudiC

Try this - very specific to your problem, not as versatile and flexible as Chubler_XL's script - little awk proposal:

Code:

awk '
NR == 1         {HD = $1
                }
FNR == 1        {split (FILENAME, T, "_")
                 HD = HD OFS $3 OFS $4 "_" T[2]
                }

                {IX  = FNR - 1
                 MAX = IX>MAX?IX:MAX 
                }

FNR == NR       {ID[IX]   = $1
                 NAME[IX] = $3
                }
$1 == ID[IX] &&
$3 == NAME[IX]  {OUT[IX]  = OUT[IX] $3 OFS $4 OFS
                 next
                }

                {OUT[IX]  = OUT[IX] OFS OFS
                }

END             {print HD
                 for (i=1; i<=MAX; i++) print ID[i], OUT[i]
                }
' OFS="\t" A_*_pred.txt
Id    Name    E0_f0    Name    E0_f1    Name    E0_f3
1    N(,)'1    0.2904    N(,)'1    0.2916    N(,)'1    0.2581    
2    N(,)'2    0.3180    N(,)'2    0.3123    N(,)'2    0.2903    
3    N(,)'3    0.3277    N(,)'3    0.3234    N(,)'3    0.2988    
4    N(,)'4    0.3675    N(,)'4    0.3475    N(,)'4    0.3496    
5    N(,)'5    0.3456    N(,)'5    0.3294    N(,)'5    0.3390

Thanks a lot RudiC for this nice script. I know it is some days now for this post but wanted to add explanation here so that everybody could take advantage of this nice code snippet.

Code:

awk '
NR == 1         {HD = $1
#### Here we are putting condition NR==1 which means this will be TRUE in very first line of very first file only.
#### Where we are putting $1's value to variable HD. Actually we are creating headings here.
                }
#### FNR==1 is the condition which will be TRUE only when each file's first line will be read, as we all know variable FNR's value 
 #### will be reset each time it reads next file. Then we are using split, which is splitting the current Input_file's name putting it into an array
#### named T whose delimiter is "_". Now putting these values into variable named HD so HD will be something like Id Name E0_f0 at very first file, similarly it will concatenate the values of all Input_files.
FNR == 1        {split (FILENAME, T, "_")
                 HD = HD OFS $3 OFS $4 "_" T[2]
                }
#### creating a variable named IX here whose value is 1 less than FNR, then creating a variable named MAX(which is basically to know how many maximum lines are there in any Input_file)
#### So if MAX's value is already greater than variable IX then no change else replace the MAX's current value with IX's current value as it is greater than MAX.
                {IX  = FNR - 1
                 MAX = IX>MAX?IX:MAX
                }
#### FNR==NR(this condition will be TRUE only when very first file is being read), so creating an array named ID whose index is IX so it will be like...
#### ID[0]=Id, ID[1]=1 and so on.....
#### creating an array named NAME whose index is IX so it's value will be NAME[0]=Name, NAME[1]=N(,)'1 and so on...
FNR == NR       {ID[IX]   = $1
                 NAME[IX] = $3
                }
 #### So checking here condition if $1's value is equal to ID's value whose index is IX and $3's value is equal to NAME's value whose index is IX
#### Then we are creating an array named OUT with index IX and putting $3 and $4's values too to it. next will skip all further statements.
$1 == ID[IX] &&
$3 == NAME[IX]  {OUT[IX]  = OUT[IX] $3 OFS $4 OFS
                 next
                }
#### If above condition is NOT TRUE then it means there was NO match found for current IX and $1 or $3 values then we are adding a OFS space on that place so that it prints NULL(space) there.
                {OUT[IX]  = OUT[IX] OFS OFS
                }
#### So printing here value of HD(which is heading of all files), then going till maximum value of MAX(which is the number of maximum line in a file).
#### printing the values of ID and OUT then.
END             {print HD
                 for (i=1; i<=MAX; i++) print ID[i], OUT[i]
                }
#### Mentioning Output field separator as tab and mentioning all the files which needs to be passed to awk for reading.
' OFS="\t" A_*_pred.txt

NOTE: Above code is for explanation purposes only not for running, one should use actual code for running it and getting the output.
Could take actual code from link https://www.unix.com/302986828-post4.html

Thanks,
R. Singh

These 2 Users Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple commands TAB delimited

Hey guys... Running Solaris 5.6, trying to write an easy /sbin/sh script. I want to run several commands, then have the results appear on one line. Additionally, I want the results to be separated by <TAB>. Let's say that my script calls three commands (date, pwd, and hostname), I would want...

2. Shell Programming and Scripting

Working with Tab-Delimited files

I have a tab-Delimited file: Eg: 'test' file contains: a<tab>b<tab>c<tab>.... Based on certain condition, I wanna increase the number of lines of this file.How do I do that Eg: If some value in the database is 1 then one line in 'test' file is fine.. If some value in the database is 2...

3. Shell Programming and Scripting

merge two text files of different size on common index

I have two text files. text file 1: ID filePath col1 col2 col3 1 10584588.mol 269.126 190.958 23.237 2 10584549.mol 281.001 200.889 27.7414 3 10584511.mol 408.824 158.316 29.8561 4 10584499.mol 245.632 153.241 25.2815 5 10584459.mol ...

4. UNIX for Advanced & Expert Users

merge two tab delimited file with exact same number of rows in unix/linux

Hi I have two tab delimited file with different number of columns but same number of rows. I need to combine these two files in such a way that row 1 in file 2 comes adjacent to row 1 in file 1. For example: The content of file1: field1 field2 field3 a1 a2 a3 b1 b2 b3...

5. Shell Programming and Scripting

script to merge two files on an index

I have a need to merge two files on the value of an index column. input file 1 id filePath MDL_NUMBER 1 MFCD00008104.mol MFCD00008104 2 MFCD00012849.mol MFCD00012849 3 MFCD00037597.mol MFCD00037597 4 MFCD00064558.mol MFCD00064558 5 MFCD00064559.mol MFCD00064559 input file 2 ...

6. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated...

7. Shell Programming and Scripting

Insert a header record (tab delimited) in multiple files

Hi Forum. I'm struggling to find a solution for the following issue. I have multiple files a1.txt, a2.txt, a3.txt, etc. and I would like to insert a tab-delimited header record at the beginning of each of the files. This is my code so far but it's not working as expected. for i in...

8. UNIX for Dummies Questions & Answers

How to sort the 6th field of tab delimited files?

Here's a sample of the data: NAME BIRTHDAY SEX LOCATION AGE ID Jim 05/11/1986 M Japan 27 86 Rei 08/25/1990 F Korea 24 33 Jane 02/24/1985 F India 29 78 I've been trying to sort files using the...

9. UNIX for Beginners Questions & Answers

UNIX - 2 tab delimited files, conditional column extraction

Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice :) I have 2 tab-delimited files - with differing column &...

10. UNIX for Beginners Questions & Answers

Match tab-delimited files based on key

I thought I had this figured out but was wrong so am humbly asking for help. The task is to add an additional column to FILE 1 based on records in FILE 2. The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2. I want to add the third column from FILE 2 to the beginning of...