Sponsored Content
Top Forums Shell Programming and Scripting Merge multiple tab delimited files with index checking Post 302988715 by RavinderSingh13 on Sunday 1st of January 2017 01:40:17 AM
Old 01-01-2017
Quote:
Originally Posted by RudiC
Try this - very specific to your problem, not as versatile and flexible as Chubler_XL's script - little awk proposal:
Code:
awk '
NR == 1         {HD = $1
                }
FNR == 1        {split (FILENAME, T, "_")
                 HD = HD OFS $3 OFS $4 "_" T[2]
                }

                {IX  = FNR - 1
                 MAX = IX>MAX?IX:MAX 
                }

FNR == NR       {ID[IX]   = $1
                 NAME[IX] = $3
                }
$1 == ID[IX] &&
$3 == NAME[IX]  {OUT[IX]  = OUT[IX] $3 OFS $4 OFS
                 next
                }

                {OUT[IX]  = OUT[IX] OFS OFS
                }

END             {print HD
                 for (i=1; i<=MAX; i++) print ID[i], OUT[i]
                }
' OFS="\t" A_*_pred.txt
Id    Name    E0_f0    Name    E0_f1    Name    E0_f3
1    N(,)'1    0.2904    N(,)'1    0.2916    N(,)'1    0.2581    
2    N(,)'2    0.3180    N(,)'2    0.3123    N(,)'2    0.2903    
3    N(,)'3    0.3277    N(,)'3    0.3234    N(,)'3    0.2988    
4    N(,)'4    0.3675    N(,)'4    0.3475    N(,)'4    0.3496    
5    N(,)'5    0.3456    N(,)'5    0.3294    N(,)'5    0.3390

Thanks a lot RudiC for this nice script. I know it is some days now for this post but wanted to add explanation here so that everybody could take advantage of this nice code snippet.
Code:
awk '
NR == 1         {HD = $1
#### Here we are putting condition NR==1 which means this will be TRUE in very first line of very first file only.
#### Where we are putting $1's value to variable HD. Actually we are creating headings here.
                }
#### FNR==1 is the condition which will be TRUE only when each file's first line will be read, as we all know variable FNR's value 
 #### will be reset each time it reads next file. Then we are using split, which is splitting the current Input_file's name putting it into an array
#### named T whose delimiter is "_". Now putting these values into variable named HD so HD will be something like Id Name E0_f0 at very first file, similarly it will concatenate the values of all Input_files.
FNR == 1        {split (FILENAME, T, "_")
                 HD = HD OFS $3 OFS $4 "_" T[2]
                }
#### creating a variable named IX here whose value is 1 less than FNR, then creating a variable named MAX(which is basically to know how many maximum lines are there in any Input_file)
#### So if MAX's value is already greater than variable IX then no change else replace the MAX's current value with IX's current value as it is greater than MAX.
                {IX  = FNR - 1
                 MAX = IX>MAX?IX:MAX
                }
#### FNR==NR(this condition will be TRUE only when very first file is being read), so creating an array named ID whose index is IX so it will be like...
#### ID[0]=Id, ID[1]=1 and so on.....
#### creating an array named NAME whose index is IX so it's value will be NAME[0]=Name, NAME[1]=N(,)'1 and so on...
FNR == NR       {ID[IX]   = $1
                 NAME[IX] = $3
                }
 #### So checking here condition if $1's value is equal to ID's value whose index is IX and $3's value is equal to NAME's value whose index is IX
#### Then we are creating an array named OUT with index IX and putting $3 and $4's values too to it. next will skip all further statements.
$1 == ID[IX] &&
$3 == NAME[IX]  {OUT[IX]  = OUT[IX] $3 OFS $4 OFS
                 next
                }
#### If above condition is NOT TRUE then it means there was NO match found for current IX and $1 or $3 values then we are adding a OFS space on that place so that it prints NULL(space) there.
                {OUT[IX]  = OUT[IX] OFS OFS
                }
#### So printing here value of HD(which is heading of all files), then going till maximum value of MAX(which is the number of maximum line in a file).
#### printing the values of ID and OUT then.
END             {print HD
                 for (i=1; i<=MAX; i++) print ID[i], OUT[i]
                }
#### Mentioning Output field separator as tab and mentioning all the files which needs to be passed to awk for reading.
' OFS="\t" A_*_pred.txt

NOTE: Above code is for explanation purposes only not for running, one should use actual code for running it and getting the output.
Could take actual code from link https://www.unix.com/302986828-post4.html

Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple commands TAB delimited

Hey guys... Running Solaris 5.6, trying to write an easy /sbin/sh script. I want to run several commands, then have the results appear on one line. Additionally, I want the results to be separated by <TAB>. Let's say that my script calls three commands (date, pwd, and hostname), I would want... (2 Replies)
Discussion started by: cdunavent
2 Replies

2. Shell Programming and Scripting

Working with Tab-Delimited files

I have a tab-Delimited file: Eg: 'test' file contains: a<tab>b<tab>c<tab>.... Based on certain condition, I wanna increase the number of lines of this file.How do I do that Eg: If some value in the database is 1 then one line in 'test' file is fine.. If some value in the database is 2... (1 Reply)
Discussion started by: shiroh_1982
1 Replies

3. Shell Programming and Scripting

merge two text files of different size on common index

I have two text files. text file 1: ID filePath col1 col2 col3 1 10584588.mol 269.126 190.958 23.237 2 10584549.mol 281.001 200.889 27.7414 3 10584511.mol 408.824 158.316 29.8561 4 10584499.mol 245.632 153.241 25.2815 5 10584459.mol ... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

4. UNIX for Advanced & Expert Users

merge two tab delimited file with exact same number of rows in unix/linux

Hi I have two tab delimited file with different number of columns but same number of rows. I need to combine these two files in such a way that row 1 in file 2 comes adjacent to row 1 in file 1. For example: The content of file1: field1 field2 field3 a1 a2 a3 b1 b2 b3... (2 Replies)
Discussion started by: mary271
2 Replies

5. Shell Programming and Scripting

script to merge two files on an index

I have a need to merge two files on the value of an index column. input file 1 id filePath MDL_NUMBER 1 MFCD00008104.mol MFCD00008104 2 MFCD00012849.mol MFCD00012849 3 MFCD00037597.mol MFCD00037597 4 MFCD00064558.mol MFCD00064558 5 MFCD00064559.mol MFCD00064559 input file 2 ... (9 Replies)
Discussion started by: LMHmedchem
9 Replies

6. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

7. Shell Programming and Scripting

Insert a header record (tab delimited) in multiple files

Hi Forum. I'm struggling to find a solution for the following issue. I have multiple files a1.txt, a2.txt, a3.txt, etc. and I would like to insert a tab-delimited header record at the beginning of each of the files. This is my code so far but it's not working as expected. for i in... (2 Replies)
Discussion started by: pchang
2 Replies

8. UNIX for Dummies Questions & Answers

How to sort the 6th field of tab delimited files?

Here's a sample of the data: NAME BIRTHDAY SEX LOCATION AGE ID Jim 05/11/1986 M Japan 27 86 Rei 08/25/1990 F Korea 24 33 Jane 02/24/1985 F India 29 78 I've been trying to sort files using the... (8 Replies)
Discussion started by: maihani
8 Replies

9. UNIX for Beginners Questions & Answers

UNIX - 2 tab delimited files, conditional column extraction

Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice :) I have 2 tab-delimited files - with differing column &... (10 Replies)
Discussion started by: GTed
10 Replies

10. UNIX for Beginners Questions & Answers

Match tab-delimited files based on key

I thought I had this figured out but was wrong so am humbly asking for help. The task is to add an additional column to FILE 1 based on records in FILE 2. The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2. I want to add the third column from FILE 2 to the beginning of... (8 Replies)
Discussion started by: andmal
8 Replies
All times are GMT -4. The time now is 01:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy