How to join one file with multiple files in a directory in UNIX?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to join one file with multiple files in a directory in UNIX?
# 8  
Old 04-23-2015
Quote:
Originally Posted by sajmar
As I mentioned, I have 300 files in the folders which are like file2 and they all have the same ending file name which is *.geno . now each of these file have different rows. I want to know how many of the id fields like (gi|358484521|ref|NW_003764373.1|) are present in each of these 300 files.
Your requirement is unclear to me. In the example you gave there is no relation between column 2 and 3 in the output file..

---
*EDIT*
OK I see you have just edited your post #1 and now the requirements are different. Please do not do that, it makes the thread hard to follow. And please try to get your specification right from the start.

So you mean something like this then?
Code:
awk 'NR==FNR{A[$2]=$0; next} $1 in A{print A[$1], $0}' file1 *.geno

or try an xargs approach...

Last edited by Scrutinizer; 04-23-2015 at 03:09 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 9  
Old 04-23-2015
Sorry Scrutinize, if I put you in the wrong way. this time the command works well. But If I need to put it inside loop, this code did not work
Code:
ls *.geno | while read FN;  awk 'NR==FNR{A[$2]=$0; next} $1 in A{print A[$1], $0}' unlocol_accessions $FN > ${FN/geno/geno2}; done

do you have any suggestion for putting your command in loop?
# 10  
Old 04-23-2015
Quote:
Originally Posted by sajmar
Sorry Scrutinize, if I put you in the wrong way. this time the command works well. But If I need to put it inside loop, this code did not work
Code:
ls *.geno | while read FN;  awk 'NR==FNR{A[$2]=$0; next} $1 in A{print A[$1], $0}' unlocol_accessions $FN > ${FN/geno/geno2}; done

do you have any suggestion for putting your command in loop?
Instead of saying "code did not work" it would help everyone reading your thread if you would post the diagnostics that were printed by the shell that explains why it didn't work.

In this case, using correct syntax would seem to fix your problem:
Code:
ls *.geno | while read FN
do      awk '
                NR==FNR{A[$2]=$0; next}
                $1 in A{print A[$1], $0}
        ' unlocol_accessions "$FN" > "${FN/geno/geno2}"
done

or, if you insist on single line code instead of readable code:
Code:
ls *.geno | while read FN;do awk 'NR==FNR{A[$2]=$0; next} $1 in A{print A[$1], $0}' unlocol_accessions "$FN" > "${FN/geno/geno2}"; done

Or, more simply:
Code:
for FN in *.geno
do      awk '
                NR==FNR{A[$2]=$0; next}
                $1 in A{print A[$1], $0}
        ' unlocol_accessions "$FN" > "${FN/geno/geno2}"
done


Last edited by Don Cragun; 04-23-2015 at 04:14 PM.. Reason: Add alternative without pipeline and add quotes.
# 11  
Old 04-24-2015
As I tried to merge the second column of the file1 with first column of file2, I was able to run the command perfectly but I have a problem here. My file1 have 11580 id and file2 have 1805 id. the file2 is the original file and I want to merge file2 with file1 but instead I get 11580 id in my merge file, I will get 1613 common id in my merge file. I need to note in file1, I have id's which repeated more than one time. Does anyone have a suggestion for my problem?
# 12  
Old 04-24-2015
You lost me.

Please show us a sample set of representative input files and the output (or outputs) that should be produced from those input files.
# 13  
Old 04-24-2015
This is file1 (11580 id)
Code:
gi|358468608|ref|NW_003780270.1| 3935 T C 0 1 1
gi|358468608|ref|NW_003780270.1| 4071 C T 0 1 1
gi|358468608|ref|NW_003780270.1| 4110 C T 0 1 1
gi|358468608|ref|NW_003780270.1| 4377 C G 1 1 2
gi|358468608|ref|NW_003780270.1| 4387 C A 0 1 1
gi|358468608|ref|NW_003780270.1| 4476 A G 1 1 2
gi|358468610|ref|NW_003780268.1| 2707 G A 0 1 1
gi|358468610|ref|NW_003780268.1| 3290 C T 0 1 1
gi|358468610|ref|NW_003780268.1| 5909 A G 0 1 1
gi|358468610|ref|NW_003780268.1| 5950 G A 0 1 1
gi|358468610|ref|NW_003780268.1| 6085 T A 0 1 1
gi|358468624|ref|NW_003780254.1| 392 T C 1 1 2
gi|358468624|ref|NW_003780254.1| 600 A G 0 1 1
gi|358468624|ref|NW_003780254.1| 924 C T 0 1 1
gi|358468624|ref|NW_003780254.1| 972 A G 0 1 1
gi|358468629|ref|NW_003780249.1| 681 A C 0 1 1
gi|358468631|ref|NW_003780247.1| 138 A G 1 1 2
gi|358468631|ref|NW_003780247.1| 327 T G 0 1 1
gi|358468631|ref|NW_003780247.1| 511 A T 0 1 1
gi|358468631|ref|NW_003780247.1| 513 C G 0 1 1
gi|358468633|ref|NW_003780245.1| 1076 T C 0 1 1
gi|358468633|ref|NW_003780245.1| 348 T C 0 1 1
gi|358468633|ref|NW_003780245.1| 460 A G 1 1 2
gi|358468633|ref|NW_003780245.1| 591 G C 0 1 1
.
.
.
gi|358484429|ref|NW_003764465.1| 927 T G 0 1 1
gi|358484430|ref|NW_003764464.1| 366 G A 1 1 2
gi|358484430|ref|NW_003764464.1| 662 C G 0 1 1
gi|358484430|ref|NW_003764464.1| 664 C G 0 1 1
gi|358484430|ref|NW_003764464.1| 709 A G 0 1 1
gi|358484430|ref|NW_003764464.1| 782 T C 0 1 1
gi|358484431|ref|NW_003764463.1| 1295 A G 1 1 2
gi|358484431|ref|NW_003764463.1| 1868 G A 0 1 1
gi|358484431|ref|NW_003764463.1| 1921 G A 0 1 1
gi|358484431|ref|NW_003764463.1| 1980 A G 0 1 1
gi|358484431|ref|NW_003764463.1| 2003 T C 1 1 2
gi|358484431|ref|NW_003764463.1| 3595 T C 0 1 1
gi|358484431|ref|NW_003764463.1| 607 A G 0 1 1
gi|358484431|ref|NW_003764463.1| 686 C G 0 1 1
gi|358484431|ref|NW_003764463.1| 844 C G 0 1 1
gi|358484432|ref|NW_003764462.1| 541 A G 0 1 1
gi|358484451|ref|NW_003764443.1| 1126 G A 0 1 1
gi|358484451|ref|NW_003764443.1| 988 T C 0 1 1
gi|358484469|ref|NW_003764425.1| 785 G T 0 1 1
gi|358484470|ref|NW_003764424.1| 440 C T 1 1 2
gi|358484470|ref|NW_003764424.1| 735 T A 0 1 1
gi|358484471|ref|NW_003764423.1| 551 G T 1 1 2
gi|358484498|ref|NW_003764396.1| 2503 A G 0 1 1
gi|358484498|ref|NW_003764396.1| 381 A G 1 1 2
gi|358484513|ref|NW_003764381.1| 1351 T C 0 1 1

This is file2 (1805 id)
Code:
1       gi|358484521|ref|NW_003764373.1|
1       gi|358484520|ref|NW_003764374.1|
1       gi|358484519|ref|NW_003764375.1|
1       gi|358484518|ref|NW_003764376.1|
1       gi|358484517|ref|NW_003764377.1|
1       gi|358484516|ref|NW_003764378.1|
1       gi|358484515|ref|NW_003764379.1|
1       gi|358484514|ref|NW_003764380.1|
1       gi|358484513|ref|NW_003764381.1|
1       gi|358484512|ref|NW_003764382.1|
1       gi|358484511|ref|NW_003764383.1|
1       gi|358484510|ref|NW_003764384.1|
1       gi|358484509|ref|NW_003764385.1|
1       gi|358484508|ref|NW_003764386.1|
1       gi|358484507|ref|NW_003764387.1|
1       gi|358484506|ref|NW_003764388.1|
1       gi|358484505|ref|NW_003764389.1|
1       gi|358484504|ref|NW_003764390.1|
1       gi|358484503|ref|NW_003764391.1|
1       gi|358484502|ref|NW_003764392.1|
1       gi|358484501|ref|NW_003764393.1|
1       gi|358484500|ref|NW_003764394.1|
1       gi|358484499|ref|NW_003764395.1|
1       gi|358484498|ref|NW_003764396.1|
1       gi|358484497|ref|NW_003764397.1|
1       gi|358484496|ref|NW_003764398.1|
. 
.
.
LGE64   gi|358482732|ref|NW_003766162.1|
LGE64   gi|358482731|ref|NW_003766163.1|
LGE64   gi|358482730|ref|NW_003766164.1|
LGE64   gi|358482729|ref|NW_003766165.1|
LGE64   gi|358482728|ref|NW_003766166.1|
LGE64   gi|358482727|ref|NW_003766167.1|
LGE64   gi|358482726|ref|NW_003766168.1|
LGE64   gi|358482725|ref|NW_003766169.1|
LGE64   gi|358482724|ref|NW_003766170.1|
LGE64   gi|358482723|ref|NW_003766171.1|
LGE64   gi|358482722|ref|NW_003766172.1|
LGE64   gi|358482721|ref|NW_003766173.1|
LGE64   gi|358482720|ref|NW_003766174.1|
LGE64   gi|358482719|ref|NW_003766175.1|
LGE64   gi|358482718|ref|NW_003766176.1|
LGE64   gi|358482717|ref|NW_003766177.1|

Now, my original file is file2. I want to bring 2 column of file2 in front of file1. Notice that file1 have repeated id in its file.
# 14  
Old 04-24-2015
This was vice versa in post #1. Looking at these files, there's no match between file1's field 2 and file2's field one.

Please exercise way more care when specfying your problem.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join files on multiple fields

Hello all, I want to join 2 tabbed files on the first 2 fields, and filling the missing values with 0. The 3rd column in each file is constant for the entire file. file1 12658699 ST5 XX2720 0 1 0 1 53039541 ST5 XX2720 1 0 1.5 1 file2 ... (6 Replies)
Discussion started by: sheetalk
6 Replies

2. Shell Programming and Scripting

Comparing two files in UNIX and create a new file similar to equi join

I have 2 files namely branch.txt file & RXD.txt file as below Ex:Branch.txt ========================= B1,Branchname1,city,country B2,Branchname2,city,country B3,Branchname3,city,country B4,Branchname4,city,country B5,Branchname5,city,country RXD file : will... (11 Replies)
Discussion started by: satece
11 Replies

3. Shell Programming and Scripting

Join multiple files with filename

Please help, I want to join multiple files based on column 1, and put the missing values as 0. Also the colname in the output should say which file the values came from. FILE1 1 11 2 12 3 13 FILE2 2 22 3 23 4 24 FILE3 1 31 3 33 4 34 FILE1 FILE2 FILE3 1 11 0 31 (1 Reply)
Discussion started by: newbie83
1 Replies

4. Shell Programming and Scripting

Join multiple files

Hi there, I am trying to join 24 files (i showed example of 3 files below). They all have 2 columns. The first columns is common to all. The files are tab delimited eg file 1 rs0001 100e-34 rs0003 2.8e-01 rs008 1.9e-90 file 2 rs0001 1.98e-22 rs0004 3.77e-10... (4 Replies)
Discussion started by: fat
4 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. Shell Programming and Scripting

Awk - join multiple files

Is it possible to join all the files with input1 based on 1st column? input1 a b c d e f input2 a b input3 a e input4 c (2 Replies)
Discussion started by: quincyjones
2 Replies

7. Shell Programming and Scripting

Join multiple files by column with awk

Hi all, I searched through the forum but i can't manage to find a solution. I need to join a set of files placed in a directory (~1600) by column, and obtain an output with first and second column common to each file, but following columns are taken from the file in the list (precisely the fourth... (10 Replies)
Discussion started by: macsx82
10 Replies

8. Shell Programming and Scripting

How to join multiple files?

I am trying to join a few hundred files using join. Is there a way to use while read or something else to automate this. My problem is the following. Day 1 City Temp ABC 20 DEF 30 HIJ 15 Day 2 City Temp ABC 22 DEF 29 KLM 5 Day 3 (3 Replies)
Discussion started by: theFinn
3 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

10. UNIX for Dummies Questions & Answers

Join multiple Split files in Unix

Hi, I have a big file of 50GB size. I need copy it to a second ftp from a ftp. I am not able to do the full 50GB transfer as it timesout after some time. SO i am trying to split the file into 5gb each 10 files with the below command. split -b 5368709120 pack.tar.gz backup.gz After I... (2 Replies)
Discussion started by: venu_nbk
2 Replies
Login or Register to Ask a Question