Grep solutions tab-delimited file

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Grep solutions tab-delimited file
# 1  
Old 04-26-2018
Grep solutions tab-delimited file

Hello, I am trying to find a solution to problem that's proving to be beyond my newbie skills. The below files comes from a genetics study. File 1 describes a position on the genome and file 2 does the same but is formatted differently and has more information. I am trying to match all lines in file 1 with the right line in file 2, and then print the full line from file 2. The first column in file 1 corresponds to the first and third column in file 2. File 1 has a colon as separator whereas file 2 has tab.

I tried to come up with a grep solution that would not require reformatting of files. I tried to come up with some easy manipulation of File 1 or file 2 to make the matching easier. Both failed. Any suggestions would be more than welcome.

Edit: Sorry if I made the first example look confusing. There are matches. Please consider the below example instead:

File 1
Code:
chr1:17373
chr1:17375
chr1:17398
chr1:17407

~100,000 rows


File 2
Code:
chr1    17372    17373    rs750111615
chr1    17374    17375    rs755771866
chr1    17378    17379    rs754322362
chr1    17384    17385    rs201535981
chr1    17395    17398    rs200784459
chr1    17405    17406    rs772228657
chr1    17405    17407    rs372841554

~15 M rows

The result I need is

Code:
chr1:17373 rs750111615
chr1:17375    rs755771866
chr1:17398    rs200784459
chr1:17407    rs372841554

Again, help would be much appreciated.

Last edited by andmal; 04-29-2018 at 04:27 PM..
# 2  
Old 04-26-2018
hmmm..... this is a bit confusing:
Code:
The first column in file 1 corresponds to the first and third column in file 2

.
I don't see the correlation in your sample files...
I don't follow - can you elaborate?
And maybe provide a desired result based on the samples provided.
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 04-27-2018
Is there any match in your samples?
# 4  
Old 04-28-2018
Sorry if I made it look confusing. There are matches. PLease consider the below example instead:

File 1
Code:
chr1:17373
chr1:17375
chr1:17398
chr1:17407

~100,000 rows


File 2
Code:
chr1    17372    17373    rs750111615
chr1    17374    17375    rs755771866
chr1    17378    17379    rs754322362
chr1    17384    17385    rs201535981
chr1    17395    17398    rs200784459
chr1    17405    17406    rs772228657
chr1    17405    17407    rs372841554

~15 M rows

The result I need is

Code:
chr1:17373 rs750111615
chr1:17375    rs755771866
chr1:17398    rs200784459
chr1:17407    rs372841554

Again, help would be much appreciated.


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 04-28-2018 at 07:23 PM.. Reason: Added CODE tags.
# 5  
Old 04-28-2018
Still not unambiguous. Try
Code:
awk '
NR == FNR       {T[$2] = $0
                 next
                }
($2 in T)        {print T[$2], $NF
                }
($3 in T)        {print T[$3], $NF
                }
' FS=: file1 FS="\t" file2
chr1:17373 rs750111615
chr1:17375 rs755771866
chr1:17398 rs200784459
chr1:17407 rs372841554

If you want the range $2 - $3 checked, it would become more complex and time consuming for that large files.
This User Gave Thanks to RudiC For This Post:
# 6  
Old 04-29-2018
Thanks -this produces exactly what I was looking for. Next, I'll try and understand the code and customize to other similar data -you've given me a great starting point.
Example output below:

Code:
1:40370176:G_GT    g    gt    -0.0103    0.0222    0.6434    ???????+??+- rs564192510
19:4197562:A_AGAAT    a    agaat    0.1321    0.0121    1.019e-27    ?++++??+++-+ rs79305507
19:4210401:C_CA    ca    c    0.2934    0.0137    3.00e-101    ?++++??+++++ rs6683453
19:4259184:T_TTC    t    ttc    0.5397    0.0186    3.27e-185    ++++????++?? rs12092368
19:4268341:A_G    a    g    -0.6392    0.0106    1.95e-798    ----?------- rs12090408
4:38765867:A_G    a    g    -0.0641    0.0116    3.516e-08    ------------ rs1778050

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Tab Delimited file in loop

Hi, I have requirement to create tab delimited file with values coming from variables. File will contain only two columns separated by tab. Header will be added once. Values will be keep adding upon the script run. If values already exists then values will be replaced. I have done so... (1 Reply)
Discussion started by: sukhdip
1 Replies

3. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

4. UNIX for Dummies Questions & Answers

Need help with tab delimited file in unix

Hi, I need urgent help with a tab delimited file I am working on. This is the file : TTTT|YYYYYYY|jargon-journal|MP0000000UID||"j1, j2, j3" I need th following output: TTTT|YYYYYYY|jargon-journal|MP0000000UID||ji TTTT|YYYYYYY|jargon-journal|MP0000000UID||j2... (8 Replies)
Discussion started by: rayarnab
8 Replies

5. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks. (1 Reply)
Discussion started by: jagdishrout
1 Replies

6. Shell Programming and Scripting

Help with converting Pipe delimited file to Tab Delimited

I have a file which was pipe delimited, I need to make it tab delimited. I tried with sed but no use cat file | sed 's/|//t/g' The above command substituted "/t" not tab in the place of pipe. Sample file: abc|123|2012-01-30|2012-04-28|xyz have to convert to: abc 123... (6 Replies)
Discussion started by: karumudi7
6 Replies

7. UNIX for Dummies Questions & Answers

tab delimited file that is not tab delimited.

Hi Forum I have a tab delimited file that opens well in Openoffice calc (excel). But when I perform any operation in command line, it reads the file incorrectly. When I 'save As' the same file in office as tab delimited then it works fine. The file that I think is tab delimited is actually... (8 Replies)
Discussion started by: imlearning
8 Replies

8. UNIX for Dummies Questions & Answers

100 $1's to new tab delimited file

Hi I have 100 files each with only one column of 10 numbers that I wish to add to a new file so that I get 100 columns collected in one tab delimited file. I tried something like: foreach num (1 2 3) foreach? gawk -F '\t' '{$num=$1}1' OFS='\t' Eu9_10.2patienter/pospep_10.2patient$num >>... (5 Replies)
Discussion started by: Banni
5 Replies

9. UNIX for Dummies Questions & Answers

Converting Space delimited file to Tab delimited file

Hi all, I have a file with single white space delimited values, I want to convert them to a tab delimited file. I tried sed, tr ... but nothing is working. Thanks, Rajeevan D (16 Replies)
Discussion started by: jeevs81
16 Replies

10. Shell Programming and Scripting

Converting Tab delimited file to Comma delimited file in Unix

Hi, Can anyone let me know on how to convert a Tab delimited file to Comma delimited file in Unix Thanks!! (22 Replies)
Discussion started by: charan81
22 Replies
Login or Register to Ask a Question