Sponsored Content
Top Forums UNIX for Beginners Questions & Answers UNIX - 2 tab delimited files, conditional column extraction Post 303015013 by GTed on Monday 26th of March 2018 06:06:00 AM
Old 03-26-2018
Quote:
Originally Posted by RudiC
THAT's the right spirit that we after in these fora!

Here you go; further questions welcome (after having read the man page); have fun:

Code:
awk -F"\t" '                                                                    # start awk and define the field separator
NR == FNR       {INT[$1] = INT[$1] $2 "-" $3 FS                                 # for the first file, identified by total record No.
                                                                                # being equal to the file's NR, save intervals to an
                                                                                # array indexed by $1 as a list of L-R L-R L-R etc.
                 next                                                           # stop processing this line, start over with  next
                }
                                                                                # this is processed for second file only
                {split (INT[$1], T)                                             # split the interval list into individual L-R into
                                                                                # temp array T
                 OUT = "NA"                                                     # predefine OUT should no match be found
                 for (t in T)   {split (T[t], LM, "-")                          # loop across all individual L-R entries, split each 
                                                                                # one into limits array, with LM[1] holding L(eft)    
                                                                                # and LM[2] the R(ight) border
                                 if ($2 >= LM[1] && $2 < LM[2]) OUT = $4        # if $2 fits between limits, set OUT to $4
                                }
                 print OUT                                                      # and print it
                }
' file1 file2                                                                   # specify input files

Hugely appreciate the time you've taken to help me out. I'll now take sometime to break this down, read around, and hopefully digest Smilie

It runs in about 3 hours on the 'real' dataset.

You're a legend Smilie
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete first column in tab-delimited text-file

I have a large text-file with tab-delimited genetic data that looks like: KSC112 KSC234 0 0 1 1 A G C T I simply wan to delete the first column, but since the file has 600 000 columns, it is not possible with awk (seems to be limited at 32k columns). Does anyone have an idea how to do this? (2 Replies)
Discussion started by: andmal
2 Replies

2. UNIX for Dummies Questions & Answers

Add a new column to a tab delimited text file

I want to add a new column to a tab delimited text file. It will be the first column and it will just be 1's. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

3. Shell Programming and Scripting

Using sed on 1st column of tab delimited file

Hi all, I'm new to Unix and work primarily in bioinformatics. I am in need of a script which will allow me to replace "1" with "chr1" in only the first column of a file which looks like such: 1 10327 rs112750067 T C . PASS ASP;RSPOS=10327;... (4 Replies)
Discussion started by: Hkins552
4 Replies

4. UNIX for Dummies Questions & Answers

Using awk to log transform a column in a tab-delimited text file?

How do I use awk to log transform the fifth column of a tab-delimited text file? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

5. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

6. UNIX for Dummies Questions & Answers

add (append) a column in a tab delimited file

I have a file having the following entries: test1 test2 test3 11 22 33 22 44 66 99 99 44 --- I want to add a column so that the above file becomes: test1 test2 test3 notest 11 22 33 * 22 44 66 * 99 99 44 * --- Thanks (6 Replies)
Discussion started by: mary271
6 Replies

7. Shell Programming and Scripting

Convert a 3 column tab delimited file to a matrix

Hi all, I have a 3 columns input file like this: CPLX9PC-4943 CPLX9PC-4943 1 CPLX9PC-4943 CpxID123 0 CPLX9PC-4943 CpxID126 0 CPLX9PC-4943 CPLX9PC-5763 0.5 CPLX9PC-4943 CpxID13 0 CPLX9PC-4943 CPLX9PC-6163 0 CPLX9PC-4943 CPLX9PC-6164 0.04... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

8. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies

9. UNIX for Dummies Questions & Answers

awk - Extract 4 lines in Column to Rows Tab Delimited between tags

I have tried the following to no avail. xargs -n8 < test.txt awk '{if(NR%6!=0){p=""}else{p="\n"};printf $0" "p}' Mod_Alm_log.txt > test.txt I have tried different variations of the above, the problem is mixes lines together. And it includes the tags "%a and %A" I need them to be all tab... (16 Replies)
Discussion started by: mytouchsr
16 Replies

10. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies
SHAR(1net)							  Wang Institute							SHAR(1net)

NAME
shar - create file storage archive for extraction by /bin/sh SYNOPSIS
shar [-abcmsuv] [-p prefix] [-d delim] files > archive DESCRIPTION
shar prints its input files with special command lines around them to be used by the shell, /bin/sh , to extract the files later. The out- put can be filtered through the shell to recreate copies of the original files. shar allows directories to be named, and shar prints the necessary commands (mkdir & cd) to create new directories and fill them. shar will not allow existing files to be over-written; such files must be removed by the user extracting the files. OPTIONS
-a All the options. The options: -v -c -b -p <tab>X are implied. -b Extract files into basenames so that files with absolute path names are put into the current directory. This option has strange effects when directories are archived. -c Check file size on extraction by counting characters. An error message is reported to the person doing the extraction if the sizes don't match. One reason why the sizes may not match is that shar will append a newline to complete incomplete last lines; shar prints a message that mentions added newlines. Another reason why the sizes may not match is that some network mail programs remove non-whitespace control characters. shar prints a message that mentions control characters to the extractor. -d Use this as the ``end of file'' delimiter instead of the default. The only reason to change it is if you suspect a file contains the default delimiter: SHAR_EOF. -m Reset the exact protection modes of files when they are extracted (using the chmod program). By default, the extractor's default file modes are used, and executable files (e.g., shell scripts) are made executable. -p Use this as the prefix to each line of the archived files. This is to make sure that special characters at the start of lines are not eaten up by programs like mailers. If this option is used, the files will be extracted with the stream editor sed rather than cat so it is more efficient and portable to avoid setting the prefix, though perhaps less safe if you don't know what is in the files. -s Silent running. All checking and extra output is inhibited. -u Archive the input files with the uuencode format for later extraction with uudecode. This will allow you to send files with control characters in them, but will slow down the extracting. You must be sure that the receiving party has access to uudecode. -v Print verbose feedback messages about what shar is doing to be printed during extraction. Sizes of plain files are echoed to allow a simple validity check. SEE ALSO
sh(1), tar(1), cpio(1), tp(1), uuencode(1), uudecode(1) fpack(1) is a plain-file packer useful for UNIX and MSDOS AUTHOR
Gary Perlman (based on a shell version by James Gosling, with additions motivated by many people on the UNIX network: Derek Zahn, Michael Thompson, H. Morrow Long, Fred Avolio, Gran Uddeborg, Chuck Wegrzyn, nucleus!randy@TORONTO, & Bill McKeeman) LIMITATIONS
shar does not know anything about links between files. UNIX User's Manual March 4, 1986 SHAR(1net)
All times are GMT -4. The time now is 06:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy