I thought I had this figured out but was wrong so am humbly asking for help.
The task is to add an additional column to FILE 1 based on records in FILE 2.
The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2.
I want to add the third column from FILE 2 to the beginning of FILE 1 so that the new FILE shows for example:
DESIRED FILE 3
Code:
1:13109 G_T t g -0.4127 0.1042 7.52e-05 ?---??????-? rs540538026
FILE 1
Code:
1:1057989 G_T t g 0.3000 0.0662 5.909e-06 ??++++++???+
1:11007 C_T t c 0.2874 0.0710 5.19e-05 ?????+++???+
1:2190612 A_G a g 1.1252 0.2605 1.561e-05 ???????????+
1:13109 G_T t g -0.4127 0.1042 7.52e-05 ?---??????-?
1:3674534 G_T t g -0.4187 0.1073 9.559e-05 ?---??????-?
1:6932407 A_G a g 1.4977 0.3322 6.535e-06 ???????????+
1:6938780 C_T t c -1.3632 0.3274 3.135e-05 ???????????-
1:7171050 A_G a g 0.0537 0.0134 6.091e-05 ?+++?-++++++
1:8960594 C_T t c -0.9273 0.2319 6.344e-05 ???????????-
1:12203508 C_T t c -1.4228 0.3469 4.111e-05 ???????????-
Thank you -this is closer to a solution than I've been in several days.
I tried out the script and did some manual sanity checks.
The code correctly identifies both column 1 and column 2 values in FILE 2. However, it only seems to add the column 3 value in FILE 2 if the matched value in FILE 2 was found in column 1.
I am wondering if this part of the code needs to be modified? Does $1 in F22 refer to the first column in the created matrix?
Code:
$1 in f21 { print $0, f21[$1];next }
$1 in f22 { print $0, f21[$1] }
If there is no match in any of the columns the row is eliminated from the output, which actually isn't much of a problem though.
And now it works like a charm and produced exactly the output I was looking for. Thanks !!
---------- Post updated at 01:01 PM ---------- Previous update was at 12:50 PM ----------
The main question is solved thanks to vgersh99. I have a bonus question, if I would like to run this awk line on multiple FILES 1 using the same reference FILE2, would something along these lines do the trick?
Code:
for i in *.txt ; do
awk '
FNR==NR {
f21[$1]=$3
f22[$2]=$3
next
}
$1 in f21 { print $0, f21[$1];next }
$1 in f22 { print $0, f22[$1] }
' FILE2.txt $i
$i > $i.pruned
done
With a VERY sloppy interpretation of "along these lines" you might come close to the desired result, once you corrected the syntax / redirection error in the before-last line, and accepted the higher resource cost as you run the script multiple times.
Why not sth. along THIS line :
Code:
awk '
FNR==NR {f21[$1]=$3
f22[$2]=$3
next
}
$1 in f21 {print $0, f21[$1] > (FILENAME ".pruned")
next
}
$1 in f22 {print $0, f22[$1] > (FILENAME ".pruned")
}
' file2.txt file[^2].txt
-I suspected the suggested code was sloppy -I'm a newbie.
As I understand, this part of your suggestions tells to take the column 1 of the f21 table. Then add the .pruned extension to the stdout file? Or does it process all files with the .pruned extension?
Code:
$1 in f21 {print $0, f21[$1] > (FILENAME ".pruned") next
To be more clear, I have 400 of FILE 1 that should be matched to the FILE 2 table, of which there is only 1. The filename looks as in the below example. I would like to match all of the below FILE1 without having run them each at a time. They all have the same file extension. The resulting files should get an additional extension .pruned.
Code:
FILE1_VEGF.tbl.filtered.tab
FILE1_TL1A.tbl.filtered.tab
FILE1_MMP13.tbl.filtered.tab
FILE1_KYNUR.tbl.filtered.tab
+398 more files
I also don't understand this part
Code:
FILE2 FILE1[^2].txt
Does the ^ mean that the files are combined?
Is it possible to use wildcard definition e.g. *.tab to process many different versions of FILE1?
Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice :)
I have 2 tab-delimited files - with differing column &... (10 Replies)
Hello,
I have 40 data files where the first three columns are the same (in theory) and the 4th column is different. Here is an example of three files,
file 2: A_f0_r179_pred.txt
Id Group Name E0
1 V N(,)'1 0.2904
2 V N(,)'2 0.3180
3 V N(,)'3 0.3277
4 V N(,)'4 0.3675
5 V N(,)'5 0.3456
... (8 Replies)
Hi, I have a rquirement in unix as below .
I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column.
ex:
Input Text file:
1|A|apple
2|B|bottle
excel file to be generated as output as... (9 Replies)
Here's a sample of the data:
NAME BIRTHDAY SEX LOCATION AGE ID
Jim 05/11/1986 M Japan 27 86
Rei 08/25/1990 F Korea 24 33
Jane 02/24/1985 F India 29 78
I've been trying to sort files using the... (8 Replies)
Hi Forum.
I'm struggling to find a solution for the following issue.
I have multiple files a1.txt, a2.txt, a3.txt, etc. and I would like to insert a tab-delimited header record at the beginning of each of the files.
This is my code so far but it's not working as expected.
for i in... (2 Replies)
Hi Forum
I have a tab delimited file that opens well in Openoffice calc (excel). But when I perform any operation in command line, it reads the file incorrectly. When I 'save As' the same file in office as tab delimited then it works fine.
The file that I think is tab delimited is actually... (8 Replies)
I have a folder that contains say 50 files in a sequential order:
cdf_1.txt
cdf_2.txt
cdf_3.txt
cdf_3.txt
.
.
.
cdf_50.txt.
I need to merge these files in the same order into a single tab delimited file.
I used the following shell script:
for x in {1..50};
do cat cdf_${x}.txt >>... (3 Replies)
I have a tab-Delimited file:
Eg:
'test' file contains:
a<tab>b<tab>c<tab>....
Based on certain condition, I wanna increase the number of lines of this file.How do I do that
Eg:
If some value in the database is 1 then one line in 'test' file is fine..
If some value in the database is 2... (1 Reply)