08-19-2011
merging files and adding special columns
Hi everyone,
I got a problem with merging files and hoped one of you would have an idea how to approach this issue. I tried it with awk, but didn't get far. This is what I have:
I got 40 files looking like the ones below. All have three columns but the number of rows differs (20000 to 50000).
eg. file1
chromosome position_on_chromosome file1 |
chr1 62138 x |
chr1 631246 x |
chr1 1238847 x |
chr1 1238854 x |
....
eg. file2
chromosome position_on_chromosome file2 |
chr1 238398 x |
chr1 533005 x |
chr1 631246 x |
chr1 657484 x |
chr1 1281185 x |
chr1 1448761 x |
....
I would now need to merge them according to their genome coordinates (ie 'chromosome' and 'position_on_chromosome' -both infos together give the coordinates). All coordinates (column 1 & 2) should be listed, if present in one file or in all files (=complete list). The third columns of the original files should be added after each other.
This is how it should look like:
chromosome position_on_chromosome file1 file2 (and all other files 'file3' 'file4' etc) |
chr1 62138 x e |
chr1 238398 e x |
chr1 533005 e x |
chr1 631246 x x |
chr1 657484 e x |
chr1 1238847 x e |
chr1 1238854 x e |
chr1 1281185 e x |
chr1 1448761 e x |
.....
A bit complicated to explain, but I hope you got what I mean
Any help would be greatly appreciated!
Edit note: ...just saw now, that it doesn't leave the space in the output table for those 'x' which are empty. Replaced the space (empty cell in the table) with a 'e' for clarification.
Last edited by TuAd; 08-19-2011 at 05:08 PM..
Reason: ...had some pasting issues, so corrected issues
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
hi
i need to select a few columns of two txt files and write it to a new file. there is one common field for both of these files.
plz help me in this
thanks in advance (4 Replies)
Discussion started by: kolvi
4 Replies
2. UNIX for Dummies Questions & Answers
Hello!
I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like:
fiile1:
a, 0.62,x
b, 0.61,x
file2:
a, 0.43,x
b, 0,49,x
The desired output is
a 0.62 0.62
b 0.61 0.49
Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies
3. Shell Programming and Scripting
Hi,
I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use:
cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI... (1 Reply)
Discussion started by: isgoed
1 Replies
4. Shell Programming and Scripting
Hello,
I have a number of tab delimited data files consists of two columns. Like that:
File1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
FIle2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
File3... (19 Replies)
Discussion started by: erden
19 Replies
5. UNIX for Dummies Questions & Answers
Hi,
I have two text files that I would like to merge/join. I would like to join them if the first columns of both text files match and the second column of the first text file matches the third column of the second text file.
Example input:
First file:
1334 10 0 0 1 5.2
1334 12 0 0 1 4.5... (4 Replies)
Discussion started by: evelibertine
4 Replies
6. Shell Programming and Scripting
I have two files.
FileA.txt
30910 rs7468327
36587 rs10814410
91857 rs9408752
105797 rs1133715
146659 rs2262038
152695 rs2810979
181843 rs3008128
182129 rs3008131
192118 rs3008170
FileB.txt
30910 1.9415219673 0
36431 1.3351312477 0.0107191428
36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies
7. Shell Programming and Scripting
Hi.
I have 2 files of below format.
File1
AA~1~STEVE~3.1~4.1~5.1
AA~2~DANIEL~3.2~4.2~5.2
BB~3~STEVE~3.3~4.3~5.3
BB~4~TIM~3.4~4.4~5.4
File 2
AA~STEVE~AA STEVE WORKS at AUTO COMPANY
AA~DANIEL~AA DANIEL IS A ELECTRICIAN
BB~STEVE~BB STEVE IS A COOK
I want to match 1st and 3rd... (2 Replies)
Discussion started by: crypto87
2 Replies
8. Shell Programming and Scripting
Hello guys,
I have 2 CSV files which goes like this:
CSV1:
Breaking.csv:
UTF-8
"Name","Description","Occupation","Email"
"Walter White","","Chemistry Teacher","w.w@bb.com"
"Jessie Pinkman","","Junkie","j.p@bb.com"
"Hank Schrader","","DEA Agent","h.s@bb.com"
CSV2:
Bad.csv... (7 Replies)
Discussion started by: jeffreybsu
7 Replies
9. Shell Programming and Scripting
Hello,
I have a tab delim file that looks like this
CHROM POS ID REF ALT ID HGVS_C HGVS_P
1 17319011 rs2076603 G A NM_022089.3,NM_001141973.2,NM_001141974.2 c.1815C>T,c.1800C>T,c.1800C>T p.Pro605Pro,p.Pro600Pro,p.Pro600Pro
1 20960230 rs45530340 ... (3 Replies)
Discussion started by: nans
3 Replies
10. Shell Programming and Scripting
I have two files, file1 and file2 who have identical number of rows and columns. However, the script is supposed to be used for for different files and I cannot know the format in advance. Also, the number of columns changes within the file, some rows have more and some less columns (they are... (13 Replies)
Discussion started by: maya3
13 Replies
LEARN ABOUT OPENSOLARIS
comm
comm(1) User Commands comm(1)
NAME
comm - select or reject lines common to two files
SYNOPSIS
comm [-123] file1 file2
DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output:
lines only in file1; lines only in file2; and lines in both files.
If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating
sequence of the original lines. If not, the results are unspecified.
OPTIONS
The following options are supported:
-1 Suppresses the output column of lines unique to file1.
-2 Suppresses the output column of lines unique to file2.
-3 Suppresses the output column of lines duplicated in file1 and file2.
OPERANDS
The following operands are supported:
file1 A path name of the first file to be compared. If file1 is -, the standard input is used.
file2 A path name of the second file to be compared. If file2 is -, the standard input is used.
USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
EXAMPLES
Example 1 Printing a list of utilities specified by files
If file1, file2, and file3 each contain a sorted list of utilities, the command
example% comm -23 file1 file2 | comm -23 - file3
prints a list of utilities in file1 not specified by either of the other files. The entry:
example% comm -12 file1 file2 | comm -12 - file3
prints a list of utilities specified by all three files. And the entry:
example% comm -12 file2 file3 | comm -23 -file1
prints a list of utilities specified by both file2 and file3, but not specified in file1.
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE,
LC_CTYPE, LC_MESSAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 All input files were successfully output as specified.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWesu |
+-----------------------------+-----------------------------+
|CSI |enabled |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5)
SunOS 5.11 3 Mar 2004 comm(1)