Merging two text files by a column and filling in the missing values


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Merging two text files by a column and filling in the missing values
# 1  
Old 06-14-2012
Merging two text files by a column and filling in the missing values

Hi,

I have to text files that I want to merge by the first column. The values in the first column pretty much match for the first part. However there are some values that are present in column 1 and not present in column 2 or vice versa. For such values I would like to substitute X for the corresponding columns. Example:

Code:
File 1:
AB   1
CD   2
EF   3
GH   4

File 2: 
CD  4
EF  5
GH  6
IJ  7

Output
AB 1 X
CD 2 4
EF 3 6
GH 4 6
IJ X 7

Thanks!
# 2  
Old 06-14-2012
just thinking...

If you used the COMM command, after cutting the first column of both files, you would know records that are in common. Save to FileCommon.
Then, maybe a PASTE command to create those in common. Your output lines 2-3-4, to FileB.
GREP -V what is in-common (your FileCommon) against the first file; with AWK for example to create those X entries. Your output line 1, to FileA.
Do the same to create FileC, from the 2nd file. Your output line 5 to FileC.
Finally combine the three files.
# 3  
Old 06-14-2012
Hi.

This probably follows the thought train of joeyg, but uses join to find the initial missing items. The key option for this is "-v". The missing items are appended to the other file with a little awk script. The files are sorted and joined again to produce the result. The data files are changed, so they are copied from a "sacred" version. The commands context, specimen, and pass-fail are local, so not available outside my group, but the script is designed so the absence will not cause an error.
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join with missing values.

# Section 1, setup, pre-solution, $Revision: 1.26 $".
# Infrastructure details, environment, debug commands for forum posts. 
# Uncomment export command to run as external user: not context, pass-fail.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
C=$HOME/bin/context && [ -f $C ] && $C join awk
set -o nounset

cp sacred1 data1
FILE1=data1
cp sacred2 data2
FILE2=data2

# Display sample data files.
pe
specimen $FILE1 $FILE2 expected-output.txt || (
head $FILE1
pe
head $FILE2 
pe 
head expected-output.txt
)

# Section 2, solution.
pl " Results:"
db " Section 2: solution."
join -v 1 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE2 

join -v 2 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE1 

join <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
tee f1

# Section 3, post-solution, check results, clean-up, etc.
db " Section 3: check solution."
(C=$HOME/bin/pass-fail && [ -f $C ] && $C;exit 0) 

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
join (GNU coreutils) 6.10
awk GNU Awk 3.1.5

Whole: 5:0:5 of 4 lines in file "data1"
AB   1
CD   2
EF   3
GH   4

Whole: 5:0:5 of 4 lines in file "data2"
CD  4
EF  5
GH  6
IJ  7

Whole: 5:0:5 of 5 lines in file "expected-output.txt"
AB 1 X
CD 2 4
EF 3 5
GH 4 6
IJ X 7

-----
 Results:
AB 1 X
CD 2 4
EF 3 5
GH 4 6
IJ X 7

-----
 Comparison of 5 created lines with 5 lines of desired results:
 Succeeded -- files have same content.

See man pages for details.

Best wishes ... cheers, drl
These 2 Users Gave Thanks to drl For This Post:
# 4  
Old 06-15-2012
Is there an easy way to alter this script so that instead of just two files, it merges five files by the first column filling in the blanks with X? Thanks!
# 5  
Old 06-15-2012
Hi.
Quote:
Originally Posted by evelibertine
Is there an easy way to alter this script so that instead of just two files, it merges five files by the first column filling in the blanks with X? Thanks!
Please post samples and expected output ... cheers, drl
# 6  
Old 06-15-2012
Code:
File 1:
AB 1
CD 1
EF 1

File 2:
EF 2
GH 2
IJ 2

File 3:
IJ 3
KL 3 
MN 3 

File 4:
MN 4
OP 4
RS 4

File 5:
RS 5
TU 5
VW 5

Output:
AB 1 X X X X
CD 1 X X X X
EF 1 2 X X X 
GH X 2 X X X
IJ X 2 3 X X 
KL X X 3 X X 
MN X X 3 4 X 
OP X X X 4 X 
RS X X X 4 5 
TU X X X X 5
VW X X X X 5

# 7  
Old 06-16-2012
Hi evelibertine ,

Please hava a look at this code , fit for 2 files.

Quote:
(
for loop_1 in `cat file_1 file_2 | awk '{print $1}'| sort -u `
do
if [ `grep -c $loop_1 file_1` -gt 0 ]
then
col2=`grep $loop_1 file_1 | awk '{print $2}'`
else
col2=X
fi
if [ `grep -c $loop_1 file_2` -gt 0 ]
then
col3=`grep $loop_1 file_2 | awk '{print $2}'`
else
col3=X
fi
echo $loop_1 $col2 $col3
## or redirect it to an out file..
done
)

Last edited by dipanchandra; 06-16-2012 at 07:57 AM.. Reason: modifying
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting values based on line-column numbers from multiple text files

Dear All, I have to solve the following problems with multiple tab-separated text file but I don't know how. Any help would be greatly appreciated. I have access to Linux mint (but not as a professional). I have multiple tab-delimited files with the following structure: file1: 1 44 2 ... (5 Replies)
Discussion started by: Bastami
5 Replies

2. UNIX for Dummies Questions & Answers

Dynamically merging 2 files on header values

Hi All, I have 2 files which i need to merge together based on the column names provided in the file. The first line in both files are header records. The first file has fixed columns but second file can have subset of the columns from file 1 File 1: ... (6 Replies)
Discussion started by: kushagra
6 Replies

3. Shell Programming and Scripting

Merging 2 text files when there is a common time stamp column in them

Dear Unix experts and users I have 2 kinds of files like below, of which I need to merge them in the order of time. File1: Date_Time Context D1 D2 04/19/2013_23:48:54.819 ABCD x x 04/19/2013_23:48:55.307 ABCD x x 04/19/2013_23:48:55.823 ABCD x ... (7 Replies)
Discussion started by: ks_reddy
7 Replies

4. UNIX Desktop Questions & Answers

merging files and add missing rows

hello all, I have files that have a specific way for naming the first column they are make of five names in Pattern of 3 Y = (no case sensitive) so the files are names $Y-$Y-$Y or $X-$Y-$Z depending how we look they only exist of the pattern exist now I want to create a file from them that... (9 Replies)
Discussion started by: A-V
9 Replies

5. UNIX for Dummies Questions & Answers

Match values/IDs from column and text files

Hello, I am trying to modify 2 files, to yield results in a 3rd file. File-1 is a 8-columned file, separted with tab. 1234:1 xyz1234 blah blah blah blah blah blah 1234:1 xyz1233 blah blah blah blah blah blah 1234:1 abc1234 blah blah blah blah blah blah n/a RRR0000 blah blah blah... (1 Reply)
Discussion started by: ad23
1 Replies

6. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

7. UNIX for Dummies Questions & Answers

Merging two text files by a column

I have two text files. One has two columns and looks like below: rs# otherallele_freq rs10399749 0 rs4030303 0 rs4030300 0 rs940550 1.000 rs13328714 0 rs11490937 0 rs6683466 0 rs12025928 1.000 rs6650104 0 rs11240781 0... (5 Replies)
Discussion started by: evelibertine
5 Replies

8. UNIX for Dummies Questions & Answers

Merging two text files by a column

So I have two text files. The first one looks like this: refsnp_id chr_name chrom_start 1 rs1000000 12 126890980 2 rs10000010 4 21618674 3 rs10000012 4 1357325 4 rs10000013 4 37225069 5 rs1000002 3 183635768 And the second one looks like this: AUC rs1000000 0.03 0.1240 AUC ... (4 Replies)
Discussion started by: evelibertine
4 Replies

9. Shell Programming and Scripting

Filling in missing columns

Hi all, I have a file that contains about 1000 rows and 800 columns. Nearly every row has 800 columns but some DONT. I want to extend the rows that dont have values with NA's. Here is an example: my file bob 2 4 5 6 8 9 4 5 tar 2 4 5 4 3 2 9 1 bro 3 5 3 4 yar 2 ... (7 Replies)
Discussion started by: gisele_l
7 Replies

10. Shell Programming and Scripting

Merging column files

Hi,Iam new to Unix.I have a file FileA which is a variable length file where each column is seperated by delimitter "|". FileA: SrNo Name Address 1-234|name1|Addr1 1-34|name2|Addr2 1-2345|name3|Addr3 FileB: SrNo Address 1-34<<06 SPACES>>Addr1<<8 spaces>> 1-234<<05... (1 Reply)
Discussion started by: swapna321
1 Replies
Login or Register to Ask a Question