Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 06-14-2012
Registered User
 
Join Date: May 2011
Posts: 190
Thanks: 92
Thanked 0 Times in 0 Posts
Merging two text files by a column and filling in the missing values

Hi,

I have to text files that I want to merge by the first column. The values in the first column pretty much match for the first part. However there are some values that are present in column 1 and not present in column 2 or vice versa. For such values I would like to substitute X for the corresponding columns. Example:


Code:
File 1:
AB   1
CD   2
EF   3
GH   4

File 2: 
CD  4
EF  5
GH  6
IJ  7

Output
AB 1 X
CD 2 4
EF 3 6
GH 4 6
IJ X 7

Thanks!
Sponsored Links
    #2  
Old 06-14-2012
joeyg's Avatar
joeyg joeyg is offline Forum Staff  
modérateur
 
Join Date: Dec 2007
Location: Out running a Marathon.
Posts: 2,189
Thanks: 46
Thanked 126 Times in 118 Posts
just thinking...

If you used the COMM command, after cutting the first column of both files, you would know records that are in common. Save to FileCommon.
Then, maybe a PASTE command to create those in common. Your output lines 2-3-4, to FileB.
GREP -V what is in-common (your FileCommon) against the first file; with AWK for example to create those X entries. Your output line 1, to FileA.
Do the same to create FileC, from the 2nd file. Your output line 5 to FileC.
Finally combine the three files.
Sponsored Links
    #3  
Old 06-14-2012
drl's Avatar
drl drl is offline Forum Advisor  
Registered Voter
 
Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 1,480
Thanks: 15
Thanked 134 Times in 122 Posts
Hi.

This probably follows the thought train of joeyg, but uses join to find the initial missing items. The key option for this is "-v". The missing items are appended to the other file with a little awk script. The files are sorted and joined again to produce the result. The data files are changed, so they are copied from a "sacred" version. The commands context, specimen, and pass-fail are local, so not available outside my group, but the script is designed so the absence will not cause an error.

Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join with missing values.

# Section 1, setup, pre-solution, $Revision: 1.26 $".
# Infrastructure details, environment, debug commands for forum posts. 
# Uncomment export command to run as external user: not context, pass-fail.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
C=$HOME/bin/context && [ -f $C ] && $C join awk
set -o nounset

cp sacred1 data1
FILE1=data1
cp sacred2 data2
FILE2=data2

# Display sample data files.
pe
specimen $FILE1 $FILE2 expected-output.txt || (
head $FILE1
pe
head $FILE2 
pe 
head expected-output.txt
)

# Section 2, solution.
pl " Results:"
db " Section 2: solution."
join -v 1 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE2 

join -v 2 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE1 

join <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
tee f1

# Section 3, post-solution, check results, clean-up, etc.
db " Section 3: check solution."
(C=$HOME/bin/pass-fail && [ -f $C ] && $C;exit 0) 

exit 0

producing:

Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
join (GNU coreutils) 6.10
awk GNU Awk 3.1.5

Whole: 5:0:5 of 4 lines in file "data1"
AB   1
CD   2
EF   3
GH   4

Whole: 5:0:5 of 4 lines in file "data2"
CD  4
EF  5
GH  6
IJ  7

Whole: 5:0:5 of 5 lines in file "expected-output.txt"
AB 1 X
CD 2 4
EF 3 5
GH 4 6
IJ X 7

-----
 Results:
AB 1 X
CD 2 4
EF 3 5
GH 4 6
IJ X 7

-----
 Comparison of 5 created lines with 5 lines of desired results:
 Succeeded -- files have same content.

See man pages for details.

Best wishes ... cheers, drl
The Following User Says Thank You to drl For This Useful Post:
evelibertine (06-15-2012)
    #4  
Old 06-15-2012
Registered User
 
Join Date: May 2011
Posts: 190
Thanks: 92
Thanked 0 Times in 0 Posts
Is there an easy way to alter this script so that instead of just two files, it merges five files by the first column filling in the blanks with X? Thanks!
Sponsored Links
    #5  
Old 06-15-2012
drl's Avatar
drl drl is offline Forum Advisor  
Registered Voter
 
Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 1,480
Thanks: 15
Thanked 134 Times in 122 Posts
Hi.
Quote:
Originally Posted by evelibertine View Post
Is there an easy way to alter this script so that instead of just two files, it merges five files by the first column filling in the blanks with X? Thanks!
Please post samples and expected output ... cheers, drl
Sponsored Links
    #6  
Old 06-15-2012
Registered User
 
Join Date: May 2011
Posts: 190
Thanks: 92
Thanked 0 Times in 0 Posts

Code:
File 1:
AB 1
CD 1
EF 1

File 2:
EF 2
GH 2
IJ 2

File 3:
IJ 3
KL 3 
MN 3 

File 4:
MN 4
OP 4
RS 4

File 5:
RS 5
TU 5
VW 5

Output:
AB 1 X X X X
CD 1 X X X X
EF 1 2 X X X 
GH X 2 X X X
IJ X 2 3 X X 
KL X X 3 X X 
MN X X 3 4 X 
OP X X X 4 X 
RS X X X 4 5 
TU X X X X 5
VW X X X X 5

Sponsored Links
    #7  
Old 06-16-2012
Registered User
 
Join Date: Jul 2010
Posts: 15
Thanks: 7
Thanked 0 Times in 0 Posts
Hi evelibertine ,

Please hava a look at this code , fit for 2 files.

Quote:
(
for loop_1 in `cat file_1 file_2 | awk '{print $1}'| sort -u `
do
if [ `grep -c $loop_1 file_1` -gt 0 ]
then
col2=`grep $loop_1 file_1 | awk '{print $2}'`
else
col2=X
fi
if [ `grep -c $loop_1 file_2` -gt 0 ]
then
col3=`grep $loop_1 file_2 | awk '{print $2}'`
else
col3=X
fi
echo $loop_1 $col2 $col3
## or redirect it to an out file..
done
)

Last edited by dipanchandra; 06-16-2012 at 06:57 AM.. Reason: modifying
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Match values/IDs from column and text files ad23 UNIX for Dummies Questions & Answers 1 02-23-2012 05:18 PM
Comparing two text files by a column and printing values that do not match evelibertine UNIX for Dummies Questions & Answers 8 01-24-2012 09:22 AM
Merging two text files by a column evelibertine UNIX for Dummies Questions & Answers 5 09-13-2011 01:46 AM
Merging two text files by a column evelibertine UNIX for Dummies Questions & Answers 4 08-19-2011 03:02 AM
Merging column files swapna321 Shell Programming and Scripting 1 06-27-2008 06:12 AM



All times are GMT -4. The time now is 10:31 AM.