|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Merging two text files by a column and filling in the missing values
Hi, I have to text files that I want to merge by the first column. The values in the first column pretty much match for the first part. However there are some values that are present in column 1 and not present in column 2 or vice versa. For such values I would like to substitute X for the corresponding columns. Example: Code:
File 1: AB 1 CD 2 EF 3 GH 4 File 2: CD 4 EF 5 GH 6 IJ 7 Output AB 1 X CD 2 4 EF 3 6 GH 4 6 IJ X 7 Thanks! |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
just thinking...
If you used the COMM command, after cutting the first column of both files, you would know records that are in common. Save to FileCommon.
Then, maybe a PASTE command to create those in common. Your output lines 2-3-4, to FileB. GREP -V what is in-common (your FileCommon) against the first file; with AWK for example to create those X entries. Your output line 1, to FileA. Do the same to create FileC, from the 2nd file. Your output line 5 to FileC. Finally combine the three files. |
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
Hi. This probably follows the thought train of joeyg, but uses join to find the initial missing items. The key option for this is "-v". The missing items are appended to the other file with a little awk script. The files are sorted and joined again to produce the result. The data files are changed, so they are copied from a "sacred" version. The commands context, specimen, and pass-fail are local, so not available outside my group, but the script is designed so the absence will not cause an error. Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate join with missing values.
# Section 1, setup, pre-solution, $Revision: 1.26 $".
# Infrastructure details, environment, debug commands for forum posts.
# Uncomment export command to run as external user: not context, pass-fail.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
C=$HOME/bin/context && [ -f $C ] && $C join awk
set -o nounset
cp sacred1 data1
FILE1=data1
cp sacred2 data2
FILE2=data2
# Display sample data files.
pe
specimen $FILE1 $FILE2 expected-output.txt || (
head $FILE1
pe
head $FILE2
pe
head expected-output.txt
)
# Section 2, solution.
pl " Results:"
db " Section 2: solution."
join -v 1 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE2
join -v 2 <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
awk '{print $1,"X"}' >> $FILE1
join <( sort -k1,1 $FILE1 ) <( sort -k1,1 $FILE2 ) |
tee f1
# Section 3, post-solution, check results, clean-up, etc.
db " Section 3: check solution."
(C=$HOME/bin/pass-fail && [ -f $C ] && $C;exit 0)
exit 0producing: Code:
% ./s1 Environment: LC_ALL = C, LANG = C (Versions displayed with local utility "version") OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64 Distribution : Debian GNU/Linux 5.0.8 (lenny) bash GNU bash 3.2.39 join (GNU coreutils) 6.10 awk GNU Awk 3.1.5 Whole: 5:0:5 of 4 lines in file "data1" AB 1 CD 2 EF 3 GH 4 Whole: 5:0:5 of 4 lines in file "data2" CD 4 EF 5 GH 6 IJ 7 Whole: 5:0:5 of 5 lines in file "expected-output.txt" AB 1 X CD 2 4 EF 3 5 GH 4 6 IJ X 7 ----- Results: AB 1 X CD 2 4 EF 3 5 GH 4 6 IJ X 7 ----- Comparison of 5 created lines with 5 lines of desired results: Succeeded -- files have same content. See man pages for details. Best wishes ... cheers, drl |
| The Following User Says Thank You to drl For This Useful Post: | ||
evelibertine (06-15-2012) | ||
|
#4
|
|||
|
|||
|
Is there an easy way to alter this script so that instead of just two files, it merges five files by the first column filling in the blanks with X? Thanks!
|
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Hi.
Please post samples and expected output ... cheers, drl |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Code:
File 1: AB 1 CD 1 EF 1 File 2: EF 2 GH 2 IJ 2 File 3: IJ 3 KL 3 MN 3 File 4: MN 4 OP 4 RS 4 File 5: RS 5 TU 5 VW 5 Output: AB 1 X X X X CD 1 X X X X EF 1 2 X X X GH X 2 X X X IJ X 2 3 X X KL X X 3 X X MN X X 3 4 X OP X X X 4 X RS X X X 4 5 TU X X X X 5 VW X X X X 5 |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Hi evelibertine ,
Please hava a look at this code , fit for 2 files. Quote:
Last edited by dipanchandra; 06-16-2012 at 06:57 AM.. Reason: modifying |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Match values/IDs from column and text files | ad23 | UNIX for Dummies Questions & Answers | 1 | 02-23-2012 05:18 PM |
| Comparing two text files by a column and printing values that do not match | evelibertine | UNIX for Dummies Questions & Answers | 8 | 01-24-2012 09:22 AM |
| Merging two text files by a column | evelibertine | UNIX for Dummies Questions & Answers | 5 | 09-13-2011 01:46 AM |
| Merging two text files by a column | evelibertine | UNIX for Dummies Questions & Answers | 4 | 08-19-2011 03:02 AM |
| Merging column files | swapna321 | Shell Programming and Scripting | 1 | 06-27-2008 06:12 AM |
|
|