Join based on positions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join based on positions
# 1  
Old 06-11-2014
Join based on positions

I have two text files as shown below

cat file1.txt

Code:
Id    leng  sal   mon
25671 34343 56565 5565
44888 56565 45554 6868
23343 23423 26226 6224
77765 88688 87464 6848
66776 23343 63463 4534

cat file2.txt


Code:
Id    number
25671 34343 
76767 34234 
23343 23423 
66776 23343

I want my output file to be

Code:
Id    leng  sal   mon
25671 34343
76767 34234
23343 23423
66776 23343
44888 56565 45554 6868
77765 88688 87464 6848

I want all the rows from File2.txt and those that does not match in file1.txt. If you see my output file the last 2 rows are not present in File2.txt but are present in File1.txt. So my output file has the two rows.
In another words. I want all the rows from file2.txt + rows that does not match in File1.txt

If we use the below command then this would return the rows that are matching based on the first field.

Code:
join -v1 file1.txt file2.txt >output.txt

Also is there a way to compare based on position? Say I want to compare two files based on position. Say from position 7-12 in File1 to 7-12 in File2.

Thanks!!

Last edited by Scrutinizer; 06-11-2014 at 05:30 PM.. Reason: code tags
# 2  
Old 06-11-2014
Please use code tags.
Please elaborate your requirement. As the format of the files are not similar, on which fields you would decide to be matched or not to be matched.

Your second question (position match) is part of this exercise or another one? If its, please rephrase your whole requirement into single, to get the final outcome and to avoid misleading repetitive posts.
# 3  
Old 06-11-2014
I have two text files as shown below

Code:
cat file1.txt

Id leng sal mon
25671 34343 56565 5565
44888 56565 45554 6868
23343 23423 26226 6224
77765 88688 87464 6848
66776 23343 63463 4534

cat file2.txt


Id number
25671 34343 
76767 34234 
23343 23423 
66776 23343

I want my output file to be

Code:
Id leng sal mon
25671 34343
76767 34234
23343 23423
66776 23343
44888 56565 45554 6868
77765 88688 87464 6848


I want all the rows from File2.txt and those that does not match in file1.txt. If you see my output file the last 2 rows are not present in File2.txt but are present in File1.txt. So my output file has the two rows.

If we use the below command then this would return the rows that are matching based on the first field.

Code:
join -v1 file1.txt file2.txt >output.txt

I want to join the two files based on the position. In the files above I want to join based on the data from position 7 to position 12.


Thanks!!
# 4  
Old 06-11-2014
The join command can be used only on one common field, so I would suggest using awk instead:
Code:
awk '
        NR == FNR {
                A[$1 FS $2]
                print $0
                next
        }
        !( ( $1 FS $2 ) in A )
' file2.txt file1.txt

# 5  
Old 06-11-2014
If key pairs ($1,$2) do not occur more than once per file :
Code:
awk '!A[$1,$2]++' file2.txt file1.txt


Last edited by Scrutinizer; 06-11-2014 at 05:52 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join 2 CSVs based on 1 key

Hello, I have 2 csv as follows: a.csv: name,phone,adress,car xy,1234,asbd yz,2134,asbdf tc,6789,salkdur b.csv: telphone,vehicle 2134,toyota 6789,bmw 1234,honda What is need is this: output.csv: name,phone,adres,car xy,1234,asbd,honda yz,2134,asbdf,toyota (7 Replies)
Discussion started by: Zam_1234
7 Replies

2. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies

3. UNIX for Dummies Questions & Answers

Filling positions based on frequency

I have files with hundreds of sequences with frequency values reported as "Freq X" and missing characters represented by a dash ("-"), something like this >39sample Freq 4 TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTAGCAGCACTA >22sample Freq 15 T-GATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTAGCAGCACTA... (12 Replies)
Discussion started by: Xterra
12 Replies

4. UNIX for Dummies Questions & Answers

Join 2 files based on certain column

I have file input1.txt 11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14 11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14 11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14 11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14... (5 Replies)
Discussion started by: radius
5 Replies

5. Shell Programming and Scripting

Sort based on positions in flat file

Hello, For example: 12........6789101112..............20212223242526..................50 ( Positions) LName FName DOB (Lastname starts from 1 to 6 , FName from 8 to 15 and date of birth from 21 to29) CURTIS KENNETH ... (5 Replies)
Discussion started by: duplicate
5 Replies

6. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

7. Shell Programming and Scripting

seds to extract fields based on positions

Hi My file has a series of rows up to 160 characters in length. There are 7 columns for each row. In each row, column 1 starts at position 4 column 2 starts at position 12 column 3 starts at position 43 column 4 starts at position 82 column 5 starts at... (7 Replies)
Discussion started by: malts18
7 Replies

8. Shell Programming and Scripting

awk script replace positions if certain positions equal prescribed value

I am attempting to replace positions 44-46 with YYY if positions 48-50 = XXX. awk -F "" '{if (substr($0,48,3)=="XXX") $44="YYY"}1' OFS="" $filename > $tempfile But this is not working, 44-46 is still spaces in my tempfile instead of YYY. Any suggestions would be greatly appreciated. (9 Replies)
Discussion started by: halplessProblem
9 Replies

9. Shell Programming and Scripting

Filling positions based on consensus character

I have files with hundreds of sequences with missing characters represented by a dash ("-"), something like this I need to go sequence by sequence and if a dash is found, it should be replaced with the most common character in that particular position. Thus, in my example the dash in the second... (6 Replies)
Discussion started by: Xterra
6 Replies
Login or Register to Ask a Question