Sponsored Content
Top Forums Shell Programming and Scripting Merge two files with different lengths Post 302796587 by ClaraW on Friday 19th of April 2013 09:19:03 PM
Old 04-19-2013
Merge two files with different lengths

Hi there,

I have two very long files like:

file1: two fields

Code:
1 123 
1 125
1 234
2 123
2 234
2 300
2 312
3 10
3 215
4 56
...

file2: 6 fields

Code:
1 123 0 1 0 0
1 126 2 1 0 0
2 123 0 1 0 1
2 138 1 1 1 1
2 300 0 1 2 3
2 311 2 4 6 0
3 120 3 4 1 0
3 215 1 1 2 1
3 216 0 2 1 5
3 345 8 0 1 0
3 357 0 1 1 1
3 500 2 1 0 1
4 17  6 1 0 2
4 70  0 1 0 1
...

The numbers of lines in file1 and file2 are not equal.

I want to get an output file like

file3: 6 fields, and the first two fields are exactly the same as the first two fields in file1. For example, the line with the first two field "1 123" has a match in file2: "1 123 0 1 0 0", then print the whole line in file2:"1 123 0 1 0 0" to file3. If one line in file1 does not have a match in file2, e.g. "1 125", then print "1 125 0 0 0 0" to file3.

Code:
1 123 0 1 0 0
1 125 0 0 0 0 
1 234 0 0 0 0
2 123 0 1 0 1
2 234 0 0 0 0
2 300 0 1 2 3
2 312 0 0 0 0
3 10  0 0 0 0
3 215 1 1 2 1
4 56  0 0 0 0
...

I am wondering if this can be done using awk or join or any other in linux? Since the files are very large, I really want it to be fast. Thanks a lot~~~

Note: Field 2 in both file1 and file2 has only number values, but field 1 in both files may have characters too. The two fields are sorted. And in both files, this kind of situation will not happen, no duplicates.

Code:
1 123
1 123
...

Also we do not need to consider the lines in file2 which do not have any match in file1, for example "1 126 2 1 0 0" (no match in file1), then this line should not be added to file3.

Last edited by ClaraW; 04-26-2013 at 08:38 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using grep to find strings of certain lengths?

I am trying to use grep to find strings of certain lengths that all start with the same letter. Is this possible?:confused: (4 Replies)
Discussion started by: crabtruck
4 Replies

2. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

3. Solaris

limit on Solaris username lengths?

Hi this question applies to Solaris 8,9,10 and opensolaris as in my environment it applies to all of these Is there a limit on the size of the username (in /etc/passwd) or indeed does there come a point where, like the 8 character limitation of passwords, the system receives the input but... (6 Replies)
Discussion started by: hcclnoodles
6 Replies

4. Shell Programming and Scripting

Read lines with different lengths in while loop

Hi there ! I need to treat files with variable line length, and process the tab-delimited words of each line. The tools I know are some basic bash scripting and sed ... I haven't got to python or perl yet. So my file looks like this obj1 0.01953 0.34576 0.04418 0.01249 obj2 0.78140... (7 Replies)
Discussion started by: jossojjos
7 Replies

5. Shell Programming and Scripting

Merging data from 2 files of different lengths?

Hi all, Sorry if someone has answered something like this already, but I have a problem. I am not brilliant with "awk" but think it should be the command to use to get what I am after. I have 2 files: job-file (several hundred lines like): 1018003,LONG MU WAN,1113S 1018004,LONG MU... (4 Replies)
Discussion started by: sgb2301
4 Replies

6. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

7. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies

8. Shell Programming and Scripting

Paste files of varying lengths

I have three files of varying lengths and different number of columns. How can I paste all three with all columns aligned? File1 ---- 123 File2 ---- 234 345 678 File3 ---- 456 789 Output should look like: 123 234 456 345 789 (6 Replies)
Discussion started by: Un1xNewb1e
6 Replies

9. UNIX for Advanced & Expert Users

UTF-8,16,32 character lengths using awk

Hi All, I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command. I have tried to use the below commands within awk function, but it does not seem to work { cmd="wc -c "stringtocheck ( cmd )... (6 Replies)
Discussion started by: tostay2003
6 Replies

10. Programming

Simple C program to count word lengths

So my program is not working and I keep changing it to figure out why. So I have two questions, can I do tracing similar to bash, and also what is wrong with this. The idea is simple, I want to count "word" lengths, with the loose definition of word not being a space, tab, or newline. Here is... (11 Replies)
Discussion started by: Riker1204
11 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes). EXAMPLES
Example 1: Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.10 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 09:51 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy