Sponsored Content
Full Discussion: File joining and sorting
Top Forums UNIX for Dummies Questions & Answers File joining and sorting Post 302825139 by verse123 on Sunday 23rd of June 2013 07:21:53 PM
Old 06-23-2013
Thank you, Allister. You pointed out the problem for the joining step.

However, even now that it's working, it's not producing the joined output I am looking for.

From f1.sort and f2.sort which look like this:
Quote:
==> f1.sort <==
2L,1
2L,2
2L,3
2L,4
2L,5
2L,6
2L,7
2L,8
2L,9
2L,10

==> f2.sort <==
10117182,1
10117309,1
10117431,1
10117467,1
10117536,1
10117554,1
10126359,1
10126386,1
10126486,1
10126597,1
The output should merge the same column and tell me which lines have a "1" tag in the new output file like this

Quote:
===> f3 <===
2L,9
2L,10
...
...
2L,10117182,1
2L,10117183
...
...
2L,10117309,1
The command I'm using is
Code:
join -a 1 -1 2 -2 1 -t, f1.sort f2.sort > f3



I've also tried turning the two input files, f1.sort and f2.sort, from comma to tab delimited (on a mac, so \t doesn't work with sed. So i've been using cntrl V + tab:
Code:
for i in *.sort ; do sed 's/,/cntrl v + tab/g' $i > $i.tab; done

but the join command still isn't working... Do you think it's failing because of the way the files are sorted? I used Yoda's suggestion but join may not know I used -g with sort...

---------- Post updated at 07:21 PM ---------- Previous update was at 05:50 PM ----------

Okay, I figured it out. It was just a sorting issue.

But this now leads me to a strange problem. I had over 1,000 lines to join from f2 to f1 but only 15 join... why do you guys suppose this is happening? file f1 has ~70000000 lines and file f2 has ~1000 lines. I am not sure why it's only matching 15 lines when all 1000 exist in file f1.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining lines in log file

Hi, I need to develop a script to join multiple three lines in a log file into one line for processing with awk and grep. I looked at tr with no success. The first line contains the date time information. The second line contains the error line. The third line is a blank line. Thanks, Mike (3 Replies)
Discussion started by: bubba112557
3 Replies

2. Shell Programming and Scripting

Joining 2 lines in a file together

Hi guys, I've got a log file which has entries that look like this: ------------------------------------------------------------------------------- 06/08/04 07:57:57 AMQ9002: Channel program started. EXPLANATION: Channel program 'INSCCPQ1.HSMTSPQ1' started. ACTION: None. ... (3 Replies)
Discussion started by: m223464
3 Replies

3. Shell Programming and Scripting

Joining program to one batch file

I created a batch file (./mybatch) that need to run few programs at a sequnece but i need a command like the DOS call command in order to return to the main batch file to proceed the sequence example: cd /dir1/path/dir2 invoke program1 cd /dir3/path2/ <--- i want to return here (2 Replies)
Discussion started by: eynkesef
2 Replies

4. UNIX for Dummies Questions & Answers

joining variable to the end of a file name

hi all i have a directory which contain file 20060101-66666-09-08-0.tif 20060101-77777-11-12-0.tif 20051231-54221-66-55.tif 20051231-54221-66-44.tif as you can see the name of the two last files is shorter then the first ones i want to take all the files with the shorter name and to add to... (7 Replies)
Discussion started by: naamas03
7 Replies

5. Shell Programming and Scripting

Transposing column to row, joining with another file, then sorting columns

Hello! I am very new to Linux and I do not know where to begin... I have a column with >64,000 elements (that are not in numberical order) like this: name 2 5 9 . . . 64,000 I would like to transpose this column into a row that will later become the header of a very large file... (2 Replies)
Discussion started by: doobedoo
2 Replies

6. Shell Programming and Scripting

joining multiple files into one while putting the filename in the file

Hello, I know how to join multiple files using the cat function. I want to do something a little more advanced. Basically I want to put the filename in the first column... One thing to note is that the file is tab delimited. e.g. file1.txt joe 1 4 5 6 7 3 manny 2 3 4 5 6 7 ... (4 Replies)
Discussion started by: phil_heath
4 Replies

7. UNIX for Dummies Questions & Answers

Joining lines of a text file using GAWK

sir... am having a data file of customer master., containing some important fields as a set one line after another., what i want is to have one set of these fields(rows) one after another in line.........then the second set... and so on... till the last set completed. ... (0 Replies)
Discussion started by: KANNI786
0 Replies

8. Shell Programming and Scripting

bash - joining lines in a file

I’m writing a bash shell script and I want to join lines together where two variables on each line are the same ie. 12345variablestuff43212morevariablestuff 12345variablestuff43212morevariablestuff 34657variablestuff78945morevariablestuff 34657variablestuff78945morevariablestuff... (12 Replies)
Discussion started by: Cultcha
12 Replies

9. Shell Programming and Scripting

Joining lines in a file - help!

I'm looking for a way to join lines in a file; e.,g consider the following R|This is line 1 R|This is line 2 R|This is line 3 R|This is line 4 R|This is line 5 what i want to end up with is R|This is line 1 R|This is line 2 R|This is line 3 R|This is line 4 R|This is line 5 so... (15 Replies)
Discussion started by: Storms
15 Replies

10. UNIX for Dummies Questions & Answers

Joining and sorting with csvs with subfields

hello masters, I am working with csv files that open just fine in excel, but have sub-fields which are comma separated as well. a 3 column csv looks like a,b,"c,d,e" f,g,h How do I make join or sort believe that "c,d,e" is just 1 field? (8 Replies)
Discussion started by: senhia83
8 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [ options ] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If one of the file names is the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Input fields are normally separated spaces or tabs; output fields by space. In this case, multiple separators count as one, and leading separators are discarded. The following options are recognized, with POSIX syntax. -a n In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -v n Like -a, omitting output for paired lines. -e s Replace empty output fields by string s. -1 m -2 m Join on the mth field of file1 or file2. -jn m Archaic equivalent for -n m. -ofields Each output line comprises the designated fields. The comma-separated field designators are either 0, meaning the join field, or have the form n.m, where n is a file number and m is a field number. Archaic usage allows separate arguments for field designators. -tc Use character c as the only separator (tab character) on input and output. Every appearance of c in a line is significant. EXAMPLES
sort /adm/users | join -t: -a 1 -e "" - bdays Add birthdays to password information, leaving unknown birthdays empty. The layout of is given in users(6); bdays contains sorted lines like tr : ' ' </adm/users | sort -k 3 3 >temp join -1 3 -2 3 -o 1.1,2.1 temp temp | awk '$1 < $2' Print all pairs of users with identical userids. SOURCE
/sys/src/cmd/join.c SEE ALSO
sort(1), comm(1), awk(1) BUGS
With default field separation, the collating sequence is that of sort -b -ky,y; with -t, the sequence is that of sort -tx -ky,y. One of the files must be randomly accessible. JOIN(1)
All times are GMT -4. The time now is 02:10 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy