As you can see, it works on the copies of the files you posted. So - there must be an inherent difference in the original files you work upon. Reduce the files to just the lines in questions, try again, and if it doesn't work, post the output of
.
I posted the reply yesterday but I am not sure why it is not reflecting. So here it is again :
The files were originally tab delimited but I made them comma delimited to help me with the join command. I am now trying again with tab delimited files. I also tried some additions in my join command and this is what I gave :
I am getting a result out of this which unfortunately means that UNIX is not finding a common key for the files to join and it is surprising because there ARE common values between the files. This is how the sample of the result looks like :
As you can see above, 01635163349 is a common key between File 1 that has dates and file 2 that has the cost. So ideally the result should be
The command
does not give me any result as in no output on the console at all.
This is how file 1 looks:
File 2:
Can there be any other way to achieve an inner join between these files?
You have now shown us 3 different input file formats (tab separated fields, <space><comma><space> separated fields, and <space><comma> separated fields). You have shown us commands using <space>, <comma>, and <tab> as the field separator. And it isn't clear which separators have been used in the files those command are processing.
More importantly, you have said that your file names are File 1, File1, file 1, and file1. Since none of your commands have quoted the filenames being passed as arguments, many of them are asking various utilities to work on files named File or file and 1 and 2 (which presumably result in non-existent file diagnostics that you haven't shown us). The name of a file is case sensitive and having a <space> in a filename requires special handling in LOTS of ways that are being ignored in all of your command lines.
Then, it is also important to understand that in an awk script, $0 is the contents of the current input line, $2 is the contents of the 2nd field in the current input line, and a command like:
is never going to work unless
contains line that just contain whole lines that exactly match the 2nd field of a line in File1 (which is not true for any of your sample input file pairs.
And, the command line:
will only work correctly if there are no <space> or <tab> characters on any line in File2 AND you are trying to find complete lines form File2 that match a subset of a line from File1.
And, the command line:
should give you a diagnostic similar to:
not the no output that you say you get.
If you keep giving us inconsistent data and don't show us what your command lines and/or the output you get from them really are, you make it impossible for us to help you.
Saying things like:
Quote:
Came out as a typo .... but this is not working either
Doesn't help us. Show us the exact diagnostic that was produced!
Saying things like:
Quote:
These files are being sent by the source. There are many other columns in these files. I have manipulated them to remove the unrequired columns and the header using AWK and SED.
Doesn't give us any indication as to whether or not we are working on UNIX format text files after you have manipulated files sent by the source. If, after have manipulated them, the source files are still DOS format text files, there is a good chance that fields are matching because of DOS text file <carriage-return> line separators causing <carriage-return> characters to keep fields from matching or to cause output sent to your terminal being obscured by parts of output lines overwriting earlier text already sent to your screen.
Please give us clear answers to the questions we have asked. We are asking for information that will allow us to help you. We are not asking you to do extra work for the fun of it.
Sorry for the confusion that I am creating. Let me start from the beginning. These files are tab delimited files. Because there is a confusion with the file names, I will henceforth use the original file names --dlya0908.tab (which I was referring as File1) and tgpr.tab (which I was referring as File2)
I do not have information on how does the source team create the files. It is an external server from which the files are FTPed.
So, dlya0908.tab looks like this :
and tgpr.tab looks like this :
I am trying to join these files like this :
The above command does not give me any result.
When I give
I get
This is wrong because there is a common key here -- 01635163349.
So the output I am looking for is :
I am looking for a way to inner join these files. tgpr.tab is a full dump file while dlya0809 is a daily file.
I hope the information I have given is helpful this time
Files being joined by the join utility must be ordered in the collating sequence of sort −b on the fields on which they are being joined.
The 2nd field in tgpr.tab is NOT in sorted order. And, with the sample data you showed us in post #10, every line in dlya0908.tab sorts before the 1st line in tgpr.tab.
.
.
.
I hope the information I have given is helpful this time
Sorry, no. Nothing new. How about the octal dump?
Quote:
join -t" " -11 -22 dlya0908.tab tgpr.tab
,with a <TAB> (\t, 0x09) char following the -t option, will print the desired result unless field 1 in dlya0908.tab can't be joined with field 2 in tgpr.tab due to - obviously non-printing - differences.
Quote:
join -a 1 -a 2 -e "NULL" -o'0,1.1,2.2' dlya0908.tab tgpr.tab
will compare field 1 in dlya0908.tab to field 1 in tgpr.tab and won't find identical entries, of course.
If dlya0908.tab is in sorted order by field 1, and tgpr.tab is not changing while your script is running, you might want to try:
which, with the data shown in post #10, produces the output:
Or, with just:
and those same input files, you get the output:
PS: Note, however, that this only works if your input files actually have tab separated fields. The sample files you have provided in this thread use sequences of spaces as field separators (not tabs).
Last edited by Don Cragun; 08-15-2016 at 08:03 AM..
Reason: Add PS.
This User Gave Thanks to Don Cragun For This Post:
I have a weird issue going on with the join command...
I have two files I am trying to join...here is a line from each file with the important parts marked in red:
file1:
/groupspace/ccops/cmis/bauwkrcn/commsamp_20140315.txt,1
file2:... (3 Replies)
Hi,
I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).
I want to write a script to join the files by the first common column so that in the... (5 Replies)
Dear all,
I have two files (each only contains 1 column) as attached. I want to combined the two files and only show the common records in both files. But when I use join command only the last row was combined. Anyone know what is the problem? I don't know how to write the correct code to only... (2 Replies)
So I want to join two files that have a lot of rows
The file named gen1 has 2 columns:
head gen1
1008567 0.4026931012
1119535 0.7088912314
1120590 0.7093805634
1145994 0.7287952590
1148140 0.7313924434
1155173 0.7359550430
1188481 0.7598914553
1201155 0.7663406553
1206921... (2 Replies)
Hello,
Going through book, "Guide to UNIX Using Linux". I am doing one of the projects that has me writing scripts to join files. Here is my pnumname script and I am extracting the programmers names and numbers from the program file and redirecting the output to the file pnn. I then created a... (0 Replies)
Hi guyz
Excuse me for posting simple question
I tried join and sort and other perl commands but failed
I have 2 files. 1st file contain single column with around 6000 values (rows).
Second file contain 2 columns 1st column is the same column (in 1st file) but randomly ordered and second... (5 Replies)
Hi,
I am a new learner of join command. Some result really make me confused.
Please kindly help me.
input:
file1:
LEO oracle engineer 210375
P.Jones Office Runner ID897
L.Clip Personl Chief ID982
S.Round UNIX admin ID6
file2:
Dept2C ID897 6 years
Dept5Z ID982 1 year
Dept3S ID6 2... (1 Reply)
Hi everybody,
I am hoping somebody here will be either be able to solve my troubles or at least give me a push in the right direction :) .
I am developing a shell script to read in 4 different files worth of data that each contain a list of:
username firstname secondname group score
I... (2 Replies)