Strange situation of file sorting and merging


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Strange situation of file sorting and merging
# 1  
Old 08-08-2012
Strange situation of file sorting and merging

I have a strange situation of sorting and merging two files based on similar columns

previusly both files has same count of records so, I made below way which is working fine until they reduced the count of one files .
I.e. some times the count of records of both will same and some times it won't but only the columns will remain unchanged .

Code:
 cat /var/tmp/today.csv > /var/tmp/outputfile
                for file in `ls /var/tmp/yday.csv | xargs `
                do
                        cut -d"," -f7,8,9 $file > /var/tmp/tmp
                        paste -d"," /var/tmp/outputfile /var/tmp/tmp >> /var/tmp/final_outputfile
                done

working well for same row count files

and columns we are talking about are 1,2,3,5,6 which are common in both files.

now files has changed
for today
Code:
36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

for yday.csv
Code:
36000807	A	123 	76	0	1	0.1	 0.2 	0.3
36000807	A	123 	76	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	76	0	10	-0.1 	 0.2 	-0.3

now final_outputfile
should looklike
Code:
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	0.1	0.2	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	NA	NA	NA

it's possible for yday.csv first row could be missing totally or any other rows
then
final_output should be looklike
Code:
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	NA	NA	NA
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	NA	NA	NA

anyone have any idea to work this out?
# 2  
Old 08-08-2012
I am struggling to try and understand the logic which produces that output from that input. What decides which rows get NA'ed and which don't?
# 3  
Old 08-08-2012
Your script does not - by all means - produce the output you describe, even with files of equal line count. The best that I could get at with generous interpretation of the command in the script is
Code:
36000807    A    123     78    0    1    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    1    -0.1      0.2     -0.3,-0.1      0.2     -0.3
36000807    A    123     78    0    5    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    5    -0.1      0.2     -0.3,-0.1      0.2     -0.3
36000807    A    123     78    0    10    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    10    -0.1      0.2     -0.3,-0.1      0.2     -0.3

When files are of different length, the leftover lines will get no or empty fields pasted. Why should a line with field 4 = 79 in today.csv get NAs when field 4 in yday.csv is 76? Please explain the intended algorithm.
# 4  
Old 08-09-2012
Hi Corona,

Apology for making a mess .Smilie

My requirement is combine yday and today csv file and create final output file. taking consideration of common columns 1,2,3,5,6 in both yday and today file .

sometimes today file counts are greater than the file counts of yday one.

So merging would only take 1st row of (combined column of 1,2,3,5,6 ) lowest count file (yady) to the first row (combined column of 1,2,3,5,6 ) of highest count file (today) and the very same records (combined column of 1,2,3,5,6 ) for next one in the highest count file (today) should have NA NA NA as there is no record matching for lowest count since 1st row of (combined column of 1,2,3,5,6 ) lowset count file matched above.

please refer below input file for different cases
PS: All field separator is , instead of "TAB". My mistake.

case one
today.csv
Code:
36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv
Code:
36000807	A	123 	76	0	1	0.56	 0.47 	0.39
36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

merged file
Code:
f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	76	0.56	 0.47 	0.39
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	76	-0.14 	 0.25 	-0.53
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	NA	NA	NA
--------------------------today.csv------------------------------------------****-----------yday.csv------------

option two
Code:
today.csv
36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv
Code:
36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

final merged file
Code:
f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	NA	NA	NA	NA
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	76	-0.14 	 0.25 	-0.53
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	NA	NA	NA
--------------------------today.csv-----------------------------------------***-----------yday.csv------------

case three
today.csv
Code:
36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv
Code:
36000807	A	123 	76	0	1	0.56	 0.47 	0.39
36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

final merged one
Code:
f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	76	0.56	 0.47 	0.39
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	-0.14 	 0.25 	-0.53
--------------------------today.csv----------------------------------------****-----------yday.csv------------

f1=today.csv
f2=yday.csv
f1.X today.csv's X column number
f2.X yday.csv's X column number

---------- Post updated at 06:40 AM ---------- Previous update was at 06:24 AM ----------

Hi Corona,

I tried to resolve your struggling at my best :-) . Do let me know, if something left out.
Cheers!!!

Last edited by manas_ranjan; 08-09-2012 at 08:37 AM..
# 5  
Old 08-09-2012
You mean to say sixth column will be common in both the file and need to merge on the basis of sixth column.
is the file1 sixth column will be sorted as shown in sample file
# 6  
Old 08-09-2012
Would this come near what you expect to see:
Code:
awk '    BEGIN {OFS="\t"}
    {
     a[7]=a[8]=a[9]="NA";
     if (tmp!=$6) {getline p < "yday";
             split(p,a)}
     tmp=$6
    }
    {print $0, a[7], a[8], a[9]}
    ' today

Code:
36000807    A    123     78    0    1    0.1     0.2     0.3    0.56    0.47    0.39
36000807    A    123     79    0    1    -0.1    0.2     -0.3    NA    NA    NA
36000807    A    123     78    0    5    0.1     0.2     0.3    -0.34    0.27    -0.38
36000807    A    123     79    0    5    -0.1    0.2     -0.3    NA    NA    NA
36000807    A    123     78    0    10   0.1     0.2     0.3    -0.14    0.25    -0.53
36000807    A    123     79    0    10   -0.1    0.2     -0.3    NA    NA    NA

---------- Post updated at 02:06 PM ---------- Previous update was at 01:54 PM ----------

This will meet your requirements on line 1:
Code:
awk 'BEGIN {OFS="\t"}
    {
     a[4]=a[7]=a[8]=a[9]="NA";
     if (tmp!=$6 && NR>1)
         {getline p < "yday";  split(p,a)}
     tmp=$6
    }
    {print $0, a[4], a[7], a[8], a[9]}
    ' today

Make the OFS a , if you need that as output delimiter.
# 7  
Old 08-09-2012
Code:
 awk 'BEGIN{OFS="\t"}
 NR==FNR{
	a[$6]=$0
	}
 NR!=FNR{
	if(a[$6])
		{	
			split(a[$6],b);
			if(!c[$6])
				{
					c[$6]=1;
					print $0,b[4],b[7],b[8],b[9]
				}
			else
				{
					print $0,b[4],"NA","NA","NA"
				}
		}
	else
		{
			print $0,"NA","NA","NA","NA"
		}
	}' yest.csv today.csv

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging and sorting files

I have the following files: file A Col1 Col2 A 1 B 2 C 3 D 4 file B Col1 Col2 A 1 Aa 1 B 2 C 3 D 4 file C Col1 Col2 A 1 (1 Reply)
Discussion started by: ramky79
1 Replies

2. Shell Programming and Scripting

merging two file

Dear All, I have two file like this: file1: a1234 b1235 c4678 d7859 file2 : e4575 f7869 g7689 h9687 I want output like this: a1234 b1235 c4678 (2 Replies)
Discussion started by: attila
2 Replies

3. Shell Programming and Scripting

Merging data from one file into another

Hello, I have a master database of a dictionary with the following structure: a=b (b is a Unicode string) a is the English part and b is the equivalent in a foreign language I have also another file which has a database where the /b/ part of the string has been corrected by an expert. let us... (5 Replies)
Discussion started by: gimley
5 Replies

4. UNIX for Dummies Questions & Answers

Sorting and merging files.

Hi I’m new to scripting and have only had about two days experience with this. I have questions about a bash/gawk script. Problem: I have 27 files, which needs to get merged into one, the files are separated into 8 subdivisions containing a 3 row data description. Example of data File.1 ... (7 Replies)
Discussion started by: Bateman1001
7 Replies

5. Programming

Help in sorting and merging lists

Hi everyone, need your help in sorting and merging two numerical lists Example: I have one list 1 2 3 4 5 7 and the other 4 6 8, then the final output should be 1 2 3 4 5 6 7 8 Requesting your kind help in this Regards, RB :) (1 Reply)
Discussion started by: ramakanth_burra
1 Replies

6. Shell Programming and Scripting

Extracting a column from a file and merging with other file using awk

Hi All: I have following files: File 1: <header> text... text .. text .. text .. <\header> x y z ... File 2: <header> text... text .. text .. (4 Replies)
Discussion started by: mrn006
4 Replies

7. UNIX for Dummies Questions & Answers

merging 2 file

I have 2 files file1.txt a 123 aqsw c 234 sfdr fil2.txt b 345 hgy d 4653 jgut I want to merger in such a manner the the output file should be outfile.txt a 123 aqsw b 345 hgy c 234 sfdr d 4653 jgut Do we have any command to achive this? (8 Replies)
Discussion started by: siba.s.nayak
8 Replies

8. Shell Programming and Scripting

How to Sort a file for given situation?

Hi All, How can you sort a file that is doubled space ( where even number lines are blank lines) and still preserves the blank lines? You can use grep,sed and regular expression. Thanks Vishal (4 Replies)
Discussion started by: vishalpatel03
4 Replies

9. Programming

strange situation in file

Hi All, I am writing some data's into a file from C++ program. The files which i am writing is of fixed length . say 232 in length per line. I am writing as . my c code is as ... (0 Replies)
Discussion started by: arunkumar_mca
0 Replies

10. UNIX for Dummies Questions & Answers

strange situation with nslookup on Linux

Hey, I have a problem with nslookup under the newly installed mandrake 9.1, and as I see now also under Redhat 8.0 I have a pc called evo with the ip 10.0.0.1 which entered correctly in the /etc/hosts file. Connection to the internet is via an adsl-router. I have the nameservers from my... (2 Replies)
Discussion started by: mod
2 Replies
Login or Register to Ask a Question