Strange situation of file sorting and merging

08-08-2012

Registered User

300, 0

Join Date: Jul 2007

Last Activity: 13 May 2020, 12:01 PM EDT

Location: Amsterdam

Posts: 300

Thanks Given: 20

Thanked 0 Times in 0 Posts

Strange situation of file sorting and merging

I have a strange situation of sorting and merging two files based on similar columns

previusly both files has same count of records so, I made below way which is working fine until they reduced the count of one files .
I.e. some times the count of records of both will same and some times it won't but only the columns will remain unchanged .

Code:

 cat /var/tmp/today.csv > /var/tmp/outputfile
                for file in `ls /var/tmp/yday.csv | xargs `
                do
                        cut -d"," -f7,8,9 $file > /var/tmp/tmp
                        paste -d"," /var/tmp/outputfile /var/tmp/tmp >> /var/tmp/final_outputfile
                done

working well for same row count files

and columns we are talking about are 1,2,3,5,6 which are common in both files.

now files has changed
for today

Code:

36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

for yday.csv

Code:

36000807	A	123 	76	0	1	0.1	 0.2 	0.3
36000807	A	123 	76	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	76	0	10	-0.1 	 0.2 	-0.3

now final_outputfile
should looklike

Code:

36000807	A	123 	78	0	1	0.1	 0.2 	0.3	0.1	0.2	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	NA	NA	NA

it's possible for yday.csv first row could be missing totally or any other rows
then
final_output should be looklike

Code:

36000807	A	123 	78	0	1	0.1	 0.2 	0.3	NA	NA	NA
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	-0.1 	 0.2 	-0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	NA	NA	NA

anyone have any idea to work this out?

manas_ranjan

View Public Profile for manas_ranjan

Find all posts by manas_ranjan

08-08-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

I am struggling to try and understand the logic which produces that output from that input. What decides which rows get NA'ed and which don't?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

08-08-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Your script does not - by all means - produce the output you describe, even with files of equal line count. The best that I could get at with generous interpretation of the command in the script is

Code:

36000807    A    123     78    0    1    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    1    -0.1      0.2     -0.3,-0.1      0.2     -0.3
36000807    A    123     78    0    5    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    5    -0.1      0.2     -0.3,-0.1      0.2     -0.3
36000807    A    123     78    0    10    0.1     0.2     0.3,0.1     0.2     0.3
36000807    A    123     79    0    10    -0.1      0.2     -0.3,-0.1      0.2     -0.3

When files are of different length, the leftover lines will get no or empty fields pasted. Why should a line with field 4 = 79 in today.csv get NAs when field 4 in yday.csv is 76? Please explain the intended algorithm.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-09-2012

Registered User

300, 0

Join Date: Jul 2007

Last Activity: 13 May 2020, 12:01 PM EDT

Location: Amsterdam

Posts: 300

Thanks Given: 20

Thanked 0 Times in 0 Posts

Hi Corona,

Apology for making a mess .

My requirement is combine yday and today csv file and create final output file. taking consideration of common columns 1,2,3,5,6 in both yday and today file .

sometimes today file counts are greater than the file counts of yday one.

So merging would only take 1st row of (combined column of 1,2,3,5,6 ) lowest count file (yady) to the first row (combined column of 1,2,3,5,6 ) of highest count file (today) and the very same records (combined column of 1,2,3,5,6 ) for next one in the highest count file (today) should have NA NA NA as there is no record matching for lowest count since 1st row of (combined column of 1,2,3,5,6 ) lowset count file matched above.

please refer below input file for different cases
PS: All field separator is , instead of "TAB". My mistake.

case one
today.csv

Code:

36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv

Code:

36000807	A	123 	76	0	1	0.56	 0.47 	0.39
36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

merged file

Code:

f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	76	0.56	 0.47 	0.39
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	76	-0.14 	 0.25 	-0.53
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	NA	NA	NA
--------------------------today.csv------------------------------------------****-----------yday.csv------------

option two

Code:

today.csv
36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3
36000807	A	123 	78	0	10	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv

Code:

36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

final merged file

Code:

f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	NA	NA	NA	NA
36000807	A	123 	79	0	1	-0.1 	 0.2 	-0.3	NA	NA	NA	NA
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	5	-0.1 	 0.2 	-0.3	76	NA	NA	NA
36000807	A	123 	78	0	10	0.1	 0.2 	0.3	76	-0.14 	 0.25 	-0.53
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	NA	NA	NA
--------------------------today.csv-----------------------------------------***-----------yday.csv------------

case three
today.csv

Code:

36000807	A	123 	78	0	1	0.1	 0.2 	0.3
36000807	A	123 	78	0	5	0.1	 0.2 	0.3
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3

yday.csv

Code:

36000807	A	123 	76	0	1	0.56	 0.47 	0.39
36000807	A	123 	76	0	5	-0.34 	 0.27 	-0.38
36000807	A	123 	76	0	10	-0.14 	 0.25 	-0.53

final merged one

Code:

f1.1		f1.2	f1.3	f1.4	f1.5	f1.6	f1.7	f1.8	f1.9	f2.4	f2.7	f2.8	f2.9
36000807	A	123 	78	0	1	0.1	 0.2 	0.3	76	0.56	 0.47 	0.39
36000807	A	123 	78	0	5	0.1	 0.2 	0.3	76	-0.34 	 0.27 	-0.38
36000807	A	123 	79	0	10	-0.1 	 0.2 	-0.3	76	-0.14 	 0.25 	-0.53
--------------------------today.csv----------------------------------------****-----------yday.csv------------

f1=today.csv
f2=yday.csv
f1.X today.csv's X column number
f2.X yday.csv's X column number

---------- Post updated at 06:40 AM ---------- Previous update was at 06:24 AM ----------

Hi Corona,

I tried to resolve your struggling at my best :-) . Do let me know, if something left out.
Cheers!!!

Last edited by manas_ranjan; 08-09-2012 at 08:37 AM..

manas_ranjan

View Public Profile for manas_ranjan

Find all posts by manas_ranjan

08-09-2012

Registered User

206, 40

Join Date: Jul 2012

Last Activity: 23 February 2018, 12:29 AM EST

Location: Panchkula

Posts: 206

Thanks Given: 0

Thanked 40 Times in 40 Posts

You mean to say sixth column will be common in both the file and need to merge on the basis of sixth column.
is the file1 sixth column will be sorted as shown in sample file

raj_saini20

View Public Profile for raj_saini20

Find all posts by raj_saini20

08-09-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Would this come near what you expect to see:

Code:

awk '    BEGIN {OFS="\t"}
    {
     a[7]=a[8]=a[9]="NA";
     if (tmp!=$6) {getline p < "yday";
             split(p,a)}
     tmp=$6
    }
    {print $0, a[7], a[8], a[9]}
    ' today

Code:

36000807    A    123     78    0    1    0.1     0.2     0.3    0.56    0.47    0.39
36000807    A    123     79    0    1    -0.1    0.2     -0.3    NA    NA    NA
36000807    A    123     78    0    5    0.1     0.2     0.3    -0.34    0.27    -0.38
36000807    A    123     79    0    5    -0.1    0.2     -0.3    NA    NA    NA
36000807    A    123     78    0    10   0.1     0.2     0.3    -0.14    0.25    -0.53
36000807    A    123     79    0    10   -0.1    0.2     -0.3    NA    NA    NA

---------- Post updated at 02:06 PM ---------- Previous update was at 01:54 PM ----------

This will meet your requirements on line 1:

Code:

awk 'BEGIN {OFS="\t"}
    {
     a[4]=a[7]=a[8]=a[9]="NA";
     if (tmp!=$6 && NR>1)
         {getline p < "yday";  split(p,a)}
     tmp=$6
    }
    {print $0, a[4], a[7], a[8], a[9]}
    ' today

Make the OFS a , if you need that as output delimiter.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-09-2012

Registered User

206, 40

Join Date: Jul 2012

Last Activity: 23 February 2018, 12:29 AM EST

Location: Panchkula

Posts: 206

Thanks Given: 0

Thanked 40 Times in 40 Posts

Code:

 awk 'BEGIN{OFS="\t"}
 NR==FNR{
	a[$6]=$0
	}
 NR!=FNR{
	if(a[$6])
		{	
			split(a[$6],b);
			if(!c[$6])
				{
					c[$6]=1;
					print $0,b[4],b[7],b[8],b[9]
				}
			else
				{
					print $0,b[4],"NA","NA","NA"
				}
		}
	else
		{
			print $0,"NA","NA","NA","NA"
		}
	}' yest.csv today.csv

raj_saini20

View Public Profile for raj_saini20

Find all posts by raj_saini20

Shell Programming and Scripting

Strange situation of file sorting and merging

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging and sorting files

Discussion started by: ramky79

2. Shell Programming and Scripting

merging two file

Discussion started by: attila

3. Shell Programming and Scripting

Merging data from one file into another

Discussion started by: gimley

4. UNIX for Dummies Questions & Answers

Sorting and merging files.

Discussion started by: Bateman1001

5. Programming

Help in sorting and merging lists

Discussion started by: ramakanth_burra

6. Shell Programming and Scripting

Extracting a column from a file and merging with other file using awk

Discussion started by: mrn006

7. UNIX for Dummies Questions & Answers

merging 2 file

Discussion started by: siba.s.nayak

8. Shell Programming and Scripting

How to Sort a file for given situation?

Discussion started by: vishalpatel03

9. Programming

strange situation in file

Discussion started by: arunkumar_mca

10. UNIX for Dummies Questions & Answers

strange situation with nslookup on Linux

Discussion started by: mod