UNIX compare, sort lines and append difference


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting UNIX compare, sort lines and append difference
# 1  
Old 08-13-2013
UNIX compare, sort lines and append difference

To make it easier, i gave following example. It is not homework or classwork. Instead, i have a huge csv file dump from tsql with 15 columns and around 300 rows. I was able to extract content that needs to be really converted. Here is the extract:



Code:
ES FP,B1ES FP,70000,I,SL22,SL22 (70000)
ES FP,B2ES FP,80000,I,XX22,XX22 (80000)
VIL SJ,B1VIL SJ,20000,I,SL22,SL22 (20000)
VIL SJ,B2VIL SJ,20000,I,XX22,XX22 (20000)
VIL SJ,S3VIL SJ,25040222,N,SL22,SL22 (25040222)
VIL SJ,S4VIL SJ,25040222,N,XX22,XX22 (25040222)
MING NO,B1MING NO,240000,I,SL22,SL22 (240000)
MING NO,B2MING NO,240000,I,XX22,XX22 (240000)
BCHN SE,B1BCHN SE,20000,I,SL22,SL22 (20000)
BCHN SE,B2BCHN SE,20000,I,XX22,XX22 (20000)

desire result should be:

Code:
ES FP,I,SL22 (70000)|XX22 (80000)
VIL SJ,I,SL22 (20000)|,XX22 (20000)
VIL SJ,N,SL22 (25040222)|XX22 (25040222)
MING NO,I,SL22 (240000)|XX22 (240000)
BCHN SE,I,SL22 (20000)|XX22 (20000)

based on sorting of field f4.

to make it simple, I created two files using grep for I and N but don't know how to append the third column if f1,f2 are same using comma as delim.

=========================

UNIX compare, sort lines and append difference
Hi,

I have a file that needs to be converted:

content is:

Code:
a, b, 4
a ,b, 5
x, y, 1
a, b, 1
x, y, 3

how can i get:

Code:
a, b, 1|4|5

x,y 1|3


Moderator's Comments:
Mod Comment Use code tags please, check PM.


---------- Post updated at 10:37 AM ---------- Previous update was at 10:27 AM ----------

Code:
ES FP,B1ES FP,70000,I,SL22,SL22 (70000)
ES FP,B2ES FP,80000,I,XX22,XX22 (80000)
VIL SJ,B1VIL SJ,20000,I,SL22,SL22 (20000)
VIL SJ,B2VIL SJ,20000,I,XX22,XX22 (20000)
VIL SJ,S3VIL SJ,25040222,N,SL22,SL22 (25040222)
VIL SJ,S4VIL SJ,25040222,N,XX22,XX22 (25040222)
MING NO,B1MING NO,240000,I,SL22,SL22 (240000)
MING NO,B2MING NO,240000,I,XX22,XX22 (240000)
BCHN SE,B1BCHN SE,20000,I,SL22,SL22 (20000)
BCHN SE,B2BCHN SE,20000,I,XX22,XX22 (20000)


Last edited by zaxxon; 08-13-2013 at 11:29 AM.. Reason: code tags
# 2  
Old 08-13-2013
Code:
cat file

ES FP,B1ES FP,70000,I,SL22,SL22 (70000)
ES FP,B2ES FP,80000,I,XX22,XX22 (80000)
VIL SJ,B1VIL SJ,20000,I,SL22,SL22 (20000)
VIL SJ,B2VIL SJ,20000,I,XX22,XX22 (20000)
VIL SJ,S3VIL SJ,25040222,N,SL22,SL22 (25040222)
VIL SJ,S4VIL SJ,25040222,N,XX22,XX22 (25040222)
MING NO,B1MING NO,240000,I,SL22,SL22 (240000)
MING NO,B2MING NO,240000,I,XX22,XX22 (240000)
BCHN SE,B1BCHN SE,20000,I,SL22,SL22 (20000)
BCHN SE,B2BCHN SE,20000,I,XX22,XX22 (20000)

awk -F, '{$1==$1 && $2==$2;a=$1;b=$4;c=$6;getline;d=$NF;print a","b","c"|"d}' file

ES FP,I,SL22 (70000)|XX22 (80000)
VIL SJ,I,SL22 (20000)|XX22 (20000)
VIL SJ,N,SL22 (25040222)|XX22 (25040222)
MING NO,I,SL22 (240000)|XX22 (240000)
BCHN SE,I,SL22 (20000)|XX22 (20000)

This User Gave Thanks to in2nix4life For This Post:
# 3  
Old 08-13-2013
That a lot, how can i even sort on f1 again and append f2.

i.e

from

Code:
ES FP,I,SL22 (70000)|XX22 (80000)
VIL SJ,I,SL22 (20000)|XX22 (20000)
VIL SJ,N,SL22 (25040222)|XX22 (25040222)
MING NO,I,SL22 (240000)|XX22 (240000)
BCHN SE,I,SL22 (20000)|XX22 (20000)

to
Code:
ES FP,I,SL22 (70000)|XX22 (80000)
VIL SJ,I,SL22 (20000)|XX22 (20000),N,SL22 (25040222)|XX22 (25040222)
MING NO,I,SL22 (240000)|XX22 (240000)
BCHN SE,I,SL22 (20000)|XX22 (20000)


Last edited by nike27; 08-13-2013 at 01:36 PM.. Reason: typo.
# 4  
Old 08-13-2013
Is it always two consecutive lines, or, like in your a, b, 1|4|5 example, can the lines be spread over the file?
# 5  
Old 08-13-2013
yes, it should be fine. Basically, i am looking for three columns that can be feed back to process the file.

so from following:

Code:
ES FP,SL22 (70000)|XX22 (80000)
VIL SJ,SL22 (20000)|XX22 (20000)
VIL SJ,SL22 (25040222)|XX22 (25040222)
MING NO,SL22 (240000)|XX22 (240000)
BCHN SE,SL22 (20000)|XX22 (20000)
BCHN SE,SL22 (100000)|XX22 (100000)
DLAR LN,SL22 (50000)|XX22 (50000)

it needs to be converted to:

Code:
ES FP,SL22 (70000)|XX22 (80000)
VIL SJ,SL22 (20000)|XX22 (20000),SL22 (25040222)|XX22 (25040222)
MING NO,SL22 (240000)|XX22 (240000)
BCHN SE,SL22 (20000)|XX22 (20000),SL22 (100000)|XX22 (100000)
DLAR LN,SL22 (50000)|XX22 (50000)

---------- Post updated at 02:32 PM ---------- Previous update was at 12:50 PM ----------

lines can be anywhere in the file and it can be more than more two lines.

---------- Post updated at 02:36 PM ---------- Previous update was at 02:32 PM ----------

also i sorted so it is always consecutive.

---------- Post updated at 04:08 PM ---------- Previous update was at 02:36 PM ----------

Any thoughts , thanks in advance.
# 6  
Old 08-14-2013
Quote:
Originally Posted by in2nix4life
Code:
cat file

ES FP,B1ES FP,70000,I,SL22,SL22 (70000)
ES FP,B2ES FP,80000,I,XX22,XX22 (80000)
VIL SJ,B1VIL SJ,20000,I,SL22,SL22 (20000)
VIL SJ,B2VIL SJ,20000,I,XX22,XX22 (20000)
VIL SJ,S3VIL SJ,25040222,N,SL22,SL22 (25040222)
VIL SJ,S4VIL SJ,25040222,N,XX22,XX22 (25040222)
MING NO,B1MING NO,240000,I,SL22,SL22 (240000)
MING NO,B2MING NO,240000,I,XX22,XX22 (240000)
BCHN SE,B1BCHN SE,20000,I,SL22,SL22 (20000)
BCHN SE,B2BCHN SE,20000,I,XX22,XX22 (20000)

awk -F, '{$1==$1 && $2==$2;a=$1;b=$4;c=$6;getline;d=$NF;print a","b","c"|"d}' file

$2==$2 means to compare the second column to next second column right? if so why you have used
Code:
$2==$2

as they are not equal when $1==$1 right?
This User Gave Thanks to EAGL€ For This Post:
# 7  
Old 08-14-2013
Actually I do not understand why $1==$1 && $2==$2; is there. IMO it has no function and could just be left out.

--
This should be equivalent to the suggestion in post #2:
Code:
awk '{p=$NF; getline; print $1, $4, p "|" $NF}' FS=, OFS=, file


Last edited by Scrutinizer; 08-14-2013 at 10:28 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Difference of Sort -n -k2 -k3 & Sort -n -k2,3

Hi, Could anyone kindly show me a link or explain the difference between sort -n -k2 -k3 & sort -n -k2,3 Also, if I like to remove the row with repetition at both $2 and $3, Can I safely use sort -u -k2 -k3 Example; 100 20 30 100 20 30 So, both $2 and $3 are same and I... (2 Replies)
Discussion started by: Indra2011
2 Replies

2. Shell Programming and Scripting

Compare file1 for matching line in file2 and print the difference in matching lines

Hello, I have two files file 1 and file 2 each having result of a query on certain database tables and need to compare for Col1 in file1 with Col3 in file2, compare Col2 with Col4 and output the value of Col1 from File1 which is a) not present in Col3 of File2 b) value of Col2 is different from... (2 Replies)
Discussion started by: RasB15
2 Replies

3. Shell Programming and Scripting

UNIX compare, sort lines and append difference

To make it easier, i gave following example. It is not homework or classwork. Instead, i have a huge csv file dump from tsql with 15 columns and around 300 rows. I was able to extract content that needs to be really converted. Here is the extract: ES FP,B1ES FP,70000,I,SL22,SL22 (70000) ES... (0 Replies)
Discussion started by: nike27
0 Replies

4. Shell Programming and Scripting

UNIX compare, sort lines and append difference

Hi, I have a file that needs to be converted: content is: a, b, 4 a ,b, 5 x, y, 1 a, b, 1 x, y, 3 how can i get: a, b, 1|4|5 x,y 1|3 (1 Reply)
Discussion started by: nike27
1 Replies

5. UNIX for Dummies Questions & Answers

append following lines to 1st line, every 3 lines

I have output like this: USER_ID 12/31/69 19:00:00 12/31/69 19:00:00 USER_ID 12/31/69 19:00:00 12/31/69 19:00:00 USER_ID 12/31/69 19:00:00 12/31/69 19:00:00 USER_ID 12/31/69 19:00:00 12/31/69 19:00:00 ... where USER_ID is a unique user login followed by their login timestamp and... (6 Replies)
Discussion started by: MaindotC
6 Replies

6. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

7. Shell Programming and Scripting

difference in unix vs. linux sort

Hi, I am using some codes that have been ported from unix to linux, and now the sorting no longer results in the desired ordering. I'm hoping to find a way to mimic the unix sort command in linux. The input file is structured the following: $> cat file.txt... (6 Replies)
Discussion started by: aj.schaeffer
6 Replies

8. Shell Programming and Scripting

Compare two files and print the two lines with difference

I have two files like this: #FILE 1 ABCD 4322 26485 JMTJ 5311 97248 XMPJ 4321 58978 #FILE 2 ABCD 4321 26485 JMTJ 5311 97248 XMPJ 4321 68978 What to do: Compare the two files and find those lines that doesn't match. And have a new file like this: #FILE 3 "from file 1" ABCD 4322 26485... (11 Replies)
Discussion started by: kingpeejay
11 Replies

9. Shell Programming and Scripting

Compare & append

I need to write a Shell Script to compare two files & display the result. If the two files are different append them and store them in a new file. How do i proceed...can someone give me a coding ? (3 Replies)
Discussion started by: rohits1991
3 Replies

10. UNIX for Dummies Questions & Answers

How to Replace,Sort,and Append Character one script

Hi all i am very new to shell scripting,hope u guys can help i need to replace,sort and append character for the file that look like this: 1007032811010001000100000001X700026930409 1007032811010001000200000002X700026930409 1007032711020001000300000003X700026930409... (2 Replies)
Discussion started by: ashikin_8119
2 Replies
Login or Register to Ask a Question