thanks for the reply..But its not working fine.Its not showing me any error.but not giving me the correct result also.Its just displaying what ever there in the file
This is a much harder problem that it appears at first glance.
The sort solution proposed by danmero should just give one line sort each set of lines with identical values in the 1st 2 fields, but the one printed depends on the sort order of the remaining fields. The order ginkrf requested was that the last line in the (unsorted) file be printed for each set of lines with identical values in the 1st two fields.
The awk solution proposed by mjf will print the 1st line of each matching set instead of the last line of each matching set. (And, if the 1st 2 fields when concatenated yield the same key even though the fields are different, some desired output lines may be skipped. For example if $1 is "ab" and $2 is "c" in one record and "a" and "bc" in another, they will both have key "abc".)
Since ginkrf didn't say whether the order of the lines in the output has to match the order in which they appeared in the input, I won't try to guess at an efficient way to do what has been requested. If the order is important, the input file could be reversed, fed through mfj's awk script (with !x[$1 $2]++ changed to !x[$1,$2]++), and then reverse the order of the output again. Depending on the output constraints this might or might not be grossly inefficient.
If the output order is not important, it could be done easily with an awk script, but could require almost 400mb of virtual address space to process 20 million 20 byte records.
With a better description of the input (is there anything in a record other than its position in the input that can be used to determine which of several lines with the 1st two fields matching should be printed) and the output constraints, we might be able to provide a better solution. Are there ever more than two lines with the same 1st two fields? If yes, out of the 20 million input records, how many output records do you expect to be produced? Are there likely to be lots of lines that only have one occurrence of the 1st two fields? What are the file sizes (input and output) in bytes (instead of records)? What is the longest input line in bytes?
What OS and hardware are you using? How much memory? How much swap space?
Last edited by Don Cragun; 10-20-2013 at 10:31 AM..
Reason: Fix explanation of the possible failure of mjf's awk proposal.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
We once needed a code that would run on a number of different systems, yet produce consistent results. We ran into the situation that utility uniq was not consistent among the systems. We introducing an option:
By substituting this idea for the system version of uniq, we were able to produce consistent results.
I think this problem can approached with the sort idea of danmero, but with the stable option set, and a "final filter" that eliminates duplicates. Because the file is already sorted, no additional storage is needed: in the final filter, if the fields of the incoming record differ from that in storage, then write out the saved line, and save the new line. If the fields are the same, then save the new instance of the line. Our code was in perl, but awk could be as easily used.
Hi, I have a rquirement in unix as below .
I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column.
ex:
Input Text file:
1|A|apple
2|B|bottle
excel file to be generated as output as... (9 Replies)
Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker...
Column #1 is a simple ID, which is used to identify the duplicate.
Once dups are identified, I need to only keep the one... (2 Replies)
Hi guys,
I need to know how i can ignore Pipe '|' if Pipe is coming as a column in Pipe delimited file
for eg:
file 1:
xx|yy|"xyz|zzz"|zzz|12...
using below awk command
awk 'BEGIN {FS=OFS="|" } print $3
i would get xyz
But i want as :
xyz|zzz to consider as whole column... (13 Replies)
I have file as below
column1|column2|column3|column4|column5|
fill1|fill2|fill3|fill4|fill5|
abc1|abc2|abc3|abc4|abc5|
.
.
.
.
i need to remove column2,3, from that file
column1|column4|column5|
fill1|fill4|fill5|
abc1|abc4|abc5|
.
.
. (3 Replies)
I have a file which was pipe delimited, I need to make it tab delimited. I tried with sed but no use
cat file | sed 's/|//t/g'
The above command substituted "/t" not tab in the place of pipe.
Sample file:
abc|123|2012-01-30|2012-04-28|xyz
have to convert to:
abc 123... (6 Replies)
Hi All,
I have space delimited file similar to the one as shown below.. I need to convert it as a pipe delimited, the values inside the pipe delimited file should be as highlighted...
AA ATIU2345098809 009697 005374
BB ATIU2345097809 005445 006518
CC ATIU9685098809 003215 003571
DD... (7 Replies)
This is my input file with extra information in the HEADER and leading & trailing SPACES between PIPE delimiter.
02/04/2010 Dynamic List Display 1
--------------------------------------------------------------------------------------... (6 Replies)
i have a file whose data is like this::
osr_pe_assign|-120|wg000d@att.com|4|
osr_evt|-21|wg000d@att.com|4|
pe_avail|-21|wg000d@att.com|4|
osr_svt|-11|wg000d@att.com|4|
pe_mop|-13|wg000d@att.com|4|
instar_ready|-35|wg000d@att.com|4|
nsdnet_ready|-90|wg000d@att.com|4|... (6 Replies)
:)Hi Friends,
I have certain log files extracted. I want it to be converted in pipe ( | ) delimited file. How do i do it?
E.g.
Account Balance : 123456789 Rs O/P (Account Balance: | 123456789 Rs)
Account Balance (Last) > 987654321 Rs O/P (Account Balance (Last) | 987654321 Rs)
Last... (5 Replies)