I need to delete all occurences of the repeated lines from a file and retain only the lines that is not repeated elsewhere in the file. As seen below the first two lines are same except that for the string "From BaseLine" and "From SMS".I shouldn't consider the string "From SMS" and "From BaseLine" for checking the repeated lines. I want to retain only the third line.
From BaseLine - 0T001 000 999999999 00101 20080411000000T1023.27
From SMS - 0T001 000 999999999 00101 20080411000000T1023.27
From BaseLine - 0T001 000 999999999 00101 20080411000000T109.019
My output should be the third line alone.
These file size would range from 100 MB to 900MB. The performance factor should also be considered. Can you please help me out?
The following expression/action pair is execute first:
When the string in the second field is seen for the first time the element/value of the associative array u will be 0 (false for AWK), because of the implicit variable initialization. In idiomatic AWK it could be written as:
Which actually means:
So, when NOT array[key]++ returns true (0 -> false, !0 -> true), do the following: build another associative array r (r for record, because it holds the entire record), $2 as key, $0 as element/value. So we store one copy (the first one) of each unique $2 while we're counting the unique values of $2 in the expression part - u[$2] ++.
After all the input has been read the END block is executed.
For every key (k) in the r array verify: if the element/value in the u array with the same key (k) equals 1 (has only one entry in the entire input), print the corresponding element/value of the r (record) array.
Hi,
I have an input file as shown below:
20140102;13:30;FR-AUD-LIBOR-1W;2.495
20140103;13:30;FR-AUD-LIBOR-1W;2.475
20140106;13:30;FR-AUD-LIBOR-1W;2.495
20140107;13:30;FR-AUD-LIBOR-1W;2.475
20140108;13:30;FR-AUD-LIBOR-1W;2.475
20140109;13:30;FR-AUD-LIBOR-1W;2.475... (2 Replies)
Hi, I already succeed moving a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla).
But it can't delete the old row. Please help me with the script.
my php script:
INSERT INTO... (2 Replies)
Hi, I want to move a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla).
I already use this script but doesn't work as I expected.
CHECK_KEYWORD="$( mysql -uroot -p123456 smsd -N... (7 Replies)
Hello,
I have a large database in which name homonyms are arranged in a row. Since the database is large and generated by hand, very often dupes creep in. I want to remove the dupes either using an awk or perl script.
An input is given below
The expected output is given below:
As can be... (2 Replies)
Hi all
I have a big file like this in rows and columns from 2 column onwards the next column is desciption of previous column means 3rd columns is description of 2 columns and 5 column is description of 4 column.
All cloumns are separated by comma
... (1 Reply)
Hello,
I'm have a file of xy data with over 1000 records. I want to delete both x and y values for any record that has the same x value as any previous record thus removing the duplicates from my file.
Can anyone help?
Thanks,
Dan (3 Replies)
Hi,
How to identify duplicate columns in a row?
Input data: may have 30 columns
9211480750 LK 120070417 920091030
9211480893 AZ 120070607
9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914
9211479568 AZ 120070327 320090730
9211479571 MM 120070326
9211480892 MM 120070324... (3 Replies)
I'm trying to remove lines of data that contain duplicate data in a specific column.
For example.
apple 12345
apple 54321
apple 14234
orange 55656
orange 88989
orange 99898
I only want to see
apple 12345
orange 55656
How would i go about doing this? (5 Replies)
I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate)
There can be duplicate rows for a key with different dates.
I need to keep only rows with latest date in this case.
Example data: ... (4 Replies)