12-24-2010
OK, A. the key is not the whole line, and B. duplicates across files are bad, two complications. Reporting the duplicate means a definition of the original, expecially for non-key data.
- If the lines have identical keys and not identical payload (fields not keys), then will file name order and order in file pick a winner?
- We need to survey all files for duplicate keys, then extract the unique and winners to load, and the losers to report. Think of them as two important products, not picking favorites. While most days there may be no duplicates, if one day there are tons, you still want it to blast through.
- There are two approaches to dealing with duplicate filtering. You can save every key in an associative array (magic box that recalls by value, but may not be robust in speed and stability with huge volume) or you can sort in key, priority order (more traditional and quite robust if you have the disk space. Store just the last key, process the first of every key and log the others. Worked great on tape in 1960 with 16K or RAM! :-)
- Tagging the duplicates by original file means adding the file name to every record, possible but a bit of a luxury if not needed.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I am doing KSH script to remove duplicate lines in a file. Let say the file has format below.
FileA
1253-6856
3101-4011
1827-1356
1822-1157
1822-1157
1000-1410
1000-1410
1822-1231
1822-1231
3101-4011
1822-1157
1822-1231
and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
2. Shell Programming and Scripting
I have following file content (3 fields each line):
23 888 10.0.0.1
dfh 787 10.0.0.2
dssf dgfas 10.0.0.3
dsgas dg 10.0.0.4
df dasa 10.0.0.5
df dag 10.0.0.5
dfd dfdas 10.0.0.5
dfd dfd 10.0.0.6
daf nfd 10.0.0.6
...
as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies
3. Shell Programming and Scripting
Hi,
is it possible to remove all duplicate lines from all txt files in a specific folder?
This is too hard for me maybe someone could help.
lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50
each textfile has lines with text.
I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies
4. Shell Programming and Scripting
Input:
hello hello
hello hello
monkey
donkey
hello hello
drink
dance
drink
Output should be:
hello hello
monkey
donkey
drink
dance (9 Replies)
Discussion started by: cola
9 Replies
5. Shell Programming and Scripting
Hi,
I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates.
Thanks in advance,
sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies
6. Shell Programming and Scripting
Hi, I have a huge file which is about 50GB. There are many lines. The file format likes
21 rs885550 0 9887804 C C T C C C C C C C
21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0
21 rs303304 0 9941889 A A A A A A A A A A
22 rs303304 0 9941890 0 A A A A A A A A A
The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies
7. Shell Programming and Scripting
Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example:
input:
<string-array name="threeItems">
<item>item1</item>
<item>item2</item>
<item>item3</item>
</string-array>
<string-array name="twoItems">
<item>item1</item>
<item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies
8. UNIX for Dummies Questions & Answers
Hi
I need this output. Thanks.
Input:
TAZ
YET
FOO
FOO
VAK
TAZ
BAR
Output:
YET
VAK
BAR (10 Replies)
Discussion started by: tara123
10 Replies
9. Windows & DOS: Issues & Discussions
So, I have text files,
one "fail.txt"
And one
"color.txt"
I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file.
Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies
10. Shell Programming and Scripting
Hi All,
I am storing the result in the variable result_text using the below code.
result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines.
file and time for the interval 03:30 - 03:45
file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies
LEARN ABOUT CENTOS
keyctl_update
KEYCTL_UPDATE(3) Linux Key Management Calls KEYCTL_UPDATE(3)
NAME
keyctl_update - Update a key
SYNOPSIS
#include <keyutils.h>
long keyctl_update(key_serial_t key, const void *payload,
size_t plen);
DESCRIPTION
keyctl_update() updates the payload of a key if the key type permits it.
The caller must have write permission on a key to be able update it.
payload and plen specify the data for the new payload. payload may be NULL and plen may be zero if the key type permits that. The key
type may reject the data if it's in the wrong format or in some other way invalid.
RETURN VALUE
On success keyctl_update() returns 0. On error, the value -1 will be returned and errno will have been set to an appropriate error.
ERRORS
ENOKEY The key specified is invalid.
EKEYEXPIRED
The key specified has expired.
EKEYREVOKED
The key specified had been revoked.
EINVAL The payload data was invalid.
ENOMEM Insufficient memory to store the new payload.
EDQUOT The key quota for this user would be exceeded by increasing the size of the key to accommodate the new payload.
EACCES The key exists, but is not writable by the calling process.
EOPNOTSUPP
The key type does not support the update operation on its keys.
LINKING
This is a library function that can be found in libkeyutils. When linking, -lkeyutils should be specified to the linker.
SEE ALSO
keyctl(1),
add_key(2),
keyctl(2),
request_key(2),
keyctl(3),
request-key(8)
Linux 4 May 2006 KEYCTL_UPDATE(3)