05-31-2011
Once sort can ensure the newest by key is first, and a sort -u following on kye will keep the first.
True, my script bit counts full duplicates uniquely. If you want the count or difference, you can compare the input of 'sort -u' to the output, perhaps using 'comm'. Since 'comm' expects unique records, if there are full duplicates, I run the sorted lists through 'uniq -c' to take full duplicates to a count and one record.
Awk is limited by its heap memory, but sort/comm/uniq/wc and pipes are robust with huge sets.
8 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have the following code.
printf "Test Message Report" > report.txt
while read line
do
msgid=$(printf "%n" "$line" | cut -c1-6000| sed -e 's///g' -e 's|.*ex:Msg\(.*\)ex:Msg.*|\1|')
putdate=$(printf "%n" "$line" | cut -c1-6000| sed -e 's///g' -e 's|.*PutDate\(.*\)PutTime.*|\1|')... (9 Replies)
Discussion started by: gugs
9 Replies
2. Shell Programming and Scripting
I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies
3. Shell Programming and Scripting
Hi,
I need some help creating a tidy shell program with awk or other language that will split large length files efficiently.
Here is an example dump:
<A001_MAIL.DAT>
0001 Ronald McDonald 01 H81
0002 Elmo St. Elmo 02 H82
0003 Cookie Monster 01 H81
0004 Oscar ... (16 Replies)
Discussion started by: mkastin
16 Replies
4. Shell Programming and Scripting
Hi All,
I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies
5. Programming
Hi there,
I had run into some fortran code to modify. Obviously, it was written without thinking of high performance computing and not parallelized... Now I would like to make the code "on track" and parallel. After a whole afternoon thinking, I still cannot find where to start. Can any one... (3 Replies)
Discussion started by: P_E_M_Lee
3 Replies
6. Shell Programming and Scripting
Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk.
I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside.
For example:
a sample_1 200
a.b sample_2 10
a sample_3 10
a sample_1 10
a... (4 Replies)
Discussion started by: camor
4 Replies
7. Shell Programming and Scripting
VARIABLE="jhovan 5259 5241 0 20:11 ? 00:00:00 /proc/self/exe --type=gpu-process --channel=5182.0.1597089149 --supports-dual-gpus=false --gpu-driver-bug-workarounds=2,45,57 --disable-accelerated-video-decode --gpu-vendor-id=0x80ee --gpu-device-id=0xbeef --gpu-driver-vendor... (3 Replies)
Discussion started by: SkySmart
3 Replies
8. Shell Programming and Scripting
I have nginx web server logs with all requests that were made and I'm filtering them by date and time.
Each line has the following structure:
127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br)
These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies