Remove dupes in a large file Post: 303024652

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove a large number of user from oracle

Hi on solaris and oracle 10g2, I have number of users created in Oracle, I wonder if I have a list of the usernames will it be possible to remove the users quickly ? I want to keep the users access to system but oracle. some thing like shell script may be ?:confused: I am trying to...

2. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in...

3. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"...

4. Shell Programming and Scripting

Remove Duplicate Filenames in 2 very large directories

Hello Gurus, O/S RHEL4 I have a requirement to compare two linux based directories for duplicate filenames and remove them. These directories are close to 2 TB each. I have tried running a: Prompt>diff -r data1/ data2/ I have tried this as well: jason@jason-desktop:~$ cat script.sh ...

5. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric...

6. UNIX for Dummies Questions & Answers

Filtering F-Dupes

Is there an easy way to tell FDupes what filetypes to look at or ignore?

7. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

8. Shell Programming and Scripting

remove large portion of web page code between two tags

Hi everybody, I am trying to remove bunch of lines from web pages between two tags: one is <h1> and the other is <table it looks like <h1>Anniversary cards roses</h1> many lines here <table summary="Free anniversary greeting cards." cellspacing="8" cellpadding="8" width="70%">my goal...

9. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of...

10. Shell Programming and Scripting

Modify script to remove dupes with two delimiters

Hello, I have a script which removes duplicates in a database with a single delimiter = The script is given below: # script to remove dupes from a row with structure word=word BEGIN{FS="="} {for(i=1;i<=NF;i++){a++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1 How do I modify...

LEARN ABOUT ULTRIX

uuencode

uuencode(5) File Formats Manual uuencode(5)

Name
uuencode - format of an encoded uuencode file

Description
Files output by consist of a header line, followed by a number of body lines, and a trailer line. The command ignores any lines preceding
the header or following the trailer. Lines preceding a header must not, of course, look like a header.

The header line is distinguished by having the first six characters by the word ``begin'', followed by a space. The next item on the line
is a mode (in octal) and a string which names the remote file. A space separates the three items in the header line.

The body consists of a number of lines, each at most 62 characters long including the trailing new line. These consist of a character
count, followed by encoded characters, followed by a new line. The character count is a single printing character and represents an inte-
ger, the number of bytes the rest of the line represents. Such integers are always in the range from 0 to 63 and can be determined by sub-
tracting the character space (octal 40) from the character.

Groups of 3 bytes are stored in 4 characters, with 6 bits per character. All are offset by a space to make the characters print. The last
line may be shorter than the normal 45 bytes. If the size is not a multiple of 3, this fact can be determined by the value of the count on
the last line. Extra dummy characters are included to make the character count a multiple of 4. The body is terminated by a line with a
count of zero. This line consists of one ASCII space.

The trailer line consists of "end" on a line by itself.

See Also
mail(1), uucp(1c), uudecode(1c), uuencode(1c), uusend(1c)

uuencode(5)