remove lines from file where > 13 occurrences of character


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users remove lines from file where > 13 occurrences of character
# 1  
Old 11-28-2008
remove lines from file where > 13 occurrences of character

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in the row looking like it has 14 columns instead of 13. I have tried a combination of grep and awk but it is running very slowly. I suspect it is the way I am using it.

tried this to print the bad rows with line numbers to a file:
grep -n '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile | awk {print} > outputfile

also tried this to create a file with only the good rows in it:

grep -v '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile > outputfile

Both are extremely slow. The input file is approx. 800 meg

thanks
# 2  
Old 11-28-2008
Try this with awk:

Code:
awk -F~ 'NF==13' infile > outfile

Regards
# 3  
Old 12-01-2008
remove lines from file where > 13 occurrences of character

thank-you very much. works perfect. and very fast too. Only took a couple of minutes.
# 4  
Old 02-10-2009
The awk statement works great, thanks. I am now looking for a variation of it. The curretn statement eliminates all rows that do not have 13 occurences but does not trap/write them, so now I would like to find all rows that do not have 13 occurrences of the delimiter.

thanks
# 5  
Old 02-10-2009
Tools Try the following

awk -F~ 'NF!=13' infile > badfile
# 6  
Old 02-10-2009
Quote:
Originally Posted by kpd
The awk statement works great, thanks. I am now looking for a variation of it. The curretn statement eliminates all rows that do not have 13 occurences but does not trap/write them, so now I would like to find all rows that do not have 13 occurrences of the delimiter.

thanks
ok, so what do you think needs to be changed?
Ah, I see a good Samaritan in our midst....

kpd,
How would you implement getting the lines with the number of '~' occurrences between 10 and 17?
# 7  
Old 02-10-2009
Thanks JoeyG. I appreciate your time. I know it was a dumb question but we don't have any Unix knowledge here. If it looks Greek and you don't speak Greek, ask a Greek. Maybe we can get vgersh99 to come and hold our hands, he seems like a very patient teacher
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove newline character from column spread over multiple lines in a file

Hi, I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies

2. UNIX for Beginners Questions & Answers

Remove lines ending with a certain character

I have a file of a content like this: abc_bla -def 800 abc_bla -def 802 abc_bla -def 804 abc_bla -def 806 abc_bla -def 808 abc_bla -def 810 abc_bla -def 812 abc_bla -def 814 ... abc_bla -def 898 abc_bla -def 900 abc_bla -def 902 abc_bla -def 904 ... abc_bla -def 990 abc_bla -def... (7 Replies)
Discussion started by: maya3
7 Replies

3. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

4. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

5. Shell Programming and Scripting

Execution problem ---to remove the lines which starts with one type of character

Hi, I have one file, I need to check if file exist or not and then remove the lines which starts with ? My file1.out data is some thing abcabcppp xyzxyzpqr ????????? ????????? Output should be in test.out abcabcppp xyzxyzpqr I am getting the output as below but the File does not exist... (4 Replies)
Discussion started by: Ramyajiguru1
4 Replies

6. UNIX for Dummies Questions & Answers

Remove 1st character in periodic lines

Hi, I have a file that looks like this, the unity of information is composed of four lines, and these extends for millions. My objective is to remove the highligthed "T". How to attack this? This character is always constant in type "T" and position "1st" but the rest of the line is... (7 Replies)
Discussion started by: sargotrons
7 Replies

7. Shell Programming and Scripting

remove all occurrences of a character at the beginning of a string

Hi there, i need some help to remove all occurrences of a certain character at the beginning of a string. Example: my string is 00102030 and i want to remove all zeros from beginning of string so the result is 102030 (3 Replies)
Discussion started by: gigagigosu
3 Replies

8. UNIX for Dummies Questions & Answers

Remove Occurrences created with Uniq -c

Hello, I have "mastered" the counting of occurrences with uniq -c. :D Now I need to keep that file neat and nice and remove only the occurrence number and the space after it. Example: 1286456 aaa 164758 aab aaa 112345 aac aaf should be turned into: aaa aab aaa aac aaf I... (6 Replies)
Discussion started by: hobbiecat
6 Replies

9. Shell Programming and Scripting

How do I remove lines that have more than one of a certain character in them?

I have a file with a few thousand lines and I'd like to remove all the lines that have more than 1 asterik (the * character) in it. So if it has 2 or more in a single line, I'd like the line removed (double d command in vi) (12 Replies)
Discussion started by: guitarscn
12 Replies

10. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies
Login or Register to Ask a Question