find and remove rows from file where multi occurrences of character found


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers find and remove rows from file where multi occurrences of character found
# 1  
Old 11-28-2008
find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in the row looking like it has 14 columns instead of 13. I have tried a combination of grep and awk but it is running very slowly. I suspect it is the way I am using it.

tried this to print the bad rows with line numbers to a file:
grep -n '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile | awk {print} > outputfile

also tried this to create a file with only the good rows in it:

grep -v '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile > outputfile

Both are extremely slow. The input file is approx. 800 meg

thanks
# 2  
Old 11-28-2008
No duplicate or cross-posting, read the rules.

Proceed here:

https://www.unix.com/unix-advanced-ex...#post302262764

Thread closed.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

2. Shell Programming and Scripting

Character screening and paste into new file in columns instead of rows

QL10169_SAUJANA%SubNetwork=ONRM_ROOT_MO_R,SubNetwork=ERBS_KCRN11,MeContext=QL10169_SAUJANA_5 %External_Link_Failure %X2_link_problem_to_one_or_several_neighbouring_eNodeBs. QL10187_MATANG_JAYA_2%SubNetwork=ONRM_ROOT_MO_R,SubNetwork=ERBS_KUCHING,MeContext=QL10187_MATANG_JAY A_2_3... (2 Replies)
Discussion started by: Ankit Vyas
2 Replies

3. UNIX for Advanced & Expert Users

Find 2 occurrences of a word and print file names

I was thinking something like this but it always gets rid of the file location. grep -roh base. | wc -l find . -type f -exec grep -o base {} \; | wc -l Would this be a job for awk? Would I need to store the file locations in an array? (3 Replies)
Discussion started by: cokedude
3 Replies

4. Shell Programming and Scripting

Replace a character of specified column(s) of all rows in a file

Hi - I have a file "file1" of below format. Its a comma seperated file. Note that each string is enclosed in double quotes. "abc","-0.15","10,000.00","IJK" "xyz","1,000.01","1,000,000.50","OPR" I want the result as: "abc","-0.15","10000.00","IJK" "xyz","1,000.01","1000000.50","OPR" I... (8 Replies)
Discussion started by: njny
8 Replies

5. Shell Programming and Scripting

Find and Remove rows

******************************************* * ROW * ******************************************* CODE:CODE1 FILE: FILE1 FIELD: FIELD1 KEY: KEY1 ORA-00001: unique constraint (ETL.KEY_PK) violated ******************************************* * ROW * *******************************************... (7 Replies)
Discussion started by: Shanks
7 Replies

6. Shell Programming and Scripting

remove all occurrences of a character at the beginning of a string

Hi there, i need some help to remove all occurrences of a certain character at the beginning of a string. Example: my string is 00102030 and i want to remove all zeros from beginning of string so the result is 102030 (3 Replies)
Discussion started by: gigagigosu
3 Replies

7. Shell Programming and Scripting

Combining rows in a text file with a character limit

I have a file that contains several thousands rows. Here is an example. ^411912$ ^487267$ ^643776$ ^682249$ ^687737$ ^692328$ ^693767$ ^695483$ ^697289$ ^757411$ ^776688$ ^778953$ ^806123$ ^872262$ ^877877$ ^839837$ ^76666$ ^72018$ ^23330$ (3 Replies)
Discussion started by: justinb_155
3 Replies

8. UNIX for Dummies Questions & Answers

Remove Occurrences created with Uniq -c

Hello, I have "mastered" the counting of occurrences with uniq -c. :D Now I need to keep that file neat and nice and remove only the occurrence number and the space after it. Example: 1286456 aaa 164758 aab aaa 112345 aac aaf should be turned into: aaa aab aaa aac aaf I... (6 Replies)
Discussion started by: hobbiecat
6 Replies

9. UNIX for Dummies Questions & Answers

Remove rows from file

Hi to all,this is my first post here. I've a file as name 89 78 09 67 othername how I can remove the word name and othername from this file, and an eventually blank row in it?Thanks in advance. (2 Replies)
Discussion started by: cv313x
2 Replies

10. UNIX for Advanced & Expert Users

remove lines from file where > 13 occurrences of character

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting... (7 Replies)
Discussion started by: kpd
7 Replies
Login or Register to Ask a Question