Sponsored Content
Top Forums UNIX for Advanced & Expert Users remove lines from file where > 13 occurrences of character Post 302262764 by kpd on Friday 28th of November 2008 02:59:02 PM
Old 11-28-2008
remove lines from file where > 13 occurrences of character

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in the row looking like it has 14 columns instead of 13. I have tried a combination of grep and awk but it is running very slowly. I suspect it is the way I am using it.

tried this to print the bad rows with line numbers to a file:
grep -n '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile | awk {print} > outputfile

also tried this to create a file with only the good rows in it:

grep -v '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile > outputfile

Both are extremely slow. The input file is approx. 800 meg

thanks
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies

2. Shell Programming and Scripting

How do I remove lines that have more than one of a certain character in them?

I have a file with a few thousand lines and I'd like to remove all the lines that have more than 1 asterik (the * character) in it. So if it has 2 or more in a single line, I'd like the line removed (double d command in vi) (12 Replies)
Discussion started by: guitarscn
12 Replies

3. UNIX for Dummies Questions & Answers

Remove Occurrences created with Uniq -c

Hello, I have "mastered" the counting of occurrences with uniq -c. :D Now I need to keep that file neat and nice and remove only the occurrence number and the space after it. Example: 1286456 aaa 164758 aab aaa 112345 aac aaf should be turned into: aaa aab aaa aac aaf I... (6 Replies)
Discussion started by: hobbiecat
6 Replies

4. Shell Programming and Scripting

remove all occurrences of a character at the beginning of a string

Hi there, i need some help to remove all occurrences of a certain character at the beginning of a string. Example: my string is 00102030 and i want to remove all zeros from beginning of string so the result is 102030 (3 Replies)
Discussion started by: gigagigosu
3 Replies

5. UNIX for Dummies Questions & Answers

Remove 1st character in periodic lines

Hi, I have a file that looks like this, the unity of information is composed of four lines, and these extends for millions. My objective is to remove the highligthed "T". How to attack this? This character is always constant in type "T" and position "1st" but the rest of the line is... (7 Replies)
Discussion started by: sargotrons
7 Replies

6. Shell Programming and Scripting

Execution problem ---to remove the lines which starts with one type of character

Hi, I have one file, I need to check if file exist or not and then remove the lines which starts with ? My file1.out data is some thing abcabcppp xyzxyzpqr ????????? ????????? Output should be in test.out abcabcppp xyzxyzpqr I am getting the output as below but the File does not exist... (4 Replies)
Discussion started by: Ramyajiguru1
4 Replies

7. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

8. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

9. UNIX for Beginners Questions & Answers

Remove lines ending with a certain character

I have a file of a content like this: abc_bla -def 800 abc_bla -def 802 abc_bla -def 804 abc_bla -def 806 abc_bla -def 808 abc_bla -def 810 abc_bla -def 812 abc_bla -def 814 ... abc_bla -def 898 abc_bla -def 900 abc_bla -def 902 abc_bla -def 904 ... abc_bla -def 990 abc_bla -def... (7 Replies)
Discussion started by: maya3
7 Replies

10. UNIX for Beginners Questions & Answers

Remove newline character from column spread over multiple lines in a file

Hi, I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies
KEYBOARD(6)							   Games Manual 						       KEYBOARD(6)

NAME
keyboard - how to type characters DESCRIPTION
Keyboards are idiosyncratic. It should be obvious how to type ordinary ASCII characters, backspace, tab, escape, and newline. In Plan 9, the key labeled Return or Enter generates a newline (0x0A); if there is a key labeled Line Feed, it generates a carriage return (0x0D); Plan 9 eschews CRLFs. All control characters are typed in the usual way; in particular, control-J is a line feed and control-M a carriage return. On the PC and some other machines, the key labeled Caps Lock acts as an additional control key. The delete character (0x7F) may be generated by a different key, one near the extreme upper right of the keyboard. On the Next it is the key labeled (not the asterisk above the 8). On the SLC and Sparcstation 2, delete is labeled Num Lock (the key above Backspace labeled Delete functions as an additional backspace key). On the other keyboards, the key labeled Del or Delete generates the delete character. The view character (0x80), used by 81/2(1) and sam(1), causes windows to scroll forward. It is generally somewhere near the lower right of the main key area. The scroll character is generated by the VIEW key on the Gnot, the Alt Graph key on the SLC, and any of the three arrow keys <-, v, and -> on the other terminals. Characters in Plan 9 are runes (see utf(6)). Any 16-bit rune can be typed using a compose key followed by several other keys. The compose key is also generally near the lower right of the main key area: the NUM PAD key on the Gnot, the Alternate key on the Next, the Compose key on the SLC, the Option key on the Magnum, and either Alt key on the PC. After typing the compose key, type a capital and exactly four hexadecimal characters (digits and to to type a single rune with the value represented by the typed number. There are shorthands for many characters, comprising the compose key followed by a two- or three-character sequence. There are several rules guiding the design of the sequences, as illustrated by the following examples. The full list is too long to repeat here, but is contained in the file in a format suitable for grep(1) or look(1). A repeated symbol gives a variant of that symbol, e.g., ?? yields c. ASCII digraphs for mathematical operators give the corresponding operator, e.g., <= yields <=. Two letters give the corresponding ligature, e.g., AE yields AE. Mathematical and other symbols are given by abbreviations for their names, e.g., pg yields 9|. Chess pieces are given by a w or b followed by a letter for the piece (k for king, q for queen, r for rook, n for knight, b for bishop, or p for pawn), e.g., wk for a white king. Greek letters are given by an asterisk followed by a corresponding latin letter, e.g., *d yields d. Cyrillic letters are given by an at sign followed by a corresponding latin letter or letters, e.g., @ya yields . Script letters are given by a dollar sign followed by the corresponding regular letter, e.g., $F yields . A digraph of a symbol followed by a letter gives the letter with an accent that looks like the symbol, e.g., ,c yields c. Two digits give the fraction with that numerator and denominator, e.g., 12 yields 1/2. The letter s followed by a character gives that character as a superscript, e.g., s1 yields 1. Sometimes a pair of characters give a symbol related to the superimposition of the characters, e.g., cO yields (C). A mnemonic letter followed by $ gives a currency symbol, e.g., l$ yields L. Note the difference between B (ss) and u (micron) and the Greek B and u. FILES
/lib/keyboard sorted table of characters and keyboard sequences SEE ALSO
intro(1), ascii(1), tcs(1), 81/2(1), sam(1), cons(3), utf(6) KEYBOARD(6)
All times are GMT -4. The time now is 01:07 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy