11-28-2008
remove lines from file where > 13 occurrences of character
I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in the row looking like it has 14 columns instead of 13. I have tried a combination of grep and awk but it is running very slowly. I suspect it is the way I am using it.
tried this to print the bad rows with line numbers to a file:
grep -n '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile | awk {print} > outputfile
also tried this to create a file with only the good rows in it:
grep -v '~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~.*~' inputfile > outputfile
Both are extremely slow. The input file is approx. 800 meg
thanks
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies
2. Shell Programming and Scripting
I have a file with a few thousand lines and I'd like to remove all the lines that have more than 1 asterik (the * character) in it. So if it has 2 or more in a single line, I'd like the line removed (double d command in vi) (12 Replies)
Discussion started by: guitarscn
12 Replies
3. UNIX for Dummies Questions & Answers
Hello,
I have "mastered" the counting of occurrences with uniq -c. :D
Now I need to keep that file neat and nice and remove only the occurrence number and the space after it.
Example:
1286456 aaa
164758 aab aaa
112345 aac aaf
should be turned into:
aaa
aab aaa
aac aaf
I... (6 Replies)
Discussion started by: hobbiecat
6 Replies
4. Shell Programming and Scripting
Hi there,
i need some help to remove all occurrences of a certain character at the beginning of a string.
Example: my string is 00102030 and i want to remove all zeros from beginning of string so the result is 102030 (3 Replies)
Discussion started by: gigagigosu
3 Replies
5. UNIX for Dummies Questions & Answers
Hi,
I have a file that looks like this, the unity of information is composed of four lines, and these extends for millions.
My objective is to remove the highligthed "T".
How to attack this? This character is always constant in type "T" and position "1st" but the rest of the line is... (7 Replies)
Discussion started by: sargotrons
7 Replies
6. Shell Programming and Scripting
Hi, I have one file, I need to check if file exist or not and then remove the lines which starts with ?
My file1.out data is some thing
abcabcppp
xyzxyzpqr
?????????
?????????
Output should be in test.out
abcabcppp
xyzxyzpqr
I am getting the output as below but the File does not exist... (4 Replies)
Discussion started by: Ramyajiguru1
4 Replies
7. Shell Programming and Scripting
Hi folks!
I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#.
# is depicting the line number in the file
some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies
8. Shell Programming and Scripting
I have a file which comes every day and the file data look's as below.
Vi abc.txt
a|b|c|d\n
a|g|h|j\n
Some times we receive the file with only a new line character in the file like
vi abc.txt
\n (8 Replies)
Discussion started by: rak Kundra
8 Replies
9. UNIX for Beginners Questions & Answers
I have a file of a content like this:
abc_bla -def 800
abc_bla -def 802
abc_bla -def 804
abc_bla -def 806
abc_bla -def 808
abc_bla -def 810
abc_bla -def 812
abc_bla -def 814
...
abc_bla -def 898
abc_bla -def 900
abc_bla -def 902
abc_bla -def 904
...
abc_bla -def 990
abc_bla -def... (7 Replies)
Discussion started by: maya3
7 Replies
10. UNIX for Beginners Questions & Answers
Hi,
I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies
LEARN ABOUT PLAN9
keyboard
KEYBOARD(6) Games Manual KEYBOARD(6)
NAME
keyboard - how to type characters
DESCRIPTION
Keyboards are idiosyncratic. It should be obvious how to type ordinary ASCII characters, backspace, tab, escape, and newline. In Plan 9,
the key labeled Return or Enter generates a newline (0x0A); if there is a key labeled Line Feed, it generates a carriage return (0x0D);
Plan 9 eschews CRLFs. All control characters are typed in the usual way; in particular, control-J is a line feed and control-M a carriage
return. On the PC and some other machines, the key labeled Caps Lock acts as an additional control key.
The delete character (0x7F) may be generated by a different key, one near the extreme upper right of the keyboard. On the Next it is the
key labeled (not the asterisk above the 8). On the SLC and Sparcstation 2, delete is labeled Num Lock (the key above Backspace labeled
Delete functions as an additional backspace key). On the other keyboards, the key labeled Del or Delete generates the delete character.
The view character (0x80), used by 81/2(1) and sam(1), causes windows to scroll forward. It is generally somewhere near the lower right of
the main key area. The scroll character is generated by the VIEW key on the Gnot, the Alt Graph key on the SLC, and any of the three arrow
keys <-, v, and -> on the other terminals.
Characters in Plan 9 are runes (see utf(6)). Any 16-bit rune can be typed using a compose key followed by several other keys. The compose
key is also generally near the lower right of the main key area: the NUM PAD key on the Gnot, the Alternate key on the Next, the Compose
key on the SLC, the Option key on the Magnum, and either Alt key on the PC. After typing the compose key, type a capital and exactly four
hexadecimal characters (digits and to to type a single rune with the value represented by the typed number. There are shorthands for many
characters, comprising the compose key followed by a two- or three-character sequence. There are several rules guiding the design of the
sequences, as illustrated by the following examples. The full list is too long to repeat here, but is contained in the file in a format
suitable for grep(1) or look(1).
A repeated symbol gives a variant of that symbol, e.g., ?? yields c.
ASCII digraphs for mathematical operators give the corresponding operator, e.g., <= yields <=.
Two letters give the corresponding ligature, e.g., AE yields AE.
Mathematical and other symbols are given by abbreviations for their names, e.g., pg yields 9|.
Chess pieces are given by a w or b followed by a letter for the piece (k for king, q for queen, r for rook, n for knight, b for
bishop, or p for pawn), e.g., wk for a white king.
Greek letters are given by an asterisk followed by a corresponding latin letter, e.g., *d yields d.
Cyrillic letters are given by an at sign followed by a corresponding latin letter or letters, e.g., @ya yields .
Script letters are given by a dollar sign followed by the corresponding regular letter, e.g., $F yields .
A digraph of a symbol followed by a letter gives the letter with an accent that looks like the symbol, e.g., ,c yields c.
Two digits give the fraction with that numerator and denominator, e.g., 12 yields 1/2.
The letter s followed by a character gives that character as a superscript, e.g., s1 yields 1.
Sometimes a pair of characters give a symbol related to the superimposition of the characters, e.g., cO yields (C).
A mnemonic letter followed by $ gives a currency symbol, e.g., l$ yields L.
Note the difference between B (ss) and u (micron) and the Greek B and u.
FILES
/lib/keyboard
sorted table of characters and keyboard sequences
SEE ALSO
intro(1), ascii(1), tcs(1), 81/2(1), sam(1), cons(3), utf(6)
KEYBOARD(6)