delete lines matching a regular expression


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers delete lines matching a regular expression
# 1  
Old 03-05-2012
delete lines matching a regular expression

I have a very large file (over 700 million lines) that has some lines that I need to delete. An example of 5 lines of the file:

HS4_80:8:2303:19153:193032 153 k80:138891 [...]
HS4_80:8:2105:5544:43174 89 k88:81949 [...]
165 k88:81949 323 0 * = 323 0 [...]
HS4_80:8:2105:5544:19502 73 k64:351700 [...]
HS4_80:8:2303:19154:108202 137 k72:245019 [...]

Two questions:
1. How can I confirm that the first character in line 3 above (beginning with 165) is a tab, so that I might use that as a regular expression, i.e. delete all lines that begin with tab. (Not all the lines I want to keep begin with HS4, nor do all the lines I want to delete begin with 165).

1. Assuming I find the right regular expression, how do I create a new file that is exactly the same, minus the lines I want to delete (like line 3 above)?

I'm a biologist who is new to programming. I know a lot more than I did a few weeks ago but it's slow going...so thanks for any help!
# 2  
Old 03-05-2012
Hi pathunkathunk,

This sed command will delete all lines that begin with a tab character. I think this is what you want, if I understood your question. And it should be very fast processing that big file.
Code:
$ sed '/^\t/ d' infile

EDIT to add the command that redirects output:
Code:
$ sed '/^\t/ d' infile >outfile

# 3  
Old 03-05-2012
Thanks but this doesn't seem to work for me.

I tried it on my large file and did not notice a difference. Then I created a smaller test file of only 6 lines, ensuring that I used the tab key to indent every other line. Still, the output file was identical to the input.

command: sed '/^\t/ d' test2 >test2b

#test2
abc1
abc2
ab3
abc4
abc5

#test2b
abc1
abc2
ab3
abc4
abc5
# 4  
Old 03-06-2012
Then I don't know what you want to achieve.

The regular expression of my previous post deletes all lines that begin with a tab character. But in your files I don't see them. They begin with a letter, no spaces.
# 5  
Old 03-06-2012
With older seds you cannot use "\t" and you need to do this:
Code:
sed 's/^    //d' infile

You cannot copy-paste this. The white space is one TAB-character entered as CTRL-V TAB
# 6  
Old 03-06-2012
With perl. Original file is retained as inputfile.bak.
Code:
perl -i.bak -ne '!/^\t/ && print' inputfile

# 7  
Old 03-06-2012
Code:
awk '!/^\t/' infile

Code:
grep -v '^\t' infile

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

regular expression matching whole words

Hi Consider the file this is a good line when running grep '\b(good|great|excellent)\b' file5 I expect it to match the line but it doesn't... what am i doing wrong?? (ultimately this regex will be in a awk script- just using grep to test it) Thanks, Storms (5 Replies)
Discussion started by: Storms
5 Replies

2. UNIX for Dummies Questions & Answers

Finding lines with a regular expression, replacing them with blank lines

So the tag for this forum says all newbies welcome... All I want to do is go through my file and find lines which contain a given string of characters then replace these with a blank line. I really tried to find a simple command to do this but failed. Here's what I did come up with though: ... (2 Replies)
Discussion started by: Golpette
2 Replies

3. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

4. Shell Programming and Scripting

Matching single quote in a regular expression

I trying to match the begining of the following line in a perl script with a regular expression. $ENV{'ORACLE_HOME'} I tried this regluar expession: /\$ENV\{\'ORACLE_HOME\'\}/ Instead of match, I got a blank prompt > It seems to be a problem with the single quote. If I take it... (11 Replies)
Discussion started by: JC9672
11 Replies

5. Shell Programming and Scripting

How to delete the word after a regular expression

Example: Lucas RUNCYCLE Rule1 Astigmatism Robot RUNCYCLE Rule2 Jack RUNCYCLE Calendar1 June Lucy RUNCYCLE Exception4 Fear RUNCYCLE Calendar5 August In this example, how can I delete the next after the expression RUNCYCLE? (i.e. Rule1, Rule2, Calendar1, Exception1, Calendar5) I'm... (3 Replies)
Discussion started by: The Gamemaster
3 Replies

6. Shell Programming and Scripting

Regular expression matching

Hi, I have a variable in my script that gets its value from a procstack output. It could be a number of any length, or it could just be a '1' with 0 or more white spaces around it. I would like to detect when this variable is just a 1 and not a 1234, for example. This is as far as I got: ... (3 Replies)
Discussion started by: tmf33uk
3 Replies

7. Shell Programming and Scripting

Regular expression matching in BASH (equivalent of =~ in Perl)

In Perl I can write a condition that evaluates a match expression like this: if ($foo =~ /^bar/) { do blah blah blah } How do I write this in shell? What I need to know is what operator do I use? The '=~' doesn't seem to fit. I've tried different operators, I browsed the man page for... (3 Replies)
Discussion started by: indiana_tas
3 Replies

8. Shell Programming and Scripting

Help: Regular Expression for Negate Matching String

Hi guys, as per subject I am having problem with regular expressions. Example, if i got a string "javax.servlet.http.HttpServlet.service" that may occurred anywhere within a text file. How can I used the negate pattern matching of regular expression? I tried the below pattern but it... (4 Replies)
Discussion started by: DrivesMeCrazy
4 Replies

9. Programming

Regular Expression matching in PERL

I am trying to read a file and capture particular lines into different strings: LENGTH: Some Content here TEXT: Some Content Here COMMENT: Some Content Here I want to be able to get (LENGTH: .... ) into one array and so on... I'm trying to use PERL in slurp mode but for some reason... (8 Replies)
Discussion started by: Legend986
8 Replies

10. Shell Programming and Scripting

Regular expression matching a new line

I have written a script to test some isdn links in my network and I am trying to format the output to be more readable. Each line of the output has a different number of digits as follows... Sitename , spid1 12345678901234 1234567890 1234567 , spid2 1234567890 1234567890 1234567 Sitename , ... (1 Reply)
Discussion started by: drheams
1 Replies
Login or Register to Ask a Question