Deleting all but a regex using sed, tr, cut etc


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Deleting all but a regex using sed, tr, cut etc
# 1  
Old 08-28-2011
Question Deleting all but a regex using sed, tr, cut etc

Hi guys, this is my first post, though I've been looking around the forums for a while trying to find a solution to my problem. I want to be able to take a several lines of text (an out put from get-iplayer and ls to be precise) and only keep the crazy alphanumerical code in each line. For example:

Quote:
Originally Posted by get-iplayer
b011rf7y: Doctor Who: Series 6 - 7. A Good Man Goes to War
b0146h0q: Doctor Who: Series 6 - 8. Let's Kill Hitler
and also

Quote:
Originally Posted by ls
Doctor_Who_Confidential_Series_5_-_8._After_Effects_b00sj9qj_default.flv
Doctor_Who_Series_5_-_8._The_Hungry_Earth_b00sj9sq_default.flv
Doctor_Who_Series_6_-_1._The_Impossible_Astronaut_b010tb7q_default.flv
All I want to do is extract the codes (e.g. b011rf7y, b00sj9qj), so I can compare them.

I've tried using Sed, tr, and cut but I can't seem to get the right output (though I can do plenty of other neat things). Which of these should I be using? I think my regex is
Code:
'((?:[a-z][a-z]*[0-9]+[a-z0-9]*))'

but I'm not sure (from txt2re.com by the way)
# 2  
Old 08-28-2011
Code:
[ZSH-4.3.11] tmp % cat testlist
b011rf7y: Doctor Who: Series 6 - 7. A Good Man Goes to War
b0146h0q: Doctor Who: Series 6 - 8. Let's Kill Hitler
Doctor_Who_Confidential_Series_5_-_8._After_Effects_b00sj9qj_default.flv
Doctor_Who_Series_5_-_8._The_Hungry_Earth_b00sj9sq_default.flv
Doctor_Who_Series_6_-_1._The_Impossible_Astronaut_b010tb7q_default.flv


[ZSH-4.3.11] tmp % cut -d : -f 1 -s  testlist 
b011rf7y
b0146h0q


[ZSH-4.3.11] tmp % cut -d : -f 2-3  testlist  
 Doctor Who: Series 6 - 7. A Good Man Goes to War
 Doctor Who: Series 6 - 8. Let's Kill Hitler
Doctor_Who_Confidential_Series_5_-_8._After_Effects_b00sj9qj_default.flv
Doctor_Who_Series_5_-_8._The_Hungry_Earth_b00sj9sq_default.flv
Doctor_Who_Series_6_-_1._The_Impossible_Astronaut_b010tb7q_default.flv

#OR

[ZSH-4.3.11] tmp % cut -d : -f 2-3  testlist | sed 's/^ //g; s/://g; s/ /_/g'
Doctor_Who_Series_6_-_7._A_Good_Man_Goes_to_War
Doctor_Who_Series_6_-_8._Let's_Kill_Hitler
Doctor_Who_Confidential_Series_5_-_8._After_Effects_b00sj9qj_default.flv
Doctor_Who_Series_5_-_8._The_Hungry_Earth_b00sj9sq_default.flv
Doctor_Who_Series_6_-_1._The_Impossible_Astronaut_b010tb7q_default.flv


Last edited by xbin; 08-28-2011 at 11:40 AM..
# 3  
Old 08-28-2011
Maybe this would be enough (GNU grep):
Code:
egrep  -o 'b[0-9a-z]{7}'

# 4  
Old 08-28-2011
Thanks Yazu I'll give it a go, I think I can see whats happening.

Xbin, really struggling to see what you've done...Smilie
# 5  
Old 08-28-2011
So does it works for you or not?
Code:
$ cat >INPUTFILE
b011rf7y: Doctor Who: Series 6 - 7. A Good Man Goes to War
b0146h0q: Doctor Who: Series 6 - 8. Let's Kill Hitler
Doctor_Who_Confidential_Series_5_-_8._After_Effects_b00sj9qj_default.flv
Doctor_Who_Series_5_-_8._The_Hungry_Earth_b00sj9sq_default.flv
Doctor_Who_Series_6_-_1._The_Impossible_Astronaut_b010tb7q_default.flv

$  egrep  -o 'b[0-9a-z]{7}' INPUTFILE 
b011rf7y
b0146h0q
b00sj9qj
b00sj9sq
b010tb7q

===

O, yes. I didn't understand about struggling... Ok, it's not for me. Smilie
This User Gave Thanks to yazu For This Post:
# 6  
Old 08-28-2011
Yazu, It works perfectly! thanks very much!

---------- Post updated at 06:47 PM ---------- Previous update was at 05:44 PM ----------

Ok the egrep line works, but I'd like it to change the file I act it on, not just print out to the terminal.

Code:
egrep  -o 'b[0-9a-z]{7}' ./FILENAME > ./FILENAME

That just wipes all lines from the file. I can't see anything salient in the man file either.

Advice?
# 7  
Old 08-28-2011
Just output it in another file.
Code:
egrep  -o 'b[0-9a-z]{7}' FILENAME >FILENAME.codes

And then, if you want
Code:
rm FILENAME
mv FILENAME.codes FILENAME

It's possible to do it in one step with sed or perl, but it's not safe.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex with sed

hi i would like to say "DATABASENAME=" to "TABLESNAME=" remove "," and press enter myconfig file thanks (1 Reply)
Discussion started by: mnnn
1 Replies

2. Shell Programming and Scripting

Multiple regex in sed

I am using the following sed script to remove new lines (\r\n and \n), except from lines starting with >: sed -i ':a /^>/!N;s/\r\n\(\)/\1/;s/\n\(\)/\1/;ta' Is there a way to include both \r\n and \n in one regex to avoid the second substitute script (s/\n\(\)/\1/)? (4 Replies)
Discussion started by: Xterra
4 Replies

3. Shell Programming and Scripting

Help with sed substitution / regex

Hi all, please can anyone show me how to use sed and regular expressions to achieve the following. If a line contains a capital A followed by exactly 5 or 6 characters followed by an angled bracket then insert an asterix before the angled bracket. So: XCONFIGA12345<X Becomes: ... (5 Replies)
Discussion started by: Jedimark
5 Replies

4. UNIX for Advanced & Expert Users

Sed regex problem

Hi, I tried to extract the time from `date` with sed. (I know it works with `date +%H:%M:%S` as well) I got three solutions of which just one worked. I thought "+" should repeat the previous expression 1 or more times and {n} should repeat the previous expression n times. $ date Thu... (9 Replies)
Discussion started by: thiuda
9 Replies

5. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

6. Shell Programming and Scripting

perl regex multi line cut

hello mighty all there's a file with lots of comments.. some of them looks like: =comment blabla blablabla bla =cut i'm trying to cut this out completely with this code: $line=~s/^=.+?=cut//sg; but no luck also tryed to change it abit but still I don't understand how the... (9 Replies)
Discussion started by: tip78
9 Replies

7. Shell Programming and Scripting

deleting text records with sed (sed paragraphs)

Hi all, First off, Thank you all for the knowledge I have gleaned from this site! Deleting Records from a text file... sed paragraphs The following code works nearly perfect, however each time it is run on the log file it adds a newline at the head of the file, run it 5 times, it'll have 5... (1 Reply)
Discussion started by: Festus Hagen
1 Replies

8. Shell Programming and Scripting

sed - using regex and | need help

From my understanding when using regex1|regex2 the matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. When im trying to extract the name from those examples: A) name.can.be.different.20.03.2009.boom B)... (2 Replies)
Discussion started by: TehOne
2 Replies

9. Shell Programming and Scripting

Sed and regex help needed

Hi all, I'm writing a script that replaces a value in a file. The file is formatted as follows: So, for this example, I'd like to replace the value for param_two. The value for param_two can be a one, or two-digit number. It replaces the value in file.cfg, and directs the... (9 Replies)
Discussion started by: marknu1
9 Replies

10. UNIX for Dummies Questions & Answers

sed regex

I would like to do this: replace the word "prod" with the word "special" but it may occur through the file naturally without a command, I only want it to happen when it has a specific command in front of it. The command will always look like this &lt;IMG,###,###,##,&gt;prod/directory/IMG/file ... (4 Replies)
Discussion started by: Shakey21
4 Replies
Login or Register to Ask a Question