How to extract text from string using regular expressions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract text from string using regular expressions
# 1  
Old 10-16-2008
How to extract text from string using regular expressions

Hi,

I'm trying to use sed to extract some text and assign it to a variable.

Can anyone provide me with some help? it would be much appreciated!

I"m looking to extract for example:

filename=/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt

I'm trying to extract the 1042 from that file name.

Again any help would be appreciated!
# 2  
Old 10-16-2008
Unfortunately you did not tell is what shell you are using or if the pattern is regular.

Here are two ways of doing it using ksh93. It the pattern is always the same, then the first way is easier.
Code:
#!/usr/bin/ksh93

filename="/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt"
out=${filename:25:4}
print $out

out=${filename/*([[:print:]])(_[[:alpha:]])({4}([[:digit:]]))([[:alpha:]]_)*([[:print:]])/\3}
print $out

Both return 1042
# 3  
Old 10-16-2008
Quote:
Originally Posted by jtung
I"m looking to extract for example:

filename=/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt

I'm trying to extract the 1042 from that file name.

What are the criteria for determining which part of the string you want?

Do you want the digits from the fourth field, using underscore as the field delimiter?

Do you want whatever follows _C up to S_?

BTW, you don't want to use sed to work on a string.

Use sed for working on files and shell parameter expansion for manipulating strings.
# 4  
Old 10-16-2008
sorry for not being more specific.

I have a file name that I want to extract the next 4 numbers after the _C
everything before and after that, I would like stripped off.

I tried using {filename:X:Y} but i don't think i'm on the latest ksh.

So i was reading up on the internet that I could use sed to accomplish this.

If there is another way to do this, please let me know.

thanks in advance!
# 5  
Old 10-16-2008
to answer the first question. the pattern is not regular
# 6  
Old 10-16-2008

Code:
filename=/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt
temp=${filename#*_C}
num=${temp%%[!0-9]*}
echo "$num"

# 7  
Old 10-16-2008
Quote:
Originally Posted by jtung
sorry for not being more specific.

I have a file name that I want to extract the next 4 numbers after the _C
everything before and after that, I would like stripped off.

I tried using {filename:X:Y} but i don't think i'm on the latest ksh.

So i was reading up on the internet that I could use sed to accomplish this.

If there is another way to do this, please let me know.

thanks in advance!
Code:
# filename="/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt"
# echo $filename|sed 's/.*_C\(....\)S_.*/\1/'
1042

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

2. UNIX for Dummies Questions & Answers

extract columns using grep or regular expressions

I am trying to print columns from a table whose name (header) matches a certain string. E.g., patient1001 patient1002 patient2005 patient3005 patient4001 0 0 0 0 0 2 9 2 8 3 2 7 3 0 2 Say I want to print columns whose names end with "01" patient1001 patient4001 0 0 2 3 2 2 ... (3 Replies)
Discussion started by: quextil
3 Replies

3. Shell Programming and Scripting

Regular Expressions

I am new to shell scripts.Can u please help me on this req. test_user = "Arun" if echo "test_user is a word" else echo "test_user is not a word" (1 Reply)
Discussion started by: chandrababu
1 Replies

4. Shell Programming and Scripting

Help with regular expressions

I have a file that I'm trying to find all the cases of phone number extensions and deleting them. So input file looks like: abc x93825 def 13234 x52673 hello output looks like: abc def 13234 hello Basically delete lines that have 5 numbers following "x". I tried: x\(4) but it... (7 Replies)
Discussion started by: pxalpine
7 Replies

5. Shell Programming and Scripting

Regular expressions help

need a regex that matches when a number has a zero (0) at the end of it so like 10 20 120 30 330 1000 and so on (6 Replies)
Discussion started by: linuxkid
6 Replies

6. Shell Programming and Scripting

Regular Expressions

what elements does " /^/ " match? I did the test which indicates that it matches single lowercase character like 'a','b' etc. and '1','2' etc. But I really confused with that. Because, "/^abc/" matches strings like "abcedf" or "abcddddee". So, what does caret ^ really mean? Any response... (2 Replies)
Discussion started by: DavidHe
2 Replies

7. UNIX for Dummies Questions & Answers

Regular expressions

In regular expressions with grep(or egrep), ^ works if we want something in starting of line..but what if we write ^^^ or ^ for pattern matching??..Hope u all r familiar with regular expressions for pattern matching.. (1 Reply)
Discussion started by: aadi_uni
1 Replies

8. Shell Programming and Scripting

Need help in string extraction using regular expressions

Hi, I am a new bee to this forum. I am trying to extract the text after a matching pattern from a url using regular expression. Ex: http://locatlhost:2020/proxy/checkthisout I want to extract the string after proxy/. I am not familiar with reg ex. Can someone please help? (2 Replies)
Discussion started by: akatraga
2 Replies

9. Shell Programming and Scripting

regular expressions

Hello, Let say I have a string with content "Free 100%". How can extract only "100" using ksh? I would this machanism to work if instead of "100" there is any kind of combination of numbers(ex. "32", "1238", "1"). I want to get only the digits. I have written something like this: ... (4 Replies)
Discussion started by: whatever
4 Replies

10. Shell Programming and Scripting

Regular Expressions

I'm trying to parse RichText to XML. I want to be able to capture everything between the '/par' tag in the RTF but not include the tag itself. So far all I have is this, '.*?\\par' but it leaves '\par' at the end of it. Any suggestions? (1 Reply)
Discussion started by: AresMedia
1 Replies
Login or Register to Ask a Question